Inventor 李伟 Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.) Pending Application number CN201210233929XA Other languages Chinese ( zh) Google Patents CN102750391A - File previewing method and system based on Hadoop distribution type Apache Tika- It is a toolkit library which is mainly used for documents type detection and content extraction from various file formats using existing parser libraries.CN102750391A - File previewing method and system based on Hadoop distribution type.It is based on Apache Lucene, adding web crawler, line-graph databases like Hadoop, the parser for HTML and other file formats etc. Apache Nutch- Apache Nutch is a highly extensible and scalable open source web search software.PreFlight- It is used to verify the PDF files for PDF/A-1B standard.XmpBox- It contains the classes and interfaces to handle the XMP metadata.FontBox- It contains the classes and interfaces to handle the font information.It contains the classes and interfaces related to the content extraction and manipulation from files. PDFBox- It is the main part of the PDFBox library.These utilities includes encrypting and decrypting PDF, overlaying, merging, debugging, converting text to PDF and PDF to an image. PDFBox comes with a series of command line utilities for performing the various operation over PDF documents. It offers unicode support for PDF creation, and has better support for interactive forms. It was taken up as an Apache project in 2008, and became an Apache top level project in 2009. PDFBox library was originally developed in 2002 by Ben Litchfield. This library provides an environment for generating, manipulating, rendering and printing PDF documents.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |