Skip to end of metadata
Go to start of metadata
Contents Summary

Working with PDF Documents

Aspose.Words for Cloud can work with PDF documents, converting them to desired format (when using convert call) or DOCX format (when using all other API calls). The request body and parameters do not differ in case of specifying input file as PDF or DOC/DOCX, making user experience consistent. As such, using PDF documents in Aspose.Words for Cloud allows to:

  • Convert PDF to DOC/DOCX format (see below for limitations)
  • Use Words API on PDF document, e.g. count number of words or get/update/delete specified paragraph. 

In case of document modifying requests (POST/PUT/DELETE), output result will always be saved in Word format. To work with PDF file exlusively, refer to Aspose.Pdf for Cloud API

destFileName parameter is required for document modification calls if PDF file is specified as input, because output will always be saved as Word document.

PDF to Word Conversion

Aspose.Words for Cloud implements its own conversion engine, that takes PDF document from storage as input and converts it into Word document so that it can be saved as DOC/DOCX document or processed using various Aspose.Words API. The converter is turning document into "flow" format - merges multiple paragraphs into one section, converts tables and lists into native Word tables and lists etc., so that document can then be naturally edited in Word editing application (such as Microsoft Word).

Aspose.Words conversion engine is deliberately focused on convertion document structure into "flow" format. The resulting document will be "editable" but some complex formatting might appear different from the original document.

Converter currently supports the following features:

  • Text and paragraphs
  • Text formatting (font, size, foreground/background, options like bold, italic, underline)
  • Bulleted and numbered lists (including nested lists)
  • Tables (bordered, without merged cells and nested tables)
  • Images Conversion
    • Semi-transparent images
    • Rotated images
    • Inline images (images that go “inside” the text, being a logical part of text paragraph)

The following limitations apply and will be addressed in future versions:

  • Each page is processed separately - section break is inserted between pages
  • Links lose their destination reference during conversion
  • Multi-column text is not supported
  • Protected PDF documents are not supported
  • Vector images (rendered via PDF paint operators) are not converted

cURL Example


SDK Examples


Reading Paragraphs from a PDF Document

cURL Example


SDK Examples

  • No labels