Docs

Output Formats

PDFPipe supports 10 output formats across three categories. Set the format field in your convert request to choose one.

Inline delivery (returnMethod: "inline"): Text-oriented extraction formats (json, text, markdown, xml, csv) are usually the best fit for embedding in the JSON API response. Encoded and image formats (base64, binary, png, jpg, webp) are returned as Base64 strings when inline; the response includes contentEncoding: "base64" and a matching contentType (e.g. image/png). Very large outputs may fall back to a presigned URL with returnMethodFallback and returnMethodFallbackReason.
FormatValueCategoryContent-TypeTiers
JSONjsonExtractionapplication/jsonAll tiers
TexttextExtractiontext/plainAll tiers
MarkdownmarkdownExtractiontext/markdownStarter+
XMLxmlExtractionapplication/xmlStarter+
CSVcsvExtractiontext/csvStarter+
Base64base64Encodedtext/plainStarter+
BinarybinaryEncodedapplication/pdfStarter+
PNGpngImageimage/pngStarter+
JPGjpgImageimage/jpegStarter+
WebPwebpImageimage/webpStarter+

Extraction Formats

Extraction formats parse the PDF content and return structured or plain-text data. These are the most commonly used formats.

JSON

format: "json"

Structured page-by-page extraction with text, tables, metadata, font information, and coordinates. The richest output format.

Example JSON output
{
  "pages": [
    {
      "page": 1,
      "width": 612,
      "height": 792,
      "text": "Invoice #2026-0142...",
      "tables": [
        {
          "rows": [
            ["Item", "Qty", "Price"],
            ["API Credits", "1000", "$49.00"]
          ]
        }
      ],
      "metadata": {
        "fonts": ["Helvetica"],
        "hasImages": false
      }
    }
  ],
  "documentMetadata": {
    "title": "Invoice",
    "author": "Acme Corp",
    "pageCount": 1,
    "fileSize": 24576
  }
}

Common use cases

  • Data extraction and processing pipelines
  • LLM / RAG document ingestion
  • Table extraction for spreadsheets or databases

Text

format: "text"

Plain text extraction. All pages concatenated with page breaks. No structural metadata.

Example Text output
Invoice #2026-0142
Date: February 15, 2026

Item          Qty    Price
API Credits   1000   $49.00
Priority      1      $29.00

Total: $78.00

Common use cases

  • Full-text search indexing
  • Simple text processing
  • Content previews

Markdown

format: "markdown"

Markdown-formatted text with headings, tables, and lists preserved as Markdown syntax. Ideal for LLM prompts.

Example Markdown output
# Invoice #2026-0142

**Date:** February 15, 2026

| Item | Qty | Price |
|------|-----|-------|
| API Credits | 1000 | $49.00 |
| Priority | 1 | $29.00 |

**Total:** $78.00

Common use cases

  • LLM prompt context (best format for AI models)
  • Documentation generation
  • Content migration

XML

format: "xml"

Structured XML output with page, text, and metadata elements. Useful for systems that consume XML natively.

Example XML output
<?xml version="1.0" encoding="UTF-8"?>
<document>
  <metadata>
    <title>Invoice #2026-0142</title>
    <pageCount>1</pageCount>
  </metadata>
  <pages>
    <page number="1">
      <text>Invoice #2026-0142...</text>
    </page>
  </pages>
</document>

Common use cases

  • Enterprise system integrations
  • XSLT transformation pipelines
  • Legacy system compatibility

CSV

format: "csv"

Comma-separated values from detected tables. Each table is output sequentially. Best for PDFs with clear tabular data.

Example CSV output
Item,Qty,Price
API Credits,1000,$49.00
Priority Support,1,$29.00

Common use cases

  • Spreadsheet import (Excel, Google Sheets)
  • Database loading
  • Financial document processing

Encoded Formats

Encoded formats return the raw PDF file content in a transport-friendly encoding. Useful when you need the original file rather than extracted text.

Base64

format: "base64"

The raw PDF file content encoded as a Base64 string. Useful when you need the original PDF bytes embedded in a JSON payload or email.

Example Base64 output
JVBERi0xLjQKMSAwIG9iago8PAov
VHlwZSAvQ2F0YWxvZwovUGFnZXMg
MiAwIFIKPj4KZW5kb2JqCjIgMCAo...

Common use cases

  • Embedding PDFs in API responses
  • Email attachments
  • Systems that require Base64 input

Binary

format: "binary"

The raw PDF file bytes. The result URL serves the original PDF as a binary download.

Example Binary output
(Binary PDF data - download via the presigned result URL)

Common use cases

  • PDF archival and storage
  • Re-serving downloaded attachment PDFs
  • Proxying PDFs through your own system

Image Formats

Image formats render each page of the PDF as a raster image. Useful for previews, thumbnails, and visual processing.

PNG

format: "png"

High-quality rasterised images of each PDF page in PNG format. Lossless compression, best for documents with text.

Example PNG output
(PNG image data - download via the presigned result URL)

Common use cases

  • Document thumbnails and previews
  • OCR pre-processing
  • Visual comparison and auditing

JPG

format: "jpg"

Rasterised page images in JPEG format. Smaller file sizes than PNG with lossy compression.

Example JPG output
(JPEG image data - download via the presigned result URL)

Common use cases

  • Web thumbnails where file size matters
  • Social media previews
  • Quick visual previews

WebP

format: "webp"

Modern image format with superior compression. Best balance of quality and file size for web display.

Example WebP output
(WebP image data - download via the presigned result URL)

Common use cases

  • Web applications optimised for performance
  • Mobile-friendly document previews
  • Progressive web apps