Output Formats

PDFPipe supports 10 output formats across three categories. Set the format field in your convert request to choose one.

Inline delivery (returnMethod: "inline"): Text-oriented extraction formats (json, text, markdown, xml, csv) are usually the best fit for embedding in the JSON API response. Encoded and image formats (base64, binary, png, jpg, webp) are returned as Base64 strings when inline; the response includes contentEncoding: "base64" and a matching contentType (e.g. image/png). Very large outputs may fall back to a presigned URL with returnMethodFallback and returnMethodFallbackReason.

Format	Value	Category	Content-Type	Tiers
JSON	`json`	Extraction	`application/json`	All tiers
Text	`text`	Extraction	`text/plain`	All tiers
Markdown	`markdown`	Extraction	`text/markdown`	Starter+
XML	`xml`	Extraction	`application/xml`	Starter+
CSV	`csv`	Extraction	`text/csv`	Starter+
Base64	`base64`	Encoded	`text/plain`	Starter+
Binary	`binary`	Encoded	`application/pdf`	Starter+
PNG	`png`	Image	`image/png`	Starter+
JPG	`jpg`	Image	`image/jpeg`	Starter+
WebP	`webp`	Image	`image/webp`	Starter+

Extraction Formats

Extraction formats parse the PDF content and return structured or plain-text data. These are the most commonly used formats.

JSON

format: "json"

Structured page-by-page extraction with text, tables, metadata, font information, and coordinates. The richest output format.

Example JSON output

{
  "pages": [
    {
      "page": 1,
      "width": 612,
      "height": 792,
      "text": "Invoice #2026-0142...",
      "tables": [
        {
          "rows": [
            ["Item", "Qty", "Price"],
            ["API Credits", "1000", "$49.00"]
          ]
        }
      ],
      "metadata": {
        "fonts": ["Helvetica"],
        "hasImages": false
      }
    }
  ],
  "documentMetadata": {
    "title": "Invoice",
    "author": "Acme Corp",
    "pageCount": 1,
    "fileSize": 24576
  }
}

Common use cases

Data extraction and processing pipelines
LLM / RAG document ingestion
Table extraction for spreadsheets or databases

Text

format: "text"

Plain text extraction. All pages concatenated with page breaks. No structural metadata.

Example Text output

Invoice #2026-0142
Date: February 15, 2026

Item          Qty    Price
API Credits   1000   $49.00
Priority      1      $29.00

Total: $78.00

Common use cases

Full-text search indexing
Simple text processing
Content previews

Markdown

format: "markdown"

Markdown-formatted text with headings, tables, and lists preserved as Markdown syntax. Ideal for LLM prompts.

Example Markdown output

# Invoice #2026-0142

**Date:** February 15, 2026

| Item | Qty | Price |
|------|-----|-------|
| API Credits | 1000 | $49.00 |
| Priority | 1 | $29.00 |

**Total:** $78.00

Common use cases

LLM prompt context (best format for AI models)
Documentation generation
Content migration

XML

format: "xml"

Structured XML output with page, text, and metadata elements. Useful for systems that consume XML natively.

Example XML output

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <metadata>
    <title>Invoice #2026-0142</title>
    <pageCount>1</pageCount>
  </metadata>
  <pages>
    <page number="1">
      <text>Invoice #2026-0142...</text>
    </page>
  </pages>
</document>

Common use cases

Enterprise system integrations
XSLT transformation pipelines
Legacy system compatibility

CSV

format: "csv"

Comma-separated values from detected tables. Each table is output sequentially. Best for PDFs with clear tabular data.

Example CSV output

Item,Qty,Price
API Credits,1000,$49.00
Priority Support,1,$29.00

Common use cases

Spreadsheet import (Excel, Google Sheets)
Database loading
Financial document processing

Encoded Formats

Encoded formats return the raw PDF file content in a transport-friendly encoding. Useful when you need the original file rather than extracted text.

Base64

format: "base64"

The raw PDF file content encoded as a Base64 string. Useful when you need the original PDF bytes embedded in a JSON payload or email.

Example Base64 output

JVBERi0xLjQKMSAwIG9iago8PAov
VHlwZSAvQ2F0YWxvZwovUGFnZXMg
MiAwIFIKPj4KZW5kb2JqCjIgMCAo...

Common use cases

Embedding PDFs in API responses
Email attachments
Systems that require Base64 input

Binary

format: "binary"

The raw PDF file bytes. The result URL serves the original PDF as a binary download.

Example Binary output

(Binary PDF data - download via the presigned result URL)

Common use cases

PDF archival and storage
Re-serving downloaded attachment PDFs
Proxying PDFs through your own system

Image Formats

Image formats render each page of the PDF as a raster image. Useful for previews, thumbnails, and visual processing.

PNG

format: "png"

High-quality rasterised images of each PDF page in PNG format. Lossless compression, best for documents with text.

Example PNG output

(PNG image data - download via the presigned result URL)

Common use cases

Document thumbnails and previews
OCR pre-processing
Visual comparison and auditing

JPG

format: "jpg"

Rasterised page images in JPEG format. Smaller file sizes than PNG with lossy compression.

Example JPG output

(JPEG image data - download via the presigned result URL)

Common use cases

Web thumbnails where file size matters
Social media previews
Quick visual previews

WebP

format: "webp"

Modern image format with superior compression. Best balance of quality and file size for web display.

Example WebP output

(WebP image data - download via the presigned result URL)

Common use cases

Web applications optimised for performance
Mobile-friendly document previews
Progressive web apps