Output Formats
PDFPipe supports 10 output formats across three categories. Set the format field in your convert request to choose one.
returnMethod: "inline"): Text-oriented extraction formats (json, text, markdown, xml, csv) are usually the best fit for embedding in the JSON API response. Encoded and image formats (base64, binary, png, jpg, webp) are returned as Base64 strings when inline; the response includes contentEncoding: "base64" and a matching contentType (e.g. image/png). Very large outputs may fall back to a presigned URL with returnMethodFallback and returnMethodFallbackReason.| Format | Value | Category | Content-Type | Tiers |
|---|---|---|---|---|
| JSON | json | Extraction | application/json | All tiers |
| Text | text | Extraction | text/plain | All tiers |
| Markdown | markdown | Extraction | text/markdown | Starter+ |
| XML | xml | Extraction | application/xml | Starter+ |
| CSV | csv | Extraction | text/csv | Starter+ |
| Base64 | base64 | Encoded | text/plain | Starter+ |
| Binary | binary | Encoded | application/pdf | Starter+ |
| PNG | png | Image | image/png | Starter+ |
| JPG | jpg | Image | image/jpeg | Starter+ |
| WebP | webp | Image | image/webp | Starter+ |
Extraction Formats
Extraction formats parse the PDF content and return structured or plain-text data. These are the most commonly used formats.
JSON
format: "json"Structured page-by-page extraction with text, tables, metadata, font information, and coordinates. The richest output format.
{
"pages": [
{
"page": 1,
"width": 612,
"height": 792,
"text": "Invoice #2026-0142...",
"tables": [
{
"rows": [
["Item", "Qty", "Price"],
["API Credits", "1000", "$49.00"]
]
}
],
"metadata": {
"fonts": ["Helvetica"],
"hasImages": false
}
}
],
"documentMetadata": {
"title": "Invoice",
"author": "Acme Corp",
"pageCount": 1,
"fileSize": 24576
}
}Common use cases
- Data extraction and processing pipelines
- LLM / RAG document ingestion
- Table extraction for spreadsheets or databases
Text
format: "text"Plain text extraction. All pages concatenated with page breaks. No structural metadata.
Invoice #2026-0142
Date: February 15, 2026
Item Qty Price
API Credits 1000 $49.00
Priority 1 $29.00
Total: $78.00Common use cases
- Full-text search indexing
- Simple text processing
- Content previews
Markdown
format: "markdown"Markdown-formatted text with headings, tables, and lists preserved as Markdown syntax. Ideal for LLM prompts.
# Invoice #2026-0142
**Date:** February 15, 2026
| Item | Qty | Price |
|------|-----|-------|
| API Credits | 1000 | $49.00 |
| Priority | 1 | $29.00 |
**Total:** $78.00Common use cases
- LLM prompt context (best format for AI models)
- Documentation generation
- Content migration
XML
format: "xml"Structured XML output with page, text, and metadata elements. Useful for systems that consume XML natively.
<?xml version="1.0" encoding="UTF-8"?>
<document>
<metadata>
<title>Invoice #2026-0142</title>
<pageCount>1</pageCount>
</metadata>
<pages>
<page number="1">
<text>Invoice #2026-0142...</text>
</page>
</pages>
</document>Common use cases
- Enterprise system integrations
- XSLT transformation pipelines
- Legacy system compatibility
CSV
format: "csv"Comma-separated values from detected tables. Each table is output sequentially. Best for PDFs with clear tabular data.
Item,Qty,Price
API Credits,1000,$49.00
Priority Support,1,$29.00Common use cases
- Spreadsheet import (Excel, Google Sheets)
- Database loading
- Financial document processing
Encoded Formats
Encoded formats return the raw PDF file content in a transport-friendly encoding. Useful when you need the original file rather than extracted text.
Base64
format: "base64"The raw PDF file content encoded as a Base64 string. Useful when you need the original PDF bytes embedded in a JSON payload or email.
JVBERi0xLjQKMSAwIG9iago8PAov
VHlwZSAvQ2F0YWxvZwovUGFnZXMg
MiAwIFIKPj4KZW5kb2JqCjIgMCAo...Common use cases
- Embedding PDFs in API responses
- Email attachments
- Systems that require Base64 input
Binary
format: "binary"The raw PDF file bytes. The result URL serves the original PDF as a binary download.
(Binary PDF data - download via the presigned result URL)Common use cases
- PDF archival and storage
- Re-serving downloaded attachment PDFs
- Proxying PDFs through your own system
Image Formats
Image formats render each page of the PDF as a raster image. Useful for previews, thumbnails, and visual processing.
PNG
format: "png"High-quality rasterised images of each PDF page in PNG format. Lossless compression, best for documents with text.
(PNG image data - download via the presigned result URL)Common use cases
- Document thumbnails and previews
- OCR pre-processing
- Visual comparison and auditing
JPG
format: "jpg"Rasterised page images in JPEG format. Smaller file sizes than PNG with lossy compression.
(JPEG image data - download via the presigned result URL)Common use cases
- Web thumbnails where file size matters
- Social media previews
- Quick visual previews
WebP
format: "webp"Modern image format with superior compression. Best balance of quality and file size for web display.
(WebP image data - download via the presigned result URL)Common use cases
- Web applications optimised for performance
- Mobile-friendly document previews
- Progressive web apps