Output Formats
PDFPipe supports 10 output formats across three categories. Set the format field in your convert request to choose one.
| Format | Value | Category | Content-Type | Tiers |
|---|---|---|---|---|
| JSON | json | Extraction | application/json | All tiers |
| Text | text | Extraction | text/plain | All tiers |
| Markdown | markdown | Extraction | text/markdown | Starter+ |
| XML | xml | Extraction | application/xml | Starter+ |
| CSV | csv | Extraction | text/csv | Starter+ |
| Base64 | base64 | Encoded | text/plain | Starter+ |
| Binary | binary | Encoded | application/pdf | Starter+ |
| PNG | png | Image | image/png | Starter+ |
| JPG | jpg | Image | image/jpeg | Starter+ |
| WebP | webp | Image | image/webp | Starter+ |
Extraction Formats
Extraction formats parse the PDF content and return structured or plain-text data. These are the most commonly used formats.
JSON
format: "json"Structured page-by-page extraction with text, tables, metadata, font information, and coordinates. The richest output format.
{
"pages": [
{
"page": 1,
"width": 612,
"height": 792,
"text": "Invoice #2026-0142...",
"tables": [
{
"rows": [
["Item", "Qty", "Price"],
["API Credits", "1000", "$49.00"]
]
}
],
"metadata": {
"fonts": ["Helvetica"],
"hasImages": false
}
}
],
"documentMetadata": {
"title": "Invoice",
"author": "Acme Corp",
"pageCount": 1,
"fileSize": 24576
}
}Common use cases
- Data extraction and processing pipelines
- LLM / RAG document ingestion
- Table extraction for spreadsheets or databases
Text
format: "text"Plain text extraction. All pages concatenated with page breaks. No structural metadata.
Invoice #2026-0142
Date: February 15, 2026
Item Qty Price
API Credits 1000 $49.00
Priority 1 $29.00
Total: $78.00Common use cases
- Full-text search indexing
- Simple text processing
- Content previews
Markdown
format: "markdown"Markdown-formatted text with headings, tables, and lists preserved as Markdown syntax. Ideal for LLM prompts.
# Invoice #2026-0142
**Date:** February 15, 2026
| Item | Qty | Price |
|------|-----|-------|
| API Credits | 1000 | $49.00 |
| Priority | 1 | $29.00 |
**Total:** $78.00Common use cases
- LLM prompt context (best format for AI models)
- Documentation generation
- Content migration
XML
format: "xml"Structured XML output with page, text, and metadata elements. Useful for systems that consume XML natively.
<?xml version="1.0" encoding="UTF-8"?>
<document>
<metadata>
<title>Invoice #2026-0142</title>
<pageCount>1</pageCount>
</metadata>
<pages>
<page number="1">
<text>Invoice #2026-0142...</text>
</page>
</pages>
</document>Common use cases
- Enterprise system integrations
- XSLT transformation pipelines
- Legacy system compatibility
CSV
format: "csv"Comma-separated values from detected tables. Each table is output sequentially. Best for PDFs with clear tabular data.
Item,Qty,Price
API Credits,1000,$49.00
Priority Support,1,$29.00Common use cases
- Spreadsheet import (Excel, Google Sheets)
- Database loading
- Financial document processing
Encoded Formats
Encoded formats return the raw PDF file content in a transport-friendly encoding. Useful when you need the original file rather than extracted text.
Base64
format: "base64"The raw PDF file content encoded as a Base64 string. Useful when you need the original PDF bytes embedded in a JSON payload or email.
JVBERi0xLjQKMSAwIG9iago8PAov
VHlwZSAvQ2F0YWxvZwovUGFnZXMg
MiAwIFIKPj4KZW5kb2JqCjIgMCAo...Common use cases
- Embedding PDFs in API responses
- Email attachments
- Systems that require Base64 input
Binary
format: "binary"The raw PDF file bytes. The result URL serves the original PDF as a binary download.
(Binary PDF data — download via the presigned result URL)Common use cases
- PDF archival and storage
- Re-serving downloaded attachment PDFs
- Proxying PDFs through your own system
Image Formats
Image formats render each page of the PDF as a raster image. Useful for previews, thumbnails, and visual processing.
PNG
format: "png"High-quality rasterised images of each PDF page in PNG format. Lossless compression, best for documents with text.
(PNG image data — download via the presigned result URL)Common use cases
- Document thumbnails and previews
- OCR pre-processing
- Visual comparison and auditing
JPG
format: "jpg"Rasterised page images in JPEG format. Smaller file sizes than PNG with lossy compression.
(JPEG image data — download via the presigned result URL)Common use cases
- Web thumbnails where file size matters
- Social media previews
- Quick visual previews
WebP
format: "webp"Modern image format with superior compression. Best balance of quality and file size for web display.
(WebP image data — download via the presigned result URL)Common use cases
- Web applications optimised for performance
- Mobile-friendly document previews
- Progressive web apps