Convert PDF to JSON API
Extract structured data from any PDF URL with a single API call. Get text, tables, metadata, and coordinates in clean JSON — from both inline and auto-download PDFs.
Why is PDF to JSON so hard?
PDFs were designed for printing, not data extraction. Text coordinates are absolute, tables have no semantic markup, and there's no standard way to identify structure. Client-side PDF libraries give you raw glyphs — turning that into usable JSON requires layout analysis, table detection, and metadata extraction.
It gets worse when the PDF isn't served inline. Many enterprise systems, government portals, and document management platforms serve PDFs as auto-downloads — triggered by redirects, tokens, or JavaScript. Your HTTP client never sees the file.
PDFPipe solves both problems. Send us any URL. We auto-detect whether it's inline or an attachment, fetch the PDF (using headless Chromium for downloads), parse it, and return structured JSON.
How it works
One POST request. We handle the rest.
1. Send a request
curl -X POST https://api.pdfpipe.dev/v1/convert \
-H "Authorization: Bearer pk_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/invoice.pdf",
"format": "json"
}'2. Get the response
{
"requestId": "req_01J9X7K2M...",
"status": "complete",
"format": "json",
"pagesProcessed": 3,
"creditsUsed": 1,
"resultUrl": "https://pdfpipe-results.s3..."
}Rich, structured JSON output
PDFPipe doesn't just dump raw text. You get page-by-page extraction with coordinates, detected tables as arrays, document metadata, font information, and image detection — all in a predictable, well-documented schema.
- Per-page text extraction with coordinates
- Table detection as structured arrays
- Document metadata (title, author, page count)
- Font and image information
- Consistent schema across all PDFs
{
"pages": [
{
"page": 1,
"width": 612,
"height": 792,
"text": "Invoice #2026-0142\nDate: February 15, 2026...",
"tables": [
{
"rows": [
["Item", "Qty", "Price"],
["API Credits", "1000", "$49.00"],
["Priority Support", "1", "$29.00"]
]
}
],
"metadata": {
"fonts": ["Helvetica", "Helvetica-Bold"],
"hasImages": false
}
}
],
"documentMetadata": {
"title": "Invoice #2026-0142",
"author": "Acme Corp",
"pageCount": 3,
"fileSize": 245760
}
}const response = await fetch(
"https://api.pdfpipe.dev/v1/convert",
{
method: "POST",
headers: {
"Authorization": "Bearer pk_live_...",
"Content-Type": "application/json",
},
body: JSON.stringify({
url: "https://example.com/invoice.pdf",
format: "json",
}),
}
);
const data = await response.json();
const result = await fetch(data.resultUrl);
const pdf = await result.json();
console.log(pdf.pages[0].text);
console.log(pdf.pages[0].tables);Works with any language
PDFPipe is a standard REST API. If your language can make HTTP requests, it can use PDFPipe. No SDKs required — though we provide them for convenience.
Start converting PDFs to JSON today
Free tier includes 10 requests per month. No credit card required.