Use Case

Convert PDF to JSON API

Extract structured data from any PDF URL with a single API call. Get text, tables, metadata, and coordinates in clean JSON — from both inline and auto-download PDFs.

Why is PDF to JSON so hard?

PDFs were designed for printing, not data extraction. Text coordinates are absolute, tables have no semantic markup, and there's no standard way to identify structure. Client-side PDF libraries give you raw glyphs — turning that into usable JSON requires layout analysis, table detection, and metadata extraction.

It gets worse when the PDF isn't served inline. Many enterprise systems, government portals, and document management platforms serve PDFs as auto-downloads — triggered by redirects, tokens, or JavaScript. Your HTTP client never sees the file.

PDFPipe solves both problems. Send us any URL. We auto-detect whether it's inline or an attachment, fetch the PDF (using headless Chromium for downloads), parse it, and return structured JSON.

How it works

One POST request. We handle the rest.

1. Send a request

curl
curl -X POST https://api.pdfpipe.dev/v1/convert \
  -H "Authorization: Bearer pk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/invoice.pdf",
    "format": "json"
  }'

2. Get the response

JSON response
{
  "requestId": "req_01J9X7K2M...",
  "status": "complete",
  "format": "json",
  "pagesProcessed": 3,
  "creditsUsed": 1,
  "resultUrl": "https://pdfpipe-results.s3..."
}

Rich, structured JSON output

PDFPipe doesn't just dump raw text. You get page-by-page extraction with coordinates, detected tables as arrays, document metadata, font information, and image detection — all in a predictable, well-documented schema.

  • Per-page text extraction with coordinates
  • Table detection as structured arrays
  • Document metadata (title, author, page count)
  • Font and image information
  • Consistent schema across all PDFs
Sample JSON output
{
  "pages": [
    {
      "page": 1,
      "width": 612,
      "height": 792,
      "text": "Invoice #2026-0142\nDate: February 15, 2026...",
      "tables": [
        {
          "rows": [
            ["Item", "Qty", "Price"],
            ["API Credits", "1000", "$49.00"],
            ["Priority Support", "1", "$29.00"]
          ]
        }
      ],
      "metadata": {
        "fonts": ["Helvetica", "Helvetica-Bold"],
        "hasImages": false
      }
    }
  ],
  "documentMetadata": {
    "title": "Invoice #2026-0142",
    "author": "Acme Corp",
    "pageCount": 3,
    "fileSize": 245760
  }
}
Node.js
const response = await fetch(
  "https://api.pdfpipe.dev/v1/convert",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer pk_live_...",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      url: "https://example.com/invoice.pdf",
      format: "json",
    }),
  }
);

const data = await response.json();
const result = await fetch(data.resultUrl);
const pdf = await result.json();

console.log(pdf.pages[0].text);
console.log(pdf.pages[0].tables);

Works with any language

PDFPipe is a standard REST API. If your language can make HTTP requests, it can use PDFPipe. No SDKs required — though we provide them for convenience.

JavaScriptPythonGoRubyPHPJavaC#curl

Start converting PDFs to JSON today

Free tier includes 10 requests per month. No credit card required.