Use Case

Extract Tables from PDF to CSV

Pull structured table data from any PDF URL as clean CSV. Automatic table detection, proper column alignment, and ready-to-import output for spreadsheets, databases, and data pipelines.

Start Free View API Docs

Why is extracting tables from PDFs so difficult?

PDF tables have no semantic structure. There are no <table>, <tr>, or <td> tags - just text positioned at specific coordinates on a page. Columns are inferred from alignment, rows from vertical spacing, and cell boundaries from whitespace patterns. Get any of that wrong and your data is scrambled.

Financial reports, invoices, and compliance documents are the worst offenders. They have merged cells, multi-line headers, footnotes mixed into table areas, and inconsistent formatting across pages. Most PDF libraries punt on table detection entirely, leaving you to build your own heuristics.

PDFPipe detects tables automatically, aligns columns correctly, and outputs clean CSV that imports directly into Excel, Google Sheets, or your database. It works with any PDF URL - including files behind authentication or auto-download triggers.

How it works

One POST request. We handle the rest.

1. Send a request

curl

curl -X POST https://api.pdfpipe.dev/v1/convert \
  -H "Authorization: Bearer pk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/financial-report.pdf",
    "format": "csv",
    "returnMethod": "inline"
  }'

2. Get the response

JSON response

{
  "requestId": "req_01J9X7K2M...",
  "status": "complete",
  "format": "csv",
  "pagesProcessed": 8,
  "creditsUsed": 1,
  "contentType": "text/csv",
  "content": "\"Item\",\"Qty\",\"Total\"\n\"API Credits\",\"1000\",\"$49.00\"\n..."
}

Clean, structured CSV output

PDFPipe detects table boundaries, aligns columns, and outputs properly quoted CSV. Import directly into Excel, Google Sheets, pandas, or any database - no manual cleanup needed.

Automatic table detection across pages
Proper column alignment and merging
RFC 4180 compliant CSV formatting
Handles multi-page tables
Works with complex financial documents

Sample CSV output

"Item","Description","Qty","Unit Price","Total"
"API Credits","Standard plan credits","1000","$0.049","$49.00"
"Priority Support","24/7 email support","1","$29.00","$29.00"
"Custom Integration","Webhook setup","1","$149.00","$149.00"
"Overage Credits","Additional requests","250","$0.059","$14.75"

"","","","Subtotal","$241.75"
"","","","Tax (8.5%)","$20.55"
"","","","Total","$262.30"

Node.js

const response = await fetch(
  "https://api.pdfpipe.dev/v1/convert",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer pk_live_...",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      url: "https://example.com/financial-report.pdf",
      format: "csv",
      returnMethod: "inline",
    }),
  }
);

const data = await response.json();
const csv = data.content;

// Parse CSV rows for your data pipeline
const rows = csv.split("\n").map(row =>
  row.split(",").map(cell => cell.replace(/"/g, ""))
);

console.log("Headers:", rows[0]);
console.log("Data rows:", rows.length - 1);

Works with any language

PDFPipe is a standard REST API. If your language can make HTTP requests, it can extract tables from PDFs. No SDKs required - though we provide them for convenience.

JavaScriptPythonGoRubyPHPJavaC#curl

Start extracting tables from PDFs today

Free tier includes 10 requests per month. No credit card required.

Get Started Free View Pricing