Use Case

Convert PDF to Markdown API

Turn any PDF URL into clean, structured Markdown with a single API call. Built for LLM pipelines, RAG systems, vector databases, and documentation workflows - with headings, tables, and lists preserved.

Start Free View API Docs

Why convert PDFs to Markdown?

Large language models and RAG systems work best with structured text. Raw PDF text loses all formatting - headings become indistinguishable from body text, tables collapse into unreadable strings, and lists lose their hierarchy. Markdown preserves that structure in a format every LLM understands.

If you are building a RAG pipeline, you need to chunk documents at semantic boundaries - section headings, paragraph breaks, table edges. Markdown gives you those boundaries as syntax, making it trivial to split documents into meaningful chunks for embedding.

PDFPipe converts any PDF URL to clean Markdown with proper heading levels, formatted tables, ordered and unordered lists, and bold/italic emphasis. Works with inline PDFs and auto-download files behind authentication, redirects, or JavaScript triggers.

How it works

One POST request. We handle the rest.

1. Send a request

curl

curl -X POST https://api.pdfpipe.dev/v1/convert \
  -H "Authorization: Bearer pk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/whitepaper.pdf",
    "format": "markdown",
    "returnMethod": "inline"
  }'

2. Get the response

JSON response

{
  "requestId": "req_01J9X7K2M...",
  "status": "complete",
  "format": "markdown",
  "pagesProcessed": 12,
  "creditsUsed": 1,
  "contentType": "text/markdown",
  "content": "# Quarterly Financial Report\n\n## Q4 2025\n..."
}

Structured Markdown, ready for AI

PDFPipe preserves document structure as proper Markdown syntax. Headings, tables, lists, and emphasis are all retained - giving your LLM or RAG pipeline the semantic context it needs to generate accurate responses.

Heading hierarchy preserved (H1 through H6)
Tables converted to Markdown pipe syntax
Ordered and unordered lists maintained
Bold and italic emphasis detected
Clean chunk boundaries for RAG pipelines

Sample Markdown output

# Quarterly Financial Report

## Q4 2025

**Prepared by:** Acme Corp
**Date:** January 15, 2026

---

## Executive Summary

Revenue for Q4 2025 reached **$4.2M**, a 23% increase
over the previous quarter. Operating margins improved
to 18.5%, driven by reduced infrastructure costs.

## Key Metrics

| Metric | Value |
|--------|-------|
| Revenue | $4,200,000 |
| Operating Margin | 18.5% |
| Customer Acquisition Cost | $142 |
| Monthly Active Users | 52,400 |

## Recommendations

1. Expand API capacity to handle projected Q1 growth
2. Invest in automated onboarding pipeline
3. Evaluate enterprise tier pricing model

Node.js - RAG pipeline

const response = await fetch(
  "https://api.pdfpipe.dev/v1/convert",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer pk_live_...",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      url: "https://example.com/whitepaper.pdf",
      format: "markdown",
      returnMethod: "inline",
    }),
  }
);

const data = await response.json();
const markdown = data.content;

// Split into chunks for your RAG pipeline
const chunks = markdown.split(/\n#{1,3} /).filter(Boolean);

// Feed each chunk to your vector database
for (const chunk of chunks) {
  await vectorDb.upsert({
    content: chunk,
    embedding: await embed(chunk),
  });
}

Built for AI workflows

Feed PDFs into ChatGPT, Claude, or any LLM. Build RAG pipelines that chunk on heading boundaries. Index documents into Pinecone, Weaviate, or Chroma. PDFPipe gives you the Markdown that makes all of it work.

OpenAIAnthropicLangChainLlamaIndexPineconeWeaviateChroma

Start converting PDFs to Markdown today

Free tier includes 10 requests per month. No credit card required.

Get Started Free View Pricing