Convert PDF to Markdown — structured text for docs, LLMs, and editors

Converting a PDF to Markdown extracts the document's text layer and formats it as structured Markdown: headings become # and ## prefixes, paragraphs are separated by blank lines, and bold text is preserved with ** markers. The result is a .md file you can open in any Markdown editor, paste into a documentation system, or feed directly into a language model.

Why Markdown instead of plain text

Plain text extraction gives you the words but loses the document's structure — everything arrives as an undifferentiated stream of lines. Markdown preserves that structure in a portable, widely-supported format. GitHub, Notion, Obsidian, Linear, and virtually every modern documentation tool read Markdown natively. Converting to Markdown rather than plain text means your content is immediately ready for editing and publishing without manual re-formatting.

For feeding content to large language models (GPT, Claude, Gemini), Markdown is preferable to plain text because heading structure tells the model what is a title, what is a section, and what is body copy. Better structure leads to better summaries, extractions, and transformations.

What is preserved — and what is not

PDF to Markdown preserves the text layer's structure: headings are detected by font size (relative to the document's body text), paragraphs are reconstructed from line groupings, and bold runs are wrapped in ** markers. This works correctly for documents with a single column of text — reports, articles, books, letters.

Complex layouts are not reconstructed. Tables output as a sequence of cell values without grid structure. Multi-column layouts may interleave text from adjacent columns. Embedded images are absent from the output. If your goal is a faithful visual copy of each page, PDF to JPG or PDF to PNG will serve you better.

Text layer vs scanned PDFs

PDF to Markdown requires a text layer — the embedded text objects that a PDF viewer lets you select and copy. If you open a PDF, try to select some text, and find you can't, it is likely a scanned PDF (a photograph of a page with no text layer). For scanned PDFs, use the OCR PDF tool to add a text layer first, then convert to Markdown.

The on-device difference

Filum runs PDF to Markdown entirely in your browser using the same extraction engine as PDF to Word and PDF to Text. The file is never sent to a server — conversion happens on your device. This matters for confidential documents: legal filings, financial reports, medical records, internal memos. You control the file at every step.

Using the output

The downloaded .md file opens in any text editor and renders correctly in Markdown-aware tools. In VS Code, preview it with Ctrl+Shift+V. In Obsidian, drop it into your vault. In Notion, use the import function and select Markdown. For LLM use, paste the contents directly into the chat — the heading structure gives the model clear document context.

Why Markdown instead of plain text

What is preserved — and what is not

Text layer vs scanned PDFs

The on-device difference

Using the output

Related guides