PDF Extract Text
Pulls the written text out of PDF files into plain .txt. Works in batch, with the option of one .txt per PDF or all merged into a single file.
What it does
Use this when you need raw text out of reports, book PDFs, or contracts - so you can paste into Word, search across them, or feed them to NLP/analysis tools. For scanned PDFs (image-only), use the OCR tool instead, not this one.
How to use
- Drag PDFs into the list.
- Pick an Output Mode: one
.txtper PDF, or all merged into a singleresults.txt. - Optionally type a Page Range (e.g.
1-5, 10). Leave blank to extract all pages. - Click Run.
Options
- Output Mode: "Separate file per PDF" is the default and most common. "Merge into one file" puts every text body inside a single
results.txt. - Page Range: Examples:
1-5or1-5, 10, 15-20. Blank means every page. - Preserve Physical Layout: Leave on to keep columns and alignment. Turn off for flat linear text, which is better for NLP/search.
- Add Page Separators: Leave on to get
--- Page 2 ---markers in the output. In merge mode, this also tells you which text came from which file.
Examples
Search across 12 monthly reports: Add all reports, run with "Separate file" mode. You get 12 .txt files.
Pull one chapter out of a book: Add book.pdf, type 45-120 in Page Range. Only that chapter is extracted.
Analyse 50 contracts as one corpus: Add all of them, "Merge" mode, "Preserve layout" off, "Separators" on. You get one results.txt.
Find which PDFs are scanned: Add 20 PDFs and run. Any .txt that comes out empty corresponds to a scanned PDF, run those through OCR.
Watch out
- Scanned (image-only) PDFs produce empty output. Use OCR for those.
- Encrypted PDFs cannot be extracted. Unlock them with PDF Encrypt first.
- The page range applies to every file, you cannot set per-file ranges.
- Complex tables and footnotes may not retain their layout perfectly. Convert to Word for those.
- In merge mode the output filename is fixed (
results.txt).
License
Free tier has a monthly extraction cap. Office plan removes it.