GitHub - kwesseling/pdf-extract: Super easy extraction of content from PDF-files

#PDF Extract

As part of integration-testing I needed to extract text from PDF's - all existing solutions was either too cumbersome or had a wierd API.

PDF Extract works by executing an external executable (Win64 only!) - but is fully self-contained and only exposes streams to the outside world.

Internally it uses Xpdf.

How to

To extract text simply use provided extractor-class (here from a file):

using (var pdfStream = File.OpenRead("my.pdf"))
using (var extractor = new Extractor())
{
    var extractedText = extractor.ExtractToString(pdfStream);
}

Or extract from/to a stream

using (var extractor = new Extractor())
{
    using (var rawTextStream = extractor.ExtractText(pdfStream))
        /// ...
}

Install

Simply add the Nuget package:

PM> Install-Package pdf-extract

Requirements

You'll need .NET Framework 4.5.1 or later on 64 bit Windows to use the precompiled binaries.

License

PDF Extract is licensed under the GNU General Pulbic License (GPL), version 2 or 3 similar to Xpdf.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
build		build
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to

Install

Requirements

License

About

Releases

Packages

Languages

License

kwesseling/pdf-extract

Folders and files

Latest commit

History

Repository files navigation

How to

Install

Requirements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages