Skip to content

Super easy extraction of content from PDF-files

License

Notifications You must be signed in to change notification settings

kwesseling/pdf-extract

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

    #PDF Extract

As part of integration-testing I needed to extract text from PDF's - all existing solutions was either too cumbersome or had a wierd API.

PDF Extract works by executing an external executable (Win64 only!) - but is fully self-contained and only exposes streams to the outside world.

Internally it uses Xpdf.

How to

To extract text simply use provided extractor-class (here from a file):

using (var pdfStream = File.OpenRead("my.pdf"))
using (var extractor = new Extractor())
{
    var extractedText = extractor.ExtractToString(pdfStream);
}

Or extract from/to a stream

using (var extractor = new Extractor())
{
    using (var rawTextStream = extractor.ExtractText(pdfStream))
        /// ...
}

Install

Simply add the Nuget package:

PM> Install-Package pdf-extract

Requirements

You'll need .NET Framework 4.5.1 or later on 64 bit Windows to use the precompiled binaries.

License

PDF Extract is licensed under the GNU General Pulbic License (GPL), version 2 or 3 similar to Xpdf.

About

Super easy extraction of content from PDF-files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 97.9%
  • Batchfile 2.1%