Saturday, October 09, 2010

How to convert PDF to text file

We can use PDFBox open source library.
Download PDFBox .NET version

Download source files

You need three four dll files from the bin folder where you extracted the download rar file.

PDFBox-0.7.3.dll
IKVM.GNU.Classpath
IKVM.Runtime
FontBox-0.1.0-dev

using following assembiles:

using System.Security;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
 
Having a pdf file like below:


We can convert PDF to text like below:

PDDocument doc = PDDocument.load(Server.MapPath("~/StudentsResults.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
string text = stripper.getText(doc);
File.WriteAllText(Server.MapPath("~/StudentsResults.txt"), text);

Output will be like this: 

No comments:

iPhone Launch Screen Sizes

iPhone Portrait iOS 8 Retina HT 5.5 = 1242 X 2208 Retna HD 4.7 = 750 X 1134 iPhone Landscape iOS 8 Retina HD 5.5  2208 X 1242 iPho...