bnewman
06-23-2009, 09:31 PM
I need to extract data from pdf files. I'm using .NET
I've been pouring over the web to find a way to do this. This is a case where the web is working against me. Putting data into pdf is easy and there's about a gazillion people posting how to do that. That makes it really hard to find how to do the opposite - get data out of pdf.
Ideally, I'd like to convert pdf into xml. Failing that, I'd like to read the text out of it into a string or stream.
I'd love to do it without using a COM component or some buggy open source product (I'm not anti-open source, but we all know there's a lot of half-baked open source software out there).
Is it possible?
I've been pouring over the web to find a way to do this. This is a case where the web is working against me. Putting data into pdf is easy and there's about a gazillion people posting how to do that. That makes it really hard to find how to do the opposite - get data out of pdf.
Ideally, I'd like to convert pdf into xml. Failing that, I'd like to read the text out of it into a string or stream.
I'd love to do it without using a COM component or some buggy open source product (I'm not anti-open source, but we all know there's a lot of half-baked open source software out there).
Is it possible?