PDF Algorithm

What kind of algorithm does Adobe Reader implement to read (and write) PDF files. I must know this because I want to read a PDF file and extract the text. Any ideas?R. Hollenstein
[193 byte] By [robinhollensteina] at [2007-10-2 3:57:51]
# 1
http://partners.adobe.com/public/developer/pdf/index_reference.htmlIt's not as easy as you think, because PDF is a superset of PostScript, which is a Turing-powerful language.
YAT_Archivista at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 2
There are several java pdf libraries. For example, iText
dingjinga at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 3
Do you know how to use itext?Have you got any examples on how just extract the text of a pdf document?R. Hollenstein
robinhollensteina at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 4

The response from dingjing should have prompted you to search the internet to find those Java PDF libraries. (The search keywords ought to be obvious.) And then to read the publicly available information about them to see if they can indeed read PDF files or whether they can only write it. And for the ones that you consider might be useful, to look at the examples that come with them. Your last post trivializes the problem.

I will save you a bit of time by posting a bit from iText's FAQ page:

"Can I read an existing PDF-document with iText? Can I use a template PDF and fill it with data?

You can extract complete pages of an existing PDF document and copy them to a newly created PDF document. You could use this to add pagenumbers or to combine different small PDF into one large document (or just the opposite). You can also use iText to fill in the fields of an AcroForm. This is (or will be) explained in the tutorial.

Is it possible to parse an existing PDF-document and convert it to another format (HTML, DOC, EXCEL)?

No, the pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. For instance: you can't retrieve a table object from a PDF file. Tables are formed by placing text and lines at selected places."

DrClapa at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 5
Dr Clap:Dont think I havent searched the internet. I have even downloaded itext, but i have tried the com.lowagie.text.pdf.PdfReader class and there is apparently no method that just extracts the text...Sorry for my incorrect formulation of the postR. Hollenstein
robinhollensteina at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 6

Dear Dr. Clap - you are a jack@ss, that is to say the least - oh but you probably already knew that. Here is a tech-tip: Climb down from your holy high horse, get the chip off your shoulder, pull that stick out of your @ss, and come back to earth. Understand that The point of asking a question in a forum, such as this one, is to gleam an answer, not to get some Bourgeois lecture on how to interpret a comment or formulate a post or to listen to some guy and his ramblings about how great one is that they already knew the answer.

sunjavasuxa at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 7

Well that was certainly a refreshing and insightful answer to the OP's six month old and long dead question.

Unfortunately I fail to see exactly how this relates to Algorithms. Perhaps this topic belong over in the section about Patterns and OO Design.

I think the Go4 discussed the Abusive Rant pattern right after Singltons and just before the Loser pattern.

marlin314a at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 8
> The point of asking a question in a forum, such as> this one, is to gleam an answer, I think you mean 'glean' here.
dubwaia at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...
# 9

> not <snip> to listen to some guy and his ramblings about how great one is that

> they already knew the answer.

The point of his reply was that he didn't already know the answer, but managed to find it in a couple of minutes using nothing more than Google and common sense.

YAT_Archivista at 2007-7-15 23:19:29 > top of Java-index,Other Topics,Algorithms...