After OCR - > Put the text into the correct reading order
I'm working in a programa that have to read PDFs to extract some data.
Till this moment i've extracted the text objects with the size, font, text and position (x1,y1,x2,y2).
After that the program recognizes text blocks and achieve again its position.
The document can have 1,2 or 3 columns and i need to put the text into the correct reading.
Let's try to explain with one example:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
&&&& &&&& && &&& &&&& &&&&& &&&& &&&& &&&&&&&
&&&&& &&&&& &&&& &&&&&& &&&& &&&&& &&&&&&&
&&&&&&& && &&&&&&& &&& &&&&& &&&&& &&&&&& &&&&
***********
#### ### ###### ######AAAA AAAAA AAAA AA
###### #### #### ###
### ### ### # ### ## #AA AAAA AAAA AAAA A
Zonex1y1x2y2
%101060020
@103060040
&105060080
*24090270100
#10260240140
A250260600140
How to make an algorithm to sort it in a correct reading order.
Does any one can guide me?
Thanks in advance,
McRunner

