OCR letter/edge detection...
Hi, sorry if this seems a little odd for this forum but here goes:
I would like to write a piece of OCR software for myself ;) but currently I know very little about the subject? My question is, what mathematical formulas, algorithms (or how would you do it) are used to detect the individual letters within the image as a whole? Lets say I pass in an image containing the word "help." What formula is used to detect the four individual letters 'h' 'e' 'l' 'p' so that it can be subdivided and each letter passed over for identification?
If anybody has any ideas, or pointers for articles, that would be great.
Thanks, Ron
[655 byte] By [
cakea] at [2007-11-27 2:13:45]

# 6
A classical and simple OCR algorighm is to do the following:
Detect connected clumps of black ink. (this is done with a floodfill type algorithm. If a black pixel is adjacent to another black pixel then those two pixels are in a connected clump. Each connected clump infects any neighbors.
Once you have a connected clump you do a comparison with a dictionary of letter shapes. This comparison is often done with the Hamming distance, (you xor the source image with the one in the dictionary and then count the number of bits left - if the images are identical the count is zero, the greater the count the greater number of pixels did not match and the more likely that the characters did not match.
You do a nearest neighbor match to your dictionary to ID your characters.
So - that is the good news. The algorithms are simple, robust. Tastes great - less filling!!
Then when you find that this system does not work as well as you want, you start fixing the problems.
1) was there ink bleed in the document so that you got huge bunches of characters all connected together. Or alternatively, was the copy so light that things that should have been connected became disconnected?
These are knows as segmentation errors. You divided the entire image into segments, the connected regions, and you made mistakes in that process that make things impossible to unravel later on.
2) was there a font that you have not yet seen and loaded and coded into your dictionary.
These kinds of problems are know as classification errors.
3) were there lines on the drawing (perhaps a background image) that cut accross the image screwing up your segmentation, or was the image perhaps scanned from a book and due to the fold at the binding the end of each line bent up out of sight with the characters being slightly skewed, rotated and out of focus, or perhaps you had to read handwritten addresses done in crayon from the front of an envelope like the Post Office.
These kinds of problems are modeling problems, your images did not match the model you thought you were using.
And of course, there may be noise problems, speckles white or black, coffee stains on the image etc.
The good news is that if your problems are simple, you code can be simple. If your problems are hard the code can be hard.
Once you have solutions for hundreds of slightly different modeling problems, classification problems, and segmentation problems, and furthermore you have built code that will try to detect when you are having problem number 19 a lot so maybe you should up the independent parameter on your little problem 19 solver, you start thinking that you have put a whole lot of work into your OCR system and that you ought to charge people a LOT of money for it because it would be a whole lot of work for them to replicate it all.
It is easy to start on OCR code, it is easy to make progress, there is always another problem to solve, and it is easy to stop when you aren't having fun any more.
If you value your time at anything more than about nickle an hour, a thousand dollar OCR package is quite a bargain.
On the other hand, you pay to go to college, you pay to learn, and you pay to play games. The compiler is FREE, your spare time is your own so whip out yer compiler and start having some FUN!.
# 7
Marlin, I am not sure whether to say sorry, or laugh!
Your right, I have made some progress but I have now hit a brick wall, but I do feel better for the process.
My issue now is that the .png files that I am reading in only (usually) contain one word, but that word is so low res that the edging code i am using isn't really doing it! I need to find another way in order to subdivide the image into each individual letter. Also, the text is usually yellow on a black background or yellow on a green background, which is not helpying any.
Ron
cakea at 2007-7-12 2:09:23 >
