View Post [edit]
Poster: | Branko Collin | Date: | Feb 22, 2005 8:24am |
Forum: | toronto | Subject: | Re: Universal OCR |
DP now tries to retain at least page numbers in its HTML versions (though they are unlikely to appear at the exact page boundaries all the time, because we reconnect words that were broken across page boundaries). Also, footnotes, columns and other items that span pages are unlikely to be in the right position, so to speak.
In other words, when sending a text through DP, it is not unreasonable to ask our volunteers to retain page breaks.
"Is this enough for designing a utility that maps the coordinates of "delightf-ul" to the corrected word "delightful"?"
I don't see why not.
"Would this be useful?"
I think it is.
Reply [edit]
Poster: | Branko Collin | Date: | Feb 22, 2005 8:34am |
Forum: | toronto | Subject: | Re: Universal OCR |