Jump to content

A New Chinese Character OCR/Translation App


Recommended Posts

. . . is available on Google Play and the Apple App Store, if someone wants to try it - Waygo Translator/Dictionary

 

 

I use my Pleco OCR App, which is occasionally useful, but not something I use regularly. Their description of it (the Pleco app) seems to summarize the difficulties of OCR pretty well. Judging from the mixed reviews on the Google Play site, I would think the Waygo App is fairly similar.

 

Principles

Much like our handwriting recognizer, our OCR system works by matching characters to templates in a database; it turns the image of the character into a simple mathematical structure, identifies its key features (lengths / positions / curvatures of strokes, etc), then searches through its database of 10,000+ Chinese characters to find the one that most closely matches that pattern.

 

However, while the handwriting recognizer always has a very clear picture of the character you drew - it knows exactly where every stroke is located, where it starts / ends, what order strokes were drawn in, where it overlaps other strokes - the OCR system has to contend with a much murkier one; characters on a camera image can be small, grainy, and out-of-focus, and the same calligraphic flourishes that make printed Chinese text so pretty to look at also make it harder to see the underlying structure of each character.

 

OCR is also up against some psychological hurdles compared to handwriting input; while a mis-recognized handwritten character can be chalked up to one’s poor handwriting / incorrect stroke order, with a printed character there’s nobody to blame but the recognition software. On top of which, because OCR must recognize multiple characters at a time, there’s less of an opportunity for it to show you its other, less likely matches like the handwriting recognizer does. Handling lots of characters at once also means that even if gets a higher percentage of them accurate on the first try, if just a few of those are incorrect it’ll still feel as if it got the entire block of text wrong. So while handwriting only has to contend with one character at a time, and can even be forgiven for getting that character wrong as long as the correct character is among its top 5 matches, OCR has to deal with multiple characters and get every one of them exactly correct in order to seem like it’s doing its job.

 

(this is all a convoluted way of asking you to be patient if things don’t work perfectly every time; we’re steadily working to bring this even closer to character recognizer perfection, but in the meantime we hope you’ll find it accurate enough to be useful in its current form)

 

 

Pleco offers an entire system (at varying prices) of dictionaries, text-to-speech, handwriting recognition, etc.

 

 

Link to comment

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...