Linux OCR Module
The Scanbot SDK's OCR Engine can transform written text into machine-readable data from still images. It is the backbone of the SDK's Data Capture Modules, enabling fast and accurate data extraction from various document formats.
The OCR feature is powered by a modern machine learning-based engine designed to provide high-speed, accurate text recognition. Currently, it supports languages that use Latin characters.
Preconditions to achieve good OCR results
A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. The SDK's UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.
Size and position
Put the document on a flat surface. Take the photo from straight above and hold the device in parallel to the document to minimize the need for perspective correction. The document should fill as much of the camera frame as possible while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and therefore, more detail. Skewed pages decrease the recognition quality.
Light and shadows
More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. Try to make sure there are no visible shadows. If you encounter large shadows, take the shot at an angle instead.
We do not recommend using the flashlight – from a small distance, using it creates a light spot at the center of the document that decreases the recognition quality.
Focus
The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus, usually around 5–10 centimeters (approx. 2–4 inches).
Typefaces
The Scanbot OCR Engine is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the recognition quality.
Get in touch
If you need further information or are interested in licensing the Scanbot SDK, please get in touch with our solution experts.