iOS OCR - SDK Features
The Scanbot OCR feature comes with two OCR engines: legacy and ML. The legacy engine is based on the Tesseract OCR engine with some modifications and enhancements. The ML (machine learning based) engine was added later. It is much faster and more accurate, but it only supports languages with latin letters. Our recommendation is to use the ML engine whenever possible and use the legacy engine only if you want to recognize text from non-latin languages like Arabic, Japanese, Chinese, Russian, Greek, Korean etc.
When using the legacy OCR engine for each desired OCR language, a corresponding .traineddata
file (aka. tessdata) must be installed in the optional resource bundle named ScanbotSDKOCRData.bundle
.
Also, the special data file osd.traineddata
is required and must be installed. It is used for orientation and script detection.
The newer ML engine does not require any language training data!
The ScanbotSDK.framework
itself does not contain any OCR language files to keep the framework small in size.
The optional bundle ScanbotSDKOCRData.bundle
, provided in the ZIP archive of the Scanbot SDK, contains the language files for English and German as well as the osd.traineddata
as examples.
You can replace or complete these language files as needed. Add this bundle to your project and make sure that it is copied along with your resources into your app.
Preconditions to achieve a good OCR result
Conditions while scanning
A perfect document for OCR is flat, straight, doesn't show large shadows, folds, or any other objects that could distract it and is in the highest possible resolution. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.
Languages (applies to legacy OCR engine only)
You can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results where the detected word could match. We suggest using as few languages as possible. Make sure that the language you're trying to detect is supported by the SDK and added to the project.
Size and position
Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that perspective correction doesn't need to be applied much. The document should fill as much of the camera frame as possible while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.
Light and shadows
More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it's better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance, it creates a light spot at the center of the document, which decreases the quality.
Focus
The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts at 5-10cm.
Typefaces
The OCR trained data is optimized for common serif and sans-serif font types. Decorative or script fonts decrease the quality of the detection a lot.
Downloading the OCR language files (applies to legacy OCR engine only)
You can find a list of all supported OCR languages and download links on this Tesseract page.
⚠️️️ Please choose and download the proper version of the language data files:
- For the latest version of Scanbot SDK 1.9.0 or newer -
- For the older versions of Scanbot SDK <= 1.8.6 -
Implementing OCR
There are three different OCR-based actions that can be executed on any image:
- Optical character recognition: use the class
SBSDKOpticalCharacterRecognizer
- Text layout and orientation recognition: use the class
SBSDKTextLayoutRecognizer
- Creation of searchable PDF documents with selectable text (HOCR): use the class
SBSDKPDFRenderer
Performing optical character recognition
The optical character recognition takes a single image or a collection of images (SBSDKImageStoring) and recognizes the text on each image.
The result contain information about the found text, where the text was found (polygon), and what kind of text it is (word, line, paragraph).
Example code for performing optical character recognition on an image:
- Swift
- Objective-C
// The file URL of the image we want to analyze.
guard let imageURL = URL(string: "...") else { return }
// Create the OCR configuration object, either with the new ML engine...
let configuration_ML = SBSDKOpticalCharacterRecognizerConfiguration.scanbotOCR()
// ...or with the legacy engine
let configuration_Legacy
= SBSDKOpticalCharacterRecognizerConfiguration.tesseract(withLanguageString: "de+en")
// Pass the configuration object to the initializer of the optical character recognizer.
let recognizer = SBSDKOpticalCharacterRecognizer(configuration: configuration_ML /* or configuration_Legacy */)
// Run the recognizeOn... method of the recognizer.
recognizer.recognize(on: imageURL) { result, error in
// In the completion handler check for the error and result.
if let result = result, error == nil {
// At the end enumerate all words and log them to the console together with their confidence values and bounding boxes.
for page in result.pages {
for word in page.words {
print("Word: \(word.text), Confidence: \(word.confidenceValue), Polygon: \(word.polygon.description)")
}
}
}
}
// The file URL of the image we want to analyze.
NSURL *imageURL = [NSURL URLWithString:@"..."];
// Create the OCR configuration object, either with the new ML engine...
SBSDKOpticalCharacterRecognizerConfiguration *configuration_ML
= [SBSDKOpticalCharacterRecognizerConfiguration scanbotOCR];
// ...or with the legacy engine
SBSDKOpticalCharacterRecognizerConfiguration *configuration_Legacy
= [SBSDKOpticalCharacterRecognizerConfiguration tesseractWithLanguageString:@"en+de"];
// Pass the configuration object to the initializer of the optical character recognizer.
SBSDKOpticalCharacterRecognizer *recognizer
= [[SBSDKOpticalCharacterRecognizer alloc] initWithConfiguration:configuration_ML /* or configuration_Legacy */];
// Run the recognizeOn... method of the recognizer.
[recognizer recognizeOnImageURL:imageURL completion:^(SBSDKOCRResult *result, NSError *error) {
// In the completion handler check for the error and result.
if (result != nil && error == nil) {
// At the end enumerate all words and log them to the console together with their confidence values and bounding boxes.
for (SBSDKOCRPage *page in result.pages) {
for (SBSDKOCRResultBlock *word in page.words) {
NSLog(@"Word: %@, Confidence: %0.0f, Polygon: %@",
word.text,
word.confidenceValue,
word.polygon.description);
}
}
}
}];
Performing text layout and orientation recognition
The text layout recognition returns information about page orientation, the angle by which the image should be rotated/tilted to deskew it, the text writing direction or the text line order. Currently the text layout recognition still uses the legacy OCR engine. So when using it make sure to install the additional OCR language files.
Example code for performing text layout recognition:
- Swift
- Objective-C
// The file URL of the image we want to analyze.
guard let imageURL = URL(string: "...") else { return }
// Start the text layout recognition by creating an instance of the recognizer and calling
// the recognizeLayout... function on it.
let recognizer = SBSDKTextLayoutRecognizer()
recognizer.recognizeLayout(on: imageURL) { result, error in
// In the completion handler check for the error and the result.
if let result = result, error == nil {
// Now we can work with the result.
if result.orientation == .up && result.writingDirection == .leftToRight {
}
}
}
// Or if you need the text orientation only...
let orientation = recognizer.recognizeTextOrientation(on: imageURL)
// Now we can work with the result.
if orientation == .up {
// Now we can work with the result.
}
// The file URL of the image we want to analyze.
NSURL *imageURL = [NSURL URLWithString:@"..."];
// Start the text layout recognition by creating an instance of the recognizer and calling
// the recognizeLayout... function on it.
SBSDKTextLayoutRecognizer *recognizer = [[SBSDKTextLayoutRecognizer alloc] init];
[recognizer recognizeLayoutOnImageURL:imageURL completion:^(SBSDKTextLayoutRecognizerResult *result, NSError *error) {
// In the completion handler check for the error and the result.
if (error == nil && result != nil) {
// Now we can work with the result.
if (result.orientation == SBSDKTextOrientationUp && result.writingDirection == SBSDKWritingDirectionLeftToRight) {
}
}
}];
// Or if you need the text orientation only...
SBSDKTextOrientation orientation = [recognizer recognizeTextOrientationOnImageURL:imageURL];
// Now we can work with the result.
if (orientation == SBSDKTextOrientationUp) {
// Now we can work with the result.
}
Want to scan longer than one minute?
Generate a free trial license to test the Scanbot SDK thoroughly.
Get your free Trial LicenseWhat do you think of this documentation?
What can we do to improve it? Please be as detailed as you like.