Skip to main content

iOS OCR - SDK Features

The Scanbot OCR feature comes with two OCR engines: legacy and ML. The legacy engine is based on the Tesseract OCR engine with some modifications and enhancements. The ML (machine learning based) engine was added later. It is much faster and more accurate, but it only supports languages with latin letters. Our recommendation is to use the ML engine whenever possible and use the legacy engine only if you want to recognize text from non-latin languages like Arabian, Japanese, Chinese, Russian, Greek, Korean etc.

When using the legacy OCR engine for each desired OCR language, a corresponding .traineddata file (aka. tessdata) must be installed in the optional resource bundle named ScanbotSDKOCRData.bundle. Also, the special data file osd.traineddata is required and must be installed. It is used for orientation and script detection.

The newer ML engine does not require any language training data!

The ScanbotSDK.framework itself does not contain any OCR language files to keep the framework small in size. The optional bundle ScanbotSDKOCRData.bundle, provided in the ZIP archive of the Scanbot SDK, contains the language files for English and German as well as the osd.traineddata as examples. You can replace or complete these language files as needed. Add this bundle to your project and make sure that it is copied along with your resources into your app.

Preconditions to achieve a good OCR result

Conditions while scanning

A perfect document for OCR is flat, straight, doesn't show large shadows, folds, or any other objects that could distract it and is in the highest possible resolution. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.

Languages (applies to legacy OCR engine only)

You can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results where the detected word could match. We suggest using as few languages as possible. Make sure that the language you're trying to detect is supported by the SDK and added to the project.

Size and position

Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that perspective correction doesn't need to be applied much. The document should fill as much of the camera frame as possible while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.

Light and shadows

More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it's better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance, it creates a light spot at the center of the document, which decreases the quality.

Focus

The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts at 5-10cm.

Typefaces

The OCR trained data is optimized for common serif and sans-serif font types. Decorative or script fonts decrease the quality of the detection a lot.

Downloading the OCR language files (applies to legacy OCR engine only)

You can find a list of all supported OCR languages and download links on this Tesseract page.

⚠️️️ Please choose and download the proper version of the language data files:

Implementing OCR

There are three different OCR-based actions that can be executed on any image:

  • Optical character recognition: use the class SBSDKOpticalCharacterRecognizer
  • Text layout and orientation recognition: use the class SBSDKTextLayoutRecognizer
  • Creation of searchable PDF documents with selectable text (HOCR): use the class SBSDKPDFRenderer

Performing optical character recognition

The optical character recognition takes a single image or a collection of images (SBSDKImageStoring) and recognizes the text on each image.
The result contain information about the found text, where the text was found (polygon), and what kind of text it is (word, line, paragraph).

Example code for performing optical character recognition on an image:

// The file URL of the image we want to analyze.
guard let imageURL = URL(string: "...") else { return }

// Create the OCR configuration object, either with the new ML engine...
let configuration_ML = SBSDKOpticalCharacterRecognizerConfiguration.scanbotOCR()

// ...or with the legacy engine
let configuration_Legacy
= SBSDKOpticalCharacterRecognizerConfiguration.tesseract(withLanguageString: "de+en")

// Pass the configuration object to the initializer of the optical character recognizer.
let recognizer = SBSDKOpticalCharacterRecognizer(configuration: configuration_ML /* or configuration_Legacy */)

// Run the recognizeOn... method of the recognizer.
recognizer.recognize(on: imageURL) { result, error in

// In the completion handler check for the error and result.
if let result = result, error == nil {

// At the end enumerate all words and log them to the console together with their confidence values and bounding boxes.
for page in result.pages {
for word in page.words {
print("Word: \(word.text), Confidence: \(word.confidenceValue), Polygon: \(word.polygon.description)")
}
}
}
}

Performing text layout and orientation recognition

The text layout recognition returns information about page orientation, the angle by which the image should be rotated/tilted to deskew it, the text writing direction or the text line order. Currently the text layout recognition still uses the legacy OCR engine. So when using it make sure to install the additional OCR language files.

Example code for performing text layout recognition:

// The file URL of the image we want to analyze.
guard let imageURL = URL(string: "...") else { return }

// Start the text layout recognition by creating an instance of the recognizer and calling
// the recognizeLayout... function on it.
let recognizer = SBSDKTextLayoutRecognizer()
recognizer.recognizeLayout(on: imageURL) { result, error in

// In the completion handler check for the error and the result.
if let result = result, error == nil {

// Now we can work with the result.
if result.orientation == .up && result.writingDirection == .leftToRight {

}
}
}

// Or if you need the text orientation only...
let orientation = recognizer.recognizeTextOrientation(on: imageURL)
// Now we can work with the result.
if orientation == .up {
// Now we can work with the result.
}

Performing PDF rendering

The PDF renderer takes a collection of images (SBSDKImageStoring) and renders a PDF containing each image as a page. Optionally, each page can be run through one of the OCR engines to create an invisible text layer on each PDF page. This process is called HOCR and in the final PDF the user can search for text fragments or mark text. Without HOCR the PDF cannot be searched neither can the text be marked.

Example code for performing PDF rendering:

// Create an image storage to save the captured document images to
let imagesURL = SBSDKStorageLocation.applicationDocumentsFolderURL.appendingPathComponent("Images")
let imagesLocation = SBSDKStorageLocation.init(baseURL: imagesURL)
guard let imageStorage = SBSDKIndexedImageStorage(storageLocation: imagesLocation) else { return }

// Specify the file URL where the PDF will be saved to. Nil makes no sense here.
guard let outputPDFURL = URL(string: "outputPDF") else { return }


// Create the OCR configuration for a searchable PDF (HOCR).
let ocrConfiguration = SBSDKOpticalCharacterRecognizerConfiguration.scanbotOCR()

// Create the default PDF rendering options.
let options = SBSDKPDFRendererOptions()

// Create the PDF renderer and pass the PDF options to it.
let renderer = SBSDKPDFRenderer(options: options)

// Start the rendering operation and store the SBSDKProgress to watch the progress or cancel the operation.
let progress = renderer.renderImageStorage(imageStorage,
indexSet: nil,
encrypter: nil,
output: outputPDFURL) { finished, error in

if finished && error == nil {
// Now you can access the pdf file at outputPDFURL.
}
}

Want to scan longer than one minute?

Generate your free "no-strings-attached" Trial License and properly test the Scanbot SDK.

Get your free Trial License

What do you think of this documentation?