Skip to main content

Flutter OCR - SDK Features

The Scanbot SDK Plugin provides a simple and convenient API to run Optical Character Recognition (OCR) on images. The OCR feature is a part of the Scanbot SDK Data Capture Modules.

Preconditions to achieve a good OCR result

Conditions while scanning

A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.

Languages (applies to tesseract only)

You can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results where the detected word could match. We suggest using as few languages as possible. Make sure that the language you are trying to detect is supported by the SDK and added to the project.

Size and position

Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that the perspective correction does not need to be applied much. The document should fill as much of the camera frame while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.

Light and shadows

More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it is better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance it creates a light spot at the center of the document which decreases the recognition quality.

Focus

The document needs to be properly focused so that the characters are sharp and clear. The autofocus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts at 5-10cm.

Typefaces

The OCR trained data is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the quality of the recognition.

SET OCR Engine

By default, from version 4.1.0 of the SDK, we are using the brand new SCANBOT_OCR OCR engine that works faster and provides better results for latin based languages. If you need to recognize other languages you can set TESSERACT as the engine and specify the required languages.

NOTE: You do not need to specify languages if you are using the SCANBOT_OCR engine.

var options = OcrOptions(engine: OcrEngine.TESSERACT, languages: ['en', 'de']);
var result = await ScanbotSdk.performOcr(_pages, options);
enum OcrEngine {
/// Slow but powerful OCR engine. Supports non-Latin languages. */
TESSERACT,

/// Fast and accurate OCR engine. Supports only Latin languages. */
SCANBOT_OCR,
}

The SCANBOT_OCR engine uses its own traineddata models. You can delete .traineddata files from assets and app if you are not going to use the TESSERACT engine.

OCR Languages and Data Files

The SCANBOT_OCR engine supports all Latin languages without specifying them in OcrOptions. You don't need to provide any models separately.

The TESSERACT OCR engine supports a wide variety of languages. For each desired language a corresponding OCR training data file (.traineddata) must be provided. Furthermore, the special data file osd.traineddata is required (used for orientation and script detection).

The Scanbot SDK plugin ships with no training data files by default to keep the plugin package small in size. You have to download and provide the desired language files in your app.

Download and Provide OCR Language Files

You can find a list of all supported OCR languages and download links on this Tesseract page.

info

Please choose and download the proper version of the language data files:

Option 1 - Provide the Language Files in the App Package:

Download the desired language files as well as the osd.traineddata file and make sure they will be packaged in your app as:

  • for Android: as assets in the sub-folder ocr_blobs/
  • for iOS: as resources in the sub-folder ScanbotSDKOCRData.bundle/

Option 2 - Provide the Language Files On-Demand:

Alternatively, to keep the app package small, you can download and provide the language files in your app on run-time. Implement a suitable download functionality of the desired language files + osd.traineddata file and place them in the languageDataPath directory which can be determined by the getOcrConfigs method on run-time.

Language Codes

The Tesseract language data files are identified by a 3-letter language code. For example:

  • eng - English
  • deu - German
  • etc.

The Scanbot SDK API uses a 2-letter ISO code:

  • en - English
  • de - German
  • etc.
Example:

If you want to perform OCR with languages English and German, you have to download and install the following data files:

  • eng.traineddata - language file for English
  • deu.traineddata - language file for German
  • osd.traineddata - special data file for orientation and script detection

In the Scanbot SDK plugin use languages: ["en", "de"].

OCR API

ScanbotSdk.getOcrConfigs()

Use this function to get Scanbot SDK OCR properties of the current App installation.

Call:
var result = await ScanbotSdk.getOcrConfigs();
// result.installedLanguages ...
// result.languageDataPath ...
  • result.languageDataPath - Contains the absolute file URI of the directory where to place the OCR training data files on run-time.
  • result.installedLanguages - Returns an array of current installed OCR languages (e.g. ["en", "fr"]). The Scanbot SDK uses the languageDataPath directory to determine current installed OCR languages.

ScanbotSdk.performOcr(pages, options)

This function takes an array of pages and performs Optical Character Recognition on each DOCUMENT image of those pages. As result the recognized text can be returned as plain text for a page containing selectable and searchable text.

List<Page> pages = ...
var options = OcrOptions(languages: ['en', 'de'], engine: OcrEngine.SCANBOT_OCR);
var result = await ScanbotSdk.performOcr(pages, options);
  • result.plainText - Contains the recognized plain text of all images.
  • result.pdfFileUri - File URI of the composed PDF file ('file:///...').
  • result.pages - A list of OcrPages with OCR data. Each OcrPage represents OCR result of given scanned Page document image. You can get bounding boxes and values of recognized words, lines and paragraphs.
Options:
var options = OcrOptions(languages, engine);
  • pages - An array with valid Page objects. Each page object should contain a cropped DOCUMENT image.
  • options.languages - An array with OCR languages of the text to be recognized (e.g. ["en", "de"]). The number of languages has an impact on the performance - the more languages, the slower the recognition process. The OCR operation will fail with an error if some of the specified languages are missing. Please use the getOcrConfigs function to make sure that desired languages are installed.
  • options.pdfOptions - Pdf options to use when shouldGeneratePdf: true.
  • options.engine - Which engine to use. New SCANBOT_OCR engine is set by default.

Want to scan longer than one minute?

Generate a free trial license to test the Scanbot SDK thoroughly.

Get your free Trial License

What do you think of this documentation?