Skip to main content

Flutter OCR Module

The Scanbot SDK's OCR Engine can transform written text into machine-readable data – both from still images and a live camera stream. It is the backbone of the SDK's Data Capture Modules, enabling fast and accurate data extraction from various document formats.

The Scanbot Flutter SDK provides a simple and convenient API to run Optical Character Recognition (OCR) on images.

As a result, you get:

  • recognized text as plain text,
  • bounding boxes of all recognized paragraphs, lines, and words,
  • text results and confidence values for each bounding box.

The OCR feature is based on the Scanbot OCR Engine created and polished by the Scanbot SDK team to provide the best text recognition speed and quality for our users.

For Tesseract engine users

The Scanbot OCR feature based on the Tesseract OCR engine is still available and can be enabled by passing TESSERACT to the engineMode arguments property:

  • engineMode: EngineMode - the OCR engine mode, either SCANBOT_OCR or TESSERACT;
  • languages: Array<String> - a set of languages to be used for OCR (needed only for TESSERACT mode);

For each desired language, a corresponding OCR training data file (.traineddata) must be provided. Furthermore, the special data file osd.traineddata is required (used for orientation and script detection).

To keep the Scanbot SDK package's size as small as possible, it contains no language data files. You have to download and include the desired language files in your app.

Preconditions to achieve good OCR results

A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. The SDK's UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.

Size and position

Put the document on a flat surface. Take the photo from straight above and hold the device in parallel to the document to minimize the need for perspective correction. The document should fill as much of the camera frame as possible while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.

Light and shadows

More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. Try to make sure there are no visible shadows. If you encounter large shadows, take the shot at an angle instead.

We do not recommend using the flashlight – from a small distance, using it creates a light spot at the center of the document that decreases the recognition quality.

Focus

The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus, usually around 5–10 centimeters (approx. 2–4 inches).

Typefaces

The Scanbot OCR Engine is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the recognition quality.

Languages

The Scanbot OCR engine (SCANBOT_OCR) supports German and English. These languages are integrated into the SDK and work out-of-the-box, without requiring additional modules.

For Tesseract engine users

For TESSERACT, you can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results that the detected word could match. We recommend using as few languages as possible. Make sure the language you are trying to detect is supported by the SDK and has been added to the project.

Using the OCR engine

Only for TESSERACT: Downloading and providing the OCR language files

You can find a list of all supported OCR languages and corresponding download links in the Tesseract documentation.

caution

Please download the proper version of the language data files for the latest version of the Scanbot Flutter SDK:

LSTM Data Files for Version 4.00

Download the desired language files as well as the osd.traineddata file and make sure they will be packaged in your app as:

  • for Android: as assets in the sub-folder ocr_blobs/
  • for iOS: as resources in the sub-folder ScanbotSDKOCRData.bundle/

Alternatively, to keep the app package small, you can download and provide the language files in your app at runtime. Implement a suitable download functionality of the desired language files and the osd.traineddata file and place them in the languageDataPath directory, which you can determine with the getOCRConfigs method at runtime.

Language codes

The Scanbot SDK API uses 2-letter ISO codes, e.g.:

  • en - English
  • de - German

The Tesseract language data files are identified by 3-letter language codes, e.g.:

  • eng - English
  • deu - German

Example

If you want to perform OCR with languages English and German, you have to download and install the following data files:

  • eng.traineddata: language file for English
  • deu.traineddata: language file for German
  • osd.traineddata: special data file for orientation and script detection

Then, in the Scanbot SDK module, use languages: ["en", "de"].

OCR APIs

ScanbotSdk.getOcrConfigs()

Use this function to get the Scanbot SDK's OCR properties of the current app installation.

var result = await ScanbotSdk.getOcrConfigs();
// result.value.installedLanguages ...
// result.value.languageDataPath ...
  • languageDataPath: Contains the absolute file URI of the directory where to place the OCR training data files at runtime.
  • installedLanguages: Returns an array of the currently installed OCR languages (e.g., ["en", "fr"]). The Scanbot SDK uses the languageDataPath directory to determine the currently installed OCR languages.

ScanbotSdk.ocrEngine.recognizeOnImageFileUris(List<String> images, {OcrConfiguration? configuration})

This function takes an array of pages and performs OCR on each image. As a result, the recognized text can be returned as plain text for pages containing selectable and searchable text.

List<String> imageFileUris = ...
var configuration = OcrConfiguration(engineMode: OcrEngine.TESSERACT, languages: ['en', 'de']);
var result = await ScanbotSdk.ocrEngine.recognizeOnImageFileUris(imageFileUris, configuration: configuration);
  • result.value.recognizedText: Contains the recognized plain text of all images.
  • result.value.pages: A list of OcrPages with OCR data. Each OcrPage represents the OCR result of a given scanned image. You can get bounding boxes and values of recognized words, lines, and paragraphs.

Configuration:

var options = OcrConfiguration(languages, engineMode);
  • options.languages: An array with the OCR languages of the text to be recognized (e.g., ["en", "de"]). The number of languages has an impact on the performance – the more languages, the slower the recognition process. The OCR operation will fail with an error if some of the specified languages are missing. Please use the getOcrConfigs function to make sure that desired languages are installed.
  • options.engineMode: Determines which engine to use. The SCANBOT_OCR engine is set by default.

Example of an OCR result:

{
"recognizedText": "Lorem ipsum dolor sit amet, consectetur\nadipiscing elit. Cdopkx gbydo drsc dohd.",
"pages": [{
"_type": "Page",
"text": "Lorem ipsum dolor sit amet, consectetur\nadipiscing elit. Cdopkx gbydo drsc dohd.",
"confidence": 0.9996336102485657,
"roi": [{"x": 88, "y": 80}, {"x": 1016, "y": 80}, {"x": 1016, "y": 208}, {"x": 88, "y": 208}],
"blocks": [{
"_type": "Block",
"text": "Lorem ipsum dolor sit amet, consectetur\nadipiscing elit. Cdopkx gbydo drsc dohd.",
"confidence": 0.9996336102485657,
"roi": [{"x": 88, "y": 80}, {"x": 1016, "y": 80}, {"x": 1016, "y": 208}, {"x": 88, "y": 208}],
"lines": [{
"_type": "Line",
"text": "Lorem ipsum dolor sit amet, consectetur",
"confidence": 0.9998694062232971,
"roi": [{"x": 88, "y": 80}, {"x": 1012, "y": 80}, {"x": 1012, "y": 124}, {"x": 88, "y": 124}],
"words": [{
"_type": "Word",
"text": "Lorem",
"confidence": 0.9999752044677734,
"roi": [{"x": 96.75828552246094, "y": 80}, {"x": 219.37440490722656, "y": 80}, {"x": 219.37440490722656, "y": 124}, {"x": 96.75828552246094, "y": 124}],
"glyphs": [{
"_type": "Glyph",
"text": "L",
"confidence": 0.9999958276748657,
"roi": [{"x": 96.75829315185547, "y": 80}, {"x": 114.2748794555664, "y": 80}, {"x": 114.2748794555664, "y": 124}, {"x": 96.75829315185547, "y": 124}]
}, {
"_type": "Glyph",
"text": "o",
"confidence": 0.9999903440475464,
"roi": [{"x": 109.89573669433594, "y": 80}, {"x": 140.5497589111328, "y": 80}, {"x": 140.5497589111328, "y": 124}, {"x": 109.89573669433594, "y": 124}]
}, {
"_type": "Glyph",
"text": "r",
"confidence": 0.99998939037323,
"roi": [{"x": 136.1706085205078, "y": 80}, {"x": 162.44549560546875, "y": 80}, {"x": 162.44549560546875, "y": 124}, {"x": 136.1706085205078, "y": 124}]
}, {
"_type": "Glyph",
"text": "e",
"confidence": 0.9999228715896606,
"roi": [{"x": 162.44549560546875, "y": 80}, {"x": 188.7203826904297, "y": 80}, {"x": 188.7203826904297, "y": 124}, {"x": 162.44549560546875, "y": 124}]
}, {
"_type": "Glyph",
"text": "m",
"confidence": 0.9999774694442749,
"roi": [{"x": 184.3412322998047, "y": 80}, {"x": 219.37440490722656, "y": 80}, {"x": 219.37440490722656, "y": 124}, {"x": 184.3412322998047, "y": 124}]
}]
},
{
"_type": "Word",
"text": "ipsum",
"confidence": 0.9993802309036255,
"roi": [{"x": 245.64927673339844, "y": 80}, {"x": 368.265380859375, "y": 80}, {"x": 368.265380859375, "y": 124}, {"x": 245.6492919921875, "y": 124}],
"glyphs": [{
"_type": "Glyph",
"text": "i",
"confidence": 0.9975591897964478,
"roi": [{"x": 245.6492919921875, "y": 80}, {"x": 258.7867431640625, "y": 80}, {"x": 258.7867431640625, "y": 124}, {"x": 245.6492919921875, "y": 124}]
}, {
"_type": "Glyph",
"text": "p",
"confidence": 0.9996521472930908,
"roi": [{"x": 258.7867431640625, "y": 80}, {"x": 280.6824645996094, "y": 80}, {"x": 280.6824645996094, "y": 124}, {"x": 258.7867431640625, "y": 124}]
}, {
"_type": "Glyph",
"text": "s",
"confidence": 0.9999656677246094,
"roi": [{"x": 280.6824645996094, "y": 80}, {"x": 306.95733642578125, "y": 80}, {"x": 306.95733642578125, "y": 124}, {"x": 280.6824645996094, "y": 124}]
}, {
"_type": "Glyph",
"text": "u",
"confidence": 0.9997465014457703,
"roi": [{"x": 306.95733642578125, "y": 80}, {"x": 333.23223876953125, "y": 80}, {"x": 333.23223876953125, "y": 124}, {"x": 306.95733642578125, "y": 124}]
}, {
"_type": "Glyph",
"text": "m",
"confidence": 0.9999798536300659,
"roi": [{"x": 333.23223876953125, "y": 80}, {"x": 368.2654113769531, "y": 80}, {"x": 368.2654113769531, "y": 124}, {"x": 333.23223876953125, "y": 124}]
}]
}]
}]
}]
}]
}

Want to scan longer than one minute?

Generate a free trial license to test the Scanbot SDK thoroughly.

Get free trial license