Android OCR - SDK Features

The Scanbot SDK for Android provides a simple and convenient API (OcrEngine) to run Optical Character Recognition (OCR) on images. As a result, you get:

recognized text as plain text;
bounding boxes of all recognized paragraphs, lines and words;
text results and confidence values for each bounding box.

The Scanbot OCR feature is based on the Scanbot OCR engine created and polished by the Scanbot SDK team to provide the best text recognition speed and quality for our users.

For Tesseract engine users

The Scanbot OCR feature based on the Tesseract OCR engine is still available and can be enabled with OcrEngine#setOcrConfig(ocrConfig: OcrConfig) method, where OcrConfig is a config class with the following properties:

engineMode: EngineMode - the OCR engine mode, either EngineMode.SCANBOT_OCR or EngineMode.TESSERACT;
languages: Set<Language> - a set of languages to be used for OCR (needed only for EngineMode.TESSERACT mode);

For each desired language, a corresponding OCR training data file (.traineddata) must be provided. Furthermore, the special data file osd.traineddata is required (used for orientation and script detection). The Scanbot SDK package contains no language data files to keep the SDK small in size. You have to download and include the desired language files in your app.

Preconditions to achieve a good OCR result

Conditions while scanning

A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.

Languages

The Scanbot OCR engine (EngineMode.SCANBOT_OCR) supports German and English languages that are integrated into the SDK and works without any additional modules out of the box.

For Tesseract engine users

For EngineMode.TESSERACT you can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results that the detected word could match. We suggest using as few languages as possible. Make sure that the language you are trying to detect is supported by the SDK and added to the project.

Size and position

Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that perspective correction does not need to be applied much. The document should fill as much of the camera frame while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.

Light and shadows

More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it is better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance it creates a light spot at the center of the document which decreases the recognition quality.

Focus

The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts around 5-10cm.

Typefaces

The trained OCR ML model is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the quality of the recognition.

Implementing OCR

Step 1 - Add OCR Feature as Dependency

The OCR feature is provided in Scanbot SDK Package II (Data Capture Modules). You have to add the corresponding dependency for Package II io.scanbot:sdk-package-2 or higher in your build.gradle:

implementation("io.scanbot:sdk-package-2:$scanbotSdkVersion")
implementation("io.scanbot:sdk-common-ocr-assets:$scanbotSdkVersion") // <<-- please also add this dependency

Get the latest $scanbotSdkVersion from the Changelog.

Step 2 - Download and Provide OCR Language Files (ONLY FOR `EngineMode.TESSERACT`)

You can find a list of all supported OCR languages and download links on this Tesseract page.

caution

Please choose and download the proper version of the language data files:

For the latest version of Scanbot SDK 1.50.0 or newer -
LSTM Data Files for Version 4.00
For the older versions of Scanbot SDK <= 1.41.0 -
Data Files for Version 3.04/3.05

Download the files and place them in the assets sub-folder assets/ocr_blobs/ of your app.

Example:

assets/ocr_blobs/osd.traineddata // required special data file
assets/ocr_blobs/eng.traineddata // english language file
assets/ocr_blobs/deu.traineddata // german language file

Step 3 - Initialization

In order to initialize the Scanbot SDK call ScanbotSDKInitializer#initialise(context: Context) method. In your Application class:

Initialize SDK
loading...

See full example on GitHub

For Tesseract engine users

For EngineMode.TESSERACT please call ScanbotSDKInitializer#prepareOCRLanguagesBlobs(true) before the first usage of the OCR feature.

Then get an instance of the OcrEngine from ScanbotSDK. In your Activity or Service class:

Create OCR Engine
loading...

See full example on GitHub

For Tesseract engine users

For EngineMode.TESSERACT to achieve a better OCR result you can enable image binarization in OcrSettings:

Enable Binarization in OCR Settings
loading...

See full example on GitHub

Define the list of languages and set the engine mode to EngineMode.TESSERACT:

Engine Mode Tesseract
loading...

See full example on GitHub

Step 4 - Run OCR

On arbitrary images

You can run OCR on arbitrary image files (JPG or PNG) provided as file URIs:

Run OCR from images
loading...

See full example on GitHub

On `Document` object

You can use the corresponding methods to pass a Document object to the OCR engine

Run OCR from Document
loading...

See full example on GitHub

caution

The OcrEngine uses the document image (cropped image) of Page objects. Thus, make sure all Page objects contain document images.

OCR Results

The OCR result contains the recognized plain text as well as the bounding boxes and text results of recognized blocks (paragraphs), lines and words:

OCR Result Handling
loading...

See full example on GitHub

Example code for creating a PDF with OCR layer from a `Document`

Creating a PDF from a Document
loading...

See full example on GitHub

Example code for creating a PDF with OCR layer from images

Creating a PDF from images with OCR
loading...

See full example on GitHub

You can omit the PdfConfiguration parameter to use the default PDF settings. In this case PdfConfiguration.default() will be used. It has empty PdfAttributes, PageSize.CUSTOM as page size and PageDirection.AUTO as the default page orientation.

The details of PdfConfiguration can be found in API reference.

See the API references of the OcrResult class for more details.

Want to scan longer than one minute?

Generate a free trial license to test the Scanbot SDK thoroughly.

Get your free Trial License

Android OCR - SDK Features

Preconditions to achieve a good OCR result

Conditions while scanning

Languages

Size and position

Light and shadows

Focus

Typefaces

Implementing OCR

Step 1 - Add OCR Feature as Dependency

Step 2 - Download and Provide OCR Language Files (ONLY FOR `EngineMode.TESSERACT`)

Step 3 - Initialization

Step 4 - Run OCR

On arbitrary images

On `Document` object

OCR Results

Example code for creating a PDF with OCR layer from a `Document`

Example code for creating a PDF with OCR layer from images

Want to scan longer than one minute?

What do you think of this documentation?

On this page

Android OCR - SDK Features

Preconditions to achieve a good OCR result​

Conditions while scanning​

Languages​

Size and position​

Light and shadows​

Focus​

Typefaces​

Implementing OCR​

Step 1 - Add OCR Feature as Dependency​

Step 2 - Download and Provide OCR Language Files (ONLY FOR EngineMode.TESSERACT)​

Step 3 - Initialization​

Step 4 - Run OCR​

On arbitrary images​

On Document object​

OCR Results​

Example code for creating a PDF with OCR layer from a Document​

Example code for creating a PDF with OCR layer from images​

Want to scan longer than one minute?

What do you think of this documentation?

On this page

Preconditions to achieve a good OCR result

Conditions while scanning

Languages

Size and position

Light and shadows

Focus

Typefaces

Implementing OCR

Step 1 - Add OCR Feature as Dependency

Step 2 - Download and Provide OCR Language Files (ONLY FOR `EngineMode.TESSERACT`)

Step 3 - Initialization

Step 4 - Run OCR

On arbitrary images

On `Document` object

OCR Results

Example code for creating a PDF with OCR layer from a `Document`

Example code for creating a PDF with OCR layer from images