Android OCR - SDK Features
The Scanbot SDK for Android provides a simple and convenient API (OpticalCharacterRecognizer
) to run Optical Character Recognition (OCR) on images.
As a result, you get:
- a searchable PDF document with the recognized text layer (aka. sandwiched PDF document);
- recognized text as plain text;
- bounding boxes of all recognized paragraphs, lines and words;
- text results and confidence values for each bounding box.
The Scanbot OCR feature is based on the Scanbot OCR engine created and polished by the Scanbot SDK team to provide the best text recognition speed and quality for our users.
The Scanbot OCR feature based on the Tesseract OCR engine is still available and can be enabled with OpticalCharacterRecognizer#setOcrConfig(ocrConfig: OcrConfig)
method,
where OcrConfig
is a config class with the following properties:
engineMode: EngineMode
- the OCR engine mode, eitherEngineMode.SCANBOT_OCR
orEngineMode.TESSERACT
;languages: Set<Language>
- a set of languages to be used for OCR (needed only forEngineMode.TESSERACT
mode);
For each desired language, a corresponding OCR training data file (.traineddata
) must be provided.
Furthermore, the special data file osd.traineddata
is required (used for orientation and script detection).
The Scanbot SDK package contains no language data files to keep the SDK small in size. You have to download and include the desired language files in your app.
Preconditions to achieve a good OCR result
Conditions while scanning
A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.
Languages
The Scanbot TLDR OCR engine supports German and English languages that are integrated into the SDK and works without any additional modules out of the box.
For EngineMode.TESSERACT
you can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results that the detected word could match. We suggest using as few languages as possible. Make sure that the language you are trying to detect is supported by the SDK and added to the project.
Size and position
Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that perspective correction does not need to be applied much. The document should fill as much of the camera frame while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.
Light and shadows
More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it is better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance it creates a light spot at the center of the document which decreases the recognition quality.
Focus
The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts around 5-10cm.
Typefaces
The trained OCR ML model is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the quality of the recognition.
Implementing OCR
Step 1 - Add OCR Feature as Dependency
The OCR feature is provided in Scanbot SDK Package II (Data Capture Modules). You have to add the corresponding dependency for Package II io.scanbot:sdk-package-2
or higher in your build.gradle
:
implementation("io.scanbot:sdk-package-2:$scanbotSdkVersion")
implementation("io.scanbot:sdk-common-ocr-assets:$scanbotSdkVersion") // <<-- please also add this dependency
Get the latest $scanbotSdkVersion
from the Changelog.
Step 2 - Download and Provide OCR Language Files (ONLY FOR EngineMode.TESSERACT
)
You can find a list of all supported OCR languages and download links on this Tesseract page.
Please choose and download the proper version of the language data files:
- For the latest version of Scanbot SDK 1.50.0 or newer -
- For the older versions of Scanbot SDK <= 1.41.0 -
Download the files and place them in the assets sub-folder assets/ocr_blobs/
of your app.
Example:
assets/ocr_blobs/osd.traineddata
// required special data fileassets/ocr_blobs/eng.traineddata
// english language fileassets/ocr_blobs/deu.traineddata
// german language file
Step 3 - Initialization
In order to initialize the Scanbot SDK call ScanbotSDKInitializer#initialise(context: Context)
method.
In your Application
class:
import io.scanbot.sdk.ScanbotSDKInitializer
...
ScanbotSDKInitializer()
...
.initialize(this)
For EngineMode.TESSERACT
please call ScanbotSDKInitializer#prepareOCRLanguagesBlobs(true)
before the first usage of the OCR feature.
Then get an instance of the OpticalCharacterRecognizer
from ScanbotSDK
.
In your Activity
or Service
class:
import io.scanbot.sdk.ScanbotSDK
import io.scanbot.sdk.ocr.OpticalCharacterRecognizer
...
val ocrRecognizer = ScanbotSDK(this).createOcrRecognizer()
For EngineMode.TESSERACT
to achieve a better OCR result you can enable image binarization in OcrSettings
:
ScanbotSDKInitializer()
.useOcrSettings(OcrSettings.Builder().binarizeImage(true).build())
...
.initialize(this)
Define the list of languages and set the engine mode to EngineMode.TESSERACT
:
val ocrRecognizer = ScanbotSDK(this).createOcrRecognizer()
val languages = mutableSetOf<Language>()
languages.add(Language.ENG)
ocrRecognizer.setOcrConfig(
OcrConfig(
engineMode = EngineMode.TESSERACT,
languages = languages,
)
)
Step 4 - Run OCR
On arbitrary images
You can run OCR on arbitrary image files (JPG or PNG) provided as file URIs:
import io.scanbot.sdk.ocr.process.OcrResult
import io.scanbot.pdf.model.PdfConfig
...
val imageFileUris: List<Uri> = ... // ["file:///some/path/file1.jpg", "file:///some/path/file2.jpg", ...]
var result: OcrResult
// with PDF as result:
val pdfConfig = PdfConfig(
pdfAttributes = PdfAttributes(
author = "Your author",
creator = "Your creator",
title = "Your title",
subject = "Your subject",
keywords = "Your keywords"
),
pageSize = PageSize.CUSTOM,
pageDirection = PageDirection.AUTO
)
result = ocrRecognizer.recognizeTextWithPdfFromUris(imageFileUris, false, pdfConfig)
// without PDF:
result = ocrRecognizer.recognizeTextFromUris(imageFileUris, false)
As OpticalCharacterRecognizer#recognizeTextWithPdfFromUris()
does not compress input images under the hood, the resulting PDF file might be quite large. Make sure to compress images before passing them to OpticalCharacterRecognizer
.
You can omit the PdfConfig
parameter to use the default PDF settings. In this case PdfConfig.defaultConfig()
will be used. It has empty PdfAttributes
, PageSize.CUSTOM
as page size and PageDirection.AUTO
as the default page orientation.
PdfAttributes
are used to set the PDF metadata. You can set the following attributes (all of which are optional):
author
creator
title
subject
keywords
PageSize
can be one of the following:
PageSize.LETTER
- represents 8.5 x 11 (inches) page size. The image is fitted and centered within the page.PageSize.LEGAL
- represents 8.5 x 14 (inches) page size. The image is fitted and centered within the page.PageSize.A3
- represents 297 x 420 (mm) page size. The image is fitted and centered within the page.PageSize.A4
- represents 210 x 297 (mm) page size. The image is fitted and centered within the page.PageSize.A5
- represents 148 x 210 (mm) page size. The image is fitted and centered within the page.PageSize.B4
- represents 250 x 353 (mm) page size. The image is fitted and centered within the page.PageSize.B5
- represents 176 x 250 (mm) page size. The image is fitted and centered within the page.PageSize.EXECUTIVE
- represents 7.25 x 10.5 (inches) page size. The image is fitted and centered within the page.PageSize.US4x6
- represents 4 x 6 (inches) page size. The image is fitted and centered within the page.PageSize.US4x8
- represents 4 x 8 (inches) page size. The image is fitted and centered within the page.PageSize.US5x7
- represents 5 x 7 (inches) page size. The image is fitted and centered within the page.PageSize.COMM10
- represents 4.125 x 9.5 (inches) page size. The image is fitted and centered within the page.PageSize.CUSTOM
- represents a custom page size. Each page is as large as its image at 72 dpi.
PageDirection
can be one of the following:
PageDirection.AUTO
- page orientation will be detected automatically. Whether the orientation is portrait or landscape depends on the image's aspect ratio.PageDirection.PORTRAIT
- page orientation will be set to portrait.PageDirection.LANDSCAPE
- page orientation will be set to landscape.
On RTU UI Page
objects
If you are using our RTU UI Components, you can use the corresponding methods to pass a list of RTU UI Page
objects:
import io.scanbot.sdk.persistence.Page
import io.scanbot.pdf.model.PdfConfig
...
val pages: List<Page> = ... // e.g. snap some pages via RTU UI DocumentScannerActivity
var result: OcrResult
// with PDF as result:
val pdfConfig = PdfConfig.defaultConfig()
result = ocrRecognizer.recognizeTextWithPdfFromPages(pages, pdfConfig)
// without PDF:
result = ocrRecognizer.recognizeTextFromPages(pages)
The OpticalCharacterRecognizer uses the document image (cropped image) of a Page
object. Thus, make sure all Page
objects contain document images.
OCR Results
When running OCR with PDF, the result object contains the searchable PDF document with the recognized text layer (aka. sandwiched PDF document):
val pdfFile: File = result.sandwichedPdfDocumentFile
In all cases the OCR result also contains the recognized plain text as well as the bounding boxes and text results of recognized paragraphs, lines and words:
val text: String = result.recognizedText // recognized plain text
// bounding boxes and text results of recognized paragraphs, lines and words (as example for the first page):
val paragraphs: List<OcrResultBlock> = result.ocrPages[0].paragraphs
val lines: List<OcrResultBlock> = result.ocrPages[0].lines
val words: List<OcrResultBlock> = result.ocrPages[0].words
See the API references of the OcrResult
class for more details.
Want to scan longer than one minute?
Generate a free trial license to test the Scanbot SDK thoroughly.
Get your free Trial LicenseWhat do you think of this documentation?
What can we do to improve it? Please be as detailed as you like.