Skip to main content

Data Capture

MRZ Scanner

Launches the MRZ scanner. The scanned data will be returned asynchronously.

alt

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

Detect MRZ on a still image

The Scanbot SDK also provides a feature to detect the MRZ data from a still image. Please refer to the below MAUI code snippet.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DetectOnImage.xaml.cs
loading...

EHIC Scanner

The Scanbot SDK detects and extracts data from European Health Insurance Cards.

Launches the EHIC Scanner. The scanned EHIC card will be returned asynchronously.

The European Health Insurance Card Scanner is based on the OCR feature and thus requires the proper installation of the OCR language files deu.traineddata and eng.traineddata (aka. blob files). For more details on how to set up OCR language files please refer to the OCR section.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

Detect EHIC on a still image

The Scanbot SDK also provides a feature to detect the EHIC data from a still image. Please refer to the below MAUI code snippet.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DetectOnImage.xaml.cs
loading...

Check Scanner

You can use the Check Recognizer UI to conveniently scan and extract data from checks.

Launches the Check Recognizer UI. The scanned check will be returned asynchronously.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

Detect Check on a still image

The Scanbot SDK also provides a feature to detect the Check data from a still image. Please refer to the below MAUI code snippet.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DetectOnImage.xaml.cs
loading...

Optical Character Recognition

The Scanbot SDK provides simple and convenient APIs to run Optical Character Recognition (OCR) on images.

As result you can get:

  • a searchable PDF document with the recognized text layer (aka. sandwiched PDF document)
  • recognized text as plain text
  • bounding boxes of all recognized paragraphs, lines and words
  • text results and confidence values for each bounding box

The Scanbot OCR feature comes with two OCR engines: Legacy and ML. The Legacy engine is based on the Tesseract OCR engine with some modifications and enhancements. The ML (machine learning based) engine was added later. It is much faster and more accurate, but it only supports languages with latin letters. Our recommendation is to use the ML engine whenever possible and use the legacy engine only if you want to recognize text from non-latin languages like Arabian, Japanese, Chinese, Russian, Greek, Korean etc.

When using the Legacy OCR engine for each desired OCR language, a corresponding OCR training data file (.traineddata) must be provided. Furthermore, the special data file osd.traineddata is required (used for orientation and script detection). The Scanbot SDK package contains no language data files to keep the SDK small in size. You have to download and include the desired language files in your app.

The newer ML engine does not require any language training data!

Preconditions to achieve a good OCR result

Conditions while scanning

A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.

Languages

You can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results where the detected word could match. We suggest using as few languages as possible. Make sure that the language you are trying to detect is supported by the SDK and added to the project.

Size and position

Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that the perspective correction does not need to be applied much. The document should fill most of the camera frame while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.

Light and shadows

More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it is better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance it creates a light spot at the center of the document which decreases the recognition quality.

Focus

The document needs to be properly focused so that the characters are sharp and clear. The autofocus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts at 5-10cm.

Typefaces

The OCR trained data is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the quality of recognition.

Download and Provide OCR Language Files

You can find a list of all supported OCR languages and download links on this Tesseract page.

⚠️️️ Please choose and download the proper version of the language data files:

Download the desired language files as well as the osd.traineddata file and place them in the Assets sub-folder SBSDKLanguageData/ of your Android app or in the Resources sub-folder ScanbotSDKOCRData.bundle/ of your iOS app.

Assets/SBSDKLanguageData/eng.traineddata  // english language file
Assets/SBSDKLanguageData/deu.traineddata // german language file
Assets/SBSDKLanguageData/osd.traineddata // required special data file

OCR API

MAUI/ReadyToUseUI.Maui/Pages/ImageResultsPage.cs
loading...

VIN Scanner

You can use the VIN Scanner UI to conveniently scan and extract vehicle identification numbers.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

Text Data Scanner

The Text Data Scanner recognizes text (OCR) within a user-defined rectangular area of interest, in consecutive video frames. A customizable block lets you clean up the raw string by filtering it against unwanted characters and OCR noise. Additionally, you can validate the result using pattern-matching or another block.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

License Plate Scanner

The Scanbot SDK provides the ability to scan car license plates and parse data fields. Scanning is currently limited to common EU license plates (country code on blue background on the left side).

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

Generic Document Recognizer

The Scanbot SDK provides the ability to detect various types of documents on the image, crop them, and recognize the fields' data via the Generic Document Recognizer.

Currently, the Generic Document Recognizer supports the following types of documents:

  • German ID Card
  • German Passport
  • German Driver's License
  • German Residence Permit

The Generic Document Recognizer is based on the OCR feature and thus requires the proper installation of the corresponding OCR language files (e.g. for English please add the file eng.traineddata). For more details on how to set up the OCR language files please refer to the OCR section.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

For API references please check:

Detect Generic Document on a still image

The Scanbot SDK also provides a feature to detect the Generic Document data from a still image. Please refer to the below MAUI code snippet.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DetectOnImage.xaml.cs
loading...

Medical Certificate Scanner

The Scanbot SDK provides the ability to find and extract content from German Medical Certificates (MC / AU-Bescheinigung forms).

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DataDetectors.xaml.cs
loading...

For API references please check:

Detect Medical Certificate on a still image

The Scanbot SDK also provides a feature to detect the Medical Certificate data from a still image. Please refer to the below MAUI code snippet.

MAUI/ReadyToUseUI.Maui/Pages/HomePage.DetectOnImage.xaml.cs
loading...

What do you think of this documentation?