Cordova OCR - SDK Features
The Scanbot SDK plugin provides a simple and convenient API to run Optical Character Recognition (OCR) on images. The OCR feature is a part of the Scanbot SDK Data Capture Modules. As a result, you get:
- a searchable PDF document with the recognized text layer (aka. sandwiched PDF document);
- recognized text as plain text;
- bounding boxes of all recognized paragraphs, lines and words;
- text results and confidence values for each bounding box.
The Scanbot OCR feature is based on the Scanbot OCR engine created and polished by the Scanbot SDK team to provide the best text recognition speed and quality for our users.
The Scanbot OCR feature based on the Tesseract OCR engine is still available and can be enabled with passing TESSERACT
to the engineMode
arguments property:
engineMode: EngineMode
- the OCR engine mode, eitherSCANBOT_OCR
orTESSERACT
;languages: Array<String>
- a set of languages to be used for OCR (needed only forTESSERACT
mode);
For each desired language, a corresponding OCR training data file (.traineddata
) must be provided.
Furthermore, the special data file osd.traineddata
is required (used for orientation and script detection).
The Scanbot SDK package contains no language data files to keep the SDK small in size. You have to download and include the desired language files in your app.
Preconditions to achieve a good OCR result
Conditions while scanning
A perfect document for OCR is flat, straight, in the highest possible resolution and does not contain large shadows, folds, or any other objects that could distract the recognizer. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.
Languages
You can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results where the detected word could match. We suggest using as few languages as possible. Make sure that the language you are trying to detect is supported by the SDK and added to the project.
The SCANBOT_OCR
engine supports German and English languages that are integrated into the SDK and works without any additional modules out of the box.
For TESSERACT
you can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision.
With more languages, there are more results that the detected word could match. We suggest using as few languages as possible.
Make sure that the language you are trying to detect is supported by the SDK and added to the project.
Size and position
Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that the perspective correction does not need to be applied much. The document should fill most of the camera frame while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.
Light and shadows
More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it is better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance it creates a light spot at the center of the document which decreases the recognition quality.
Focus
The document needs to be properly focused so that the characters are sharp and clear. The autofocus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts at 5-10cm.
Typefaces
The OCR trained data is optimized for common serif and sans-serif font types. Decorative or script fonts drastically decrease the quality of the recognition.
OCR Languages and Data Files
The OCR engine supports a wide variety of languages. For each desired language a corresponding OCR training data file (.traineddata
) must be provided.
Furthermore the special data file osd.traineddata
is required (used for orientation and script detection).
The Scanbot SDK plugin ships with no training data files by default to keep the plugin package small in size. You have to download and provide the desired language files in your app.
Download and Provide OCR Language Files
You can find a list of all supported OCR languages and download links on this Tesseract page.
Download
⚠️️️ Please choose and download the proper version of the language data files:
- For the latest version of Scanbot SDK Cordova Plugin 3.1.0 or newer -
- For the older versions of Scanbot SDK Cordova Plugin <= 3.0.1 -
Provide
Cordova
Option 1 - Provide the Language Files in the App Package:
Download the desired language files as well as the osd.traineddata
file and make sure they will be packaged in your app as:
- for Android: as assets in the sub-folder
ocr_blobs/
- for iOS: as resources in the sub-folder
ScanbotSDKOCRData.bundle/
This can be done by defining the following simple mappings in the config.xml
of your Cordova project:
<platform name="android">
<resource-file src="arbitrary-source-folder-of-your-project/osd.traineddata" target="app/src/main/assets/ocr_blobs/osd.traineddata" />
<resource-file src="arbitrary-source-folder-of-your-project/eng.traineddata" target="app/src/main/assets/ocr_blobs/eng.traineddata" />
...
</platform>
<platform name="ios">
<resource-file src="arbitrary-source-folder-of-your-project/ScanbotSDKOCRData.bundle" target="ScanbotSDKOCRData.bundle" />
<resource-file src="arbitrary-source-folder-of-your-project/osd.traineddata" target="ScanbotSDKOCRData.bundle/osd.traineddata" />
<resource-file src="arbitrary-source-folder-of-your-project/eng.traineddata" target="ScanbotSDKOCRData.bundle/eng.traineddata" />
...
</platform>
See the config.xml
of our example app.
Option 2 - Provide the Language Files On-Demand:
Alternatively, to keep the app package small, you can download and provide the language files in your app on run-time.
Implement a suitable download functionality of the desired language files + osd.traineddata
file and place them
in the languageDataPath
directory which can be determined by the getOcrConfigs()
method on run-time.
Capacitor
Provide the Language Files in the App Package:
Download the desired language files as well as the osd.traineddata
file and make sure they will be packaged in your app as:
- for Android: as assets in the sub-folder
ocr_blobs/
- for iOS: as resources in the sub-folder
ScanbotSDKOCRData.bundle/
Language Codes
The Tesseract language data files are identified by a 3-letter language code. For example:
eng
- Englishdeu
- German- etc.
The Scanbot SDK API uses a 2-letter ISO code:
en
- Englishde
- German- etc.
Example:
If you want to perform OCR with languages English and German, you have to download and install the following data files:
eng.traineddata
- language file for Englishdeu.traineddata
- language file for Germanosd.traineddata
- special data file for orientation and script detection
Then, in the Scanbot SDK plugin, use languages: ["en", "de"]
.
OCR API
Get OCR Configs
ScanbotSdk.getOcrConfigs()
Use this function to get Scanbot SDK OCR properties of the current app installation.
Result Fields:
languageDataPath
- Contains the absolute file URI of the directory where to place the OCR training data files on run-time.installedLanguages
- Returns an array of current installed OCR languages (e.g.["en", "fr"]
). The Scanbot SDK uses thelanguageDataPath
directory to determine current installed OCR languages.
Perform OCR
ScanbotSdk.performOcr( imageFileUris: string[], languages: string[], options?:{ outputFormat?: OCROutputFormat, engineMode?: OCREngineMode };)
This function takes an array of images and performs Optical Character Recognition on each of the images. The recognized text can be returned as plain text, as JSON object, or a composed PDF file containing selectable and searchable text.
import ScanbotSdk, { Page } from 'cordova-plugin-scanbot-sdk';
private SDK = ScanbotSdk.promisify();
public scannedPages: Page[] = ...;
// Always make sure you have a valid license on runtime via SDK.getLicenseInfo()
if (!licenseCheckMethod()) { return; }
const result = await this.SDK.performOcr({
imageFileUris: this.scannedPages.map(p => p.documentImageFileUri),
languages: ['en'],
options: {
outputFormat: 'FULL_OCR_RESULT',
engineMode:"SCANBOT_OCR"
}
});
// use the suitable result fields:
// result.pdfFileUri
// result.jsonData
// result.plainText
Input args:
images
: Input images as an array of file URIs in proper order (image element 1 => page 1, etc).languages
: An array with OCR languages of the text to be recognized (e.g.["en", "de"]
). The number of languages has an impact on the performance - the more languages, the slower the recognition process. The OCR operation will fail with an error if some of the specified languages are missing. Please use the getOcrConfigs function to make sure that desired languages are installed.outputFormat
: OCR output format enum value to specify the result. See below.engineMode
: The engine used in the OCR detection.SCANBOT_OCR
andTESSERACT
are viable options
Result Fields:
plainText
: Contains the recognized plain text.pdfFileUri
: File URI of the composed PDF file ('file:///...'
).jsonData
: Structured JSON object of the OCR result. See below.
Supported OCR Output Formats
PLAIN_TEXT
: Returns the recognized text as plain text only.RESULT_JSON
: Returns the OCR result as a JSON object.PDF_FILE
: Creates a composed PDF file containing selectable and searchable text.FULL_OCR_RESULT
: Full result: composed PDF file and OCR result as a JSON object.
OCR JSON Result:
The OCR JSON result is a structured object, containing the pages
, paragraphs
, lines
, and words
as bounding boxes. Each bounding box contains the coordinates, the extracted plain text, and the confidence value.
Example:
{
"pages":[
{
"text":"Ut enim ad minim veniam, quis nostrud exercitation\nullamco laboris nisi ut aliquip ex ea commodo\nconsequat\n",
"words":[
{
"boundingBox":{
"y":0.46182008368200839,
"x":0.011823899371069183,
"width":0.040000000000000001,
"height":0.070606694560669453
},
"text":"Ut",
"confidence":96.546516418457031
},
{
"boundingBox":{
"y":0.46443514644351463,
"x":0.064402515723270437,
"width":0.091320754716981131,
"height":0.06903765690376569
},
"text":"enim",
"confidence":95.442947387695312
},
...
{
"boundingBox":{
"y":0.68462343096234313,
"x":0.010817610062893081,
"width":0.22213836477987423,
"height":0.13702928870292888
},
"text":"consequat",
"confidence":23.432151794433594
}
],
"lines":[
{
"boundingBox":{
"y":0.41684100418410042,
"x":0.011823899371069183,
"width":0.98037735849056606,
"height":0.14225941422594143
},
"text":"Ut enim ad minim veniam, quis nostrud exercitation\n",
"confidence":95.124320983886719
},
{
"boundingBox":{
"y":0.57112970711297073,
"x":0.011572327044025157,
"width":0.86792452830188682,
"height":0.1192468619246862
},
"text":"ullamco laboris nisi ut aliquip ex ea commodo\n",
"confidence":95.142372131347656
},
{
"boundingBox":{
"y":0.68462343096234313,
"x":0.010817610062893081,
"width":0.22213836477987423,
"height":0.13702928870292888
},
"text":"consequat\n",
"confidence":23.432151794433594
}
],
"paragraphs":[
{
"boundingBox":{
"y":0.41684100418410042,
"x":0.011320754716981131,
"width":0.98088050314465414,
"height":0.34884937238493724
},
"text":"Ut enim ad minim veniam, quis nostrud exercitation\nullamco laboris nisi ut aliquip ex ea commodo\nconsequat\n",
"confidence":90.915626525878906
}
]
}
...
]
}
Want to scan longer than one minute?
Generate a free trial license to test the Scanbot SDK thoroughly.
Get your free Trial LicenseWhat do you think of this documentation?
What can we do to improve it? Please be as detailed as you like.