Skip to main content

Using Data Scanner | Android Document Scanner

The Scanbot SDK provides the ability to perform text recognition directly on the Camera frames. As a result of scanning, the user gets the GenericTextRecognitionResult class instance, which contains the raw text extracted from the frame and symbol boxes.

Try our Data Scanner SDK App or check the following step by step integration instructions.

Step 1 - Add Data Scanner SDK as a Dependency#

Data Scanner SDK is available with the SDK Package 2. You have to add the following dependencies for it:

api "io.scanbot:sdk-package-2:$latestSdkVersion"api "io.scanbot:sdk-generictext-assets:$latestSdkVersion"

It can be conveniently used in conjunction with ScanbotCameraView (e.g. live detection for preview).

Step 2 - Add desired blobs prefetching to SDK config#

Add OCR training data file (.traineddata) for the desired language to the assets. See Optical Character Recognition.

Add a call of .prepareOCRLanguagesBlobs(true) method in ScanbotSDKInitializer.

override fun onCreate() {    super.onCreate()
    ScanbotSDKInitializer()            .license(this, licenseKey)            // TODO: other configuration calls            .prepareOCRLanguagesBlobs(true)            .initialize(this)}

Step 3 - Add ScanbotCameraView to layout#

<    android:id="@+id/camera_view"    android:layout_width="match_parent"    android:layout_height="match_parent" />

Step 4 - get GenericTextRecognizer instance from ScanbotSDK and attach it to ScanbotCameraView#

val scanbotSdk = ScanbotSDK(this)val textRecognizer = scanbotSdk.createGenericTextRecognizer()val textRecognizerFrameHandler = GenericTextRecognizerFrameHandler.attach(cameraView, textRecognizer)

Step 5 - Set up the needed config for the Generic Text Recognizer instance#

// will pass all the strings in the format "0123 123456"textRecognizer.setValidator("#### ######")

Step 6 - Add result handler for GenericTextRecognizerFrameHandler#

textRecognizerFrameHandler.addResultHandler(object : GenericTextRecognizerFrameHandler.ResultHandler {    override fun handle(result: FrameHandlerResult<GenericTextRecognitionResult, SdkLicenseError>): Boolean {        if (result is FrameHandlerResult.Success && result.value.validationSuccessful) {            // NOTE: 'handle' method runs in background thread - don't forget to switch to main before touching any Views            runOnUiThread {                proceedToResult(result.value.rawText)            }            return true        }        return false    }})

Step 7 - Improve the quality and performance of the recognition by setting a custom cleaner and validation callbacks and changing options#

// CUSTOM VALIDATION FUNCTION in addition to a pattern:genericTextScanner.setValidator("######", object : GenericTextRecognizer.GenericTextValidationCallback {    override fun validate(text: String): Boolean {        return text.first() in listOf('1', '2') // TODO: add additional validation for the recognized text    }})
// CUSTOM CLEANER FUNCTION.// If the string you intend on scanning is not clearly separated from other parts of the text// then enable this setting. This will only work with 'pattern' variable from the validator:genericTextScanner.matchSubstringForPattern = true
// As an alternative it is possible to extract the valuable text from the raw scanned text manually// using a Cleaner. The effective implementation of this function might significantly improve the speed// of scanninggenericTextScanner.setCleaner(object : GenericTextRecognizer.CleanRecognitionResultCallback {    override fun process(rawText: String): String {        return extractValuableDataFromText(rawText)    }})
// Set needed supported languages (it is required to add needed blobs to assets)genericTextScanner.supportedLanguages = setOf(Language.ENG, Language.DEU)
// Set which symbols are supported by recognizergenericTextScanner.allowedSymbols = setOf('a', 'b', 'c')
// These parameters allow customizing the performance and quality of recognition. The default values mean that// to return a result from the recognizer it is required that 2 of the 3 latest scanned frames contain// the same recognized resultgenericTextScanner.minimumNumberOfRequiredFramesWithEqualRecognitionResult // (default is 2)genericTextScanner.maximumNumberOfAccumulatedFrames // (default is 3)