Skip to main content

Generic Document Scanner UI Components | Android Document Scanner

Introduction

The Scanbot SDK provides the ability to detect various types of documents in an image, crop them and recognize data fields via the Generic Document Recognizer.

Currently, the Generic Document Recognizer supports the following types of documents:

  • German ID Card
  • German Passport
  • German Driver's license
  • German Residence permit

The Generic Document scanner is available both as an RTU UI and a classic component (types of components are explained here).

Integration

Take a look at our Example Apps to see how to integrate the Generic Document Scanner.

Add Feature as a Dependency

GenericDocumentRecognizer is available with SDK Package 3. You have to add the following dependencies for it:

implementation("io.scanbot:sdk-package-3:$latestSdkVersion")
implementation("io.scanbot:sdk-genericdocument-assets:$latestSdkVersion")
caution

Please do not use multiple scanners at the same time. For example, do not combine generic document scanner, health insurance scanner, text data scanner, etc. at the same time! Each scanner instance requires a lot of memory, GPU, and processor resources. Using multiple scanners will lead to performance issues for the entire application.

Initialize the SDK

The Generic Document Recognizer is based on the OCR feature of the Scanbot SDK. Please check the Optical Character Recognition docs for more details.

In order to use the Generic Document Recognizer you need to prepare the German and English OCR language files. Place the deu.traineddata and eng.traineddata files in the assets sub-folder assets/ocr_blobs/ of your app.

Add a call to .prepareOCRLanguagesBlobs(true) method for ScanbotSDKInitializer in your Application class:

override fun onCreate() {
super.onCreate()

ScanbotSDKInitializer()
.license(this, licenseKey)
// TODO: other configuration calls
.prepareOCRLanguagesBlobs(true)
.initialize(this)
}
caution

Unfortunately, we have noticed that all devices using a Cortex A53 processor DO NOT SUPPORT GPU acceleration. If you encounter any problems, please disable GPU acceleration for these devices.

ScanbotSDKInitializer()
.allowGpuAcceleration(false)

Ready-To-Use UI Component

Ready-To-Use UI Component (activity) that is responsible for scanning documents supported by the Generic Document Recognizer is GenericDocumentRecognizerActivity.

alt text

Have a look at our end-to-end working example of the RTU components usage here.

Starting and configuring RTU Generic Document scanner

First of all, you have to add the SDK package and feature dependencies as described here.

Initialize the SDK as described here. More information about the SDK license initialization can be found here.

To use any of the RTU UI components you need to include the corresponding dependency in your build.gradle file:

implementation("io.scanbot:sdk-package-ui:$scanbotSdkVersion")

Get the latest $scanbotSdkVersion from the Changelog.

To start the RTU Generic Document scanner you only have to start a new activity and be ready to process its result later.

info

Starting from version 1.90.0, the SDK RTU components contain predefined AndroidX Result API contracts. They handle part of the boilerplate for starting the RTU activity component and mapping the result once it finishes.

If your code is bundled with Android's deprecated startActivityForResult API - check the other approach we offer for this case.

val genericDocumentResult: ActivityResultLauncher<GenericDocumentRecognizerConfiguration>

...

genericDocumentResult =
activity.registerForActivityResultOk(GenericDocumentRecognizerActivity.ResultContract()) { result ->
val resultWrappers = result.result!!
val firstWrapper = resultWrappers.first()
val document = scanbotSDK.resultRepositoryForClass(firstWrapper.clazz).getResultAndErase(firstWrapper.resultId)

Toast.makeText(
activity,
document?.fields?.map { "${it.type.name} = ${it.value?.text}" }.toString(),
Toast.LENGTH_LONG
).show()
}

...

myButton.setOnClickListener {
val genericDocumentConfiguration = GenericDocumentRecognizerConfiguration()
genericDocumentResult.launch(genericDocumentConfiguration)
}
info

We offer some syntactic sugar for handling the result from RTU components via AndroidX Result API:

  • every RTU component's activity contains a Result class which, in turn, along with the resultCode value exposes a Boolean resultOk property. This will be true if resultCode equals Activity.RESULT_OK;

  • when you only expect Activity.RESULT_OK result code - you can use the AppCompatActivity.registerForActivityResultOk extension method instead of registerForActivityResult - it will be triggered only when there is a non-nullable result entity present.

caution

Always use the corresponding activity's static newIntent method to create intent when starting the RTU UI activity using deprecated startActivityForResult approach. Creating android.content.Intent object using its constructor (passing the activity's class as a parameter) will lead to the RTU UI component malfunctioning.

An instance of GenericDocumentRecognizerConfiguration is required for starting the RTU UI activity. It allows configuration changes through methods it exposes:

val genericDocumentConfiguration = GenericDocumentRecognizerConfiguration()

// Apply the color configuration
genericDocumentConfiguration.setTopBarButtonsInactiveColor(context.getColor(this, R.color.white))
genericDocumentConfiguration.setTopBarBackgroundColor(context.getColor(this, R.color.colorPrimaryDark))
...

// Apply the text configuration
genericDocumentConfiguration.setClearButtonTitle(context.getString(R.string.clear_button))
genericDocumentConfiguration.setSubmitButtonTitle(context.getString(R.string.submit_button))
...

// Apply the parameters for fields
genericDocumentConfiguration.setFieldsDisplayConfiguration(
hashMapOf(
// Use constants from NormalizedFieldNames objects from the corresponding document type
DePassport.NormalizedFieldNames.PHOTO to FieldProperties(
"My passport photo",
FieldProperties.DisplayState.AlwaysVisible
),
MRZ.NormalizedFieldNames.CHECK_DIGIT to FieldProperties(
"Check digit",
FieldProperties.DisplayState.AlwaysVisible
)
...
)
)

Excluding fields from scanning in RTU UI

It is also possible to exclude certain fields from the scanning process altogether. When implemented, these excluded fields will not even be attempted to be recognized. This is useful for security and/or privacy reasons. All other fields will be scanned as usual. Fields should be set ONLY as normalized field names.

// Exclude some document fields from being recognized
genericDocumentConfiguration.setExcludedFieldTypes(hashSetOf(
DeIdCardFront.NormalizedFieldNames.PHOTO,
DeIdCardFront.NormalizedFieldNames.PIN,
DePassport.NormalizedFieldNames.PHOTO,
DePassport.NormalizedFieldNames.SIGNATURE,
DeIdCardBack.NormalizedFieldNames.EYE_COLOR
))
info

All parameters in GenericDocumentRecognizerConfiguration are optional.

Full API references for these methods can be found on this page.

Handling the result

Below is a simple example of handling the result: List<ResultWrapper<GenericDocument>>.

For simplicity we will take only the first document

val firstResultWrapper = resultWrappers.first()

Get the ResultRepository from the ScanbotSDK instance. scanbotSDK was created in onCreate via ScanbotSDK(context).

val resultRepository = scanbotSDK.resultRepositoryForClass(firstResultWrapper.clazz)

Receive an instance of GenericDocument class from the repository. This call will also remove the result from the repository (to optimize the memory usage):

val genericDocument = resultRepository.getResultAndErase(firstResultWrapper.resultId)

If you do not need to remove the result, use resultRepository.getResult(firstDocument.resultId). Please note that this repository does not use persistent storage and is based on an LRU Cache. The repository will be cleared if the app process is terminated or the memory consumption is too high.

Now you can simply show the detected document fields in a Toast notification:

Toast.makeText(
activity,
genericDocument?.fields?.joinToString("\n") { "${it.type.name} = ${it.value?.text}" } ?: "",
Toast.LENGTH_LONG
).show()

It is also possible to use the GenericDocumentWrapper successors to use strongly typed objects and conveniently get access to the fields of the corresponding document.

To receive an instance of the scanned document wrapper, use GenericDocumentLibrary or wrap() extension function as follows:

// Alternatively, use GenericDocumentLibrary.wrapperFromGenericDocument(genericDocument)
when (val wrapper = genericDocument.wrap()) {
is DeIdCardFront -> {
val id = wrapper.id
val name = wrapper.givenNames
val surname = wrapper.surname
val pin = wrapper.pin
}
is DeDriverLicenseFront -> {
val id = wrapper.id
val name = wrapper.givenNames
val surname = wrapper.surname
val categories = wrapper.licenseCategories
}
is DeResidencePermitFront -> {
val id = wrapper.id
val name = wrapper.givenNames
val surname = wrapper.surname
val pin = wrapper.pin
}
else -> {
// Handle other document types
}
}

Full API references for the result wrapper class are available here and for GenericDocument class here.

Classic component

To integrate the classic component of the Generic Document Scanner you can take a look at our Generic Document Recognizer Example For Live Detection, Generic Document Recognizer Example For Auto Snapping or check the following step-by-step integration instructions.

GenericDocumentRecognizer can be used both in conjunction with ScanbotCameraXView (e.g. live detection for preview) and by itself for detection on a Bitmap or JPEG byte array. Let's have a look at an example with ScanbotCameraXView.

Add feature depedencies and initialize the SDK

First of all, you have to add the SDK package and feature dependencies as described here.

Initialize the SDK as described here. More information about the SDK license initialization can be found here.

Add ScanbotCameraXView to layout

<io.scanbot.sdk.ui.camera.ScanbotCameraXView
android:id="@+id/camera_view"
android:layout_width="match_parent"
android:layout_height="match_parent" />

Get GenericDocumentRecognizer instance from ScanbotSDK, set the required document types and blurriness acceptance score, then attach it to ScanbotCameraXView

val scanbotSdk = ScanbotSDK(this)

// Please note that each call to this method will create a new instance of GenericDocumentRecognizer
// It should be used on a "single instance per screen" basis
val genericDocumentRecognizer = scanbotSdk.createGenericDocumentRecognizer()
genericDocumentRecognizer.acceptedSharpnessScore = 80f

// Uncomment to scan only ID cards and passports
// genericDocumentRecognizer.acceptedDocumentTypes = listOf(
// RootDocumentType.DePassport,
// RootDocumentType.DeIdCardFront,
// RootDocumentType.DeIdCardBack
// )

// Uncomment to scan only Driver's licenses
// genericDocumentRecognizer.acceptedDocumentTypes = listOf(
// RootDocumentType.DeDriverLicenseFront,
// RootDocumentType.DeDriverLicenseBack
// )

// Uncomment to scan only Residence permit cards
// genericDocumentRecognizer.acceptedDocumentTypes = listOf(
// RootDocumentType.DeResidencePermitFront,
// RootDocumentType.DeResidencePermitBack
// )

// To scan all the supported document types (default value)
genericDocumentRecognizer.acceptedDocumentTypes = RootDocumentType.ALL_TYPES

val frameHandler = GenericDocumentRecognizerFrameHandler.attach(cameraView, genericDocumentRecognizer)

Excluding fields from scanning for genericDocumentRecognizer

It is also possible to exclude certain fields from the scanning process altogether. When implemented, these excluded fields will not even be attempted to be recognized. This is useful for security and/or privacy reasons. All other fields will be scanned as usual. Fields should be set ONLY as normalized field names.

// Exclude some document fields from being recognized
genericDocumentRecognizer.excludedFieldTypes = setOf(
DeIdCardFront.NormalizedFieldNames.PHOTO,
DeIdCardFront.NormalizedFieldNames.PIN,
DePassport.NormalizedFieldNames.PHOTO,
DePassport.NormalizedFieldNames.SIGNATURE,
DeIdCardBack.NormalizedFieldNames.EYE_COLOR)

Add a result handler for GenericDocumentRecognizerFrameHandler

Add a frame handler which, for example, observes consecutive successful recognition statuses and shows a toast notification whenever two or more such statuses are received.

frameHandler.addResultHandler(object : GenericDocumentRecognizerFrameHandler.ResultHandler {
private var successCounter = 0

override fun handle(result: FrameHandlerResult<GenericDocumentRecognitionResult, SdkLicenseError>): Boolean {
val isSuccess = result is FrameHandlerResult.Success
when {
isSuccess && successCounter >= 2 -> {
// NOTE: 'handle' method runs in background thread
// - don't forget to switch to main before touching any Views
runOnUiThread {
Toast.makeText(
this@MainActivity,
"Document found!\nYou can now snap picture.",
Toast.LENGTH_SHORT
).show()
}
}
isSuccess -> successCounter++
else -> successCounter = 0
}
return false
}
})

Method handle(result: FrameHandlerResult<GenericDocumentRecognitionResult, SdkLicenseError>) will be triggered every time GenericDocumentRecognizer detects a document in the camera preview frame or if a license error has occurred.

If the result of the scanning was successful, the user gets the GenericDocumentRecognitionResult object which contains a cropped document image and a GenericDocument object. Each field is represented by the Field class, holding the field's type, cropped visual source, recognized text and confidence level value.

You can now run your app and should see a simple camera preview that can scan your documents.

Pass snapped picture to GenericDocumentRecognizer, process results

First, decode the image ByteArray obtained from the camera's callback, taking into account the image orientation. Our ImageProcessor component can be used for this:

val resultBitmap = ImageProcessor(image).rotate(imageOrientation).processedBitmap()

Next, we perform a recognition:

val recognitionResult = genericDocumentRecognizer.scanBitmap(resultBitmap)

As an example of further application, set the obtained scan parameters to a TextView:

val myTextView = findViewById<TextView>(R.id.my_text_view)

val resultsMessage = "Recognition results:\n" +
"Recognition status: ${recognitionResult.status}\n" +
"Card type: ${recognitionResult.document.type}\n" +
"Number of fields scanned: ${recognitionResult.document?.fields?.size ?: 0}"

myTextView.text = resultsMessage

It is also possible to use the GenericDocumentWrapper successors to use strongly typed objects and conveniently get access to the fields of the corresponding document.

To receive an instance of the scanned document wrapper, use GenericDocumentLibrary or wrap() extension function as follows:

val recognitionResult = genericDocumentRecognizer.scanBitmap(resultBitmap)

val genericDocument = recognitionResult.document
if (genericDocument != null) {
// Alternatively, use GenericDocumentLibrary.wrapperFromGenericDocument(genericDocument)
when (val wrapper = genericDocument.wrap()) {
is DeIdCardFront -> {
val id = wrapper.id
val name = wrapper.givenNames
val surname = wrapper.surname
val pin = wrapper.pin
}
is DeDriverLicenseFront -> {
val id = wrapper.id
val name = wrapper.givenNames
val surname = wrapper.surname
val categories = wrapper.licenseCategories
}
is DeResidencePermitFront -> {
val id = wrapper.id
val name = wrapper.givenNames
val surname = wrapper.surname
val pin = wrapper.pin
}
else -> {
// Handle other document types
}
}
}

Add a Finder Overlay

In addition, it is recommended to add a "Finder Overlay". This feature allows you to predefine a document over the ScanbotCameraXView screen. By using this overlay the Generic Document scanner can skip the time-consuming step "Search for the document area" and perform the recognition directly in the specified "Finder Overlay" area. By using this approach the Generic Document scanner recognizes and extracts the document content much faster.

Details about applying finder view logic in the layout and in the code can be found here.

Want to scan longer than one minute?

Generate your free "no-strings-attached" Trial License and properly test the Scanbot SDK.

Get your free Trial License

What do you think of this documentation?