Using the Linux Document Scanner SDK's Document Quality Analyzer feature
Overview
The Document Quality Analyzer (DoQA) feature of the Scanbot SDK allows users to assess the quality of an image to determine if it is suitable for Optical Character Recognition (OCR) and content extraction. This tool evaluates factors such as visibility, clarity, and text readability, providing feedback on whether the image meets the required quality standards.
The analysis is based on the visible text within the provided image. By using this feature, users can ensure that their images are captured with sufficient quality and are suitable for subsequent OCR processing.
Functionality
The DoQA assigns higher quality scores to text that is sharp, well-lit, and in focus, while blurry, poorly lit, or excessively small text receives lower scores.
Currently, only text using Latin characters (A–Z) and numbers (0–9) is supported. The analysis is performed on individual text segments, and the scores are aggregated to provide the following results:
- Overall quality score
- Text orientation detection (supports only 90° rotations)
- Histogram and heatmap of text scores. These can be used to implement custom statistical quality models.
The heatmap also provides a visual representation of the quality assessment, aiding in debugging unexpected results. - Number of detected characters. A low number of detected characters often indicates a poor-quality image or a non-document photo.
Limitations
The DoQA currently cannot detect the following quality issues:
- Overflowing or cropped text
- Text misaligned with image edges
- Text with non-Latin characters
- Handwritten text
- Creases or folds in the paper
- Images that contain only illustrations or very little text
- Incorrect or incomplete document cropping.
To automatically detect and crop documents in an image, first use the SDK's built-in document detection feature, and then run the DoQA.
Get in touch
If you need further information or are interested in licensing the Scanbot SDK, please get in touch with our solution experts.