Skip to main content

Document Quality Analyzer

The Document Quality Analyzer (DoQA) feature of the Scanbot SDK allows users to assess the quality of a scanned page or a raw image to run Optical Character Recognition (OCR) and extract the content. This tool evaluates factors such as visibility, clarity, and text readability, providing feedback on whether the image meets the desired quality standards. To arrive at the result, the visible text on the captured document or image is analyzed. Users can utilize this feature to ensure that their documents are captured with optimal quality and are suitable for subsequent OCR processing.

Functionality

DoQA assigns higher quality scores to text that is sharp, well-lit, and in focus, while blurry, poorly lit, or excessively small text receives lower scores. Currently, only texts using Latin characters (A-Z) and numbers (0-9) are supported. The analysis is performed on individual text segments, and the scores are aggregated to provide the following results:

  • Overall document quality score
  • Text orientation detection (supports only 90° rotations)
  • Histogram and heatmap of text scores. The histogram and heatmap can be used to implement custom statistical quality models. The heatmap also provides a visual representation of the quality assessment, aiding in debugging unexpected results.
  • Number of detected characters. A low number of detected characters often indicates a poor-quality scan or a non-document image.

Limitations

DoQA currently cannot detect the following quality issues:

  • Overflowing or cropped text
  • Text misaligned with image edges
  • Text with non-Latin characters
  • Pages with handwritten text
  • Creases or folds in the paper
  • Pages that contain only illustrations or very little text
  • Incorrect or incomplete document cropping. To automatically detect and crop documents in an image, please first use our Document Detection feature and then run the DoQA.

Analyze the quality of a document image

Analyze Document Quality
loading...

Want to scan longer than one minute?

Generate a free trial license to test the Scanbot SDK thoroughly.

Get your free Trial License

What do you think of this documentation?