Skip to main content

Document Scanner | iOS Document Scanner

Document Scanner UI#

The Scanbot SDK comes with a view controller subclass that handles all the camera and detection implementation details for you. It provides a UI for document scanning guidance as well as a UI and functionality for manual and automatic shutter release.

The view controller's delegate can customize the appearance and behavior of the guidance UI. Furthermore, SBSDKScannerViewController gives its delegate control over how and when frames are analyzed and, most important, it delivers the scanned (and perspective corrected, cropped document) images to its delegate.

See SBSDKScannerViewControllerDelegate for customization of the UI and its behavior.

There are two ways to integrate the component into the application:

Classical UI Component#

The main class of the Classical UI component is SBSDKScannerViewController.

Usually this view controller is embedded as a child view controller into another view controller, the parent view controller. The parent view controller usually acts as the delegate and processes the recognition results. You still have full control over the UI elements and can add additional views and buttons to your view controller. The classical component does not display results, instead it just forwards them to the delegate.

import UIKitimport ScanbotSDK
class DocumentScannerSwiftViewController: UIViewController {
    // The instance of the scanner view controller.    var scannerViewController: SBSDKScannerViewController?
    override func viewDidLoad() {        super.viewDidLoad()
        // Create the SBSDKScannerViewController instance.        self.scannerViewController = SBSDKScannerViewController(parentViewController: self, imageStorage: nil)    }
}
extension DocumentScannerSwiftViewController: SBSDKScannerViewControllerDelegate {    func scannerController(_ controller: SBSDKScannerViewController,                           didDetect polygon: SBSDKPolygon?,                            with status: SBSDKDocumentDetectionStatus) {        // Process the detected document.    }}

Ready-To-Use UI Component#

The main class of the Ready-To-Use UI (RTU UI) component is SBSDKUIDocumentScannerViewController.

Usually this view controller is used as a separate screen for scanning documents. It returns results wrapped in an SBSDKUIDocument instance.

While you don't have direct control of the actual scanner view controller, you can use the SBSDKUIDocumentScannerConfiguration to customize it in a variety of ways, such as colors, texts and behavior.

import UIKitimport ScanbotSDK
class DocumentScannerUISwiftViewController: UIViewController {
    override func viewDidAppear(_ animated: Bool) {        super.viewDidAppear(animated)
        // Start scanning here. Usually this is an action triggered by some button or menu.        self.startScanning()    }
    func startScanning() {
        // Create the default configuration object.        let configuration = SBSDKUIDocumentScannerConfiguration.default()
        // Behavior configuration:        // e.g. enable multi page mode to scan several documents before processing the result.        configuration.behaviorConfiguration.isMultiPageEnabled = true
        // UI configuration:        // e.g. configure various colors.        configuration.uiConfiguration.topBarBackgroundColor = UIColor.red        configuration.uiConfiguration.topBarButtonsActiveColor = UIColor.white        configuration.uiConfiguration.topBarButtonsInactiveColor = UIColor.white.withAlphaComponent(0.3)
        // Text configuration:        // e.g. customize a UI element's text.        configuration.textConfiguration.cancelButtonTitle = "Cancel"
        // Present the recognizer view controller modally on this view controller.        SBSDKUIDocumentScannerViewController.present(on: self,                                                     with: configuration,                                                     andDelegate: self)    }}
extension DocumentScannerUISwiftViewController: SBSDKUIDocumentScannerViewControllerDelegate {    func scanningViewController(_ viewController: SBSDKUIDocumentScannerViewController,                                didFinishWith document: SBSDKUIDocument) {        // Process the scanned document.    }}

Document Representation#

The Scanbot SDK provides two abstractions to incapsulate scanned documents.

SBSDKUIPage - This class represents a scanned document page. It contains all needed information about the scanned page and gives you the ability to manage the scanned image in a variety of ways.

SBSDKUIDocument - This class represents a thread-safe container for SBSDKUIPage instances. It gives the ability to add, remove and replace pages in an array-like fashion.

Both these classes are widely used in all Ready to use UI components (more about RTU UI).

Examples of some basic operations on SBSDKUIPage and SBSDKUIDocument#

// For this example we're going to create a mock scanned document image.guard let documentImage = UIImage(named: "documentImage") else { return }
// Create a page with a UIImage instance.let page = SBSDKUIPage(image: documentImage, polygon: nil, filter: SBSDKImageFilterTypeNone)
// Return the result of the detected document on an image.let result = page.detectDocument(true)
// Rotate the image 180 degrees clockwise. Negative values will rotate the image counter-clockwise.page.rotateClockwise(2)
// Return the detected document preview image.let previewImage = page.documentPreviewImage()
// Return the url of the original image.let originalImageURL = page.originalImageURL()
// Create an empty document instance.let document = SBSDKUIDocument()
// Add the page to the document.document.add(page)
// Replace the first page of the document with the new page.document.replacePage(at: 0, with: page)
// Remove the first page from the document.document.removePage(at: 0)
// Find the index of the page by its identifier.let index = document.indexOfPage(withPageFileID: page.pageFileUUID)

Image Editing#

The Scanbot SDK provides the ability to move the edge and corner handles to redefine the polygon manually and the ability to rotate the scanned/imported image in 90 degree steps (counter-)clockwise.

There are two ways to integrate the component into the application:

Classical UI Component#

The main class of the Classical UI component is SBSDKImageEditingViewController.

Usually this view controller is embedded as a child view controller into another view controller, the parent view controller. The parent view controller usually acts as the delegate and processes the recognition results. You still have full control over the UI elements and can add additional views and buttons to your view controller.

import UIKitimport ScanbotSDK
class ImageEditingSwiftViewController: UIViewController {
    // Image to edit.    var editingImage: UIImage?
    override func viewDidAppear(_ animated: Bool) {        super.viewDidAppear(animated)
        // Check if the image to edit is not nil.        guard let image = self.editingImage else { return }
        // Create editing view controller.        let editingViewController = SBSDKImageEditingViewController()
        // Set the editing view controller's image.        editingViewController.image = image
        // Set self as a delegate.        editingViewController.delegate = self
        // Create and set up a navigation controller to present control buttons.        let navigationController = UINavigationController(rootViewController: editingViewController)        navigationController.modalPresentationStyle = .fullScreen
        // Present editing screen modally.        self.present(navigationController, animated: true, completion: nil)    }}
extension ImageEditingSwiftViewController: SBSDKImageEditingViewControllerDelegate {
    // Create a custom cancel button.    func imageEditingViewControllerCancelButtonItem(_ editingViewController: SBSDKImageEditingViewController) -> UIBarButtonItem? {        return UIBarButtonItem(systemItem: .cancel)    }
    // Create a custom save button.    func imageEditingViewControllerApplyButtonItem(_ editingViewController: SBSDKImageEditingViewController) -> UIBarButtonItem? {        return UIBarButtonItem(systemItem: .save)    }
    // Create a custom button for clockwise rotation.    func imageEditingViewControllerRotateClockwiseToolbarItem(_ editingViewController: SBSDKImageEditingViewController) -> UIBarButtonItem? {        return UIBarButtonItem(title: "Rotate clockwise",                               style: .plain,                               target: nil,                               action: nil)    }
    // Create a custom button for counter-clockwise rotation.    func imageEditingViewControllerRotateCounterClockwiseToolbarItem(_ editingViewController: SBSDKImageEditingViewController) -> UIBarButtonItem? {        return UIBarButtonItem(title: "Rotate counter-clockwise",                               style: .plain,                               target: nil,                               action: nil)    }
    // Handle canceling the changes.    func imageEditingViewControllerDidCancelChanges(_ editingViewController: SBSDKImageEditingViewController) {        self.dismiss(animated: true, completion: nil)    }
    // Handle applying the changes.    func imageEditingViewController(_ editingViewController: SBSDKImageEditingViewController,                                    didApplyChangesWith polygon: SBSDKPolygon, croppedImage: UIImage) {        // Process edited image.        self.dismiss(animated: true, completion: nil)    }}

Ready-To-Use UI Component#

The main class of the Ready-To-Use UI (RTU UI) component is SBSDKUICroppingViewController.

Usually this view controller is used as a separate screen for editing scanned/imported images.

While you don't have direct control of the actual scanner view controller, you can use the SBSDKUICroppingScreenConfiguration to customize it in a variety of ways, such as colors, texts and behavior.

import Foundationimport ScanbotSDK
class ImageEditingUISwiftViewController: UIViewController {
    // Page to edit.    var editingPage: SBSDKUIPage?
    override func viewDidAppear(_ animated: Bool) {        super.viewDidAppear(animated)
        // Check if the page to edit exists.        guard let page = self.editingPage else { return }
        // Create the default configuration object.        let configuration = SBSDKUICroppingScreenConfiguration.default()
        // Behavior configuration:        // e.g disable the rotation feature.        configuration.behaviorConfiguration.isRotationEnabled = false
        // UI configuration:        // e.g. configure various colors.        configuration.uiConfiguration.topBarBackgroundColor = UIColor.red        configuration.uiConfiguration.topBarButtonsColor = UIColor.white
        // Text configuration:        // e.g. customize a UI element's text        configuration.textConfiguration.cancelButtonTitle = "Cancel"
        // Present the recognizer view controller modally on this view controller.        SBSDKUICroppingViewController.present(on: self,                                              with: page,                                              with: configuration,                                              andDelegate: self)
    }}
extension ImageEditingUISwiftViewController: SBSDKUICroppingViewControllerDelegate {    func croppingViewController(_ viewController: SBSDKUICroppingViewController, didFinish changedPage: SBSDKUIPage) {        // Process the edited page and dismiss the editing screen        viewController.dismiss(animated: true, completion: nil)    }}

Image Processing#

See SBSDKImageProcessor.

Digital image processing is a core part of the Scanbot SDK. Basically there are three operations on images:

  • Rotation
  • Image filtering
  • Image warping (perspective correction and cropping) into a 4-sided polygon

All of these image operations can be called either synchronously in any thread or queue or asynchronously on a special serial image processing queue. When working with large images it is highly recommended to make use of the asynchronous API as no parallel processing of images is possible. Processing large images concurrently easily causes memory warnings and crashes.

The synchronous API can be found in the UIImageSBSDK class extension.

The asynchronous API is implemented as static class SBSDKImageProcessor. In addition to the three standard operations, SBSDKImageProcessor provides a method to apply custom image processing by specifying an SBSDKImageProcessingHandler block. Execution is also dispatched to the image processing queue. The operations' completion handlers are called in the main thread.

Each call into the asynchronous API returns an SBSDKProgress object. This NSProgress subclass can be used to observe the progress of the operation but it can also be used to cancel the operation via the -(void)cancel method.

Example code for custom asynchronous image filter#

// Specify the file URL for the input imageguard let inputImageURL = URL(string: "...") else { return }
// Specify the file URL the output image is written to. Set to nil if you don't want to save the output imagelet outputImageURL = URL(string: "...")
// Create the image processing closurelet processingHandler: SBSDKImageProcessingHandler = { sourceImage, outError in    // Apply a color filter to the input image,    let filteredImage = sourceImage.sbsdk_imageFiltered(by: SBSDKImageFilterTypeColor)
    // and return the filtered image.    return filteredImage}let progress = SBSDKImageProcessor.customFilterImage(inputImageURL,                                                     processingBlock: processingHandler,                                                     outputImageURL: outputImageURL) { isFinished, error, resultInfo in    let outputImage = resultInfo?[SBSDKResultInfoDestinationImageKey] as? UIImage}

Example code for detecting and applying a polygon to an image:#

// Specify the file URL for the input imageguard let inputImageURL = URL(string: "...") else { return }
// Load the image from the specified pathguard let inputImage = UIImage(contentsOfFile: inputImageURL.path) else { return }
// Specify the file URL the output image is written to. Set to nil if you don't want to save the output imagelet outputImageURL = URL(string: "...")
// Create a document detector.let detector = SBSDKDocumentDetector()
// Let the document detector run on the input image.let result = detector.detectDocumentPolygon(on: inputImage,                                            visibleImageRect: .zero,                                            smoothingEnabled: false,                                            useLiveDetectionParameters: false)
// Check the result.if result.status == SBSDKDocumentDetectionStatusOK, let polygon = result.polygon {
    // If the result is an acceptable polygon, we warp the image into the polygon asynchronously.    // When warping is done we check the result and on success we extract the output image.    // Then do whatever you want with the warped image.    SBSDKImageProcessor.warpImage(inputImageURL,                                  polygon: polygon,                                  outputImageURL: outputImageURL) { isFinished, error, resultInfo in        if isFinished && error == nil {            let outputImage = resultInfo?[SBSDKResultInfoDestinationImageKey] as? UIImage        }    }} else {    // No acceptable polygon found.}

PDF Creation#

The SBSDKPDFRenderer static class takes an image storage and renders the contained images into a PDF. For each image a page is generated. The generated pages have sizes that correspond to DIN A4, US Letter or Custom. As the images are embedded unscaled the resolution for each page depends on its image. When rendering into a DIN A4 or US Letter format the orientation of the page (landscape or portrait) is derived from the image's aspect ratio.

PDFs can be encrypted using SBSDKAESEncrypter or your custom written encryption classes. The PDF's data is encrypted in memory before it is written to disk. To decrypt the PDF you need to run proper decryption in your backend or clients.

NOTE: The Scanbot SDK does not lock the PDF with the password, but rather encrypts the actual file. This provides the best level of protection. To decrypt the PDF file you can use the key property of SBSDKAESEncrypter or generate the key yourself using salt, password and iterations.

See SBSDKPDFRendererPageSize for further information.

The operations' completion handlers are called in main thread.

Example code for creating a standard PDF from an image storage:#

// Create an image storage to save the captured document images tolet imagesURL = SBSDKStorageLocation.applicationDocumentsFolderURL().appendingPathComponent("Images")let imagesLocation = SBSDKStorageLocation.init(baseURL: imagesURL)guard let imageStorage = SBSDKIndexedImageStorage(storageLocation: imagesLocation) else { return }
// Define the indices of the images in the image storage you want to render into a PDF, e.g. the first 3.// To include all images you can simply pass nil for the indexSet. The indexSet is validated internally.// You don't need to concern yourself with the validity of all the indices.let indexSet = IndexSet(integersIn: 0...2)
// Specify the file URL where the PDF will be saved to. Nil makes no sense here.guard let outputPDFURL = URL(string: "outputPDF") else { return }
// In case you want to encrypt your PDF file, create encrypter using a password and an encryption mode.let encrypter = SBSDKAESEncrypter(password: "password_example#42", mode: .AES256)
// Enqueue the operation and store the SBSDKProgress to watch the progress or cancel the operation.// After completion the PDF is stored at the URL specified in outputPDFURL.// You can also extract the image store and the PDF URL from the resultInfo.let progress = SBSDKPDFRenderer.renderImageStorage(imageStorage,                                                   copyImageStorage: true,                                                   indexSet: indexSet,                                                   with: .autoLocale,                                                   encrypter: encrypter,                                                   output: outputPDFURL) { isFinished, error, resultInfo in    if isFinished && error == nil {        let completedImageStore = resultInfo?[SBSDKResultInfoImageStorageKey] as? SBSDKIndexedImageStorage        let completedPDFURL = resultInfo?[SBSDKResultInfoDestinationFileURLKey] as? URL
    }}

TIFF Creation#

The Scanbot SDK provides the ability to write scanned images into a TIFF file.

The SBSDKTIFFImageWriter contains convenient functions to write the scanned images into a multi-page TIFF file, adjust the TIFF file's parameters and encrypt the newly created file.

NOTE: The Scanbot SDK can optionally encrypt the TIFF file. This provides the best level of protection. To decrypt the TIFF file you can use the key property of SBSDKAESEncrypter or generate the key yourself using salt, password and iterations.

Example code for creating a TIFF file from the scanned images:#

    // For this example we're using an empty array, but there should be scanned images in it.    let scannedImages: [UIImage] = []
    // Specify the file URL where the TIFF will be saved to. Nil makes no sense here.    guard let outputTIFFURL = URL(string: "outputTIFF") else { return }
    // In case you want to encrypt your TIFF file, create encrypter using a password and an encryption mode.    let encrypter = SBSDKAESEncrypter(password: "password_example#42",                                      mode: .AES256)
    // The `SBSDKTIFFImageWriter` has parameters where you can define various options,    // e.g. compression algorithm or whether the document should be binarized.    // For this example we're going to use the default parameters.    let parameters = SBSDKTIFFImageWriterParameters.default()
    // Write a TIFF file with scanned images into the defined URL.    // The result of this function is a boolean signifying whether the operation was successful or not.    let result = SBSDKTIFFImageWriter.writeTIFF(scannedImages,                                                fileURL: outputTIFFURL,                                                encrypter: encrypter,                                                parameters: parameters)

Optical Character Recognition#

The Scanbot OCR feature is based on the Tesseract OCR engine with some modifications and enhancements. The Scanbot SDK uses an optimized custom library of the Tesseract OCR under the hood and provides a convenient API.

For each desired OCR language a corresponding .traineddata file (aka. tessdata) must be installed in the optional resource bundle named ScanbotSDKOCRData.bundle. Also, the special data file osd.traineddata is required and must be installed. It is used for orientation and script detection.

The ScanbotSDK.framework itself does not contain any OCR language files to keep the framework small in size. The optional bundle ScanbotSDKOCRData.bundle, provided in the ZIP archive of the Scanbot SDK, contains the language files for English and German as well as the osd.traineddata as examples. You can replace or complete these language files as needed. Add this bundle to your project and make sure that it is copied along with your resources into your app.

Preconditions to achieve a good OCR result#

Conditions while scanning#

A perfect document for OCR is flat, straight, doesn't show large shadows, folds, or any other objects that could distract it and is in the highest possible resolution. Our UI and algorithms do their best to help you meet these requirements. But as in photography, you can never fully get the image information back that was lost during the shot.

Languages#

You can use multiple languages for OCR. But since the recognition of characters and words is a very complicated process, increasing the number of languages lowers the overall precision. With more languages, there are more results where the detected word could match. We suggest using as few languages as possible. Make sure that the language you're trying to detect is supported by the SDK and added to the project.

Size and position#

Put the document on a flat surface. Take the photo from straight above in parallel to the document to make sure that perspective correction doesn't need to be applied much. The document should fill as much of the camera frame as possible while still showing all of the text that needs to be recognized. This results in more pixels for each character that needs to be detected and hence, more detail. Skewed pages decrease the recognition quality.

Light and shadows#

More ambient light is always better. The camera takes the shot at a lower ISO value, which results in less grainy photos. You should make sure that there are no visible shadows. If you have large shadows, it's better to take the shot at an angle instead. We also do not recommend using the flashlight - from this low distance, it creates a light spot at the center of the document, which decreases the quality.

Focus#

The document needs to be properly focused so that the characters are sharp and clear. The auto-focus of the camera works well if you meet the minimum required distance for the lens to be able to focus. This usually starts at 5-10cm.

Typefaces#

The OCR trained data is optimized for common serif and sans-serif font types. Decorative or script fonts decrease the quality of the detection a lot.

Implementing OCR#

Download OCR files#

You can find a list of all supported OCR languages and download links on this Tesseract wiki page.

⚠️️️ Please choose and download the proper version of the language data files:

OCR API#

The SBSDKOpticalTextRecognizer takes one or more images and performs various text related operations on each of the images:

  • Page layout analysis
  • Text recognition
  • Creation of searchable PDF documents with selectable text

The page layout analysis returns information about page orientation, the angle the image should be rotated/tilted by to deskew it, the text writing direction or the text line order.

The text recognition operations take either a collection of images (SBSDKImageStoring) and optionally create a PDF of it, or a single image. The single image operation can also accept a rectangle determining which area of the image the text recognition should be applied to. The results found in the completion handler's resultsDictionary contain information about the found text, where the text was found (boundingBox) and what kind of text it is (word, line, paragraph).

All SBSDKOpticalTextRecognizer operations run in a separate serial queue. The operations' completion handlers are called in the main thread.

Example code for performing a page layout analysis:#

// The file URL of the image we want to analyze.guard let imageURL = URL(string: "...") else { return }
// Start the page layout analysis and store the returned SBSDKProgress object. This object can be used to cancel// the operation or to observe the progress. See NSProgress.// In completion check if we finished without error and extract the analyzer result from the resultInfo dictionary.let progress = SBSDKOpticalTextRecognizer.analyseImagePageLayout(imageURL) { isFinished, error, resultInfo in    if isFinished && error == nil {        if let result = resultInfo?[SBSDKResultInfoPageAnalyzerResultsKey] as? SBSDKPageAnalyzerResult {            // Now we can work with the result.        }    }}

Example code for performing text recognition on an image:#

// The file URL of the image we want to analyze.guard let imageURL = URL(string: "...") else { return }
// Enqueue the text recognition operation.// We limit detection to the center area of the image leaving margins of 25% on each side.// Only recognize English.// The returned SBSDKProgress object can be used to cancel the operation or observe the progress.// Upon completion we extract the result from the resultsDictionary and log the whole recognized text.// Then we enumerate all words and log them to the console together with their confidence values and bounding boxes.let progress = SBSDKOpticalTextRecognizer.recognizeText(imageURL,                                                        rectangle: CGRect(x: 0.25, y: 0.25, width: 0.5, height: 0.5),                                                        languageString: "eng") { isFinished, error, resultInfo in    if let result = resultInfo?[SBSDKResultInfoOCRResultsKey] as? SBSDKOCRResult {        // Now we can work with the result.    }}