DocumentScannerViewController: Discussion and Tutorial

At WWDC ’22, the Vision team at Apple introduced a new feature that makes scanning for live data a breeze. This feature takes the form of a DocumentScannerViewController. In this tutorial/walk-through, we’ll go through the problems this new controller tries to solve, its features, and its limitations. This article references code from Apple's session on Capture machine-readable codes and text with VisionKit from WWDC 2022.

What is Data Scanning?

Simply put, data scanning is a way for a sensor such as a camera, to read data from the real-world. This data might be in the form of text, barcodes, QR codes, and more. The live-text feature in iOS 15 is an example of a data scanner that detects text and other useful information from an image.

The Problem

Before iOS 16, scanning for data in a camera-feed required a multi-step approach consisting of writing code that interacted with more than one framework. Let’s go through an example.

AVFoundation

The AVFoundation framework lets us interact with the Audio-Visual sensors available on device. In a typical approach using this framework, we would first use an AVCaptureDevice to create an AVCaptureDeviceInput. We would then pass this input into a AVCaptureSession. This session would be connected to a preview layer to display the live camera feed using AVCaptureVideoPreviewLayer. Parallelly, this session would also provide us a AVCaptureMetadataOutput that results in an AVMetadataObject. This would yield us a way to capture machine readable codes such as barcodes, QR codes, and more.

Using AVFoundation with Vision

The AVFoundation method can only be used to detect machine readable codes in its default form. If you would like to also detect text from the camera-feed, you can attach the AVCaptureSession’s output to an AVCaptureVideoDataOutput. This gives us a stream of CMSampleBufferRef that can be fed into the Vision framework’s VNImageRequestHandler to perform observations. These observations take the form of vision observation objects.

How to ease this multi-step process?

In iOS 16, Apple introduces a new method for data scanning that encapsulates all of the aforementioned steps into one view-controller: Introducing the DataScannerViewController as part of the VisionKit framework. DataScannerViewController is a sub-class of a UIViewController. It combines the features of AVFoundation and Vision specifically for the purpose of data-scanning.

User facing features

Live Camera Preview: This view-controller displays a live camera feed preview just as it would if you’d have configured an AVCaptureVideoLayer manually.
Guidance: Provides small hints at the top such as “Slow Down” during data scanning.
Item highlighting: Provides item highlighting for detected data and a bounding box.
Tap-to-focus: The user can tap to focus on a different data object from within this controller without any additional code.
Pinch-to-zoom: Users can pinch-to-zoom into the frame to get a closer look at the item to be scanned.

Developer Features

Coordinates for recognised items are in view coordinates. This removes the need to convert from AVKit’s coordinate space to Vision’s coordinate space and back to the View’s coordinate space.
A region-of-interest can be set so that the controller only scans for items in a supplied area. This region is also marked by view coordinates.
Text content types and machine readable code symbologies can be specified so that the scanner only scans for items that you’re interested in.

Getting Started

Privacy Usage Description

Since we are accessing the camera, we have to provide a plist entry in our Xcode project for the Privacy-Camera Usage Description key. This key’s value will be the string that is shown in the prompt when the system presents an alert asking for permission to use the camera. It is important to be as descriptive, yet as concise, as possible so that your users know why their device’s camera is going to be used.

Code

The new DocumentScannerViewController is part of the VisionKit framework. Start by importing it.

import VisionKit

NOTE: Data Scanning isn’t supported on all devices.

In addition to requiring iOS 16, Apple restricts data scanning to devices launched in 2018 or later with the Apple Neural Engine built into their SoC. To check if the device is compatible with VisionKit’s new data-scanning controller use the isSupported class property on the DataScannerViewController.

DataScannerViewController.isSupported

We also need to check for availability.

DataScannerViewController.isAvailable

This boolean might be false if certain restrictions are placed within the system. For example, the user might have denied camera permission or they might have turned off camera usage for all apps through the Content and Privacy Restriction facility within Screen Time.

Back to Code

Start by defining a variable called recognizedDataTypes. This variable is a set of DataScannerViewController.RecognizedDataType items. Recognised data types consist of two broad categories: Text and Machine Readable code.

Text Types

These types are defined in DataScannerViewController.TextContentType. They consist of the following content types:

URL
dateTimeDuration
emailAddress
flightNumber
fullStreetAddress
shipmentTrackingNumber
telephoneNumber

.text(textContentType: .URL)

In addition to configuring the types of text that can be recognised, we can also pass in a list of languages that we’d like to recognise these texts in. If you know what languages to expect, you can list them out. If you don’t pass in any language explicitly, the scanner will use the user’s preferred language list. To specify a language:

.text(languages: ["en"])

The list of available languages can be accessed with another property on the class: supportedTextRecognitionLanguages

DataScannerViewController.supportedTextRecognitionLanguages

Use this list to access the most up-to-date list of available languages that can be used with the data scanner.

Machine Readable Code Types

These types are defined as part of VNBarcodeSymbology. These are some of the items that they contain. The full list can be found here.

qr
microQR
code128
ean13

To detect these, simply pass in another recognised data type to the variable.

.barcode(symbologies: [.qr, .ean13])

Your code defining the recognised data types should now look like this.

let recognizedDataTypes: Set<DataScannerViewController.RecognizedDataType> = [
	.barcode(symbologies: [.qr, .ean13]),
	.text(languages: ["en"])
]

We can now create an instance of DataScannerViewController and pass this set of data items that we’d like the scanner to recognise.

let controller = DataScannerViewController(recognizedDataTypes: recognizedDataTypes)

You can now present it like you would any other view-controller. Once the presentation is complete, call controller.startScanning().

present(controller, animated: true) {
	try? controller.startScanning()
}

Available Initialisation Parameters

recognizedDataTypes: The data types that you’d like to recognise.
qualityLevel: This can have 3 types: Balanced, Fast, or Accurate. Choose Fast if you’d like to perform detections quickly at the expense of accuracy. Choose Accurate if you’d like to detect small details such as micro QR codes. For most tasks, the Balanced level (also the default) is sufficient.
recognizesMultipleItems: Flag that determines if the scanner focuses on one item or looks to detect several in a frame.
isHighFrameRateTrackingEnabled: Enable this when you have your custom highlight that requires precise tracking as the camera’s preview frame keeps moving around.
isPinchToZoomEnabled: Allow the user to perform the pinch gesture on the preview window to zoom into and focus on a particular object.
isGuidanceEnabled: Allows for labels to be shown at the top of the view to help guide the user perform successful data scanning.
isHighlightingEnabled: You may choose to disable this if you’re drawing custom highlights.

Delegates

To make use of the data that is being scanned and to provide custom highlights we need to make use of the delegate methods. Start by providing a delegate to the instance of DataScannerViewController.

controller.delegate = self

Conform your presenter class to DataScannerViewControllerDelegate.

Handling Tap Interactions

The first method that we are going to discuss is

dataScanner(_:didTapOn:)

As the function signature suggests, this method is called when the user taps on an item that has scannable data in it. This happens when the user taps to focus on an object that has data items belonging to the set of recognised data items that was passed during the initialisation of this controller. From Apple’s WWDC ’22 session:

func dataScanner(_ dataScanner: DataScannerViewController, didTapOn item: RecognizedItem) {
    switch item {
    case .text(let text):
        print("text: \(text.transcript)")
    case .barcode(let barcode):
        print("barcode: \(barcode.payloadStringValue ?? "unknown")")
    default:
        print("unexpected item")
    }
}

Working with Custom Highlights

Adding Custom Highlights
As mentioned earlier, you can provide custom highlights that override the DataScannerViewController’s default highlighting view. Each recognised item that is passed back to us has a UUID that remains attached to that item during the lifetime of that particular data item. This means that you can track a particular data item from the time it enters the frame to the time it is last seen by the controller. Using this UUID, we can add highlights to particular items. Use a dictionary to keep track of these items.

// Dictionary to store our custom highlights keyed by their associated item ID.
var itemHighlightViews: [RecognizedItem.ID: HighlightView] = [:]

Now, you can add a new UIView for every item that is uniquely recognised by the scanner. To do this, use the didAdd method.

// For each new item, create a new highlight view and add it to the view hierarchy.
func dataScanner(_ dataScanner: DataScannerViewController, didAdd addItems: [RecognizedItem], allItems: [RecognizedItem]) {
    for item in addedItems {
        let newView = newHighlightView(forItem: item)
        itemHighlightViews[item.id] = newView
        dataScanner.overlayContainerView.addSubview(newView)
    }
}

Updating Custom Highlights
As I’ve mentioned previously, recognised items are given unique identifiers that can be used to track that item during its lifetime. Using another delegate method we can update the frame of the custom highlight view in real-time as the item moves across the preview window.

// Animate highlight views to their new bounds
func dataScanner(_ dataScanner: DataScannerViewController, didUpdate updatedItems: [RecognizedItem], allItems: [RecognizedItem]) {
    for item in updatedItems {
        if let view = itemHighlightViews[item.id] {
            animate(view: view, toNewBounds: item.bounds)
        }
    }
}

Removing Custom Highlights
Finally, once the recognised item is no longer in view, we need to remove the custom highlight view from the view hierarchy. To do this, access this delegate method.

// Remove highlights when their associated items are removed.
func dataScanner(_ dataScanner: DataScannerViewController, didRemove removedItems: [RecognizedItem], allItems: [RecognizedItem]) {
    for item in removedItems {
        if let view = itemHighlightViews[item.id] {
            itemHighlightViews.removeValue(forKey: item.id)
            view.removeFromSuperview()
        }
    }
}

Zooming

To get notified when the controller’s zoomFactor property changes, implement this delegate method.

func dataScannerDidZoom(_ dataScanner: DataScannerViewController) {
	print(controller.zoomFactor)
}

Error Handling

If, for any reason, the data scanner becomes unavailable during the lifecycle of the view-controller this delegate method will be called.

func dataScanner(
    _ dataScanner: DataScannerViewController,
    becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable
) {
	print(error.localizedDescription)
}

Continuous Async Stream of Recognised Items.

func updateViaAsyncStream() async {
    guard let scanner = dataScannerViewController else { return }

    let stream = scanner.recognizedItems
    for await newItems: [RecognizedItem] in stream {
        let diff = newItems.difference(from: currentItems) { a, b in
            return a.id == b.id
        }

        if !diff.isEmpty {
            currentItems = newItems
            sendDidChangeNotification()
        }
    }
}

Additional Features

Taking Photos
The DataScannerViewController can be used to take pictures of the current preview window. To take a picture and save it to the user’s photo gallery use this code:

if let image = try? await dataScanner.capturePhoto() {
    UIImageWriteToSavedPhotosAlbum(image, nil, nil, nil)
}

Zooming
You can programatically set the zoom levels of the controller and its preview window. To set the zoomFactor:

controller.zoomFactor = 0.4

You can decide this value using the minZoomFactor and the maxZoomFactor properties. These give you access to the camera’s minimum and maximum available zoom values respectively.

Scanning Process Related
To stop scanning call the stopScanning() method. To check if the scanner is active use the isScanning property on the controller.

Conclusion

DocumentScannerViewController is a powerful new way to scan for data in a live camera video feed. This process previously required a lot of code and required manual integration of VisionKit with AVKit. Although the limitations are quite obvious such as iOS 16 minimum version requirement, not compatible with old devices without Apple Neural Engine, etc, I still believe this API will be really useful in the years to come.