In this tutorial you’ll see how to extract frames from live camera streams, regular movie files and streamed movie files and display them on a MetalKit View. Using Metal allows you to have a great deal of control over how the pixels are rendered and ensures that the GPU is used for rendering, which keeps from slowing down the CPU.
The tutorial code will always convert the video frames into CIImage
objects before display. This is to help you adapt the code to your own applications. Applying filters, resizing and other effects are fast and easy when working with CIImage
. Additionally, as long as you set your CIContext
to the GPU, you can be sure that your code uses the GPU and the CPU efficiently.
When working with Video, Apple provides a number of frameworks that operate at different levels. For the higher level frameworks such as AVKit or CoreImage, the system determines when to use the GPU and when to use the CPU for rendering and computation. The higher-level frameworks also offer a tradeoff between ease of use and granularity of control. If you want to have direct access to tell the GPU how to render every pixel, then you will want to use Metal. Remember, though, that you are now responsible for keeping video and audio in sync, encoding and decoding data and determining a reasonable UI for your user. For many use cases, using a higher level framework is probably a better choice. Apple’s engineers have worked to ensure the graphics frameworks use the GPU and CPU in reasonable ways. However, when you need to use Metal, you you need it. So, let's get started.
Setting Up a MTKView
An MTKView
is a subclass of a UIView
so it has a frame and bounds and other properties. Its drawing and rendering is backed directly by drawables and textures rendered on the GPU, so drawing to it can be quite fast. In addition to displaying video, you can render 3D model objects and graphics shaders, so for a graphics rich application it can be a powerful tool.
Before you can send data to the view it needs to be configured. MetalKit Views are still heavily influenced by UIKit, so if you are working in a SwiftUI project, you will need to wrap them in ViewRepresentable code.
//MetalKit Variables
@IBOutlet var displayView: MTKView!
var metalDevice : MTLDevice!
var metalCommandQueue : MTLCommandQueue!
//CoreImage Variables
var ciContext : CIContext!
var filteredImage: CIImage?
var cleanImage: CIImage?
//get a reference to the GPU
metalDevice = MTLCreateSystemDefaultDevice()
//link the GPU to our MTKView
displayView.device = metalDevice
//link our command queue variable to the GPU
metalCommandQueue = metalDevice.makeCommandQueue()
//associate our CIContext with the metal stack
ciContext = CIContext(mtlCommandQueue: metalCommandQueue)
You will need to get a reference to the GPU of the iOS device. At least for now, iOS devices only have one GPU. Then you will need to link the displayView
and the metalCommandQueue
to the metalDevice
. Finally we will link the ciContext
to the GPU so that all of our image manipulation code will run in the same place.
//tell our MTKView that we want to call .draw() to make updates
displayView.isPaused = true
displayView.enableSetNeedsDisplay = false
//let it's drawable texture be writen to dynamically
displayView.framebufferOnly = false
//set our code to be our MTKView's delegate
displayView.delegate = self
In the code above, we set the MTKView
to be in a isPaused
state and do not enable setNeedsDisplay
. This will ensure that it will only redraw when we explicitly tell it to redraw using a .draw()
method in our delegate. By setting framebufferOnly
to false
you are telling the view that you will be writing to it multiple times and may also read from it.
Drawing in an MTKView
Now the MTKView is configured. The next step is to update the delegate methods. The MTKViewDelegate
has two methods. If your code needs to respond to the view dimensions changing (to support device rotation, or if you want the user to be able to resize the window) make your adjustments in func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize)
. For this example we are only interested in the delegate method: func draw(in view: MTKView)
. The code below draws a CIImage
to the MTKView
. When showing video, we can extract each frame of the video as a CIImage
//create a buffer to hold this round of draw commands
guard let commandBuffer = metalCommandQueue.makeCommandBuffer() else {
return
}
//grab the filtered or a clean image to display
guard let ciImage = filteredImage ?? cleanImage else {
return
}
//get a drawable if the GPU is not busy
guard let currentDrawable = view.currentDrawable else {
return
}
//make sure frame is centered on screen
let heightOfImage = ciImage.extent.height
let heightOfDrawable = view.drawableSize.height
let yOffset = (heightOfDrawable - heightOfImage)/2
//render into the metal texture
self.ciContext.render(ciImage,
to: currentDrawable.texture,
commandBuffer: commandBuffer,
bounds: CGRect(origin: CGPoint(x: 0, y: -yOffset),
size: view.drawableSize),
colorSpace: CGColorSpaceCreateDeviceRGB())
//present the drawable and buffer
commandBuffer.present(currentDrawable)
//send the commands to the GPU
commandBuffer.commit()
For each pass of the draw method, we will create a new MTLCommandBuffer
then we will get the MTKView
's .currentDrawable
which contains a texture
we can send pixel data to. The texture will have a size and a color space. Though we are using Metal to render images to the screen, Metal can be used to send any valid commands to the GPU for things like computations. Once the buffer has been filled with commands it can be assigned to the drawable and committed. When the buffer is committed, the GPU will execute all of the commands. If another .draw()
gets called while the buffer is being executed, the GPU will not interrupt the current buffer in order to start the new one.
The bounds
of the render allow you to move where the CIImage
gets displayed in the MTKView
and resize the CIImage
within the view. This can be valuable when compositing multiple images or video streams onto the same MTKView
. Remember that unlike a UIView
the origin point is the bottom left of the texture and CIImage
.
Since iOS 11 Apple provides a new lighter weight API for sending renders to the buffer. Use whichever form makes sense to you.
//render into the metal texture
let destination = CIRenderDestination(mtlTexture: currentDrawable.texture,
commandBuffer: commandBuffer)
do {
try self.ciContext.startTask(toRender: ciImage, to: destination)
} catch {
print(error)
}
The CIRenderDestination
has some other optional parameters such as height
and width
but will always display with an origin of 0,0. So, unlike the earlier call, if you need to rotate or change the dimensions of the image, you will need to do it to the CIImage
earlier in the code. However, if your app always displays the video at full size in the MTKView
this may be easier. Also the startTask
call will return immediately, instead of waiting for other tasks on the CPU to complete.
It is important to remember that you can have any number of renders in the commandBuffer
before the calls to .present
and .commit
. This means that the same MTKView
can display video from multiple streams as well as static images or animations.
Working with Local Files
In order to extract individual frames from a local file, we can use an AVAssetReader
to get pixel data to render in the MTKView
When using higher-level frameworks, a quick way to display a video file for playback is to load it into an AVPlayer
. However, an alternative is to use an AVAssetReader
to read the tracks and the individual frames from the video tracks.
let asset = AVAsset(url: Bundle.main.url(forResource: "grocery-train", withExtension: "mov")!)
let reader = try! AVAssetReader(asset: asset)
guard let track = asset.tracks(withMediaType: .video).last else {
return
}
let outputSettings: [String: Any] = [
kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32ARGB
]
let trackOutput = AVAssetReaderTrackOutput(track: track, outputSettings: outputSettings)
reader.add(trackOutput)
reader.startReading()
It is important that the outputSettings
of the reader match the image format of the video track. If there is a mismatch the color may be off or the reader may be unable to extract a usable buffer. If your code is generating empty or black buffers, the outputSettings
are the first place to troubleshoot. Once the reader starts reading then it can extract pixel buffers. You can use a while
loop to get all of the buffers from the track and send them to the MTKView
for display.
var sampleBuffer = trackOutput.copyNextSampleBuffer()
while sampleBuffer != nil {
guard let cvBuffer = CMSampleBufferGetImageBuffer(sampleBuffer!) else {
return
}
print(track.preferredTransform)
print(sampleBuffer?.outputPresentationTimeStamp)
//get a CIImage out of the CVImageBuffer
cleanImage = CIImage(cvImageBuffer: cvBuffer)
displayView.draw()
sampleBuffer = trackOutput.copyNextSampleBuffer()
}
}
In the code above, use CMSampleBufferGetImageBuffer
to ensure that we have a valid image. Then assign the image to a CoreImage
and execute the .draw
method of the MTKView
. Afterwards, get the next sample buffer. This will continue until the end of the track, when .copyNextSampleBuffer()
will return nil
.
In the code above, there are two print
statements to note some valuable information that your app may want to store for later use. Remember, when working with buffers and the GPU directly, much of the higher-level metadata about the video is lost, so you’ll need to keep track of it in your code if you want to use it later. An image may be rotated because of the way it was originally generated in the video. The preferredTransform
will provide data that your app can use to transform the image to the “proper” orientation before display. Additionally the outputPresentationTimeStamp
tells when the image represented by the buffer appeared in the original video. This can be helpful when you are trying to sync individual frames back to audio tracks or if your app only wants to modify specific frames and then reinsert them into the original clip.
Working with Streams
In addition to local files, an app may want to use a video stream as the input. The process is largely the same, as there is a method to extract a CVPixelBuffer
from an AVPlayer
that is streaming. You can find the complete example code for pushing pixel buffers from a stream to an MTKView
in this blog post about streaming. However, the important part of the code perhaps looks familiar by now:
let currentTime = playerItemVideoOutput.itemTime(forHostTime: CACurrentMediaTime())
if playerItemVideoOutput.hasNewPixelBuffer(forItemTime: currentTime) {
if let buffer = playerItemVideoOutput.copyPixelBuffer(forItemTime: currentTime, itemTimeForDisplay: nil) {
let frameImage = CIImage(cvImageBuffer: buffer)
self.currentFrame = frameImage //a CIImage var
self.videoView.draw() //our MTKView
}
}
The code above uses a CADisplayLink
to query the stream on a regular basis for new video frames. It then extracts a buffer and sends it over the MTKView
for display.
Working with the Cameras
Much like a video stream with an AVPlayerItemVideoOutput
to output pixel buffer data, a standard object to use with the camera is an AVCaptureVideoDataOutput
object. First, you need to set up a standard capture session for either the front or back camera. Then attach a video data output object to the session.
videoOutput = AVCaptureVideoDataOutput()
let videoQueue = DispatchQueue(label: "captureQueue", qos: .userInteractive)
videoOutput.setSampleBufferDelegate(self, queue: videoQueue)
if captureSession.canAddOutput(videoOutput) {
captureSession.addOutput(videoOutput)
} else {
fatalError("configuration failed")
}
Whenever the camera has collected enough data to generate a pixel buffer it will send that to its delegate. The delegate can then convert that data to a CIImage
and send it to the MTKView
to get rendered. The method in a AVCaptureVideoDataOutputSampleBufferDelegate
would look like this.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
//get a CVImageBuffer from the camera
guard let cvBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
//get a CIImage out of the CVImageBuffer
cleanImage = CIImage(cvImageBuffer: cvBuffer)
displayView.draw()
}
Going Further
In this tutorial, you saw how to extract pixel buffers from the camera, local file and remote streams and convert them to CIImage
Then you saw how to render a CIImage
to fill all or part of an MTKView
. If your application needs to resize or apply filters to the images, you can do that before calling the .draw
method of the MTKView
. If the only reason you are considering using MKTViews
is because of a Metal or OpenGL kernel you want to use, consider our tutorial on how to wrap kernels into Core Image filters.
If your application needs to gather streams of video, filter and then render them to the screen with precision, then drawing to an MTKView
may be sufficient. However, if you want to let your users dictate how and where to filter the streams, then you may want to consider an SDK like VideoEditor SDK or CreativeEditor SDK.