<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Core ML – IMG.LY Blog</title><description>Posts tagged Core ML on the IMG.LY blog.</description><link>https://img.ly/blog/tag/core-ml/</link><language>en-us</language><image><url>https://img.ly/apple-touch-icon.png</url><title>Core ML – IMG.LY Blog</title><link>https://img.ly/blog/tag/core-ml/</link></image><atom:link href="https://img.ly/blog/tag/core-ml/rss.xml" rel="self" type="application/rss+xml"/><generator>Astro</generator><lastBuildDate>Wed, 24 Jun 2026 09:29:46 GMT</lastBuildDate><ttl>60</ttl><item><title>How to Remove Backgrounds Using Core ML</title><link>https://img.ly/blog/how-to-remove-backgrounds-using-coreml/</link><guid isPermaLink="true">https://img.ly/blog/how-to-remove-backgrounds-using-coreml/</guid><description>Bring background removal and replacement to your iOS application.</description><pubDate>Tue, 28 Jun 2022 15:26:53 GMT</pubDate><content:encoded>&lt;p&gt;In this tutorial, you will learn how to use machine learning to identify an image background and mask it out. This is particularly useful for making stickers or avatars or adding a fake background to a video call. The process of assigning the pixels in an image to a specific object is called “segmentation”. Apple provides an optimized method with pictures of people in its &lt;a href=&quot;https://developer.apple.com/documentation/vision&quot;&gt;Vision framework&lt;/a&gt;. For performing the same tasks with non-human subjects, you can use the &lt;a href=&quot;https://github.com/tensorflow/models/tree/master/research/deeplab&quot;&gt;DeepLabV3 machine learning model with Core ML&lt;/a&gt;. The code examples in this tutorial have been tested using Xcode 13 and Swift 5. Because of the use of Core ML, Vision, and CoreImage in this tutorial, you should run the demo code on a device, not on the Simulator. An iOS project with the demo code is &lt;a href=&quot;https://github.com/waltertyree/super-eureka&quot;&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Image segmentation is a different process and requires other machine learning models than image recognition. With recognition, the model produces bounding rectangles that the system believes to contain the entire object. With segmentation, the model identifies the actual pixels of the object.&lt;/p&gt;
&lt;p&gt;In this tutorial, you’ll start with an image of your subject. Then you’ll generate a mask for the background using Vision. Finally, you’ll use CoreImage filters to blend the original image, image mask, and the new image background.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;segmentation-process-1&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 700px) 700px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;700&quot; height=&quot;666&quot; src=&quot;https://img.ly/_astro/segmentation-process-1_Z2lEfI5.webp&quot; srcset=&quot;/_astro/segmentation-process-1_2i2R9H.webp 640w, /_astro/segmentation-process-1_Z2lEfI5.webp 700w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Whether using the Vision framework alone or supplementing Vision with another Core ML model – the process will be the same:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a Vision request object with some parameters.&lt;/li&gt;
&lt;li&gt;Create a Vision request handler with the image to be processed.&lt;/li&gt;
&lt;li&gt;Process the image with the handler and object.&lt;/li&gt;
&lt;li&gt;Process the results into the final image using CoreImage.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When working with still images, Apple’s CoreImage framework is usually the best option, mainly because of the large number of available filters and the ability to create reusable pipelines that you can run on the CPU or the GPU.&lt;/p&gt;
&lt;h2 id=&quot;using-vision-to-segment-people&quot;&gt;Using Vision to Segment People&lt;/h2&gt;
&lt;p&gt;Without an external model, Vision can only segment documents or people in an image. It is vital to understand that Vision will only identify a pixel in the image as “this pixel is part of a person” or “this pixel is not part of a person.” If an image contains a group of people, Vision will not be able to separate individuals.&lt;/p&gt;
&lt;p&gt;To segment the image, start by creating an instance of a &lt;code&gt;VNGeneratePersonSegmentationRequest&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&quot;astro-code github-dark&quot; tabindex=&quot;0&quot; data-language=&quot;swift&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;var&lt;/span&gt;&lt;span&gt; segmentationRequest &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; VNGeneratePersonSegmentationRequest&lt;/span&gt;&lt;span&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;segmentationRequest.qualityLevel &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; .balanced&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This type of request has a few options you can set. &lt;code&gt;.qualityLevel&lt;/code&gt; can be &lt;code&gt;.fast&lt;/code&gt;, &lt;code&gt;.balanced&lt;/code&gt; or &lt;code&gt;.accurate&lt;/code&gt;. The level of &lt;code&gt;.accurate&lt;/code&gt; is the default. This will determine how closely the mask conforms to the boundaries of the original image. The different levels process at different speeds. The &lt;code&gt;.fast&lt;/code&gt; setting is intended for use in a video so that frames don’t get dropped. Using &lt;code&gt;.balanced&lt;/code&gt; or &lt;code&gt;.accurate&lt;/code&gt; causes noticeable delay in an app processing the image on most devices. Experiment with different settings depending on your needs.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;speed-compare&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 700px) 700px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;700&quot; height=&quot;516&quot; src=&quot;https://img.ly/_astro/speed-compare_Z1SlWxp.webp&quot; srcset=&quot;/_astro/speed-compare_ywD9u.webp 640w, /_astro/speed-compare_Z1SlWxp.webp 700w&quot;&gt;&lt;/p&gt;
&lt;p&gt;The output of the segmentation request will be a &lt;code&gt;CVPixelBuffer&lt;/code&gt;. This is a structure that contains information for each pixel of the image. Most of Apple’s video frameworks as well as CoreImage can work with pixel buffers. The default for &lt;code&gt;VNGeneratePersonSegmentationRequest&lt;/code&gt; is a buffer where the color of each pixel is represented by an 8-bit number. Any pixel that Vision thinks contains part of a person will be white and any not-a-person will be black and represented by zero. This will be exactly what you want for generating a mask to work with &lt;code&gt;CIBlendWithMask&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, create a &lt;code&gt;VNImageRequestHandler&lt;/code&gt; with the image to be processed. The handler class has a &lt;a href=&quot;https://developer.apple.com/documentation/vision/vnimagerequesthandler&quot;&gt;number of initializers&lt;/a&gt; for different types of data. In this example we will use &lt;code&gt;CGImage&lt;/code&gt;, but you could also start with &lt;code&gt;CIImage&lt;/code&gt;, &lt;code&gt;CVPixelbuffer&lt;/code&gt;, &lt;code&gt;Data&lt;/code&gt;, or others. You can also specify options for the handler. In this tutorial we will not specify any, but one of the options is to pass in a &lt;code&gt;CIContext&lt;/code&gt;. This can help with performance as you can tell your app to do all of the processing with a single context on the GPU using Metal.&lt;/p&gt;
&lt;pre class=&quot;astro-code github-dark&quot; tabindex=&quot;0&quot; data-language=&quot;swift&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;guard&lt;/span&gt;&lt;span&gt; let&lt;/span&gt;&lt;span&gt; originalCG &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; originalImage&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.cgImage &lt;/span&gt;&lt;span&gt;else&lt;/span&gt;&lt;span&gt; { &lt;/span&gt;&lt;span&gt;abort&lt;/span&gt;&lt;span&gt;() }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; handler &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; VNImageRequestHandler&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;cgImage&lt;/span&gt;&lt;span&gt;: originalCG)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;try?&lt;/span&gt;&lt;span&gt; handler.&lt;/span&gt;&lt;span&gt;perform&lt;/span&gt;&lt;span&gt;([segmentationRequest])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;guard&lt;/span&gt;&lt;span&gt; let&lt;/span&gt;&lt;span&gt; maskPixelBuffer &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  segmentationRequest.results&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;first&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.pixelBuffer &lt;/span&gt;&lt;span&gt;else&lt;/span&gt;&lt;span&gt; { &lt;/span&gt;&lt;span&gt;return&lt;/span&gt;&lt;span&gt; }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; maskImage &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; CGImage.&lt;/span&gt;&lt;span&gt;create&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;pixelBuffer&lt;/span&gt;&lt;span&gt;: maskPixelBuffer)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the code above, the &lt;code&gt;originalImage&lt;/code&gt; (which happens to be a UIImage) gets converted to a CGImage. Then we use the image to initialize a request handler. The &lt;code&gt;VNImageRequestHandler&lt;/code&gt; has a method &lt;code&gt;.perform&lt;/code&gt; which takes an array of all of the Vision requests you want to use to process the image. The &lt;code&gt;.perform&lt;/code&gt; method will not return until all of the requests in the array have been completed. Depending on how you want to structure your code, you can either provide a completion handler to use with each of the requests or just process the results in-line. In this example, we’ll process in-line.&lt;/p&gt;
&lt;p&gt;If the &lt;code&gt;segmentationRequest&lt;/code&gt; found any people in the image, the &lt;code&gt;results&lt;/code&gt; array will contain a &lt;code&gt;.pixelBuffer&lt;/code&gt;. Create the mask we need by converting the &lt;code&gt;pixelBuffer&lt;/code&gt; into a CGImage. To convert to a CGImage, the example uses some helper methods that &lt;a href=&quot;https://github.com/hollance/CoreMLHelpers&quot;&gt;Matthijs Hollemans published to GitHub&lt;/a&gt; specifically for working with Core ML inputs and outputs.&lt;/p&gt;
&lt;p&gt;Now that we have the mask image, use &lt;code&gt;CIFilter&lt;/code&gt; to compose the final image. It will take a few steps. First, resize the new background, mask, and original image to the same size. Then, blend the three images.&lt;/p&gt;
&lt;pre class=&quot;astro-code github-dark&quot; tabindex=&quot;0&quot; data-language=&quot;swift&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//Convert main image to a CIImage and get the size&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; mainImage &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; CIImage&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;cgImage&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;self&lt;/span&gt;&lt;span&gt;.originalImage&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;.cgImage&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; originalSize &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; mainImage.extent.&lt;/span&gt;&lt;span&gt;size&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//Convert the maskimage to CIImage and set the size&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//to be the same as the original&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;var&lt;/span&gt;&lt;span&gt; maskCI &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; CIImage&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;cgImage&lt;/span&gt;&lt;span&gt;: maskImage&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; scaleX &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; originalSize.width &lt;/span&gt;&lt;span&gt;/&lt;/span&gt;&lt;span&gt; maskCI.extent.width&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; scaleY &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; originalSize.height &lt;/span&gt;&lt;span&gt;/&lt;/span&gt;&lt;span&gt; maskCI.extent.height&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;maskCI &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; maskCI.&lt;/span&gt;&lt;span&gt;transformed&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;by&lt;/span&gt;&lt;span&gt;: .&lt;/span&gt;&lt;span&gt;init&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;scaleX&lt;/span&gt;&lt;span&gt;: scaleX, &lt;/span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;: scaleY))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//Convert the new background to a CIImage and set the size&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//to be the same as the original&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; backgroundUIImage &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; UIImage&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;named&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;&quot;starfield&quot;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;resized&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;to&lt;/span&gt;&lt;span&gt;: originalSize)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; background &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; CIImage&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;cgImage&lt;/span&gt;&lt;span&gt;: backgroundUIImage.cgImage&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//Use CIBlendWithMask to combine the three images&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; filter &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; CIFilter&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;name&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;&quot;CIBlendWithMask&quot;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;filter&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;setValue&lt;/span&gt;&lt;span&gt;(background, &lt;/span&gt;&lt;span&gt;forKey&lt;/span&gt;&lt;span&gt;: kCIInputBackgroundImageKey)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;filter&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;setValue&lt;/span&gt;&lt;span&gt;(mainImage, &lt;/span&gt;&lt;span&gt;forKey&lt;/span&gt;&lt;span&gt;: kCIInputImageKey)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;filter&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;setValue&lt;/span&gt;&lt;span&gt;(maskCI, &lt;/span&gt;&lt;span&gt;forKey&lt;/span&gt;&lt;span&gt;: kCIInputMaskImageKey)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;//Update the UI&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;self&lt;/span&gt;&lt;span&gt;.filteredImageView.&lt;/span&gt;&lt;span&gt;image&lt;/span&gt;&lt;span&gt; =&lt;/span&gt;&lt;span&gt; UIImage&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;ciImage&lt;/span&gt;&lt;span&gt;: filter&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;.outputImage&lt;/span&gt;&lt;span&gt;!&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The above code resizes the images to match the size of the original image and converts them to &lt;code&gt;CIImage&lt;/code&gt;. Many machine learning models commonly resize inputs and outputs. For example, the output of the &lt;code&gt;VNGeneratePersonSegmentationRequest&lt;/code&gt; using the demo image is 384x512.&lt;/p&gt;
&lt;p&gt;Both CIImage and UIImage formats describe an image but do not always have a bitmap representation. That is why each image gets converted to a CGImage first. Though this guarantees that the demo code will work, converting formats makes the code run slower. In your app, you should experiment with different methods to get your images into CIImage format for filtering. Also, the above code uses Matthijs’ &lt;code&gt;resized&lt;/code&gt; helper method to do some of the resizing.&lt;/p&gt;
&lt;p&gt;With everything resized, the &lt;a href=&quot;https://developer.apple.com/documentation/coreimage/ciblendwithmask&quot;&gt;CIBlendWithMask&lt;/a&gt; filter stitches the images together. In place of the black pixels in the mask, the final image will show the background – for white pixels, the foreground. Wherever the mask image has a gray pixel, the final image will blend background and foreground.&lt;/p&gt;
&lt;p&gt;Now let’s see how to use an additional &lt;code&gt;CoreML&lt;/code&gt; model to process images that don’t have a person as the main subject.&lt;/p&gt;
&lt;h2 id=&quot;choosing-a-model&quot;&gt;Choosing a Model&lt;/h2&gt;
&lt;p&gt;Apple provides a number of pre-built models for text and image processing. The DeepLab v3 Machine Learning model can perform segmentation requests on images that have subjects like dogs. You can download it from &lt;a href=&quot;https://developer.apple.com/machine-learning/models/&quot;&gt;Apple directly&lt;/a&gt; or also find it at the &lt;a href=&quot;https://github.com/tensorflow/models/tree/master/research/deeplab&quot;&gt;DeepLab repo on GitHub&lt;/a&gt;. Once you have downloaded the model, add it to your Xcode project the same as any other file. The DeepLab model will recognize people, the same as the &lt;code&gt;VNPersonSegmentationRequest&lt;/code&gt;, but also other objects. This does come at a cost in an increased filesize for your application. You may notice that Apple provides multiple versions of the model of different file sizes. They all perform the same task, but differ in how they represent the output and a few other things.&lt;/p&gt;
&lt;p&gt;Models on Apple’s website are packaged to work with Core ML and Xcode, so you can easily try a different model. The input and output method names will be the same. It is beyond the scope of this tutorial, yet Apple provides tutorials and example scripts on how to convert TensorFlow and other machine learning models to work with Core ML.&lt;/p&gt;
&lt;h3 id=&quot;core-ml-and-xcode&quot;&gt;Core ML and Xcode&lt;/h3&gt;
&lt;p&gt;When you are working with a Core ML model, Xcode provides some convenient tools. Access them by highlighting the name of the model in the File Navigator pane of Xcode. For instance, clicking “Preview” will let you test the model with your data.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;testing-screenshot&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 700px) 700px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;700&quot; height=&quot;412&quot; src=&quot;https://img.ly/_astro/testing-screenshot_Z2l2s2a.webp&quot; srcset=&quot;/_astro/testing-screenshot_8v5CS.webp 640w, /_astro/testing-screenshot_Z2l2s2a.webp 700w&quot;&gt;&lt;/p&gt;
&lt;p&gt;You can also use the “Predictions” tab to determine how the input needs to be formatted and what to expect for the output.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;predictions-screenshot&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 700px) 700px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;700&quot; height=&quot;355&quot; src=&quot;https://img.ly/_astro/predictions-screenshot_Z1Bf2h7.webp&quot; srcset=&quot;/_astro/predictions-screenshot_Zh7lHX.webp 640w, /_astro/predictions-screenshot_Z1Bf2h7.webp 700w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Here you can see that the DeepLabV3 model expects images to be 513 x 513 and will output a multidimensional array of integers that is 513 x 513 in size.&lt;/p&gt;
&lt;p&gt;Finally, on this screen, you can see an entry for the “Model Class” in the headers. Double-click on the name to jump into the Swift wrapper class for the model to see how to call the model in your code and work with the inputs and outputs.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;model-class&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 418px) 418px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;418&quot; height=&quot;85&quot; src=&quot;https://img.ly/_astro/model-class_1pbhcQ.webp&quot; srcset=&quot;/_astro/model-class_1pbhcQ.webp 418w&quot;&gt;&lt;/p&gt;
&lt;h3 id=&quot;the-deeplabv3-model&quot;&gt;The DeepLabV3 Model&lt;/h3&gt;
&lt;p&gt;The DeepLab model input will be a color image that is 513 x 513 pixels. The Vision framework will handle resizing the input, but you can provide options on how that resize should work. The DeepLabV3 model has been trained to recognize and segment these items:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;aeroplane&lt;/li&gt;
&lt;li&gt;bicycle&lt;/li&gt;
&lt;li&gt;bird&lt;/li&gt;
&lt;li&gt;boat&lt;/li&gt;
&lt;li&gt;bottle&lt;/li&gt;
&lt;li&gt;bus&lt;/li&gt;
&lt;li&gt;car&lt;/li&gt;
&lt;li&gt;cat&lt;/li&gt;
&lt;li&gt;chair&lt;/li&gt;
&lt;li&gt;cow&lt;/li&gt;
&lt;li&gt;dining table&lt;/li&gt;
&lt;li&gt;dog&lt;/li&gt;
&lt;li&gt;horse&lt;/li&gt;
&lt;li&gt;motorbike&lt;/li&gt;
&lt;li&gt;person&lt;/li&gt;
&lt;li&gt;potted plant&lt;/li&gt;
&lt;li&gt;sheep&lt;/li&gt;
&lt;li&gt;sofa&lt;/li&gt;
&lt;li&gt;train&lt;/li&gt;
&lt;li&gt;tv or monitor&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Anything that the model does not recognize, it will consider a &lt;em&gt;background&lt;/em&gt;. After performing the recognition, the model returns a two-dimensional 513 x 513 array. Each entry in the array corresponds to one pixel in the original 513 x 513 input image. For instance, every pixel in the original image that shows a dog will be represented in the output by a &lt;code&gt;12&lt;/code&gt; in the corresponding array entry. Any pixel that is not a recognized object will be given a &lt;code&gt;0&lt;/code&gt; in the output array to represent a &lt;em&gt;background&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;If the model recognizes multiple objects, there will be multiple numbers in the array. If that is not what you want, you need to make changes to the array before creating the mask. For example, using an image of a dog riding a horse, some entries in the array will be &lt;code&gt;12&lt;/code&gt;, others will be &lt;code&gt;13&lt;/code&gt;, and the rest will be &lt;code&gt;0&lt;/code&gt;. To filter out the horse, you need to loop through the array and change any &lt;code&gt;13&lt;/code&gt;s to &lt;code&gt;0&lt;/code&gt;s.&lt;/p&gt;
&lt;h2 id=&quot;segmentation-with-deeplab&quot;&gt;Segmentation with DeepLab&lt;/h2&gt;
&lt;p&gt;To use the DeepLab model in your code, you again use a request to the Vision framework but this time you can specify a model.&lt;/p&gt;
&lt;pre class=&quot;astro-code github-dark&quot; tabindex=&quot;0&quot; data-language=&quot;swift&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; config &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; MLModelConfiguration&lt;/span&gt;&lt;span&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;var&lt;/span&gt;&lt;span&gt; segmentationModel &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; try!&lt;/span&gt;&lt;span&gt; DeepLabV3&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;configuration&lt;/span&gt;&lt;span&gt;: config)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;if&lt;/span&gt;&lt;span&gt; let&lt;/span&gt;&lt;span&gt; visionModel &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; try?&lt;/span&gt;&lt;span&gt; VNCoreMLModel&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;for&lt;/span&gt;&lt;span&gt;: segmentationModel.model) {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  self&lt;/span&gt;&lt;span&gt;.request &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; VNCoreMLRequest&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;model&lt;/span&gt;&lt;span&gt;: visionModel)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  self&lt;/span&gt;&lt;span&gt;.request&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.imageCropAndScaleOption &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; .scaleFill&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the code above, we create an instance of the DeepLab Core ML object and then initialize a generic &lt;code&gt;VNCoreMLRequest&lt;/code&gt; with its model. The only option we set on the request is to tell it how to modify the image when it resizes it for input.&lt;/p&gt;
&lt;p&gt;Now the code is very similar to the first example.&lt;/p&gt;
&lt;pre class=&quot;astro-code github-dark&quot; tabindex=&quot;0&quot; data-language=&quot;swift&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; cgImage &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; originalImage&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.cgImage&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;let&lt;/span&gt;&lt;span&gt; handler &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; VNImageRequestHandler&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;cgImage&lt;/span&gt;&lt;span&gt;: cgImage, &lt;/span&gt;&lt;span&gt;options&lt;/span&gt;&lt;span&gt;: [&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;span&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;try?&lt;/span&gt;&lt;span&gt; handler.&lt;/span&gt;&lt;span&gt;perform&lt;/span&gt;&lt;span&gt;([request])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;if&lt;/span&gt;&lt;span&gt; let&lt;/span&gt;&lt;span&gt; observations &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; request.results &lt;/span&gt;&lt;span&gt;as?&lt;/span&gt;&lt;span&gt; [VNCoreMLFeatureValueObservation],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  let&lt;/span&gt;&lt;span&gt; segmentationmap &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; observations.&lt;/span&gt;&lt;span&gt;first&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;.featureValue.multiArrayValue {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  guard&lt;/span&gt;&lt;span&gt; let&lt;/span&gt;&lt;span&gt; maskUIImage &lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;span&gt; segmentationmap.&lt;/span&gt;&lt;span&gt;image&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;min&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;0.0&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;max&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;1.0&lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;span&gt;else&lt;/span&gt;&lt;span&gt; { &lt;/span&gt;&lt;span&gt;return&lt;/span&gt;&lt;span&gt; }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  applyBackgroundMask&lt;/span&gt;&lt;span&gt;(maskUIImage)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because we are using a generic &lt;code&gt;VNCoreMLRequest&lt;/code&gt; we have to cast the results as &lt;code&gt;VNCoreMLFeatureValueObservation&lt;/code&gt;. The DeepLab model returns a &lt;code&gt;.multiArrayValue&lt;/code&gt; instead of a &lt;code&gt;.pixelBuffer&lt;/code&gt; so we will again rely on the helper methods to convert it to an image. Now that the mask image has been created, applying the background is the same as in the original example.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;dog-in-space-1&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 400px) 400px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;400&quot; height=&quot;519&quot; src=&quot;https://img.ly/_astro/dog-in-space-1_pxayi.webp&quot; srcset=&quot;/_astro/dog-in-space-1_pxayi.webp 400w&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;going-further&quot;&gt;Going Further&lt;/h2&gt;
&lt;p&gt;This tutorial focused on still images, yet both the standard Vision segmentation requests and DeepLabV3 allow processing video input as they can work with &lt;code&gt;CVPixelBuffer&lt;/code&gt; and &lt;code&gt;VNSequenceRequestHandler&lt;/code&gt;.&lt;br&gt;
The rest of the process will be almost identical to our example. You will need to finetune the performance, or else you will experience frame drops.&lt;/p&gt;
&lt;p&gt;Apple provides another method for identifying the background and primary subject in a still image: Portrait mode. When a device renders Portrait mode, it takes pictures with all cameras on the device and stitches them together. This way, the depth of field can change, and you can adjust what items are in focus or blurred.&lt;/p&gt;
&lt;p&gt;If the DeepLab model does not cover your subject matter, you can train the DeepLabV3 model to recognize other objects using your own data.&lt;/p&gt;
&lt;p&gt;However, you may consider using SDKs such as &lt;a href=&quot;https://img.ly/products/photo-sdk/&quot;&gt;PE.SDK&lt;/a&gt; and &lt;a href=&quot;https://img.ly/products/video-sdk/&quot;&gt;VE.SDK&lt;/a&gt;. Enabling background removal for your users is easy — all it takes is a few lines in your configuration as described in the &lt;a href=&quot;https://img.ly/docs/pesdk/ios/guides/background-removal/?utm_source=imgly&amp;#x26;utm_medium=blog&amp;#x26;utm_campaign=howtos&quot;&gt;official PE.SDK documentation for iOS&lt;/a&gt; and &lt;a href=&quot;https://img.ly/docs/pesdk/android/guides/background-removal/?utm_source=imgly&amp;#x26;utm_medium=blog&amp;#x26;utm_campaign=howtos&quot;&gt;Android&lt;/a&gt;, which adds a control to the photo editor to toggle the setting.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;background-removal-photo-editor-app-pe-sdk-1&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 277px) 277px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;277&quot; height=&quot;600&quot; src=&quot;https://img.ly/_astro/background-removal-photo-editor-app-pe-sdk-1_21B1QF.webp&quot; srcset=&quot;/_astro/background-removal-photo-editor-app-pe-sdk-1_21B1QF.webp 277w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Integrate a fully customizable photo editor with Background Removal into your app with PE.SDK.&lt;/p&gt;
&lt;p&gt;The latest &lt;a href=&quot;https://img.ly/blog/photo-editor-video-editor-sdk-v_10-11-release-notes/?utm_source=imgly&amp;#x26;utm_medium=blog&amp;#x26;utm_campaign=howtos&quot;&gt;VE.SDK release&lt;/a&gt; introduced Background Removal for stickers. This feature recognizes people in pictures and removes the background with one tap – no need for manual outlines or masks.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;background-removal-photo-editor-app-pe-sdk-video&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 277px) 277px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;277&quot; height=&quot;600&quot; src=&quot;https://img.ly/_astro/background-removal-photo-editor-app-pe-sdk-video_1flT3l.webp&quot; srcset=&quot;/_astro/background-removal-photo-editor-app-pe-sdk-video_1flT3l.webp 277w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Let your users create beautiful visuals with Sticker Background Removal.&lt;/p&gt;
&lt;p&gt;This feature is available on Android and iOS 15.0 and higher only. See the official documentation for Sticker Background Removal on &lt;a href=&quot;https://img.ly/docs/vesdk/android/guides/stickers/?utm_source=imgly&amp;#x26;utm_medium=blog&amp;#x26;utm_campaign=releasenotes#background-removal&quot;&gt;Android&lt;/a&gt; or &lt;a href=&quot;https://img.ly/docs/vesdk/ios/guides/stickers/?utm_source=imgly&amp;#x26;utm_medium=blog&amp;#x26;utm_campaign=releasenotes#background-removal&quot;&gt;iOS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Using an SDK will simplify and accelerate your app development, as you can save time and resources, and focus on the growth and innovation of your application instead.&lt;/p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;In this tutorial, you saw how to use Vision and Core ML to segment an image of a person and remove the background. You also saw how to use DeepLab to work with images of non-persons.&lt;/p&gt;
&lt;p&gt;Thanks for reading! We hope that you found this tutorial helpful. Feel free to reach out on &lt;a href=&quot;https://twitter.com/imgly&quot;&gt;Twitter&lt;/a&gt; with any questions, comments, or suggestions.&lt;/p&gt;</content:encoded><dc:creator>Walter</dc:creator><media:content url="https://blog.img.ly/2022/06/background-remove-removal-editor-app-development.png" medium="image"/><category>How-To</category><category>Background Removal</category><category>Machine Learning</category><category>Photo Editing</category><category>App Development</category><category>iOS App Development</category><category>Core ML</category><category>Tutorial</category></item></channel></rss>