Photography – IMG.LY Blog

On Magic Colors

Pascal — Wed, 17 Jul 2019 00:00:00 GMT

In this article, I will show and illuminate some pictures with the most magic colors that I have encountered as a photographer—these kind of colors that stick in your mind forever.

Apart from my interest in magic colors as a photographer, I am also interested in magic colors as a researcher and developer at img.ly where we push the quality of photo editing in our PhotoEditor SDK further. Therefore, this article will be completed by a follow-up article where I will show how to use post-production to gently push colors over the edge into the realm of magic colors.

Finding Magic Colors

The best way to capture magic colors is — ‘simply’ — to be at the right place at the right time. Both place and time are important but often not equally important. I will show one example where the place was more important than the time and one example where it was the other way round.

The magic colors from my first example were hidden in the intense, overwhelming, colorful, and loud city of Marrakech. There you can find a quiet place of refuge: Jardin Majorelle, a beautiful garden created by the French painter Jacques Majorelle.

While it is a very relaxing place for your mind and ears, it is a visually stunning masterpiece. The path through the garden starts with the clean visual appearance of a small bamboo forest with all the green and turquoise tones. It then runs past all kinds of palm trees, cacti, small ponds, and fountains. Finally, it leads to the center of the garden where you are blown away by a striking blue house, Majorelle’s former atelier, overshooting the intense blue African sky.

The blue is the famous Majorelle blue: an unreal looking cobalt blue, which is almost too strong to look at.

In the garden, the blue is elaborately complemented with pastel yellow, beige, and turquoise tones, which add a nice contrast and make the blue shine even more.

Surprisingly, around noon was the best time for this photo as the shown part of the building was in the shades under a roof and only harsh light yielded out all the colors.

The magic colors from my second example could be found on a stroll at some beach in the Netherlands.

After a rainy day, suddenly, the sun broke through the clouds when it was already deep down on the horizon. Then the magic happened, the light passed through the right amount of clouds and haze so that its bluish rays were scattered away and its tone shifted towards orange. All surrounding clouds lit up and the sky began to glow. As a stunning result, everything was bathed in amazing golden light.

These magic colors only appeared for three minutes during the golden hour. As the clouds moved back in front of the sun the light lost its golden shine. This moment could have easily been missed if I would have stayed at home just because of the rain.

Creating Magic Colors

The next possible way to achieve magic colors is to create them by yourself.

The magic colors from my third example were created by photographing the fluorescent mineral wernerite under ultraviolet (UV) lighting. I made the lighting with a DIY ultraviolet lamp consisting of a cheap spotlight housing, a cheap but powerful fluorescent tube for water disinfection, and an expensive filter glass. The filter glass only let the desired UV wavelength pass. Without such a filter glass, the colors would appear washed-out as the fluorescent tube not only outputs invisible UV light but also large amounts of visible light.

The mineral absorbs the UV light and emits visible light instead. The resulting colors looked unreal as the usually boring looking stone now glows all over with a huge intensity. As a bonus, the blue matches that of Majorelle very closely.

The magic colors from my final example were created by photographing a feather from a blue-and-yellow macaw in my home studio with a simple bidirectional light setup. I used a 50 mm Zeiss Pancolar lens from the 80ies with a wide-open aperture of f/1.8, which adds a nice glow to the yellows. The feather was lit from above and below by two flashlights modified with small softboxes. The light was shaded from the background and the shutter speed was set fast enough so that the studio just disappeared into black.

To keep the color magic on the yellow part of the feather, 20 images with different focuses on the blue part were taken with a wide-open aperture and then stacked to a single image. This created the fast drop-off in sharpness, which would have been impossible to create with a single shot.

Interestingly, these magic colors could never be seen with the naked eye as they only appeared for approximately 80 μs while the flashes were triggered. Then, they were formed to their final appearance by the lens and the post-processing.

What Makes Colors Magic?

To sum up, magic colors can be found in many places or even created by yourself. For the four examples in this article, different aspects were important: the right location for the picture of the blue building in Marrakech, the right time as the sun came through the clouds for the golden sea, the subject and the equipment for the fluorescent colors, and the quality of the light and technical realization for the glowing feather.

While writing this article I asked myself: What exactly does make colors magic? Is it the saturation, vividness, shininess, or glow? Is it the texture of the object, the quality of the light, or the surrounding colors?

All of this is important. However, perhaps the most important part to consider colors as magic is that the colors need to look slightly unreal — but only a little bit. What do you think?

In an upcoming article, I will tell how these thoughts on magic colors influenced the color editing process in our PhotoEditor SDK and share some details on how to gently push colors towards magic colors in post.

Elbi transforms creative user engagement to help people worldwide

Felix — Wed, 24 Oct 2018 00:00:00 GMT

Elbi Digital:

Founded 2014
Based in the UK
Social enterprise that helps people to connect with charity organizations and do good with micro-donations and content creation

Requirements:

A robust hybrid image editing solution at the core of the app for easy content creation

Results: Easy and seamless content creation with Photos, Doodles, and motivational messages

The Elbi app aims to connect people with charity organizations and beneficiaries alike and facilitate the social engagement and donation process. Users can support charity organizations via a “Love Button” with micro-donations and get creative for selected good causes. If users don’t donate directly, they can help fundraise money with their content. To enable their users to produce motivating and exceptional content, Elbi had to provide them with a powerful creative tool. “Our app has photo-editing at the very core of its experience and the editor has been invaluable in making that vision happen. There are no other tools on the market up to the quality of PhotoEditor SDK, so it was an easy choice to have it be at the heart of Elbi,” says Toby Green former lead developer at Elbi.

For the previous version of the Elbi app, the team built their own creative tool that worked with iOS only. When they decided to reforge their app, the team at Elbi wanted to follow a different approach. “When we rebuilt the app we wanted it to be a hybrid solution and not only iOS like the previous one, so we abandoned our existing solution and were looking for an existing library. I spent a lot of time researching photo editing tools and did a feasibility study on all the available editors and PhotoEditor SDK came top of the list” says Toby Green.

With the PhotoEditor SDK the developers at Elbi were able to swiftly rebuild their app “We would never have been able to do this so quickly without the editor” says Green, “the documentation was great so it was very easy to get it all up and running and the support was incredibly patient and helpful. They always responded quickly and were happy to spend time resolving even difficult-to-track-down issues”. The SDK seamlessly integrates into the app providing an intuitive solution for Elbi’s content creation section. “It really looks like it’s part of the product,” says Toby Green.

“The product is a 10/10; it is the best hybrid tool available. And in general, I would say one of the finest teams I’ve ever worked with; they’ve been very supportive with regards to custom support, adding functionalities and discussing integration and they helped us getting everything working 100%,” concludes Toby Green. With the PhotoEditor SDK Elbi facilitates the charitable engagement of its users and helps people all around the globe to do good on the go.

Case Study: W | Bear & PhotoEditor SDK

Felix — Mon, 17 Sep 2018 00:00:00 GMT

W | Bear is the first global photo and video blogging social community for gay men. The free app allows its users to post pictures and videos to a personal feed, express their creativity and connect with like-minded bears. The blogging community provides a safe place where people can build and shape a social environment, chat, flirt and date. W | Bear was created and developed by the private and public cloud solution provider gNetLabs. We sat together with Xavier Nicolle, CEO at gNetLabs to talk about the vision behind W | Bear, time to market and providing users with tailored assets to amplify engagement.

“W | Bear is aimed at the bear subculture of the LGBT community. People express themselves, their story and their life through photos and videos. Our users are very happy with the app. 18 months after the release we reached around 150.000 users worldwide, which is beyond our expectation and it’s getting faster and faster. We get a lot of positive reactions and amazing feedback from users that tell us, that our application helped them to gain more self-esteem and to better integrate into the community,” Xavier Nicolle says.

As pictures were going to be a vital part of the Instagram like photo blogging application (to this day, more than 2 million pictures and videos have been posted on the app), the folks at gNetLabs started exploring options as their CEO explains: “Time to market was critical for us, and we were kind of in a rush to develop the app. From our background as a hosting company, we’d usually develop and host everything ourselves, however, due to time problems, that wasn’t a viable option regarding the photo editing functionalities. So, we jumped into PhotoEditor SDK to save time, really liked it and eventually kept it. The features and pricing perfectly fit our needs, the integration was straightforward, and the SDK is of great quality and open enough for fine-tuning. On top of that, unlike Adobe Creative SDK for example, it is not linked to any other service. And that is exactly what we were looking for, a piece of software that we could simply plug in that isn’t linked to a service that we don’t need.”

To offer the best experience possible with W | Bear, the folks at gNetLabs pay close attention to how people engage with the app and its features. “Our users mostly work with stickers, frames, and overlays when editing their pictures,” Xavier Nicolle says, “so, bearing that in mind, we want to provide more content and assets that are tailored for the community so our users will hopefully make more use of the editor. Currently, we’re working together with artists that develop stickers that we’re going to introduce using the SDK.”

But it doesn’t stop with W | Bear, as Xavier Nicolle describes the vision behind the community: “W | Bear is the first piece of a larger social network we want to develop that is dedicated and aimed at the whole LGBT community. We’re going to release more applications using the same technology as W | Bear to address more subcultures. On top of that, there are going to be corresponding websites for each app, so the user will be able to find the same functionalities and tools across all platforms.”

**Thanks for reading! To stay in the loop, subscribe to our Newsletter.**

From 2D to 3D Photo Editing

Malte — Tue, 26 Jun 2018 00:00:00 GMT

Last November, we released Portrait, an iOS app that helps create amazing, stylized selfies and portraits instantly.

With over a million downloads and many more portrait images created, we feel that the idea and vision of Portrait was more than confirmed. The central component of Portrait is an AI that is trained to clip portraits from the background, a technique we are eager to further improve and refine. In fact, Portrait helped us to explore a novel technique for image editing, as we were able to leverage a new powerful data set in photography: depth data.

We began feeding our AI models with the depth data from the iPhone Xs TrueDepth camera and had one goal in mind: to infer depth information for portrait imagery, or bringing three-dimensionality into a two-dimensional photo. Along the way, we created a new architecture concept, that allows performance and memory improvements through modularizing and reusing neural networks.

In the following article, we’d like to present some of our results along with the insights we made.

The New Cool: Depth Data

The usage of depth data in image editing initially became available with the iPhone 7 Plus when Apple introduced ‘Portrait Mode’. By combining a depth map and face detection, the devices are able to blur our distant objects and backgrounds, mimicking a ‘bokeh’ or depth of field effect, which is well known from DSLRs cameras.

While the actual implementation varies, all major manufacturers nowadays offer a similar mode by incorporating depth data into their image editing pipeline. This is either achieved through the conventional dual or even triple camera on the back of a phone, dual-pixel offset calculations combined with machine learning or dedicated sensors like Apples TrueDepth module. In fact, for a modern flagship phone, some sort of depth based portrait mode is almost a commodity.

From a developers perspective, things look a little different: Depth data became a first-class citizen throughout the iOS APIs in iOS 11 and such data is now easily accessible on supported devices. Android users obviously have access to depth data as well, either by utilizing multiple cameras or by Googles dual-pixel based machine learning approach, seen in the newer Pixel 2 phones. But contrary to iOS, Android doesn’t yet offer a common developer interface to access such data. In fact, developers aren’t able to access any of the depth information Google or other manufacturers collected within their camera apps. This means developers would either need to implement the algorithm to infer depth from two images themselves or try to rebuild Googles sophisticated machine learning powered system. Neither of these options is practical and probably not even possible given the usual limitations to camera APIs.

So although being quite common, depth data isn’t as easily accessible for developers as one might think. Right now you’re out of luck on Android, dependent on hardware on iOS and even then limited to the 1.000$ flagship if you’re interested in depth for images taken with the front camera. And last but not least, across all devices and platforms, there is no way for you to generate a depth map for an existing image.

Deep Possibilities

Despite the restrictions, we decided to first explore the power of depth for image editing, as depth data provides many new exciting creative possibilities:

If we have a depth map for a given image, our editing possibilities are increased dramatically. Instead of a 2D image, a flat plane of color values, we suddenly have a depth value for each individual pixel, which translates into a 3D landscape highlighting distinct objects in the foreground and a clear indication of background.

Depth-aware Editing

Instead of relying on color and texture differences to determine fore- and background, one could literally edit these regions individually. This allows adjustments like darkening the background while lightening the foreground, which makes portraits ‘pop’. If we’d be able to generate a high-resolution depth map, we could easily replace the AI currently used in Portrait and would allow even more sophisticated creatives. Thanks to the new APIs, there are already some awesome iOS apps available that specialize in depth based editing. One famous example is Darkroom with their “depth-aware filters”:

Depth of Field Effects

As a depth of field or bokeh effect was the initial motivation for Apple to incorporate depth sensing technology, it is one of the most obvious applications. Depth is crucial for such an effect, as the amount of bluriness of any given region directly depends on its distance to the camera lens.

3D Asset Placement

As mentioned above, a depth map gives us a 3D understanding of the image. We’re able to tell if subject A is positioned in front of or behind subject B. This allows placement of digital assets like stickers or text in a ‘depth-aware’ fashion, but could also be used to apply ‘intelligent’ depth of field, e.g. a bokeh effect that ensures all faces are in focus.

Enter Deep Learning

Motivated by the possibilities enabled by depth maps, we were wondering if we could bring this magic to any type of portrait image. We consulted existing literature on depth inference and found various papers¹ and articles on the topic, some of which even presented results that seemed sufficient for our use cases. In our case, we didn’t need accurate, as in ‘this pixel is 30cm in front of the camera’, results, but we were only interested in getting the general distance relations correct. For us, knowing that region A was slightly behind but definitely way in front of region B was enough to generate a visually pleasing effect and by constraining our domain to portrait imagery, we were able to further reduce the tasks complexity.

Given our experience with deep learning and our current focus on introducing machine learning powered features to the PhotoEditor SDK, we immediately decided to tackle the new challenge with deep learning or more specifically convolutional neural networks. Having a huge dataset of image and depth map pairs available, made this choice even easier. We stuck to a system similar to our previous segmentation model but decided to put more emphasis on allowing the reuse of individual parts, as this would come in handy when adding additional features in the future. To achieve this, we created a new modularized neural network approach named Hydra, which will be presented in an upcoming blog post.

During development, we followed our tried and tested workflow of starting with a complex custom model, which is then tweaked and refined to match our performance requirements while maintaining the prediction quality we need. Once that was done, we had a fast and small model, trained on thousands of iPhone front camera selfies and capable of inferring high fidelity depth maps from a plain RGB image in under a second.

The Prototype

After creating a small model capable of inferring depth maps for any given portrait image, we immediately wanted to evaluate its performance in a ‘real-world’ environment. We decided to build a prototype that applies a depth of field effect to a portrait image, by using the model and its outputs. With our long-term goal of deploying the model to iOS, Android and the web in mind, we built the prototype using TensorFlowJS to explore this newly released library. Our browser demo consists of a minimal ‘Hydra’ implementation with individual modules, one for extracting features and one for the actual depth inference, which can both be executed individually.

While being optimized for performance and memory footprint, the trained weights of the model still add up to ~18MB, which we will improve by further fine-tuning or even applying pruning or quantization. Once the models are loaded, all further processing happens on the device though, so you may try out all the samples without worrying about your data plan.

Results

Seeing our vision come to life was quite a stunning experience. Suddenly our browser was able to perform a complex depth of field effect without the need for special hardware, manual annotations or anything else apart from our image. And the best part was manually moving the focal plane through the image, either by sliding or tapping on different regions. Although being trained on ‘just’ selfies the model handles turned heads, silhouettes and multiple people pretty well and isn’t as restricted to its domain as we initially expected.

And while our initial prototype is still weighing in at ~18MB, we’re certain to slim that down further in order to use the model in production. Performance wise we were very impressed with the TensorFlowJS inference speed. Even though everything is happening on the client side and is therefore dependent on the clients hardware, we saw inference speed below one second right of the bat and those greatly improved after the initial run, as the resources were already allocated. While not being immediately helpful for the depth inference part, this allowed us to further confirm our theory behind Hydra: Re-running inference once the necessary resources on the machine have been allocated greatly increases performance and might even allow real-time performance after an initial setup-time.

To summarise, we’re definitely eager to further explore the use of depth data in image editing and think we have found a way to overcome the access restrictions on different platforms and hardware with our custom model. Combined with our new Hydra approach we can see lots of potential features that will delight both our users and customers and we will keep you updated right here.

(1)
The papers we extracted most knowledge for our use case from were:
“Depth Map Prediction from a Single Image using a Multi-Scale Deep Network” (arXiv)
“Deeper Depth Prediction with Fully Convolutional Residual Networks” (arXiv)

**Thanks for reading! To stay in the loop, subscribe to our Newsletter.**

The IMG.LY Photo Roll

Felix — Thu, 21 Sep 2017 00:00:00 GMT

After today, we at IMG.LY are going to divert some of our energy to a new endeavour, the img.ly Photo Roll. We feel deeply rooted in the open-source movement and we know that the internet and most of the tools that we use today wouldn’t be nearly the same if it wasn’t for the awesome and creative work of countless people that shared their assets for free. Still today we heavily rely on open-source technology and creative work to give us guidance and inspire us to create new and unseen things.

Working on our product PhotoEditor SDK for the last two years gave us a lot of useful and valuable insights about photography and the various possibilities to create exceptional visual content. As some of us are enthusiastic photographers, it only seems logical to join the community of photographers and creative folks that enrich the internet with extraordinary and free content every single day. Since we have always gratefully used the work of other people, we would like to give something back to the community. Therefore, we are going to start sharing pictures we took around the globe via our Unsplash and Instagram account. Please feel free to download, use, modify and share them at your will. The photos that we’re going to post over the following weeks and months will also be available in our photo roll that ships with the PhotoEditor SDK.

We certainly hope that you’re going to make use of them. And we’d be happy to hear from you what you created with our pictures, and we’d love to see them in a new guise. After all, their real value only shows if someone is inspired by and can get creative with them.

Thanks to all the people that contribute to our initiative: Malte Baumann, Tommi Gutscher, Niklas Priddat, Cem Selcuk and Eray Basar. Below we have already compiled a few pictures that we are going to post over the next few days. Stay tuned.

Cheers!
Your friends at IMG.LY

Thanks for reading! To stay in the loop, subscribe to our Newsletter.

The Photograph that Will Not Vanish.

Felix — Mon, 22 May 2017 00:00:00 GMT

The HP Sprocket paves the way for a whole new way of experiencing mobile photography. While pictures nowadays are either ephemeral or get stored away in digital vaults, HP breathes new life into material photographs by making mobile printing available to anyone. For this piece, we sat together with Carem Pereira, SCRUM Master for the sprocket Android team in Brazil, to recap the development of HP’s portable gem.

The HP Sprocket is a beautifully small and portable printer that instantly prints two by three-inch color photos — no cartridge or ink required. Peel the back of the photos and you can even stick them onto anything you like. “Our message is that printing can be easy and fun on the go. Snap, print, and play. You can take the sprocket anywhere, easily print photos on the spot and share them with your friends” says Carem Pereira. But creating a straightforward printing experience was only half the battle for the sprocket team, as Pereira explains: “From the first day on, editing features were planned to be a central part of the sprocket’s core experience. We wanted to give our users the ability to personalize and customize their snapshots before printing or sharing them.”

“The PhotoEditor SDK absolutely saved us a lot of time.”

Consequently, HP wanted to incorporate these essential features into their free sprocket companion app that connects the user’s mobile device to the printer via Bluetooth and allows for the management and printing of the pictures. “We’ve been working on a very tight schedule and had only three months until the release of our first MVP mid-September 2016”, Pereira explains, “in the beginning, we wanted to implement the editing features ourselves, but by the time we started the estimates, we realized that we’d eventually not be able to meet our goals. So, we were looking for a third-party solution.”

“One of our team members was already working with the SDK for another project, the HP Print Bot,” Pereira continues, “so, we compared PhotoEditor SDK with other solutions and found that it would be the best fit for us since it provides all the features that are crucial for our use case and we were already familiar with it. Also, it was of great importance to us that the look of the editor in sprocket matched the rest of the app and we saw that it would be very easy to accomplish that with the PhotoEditor SDK.”

The PhotoEditor SDK also leaves full control over content assets like stickers, fonts and filters, a feature critical to the HP team: “We want to stay relevant to our customers and one of the examples where we accomplish that is with our assets, like Stickers and Frames that are tailored for specific times and holidays. We have releases every two weeks that contain a new set of assets,” Pereira explains.

“It’s amazing how just a little gesture like a photograph can make such a difference in someone’s life!”

The HP Sprocket became an instant success. “The sprocket continues to be one of the great highlights for HP in consumer printing this year. The customers love using the printer, and we are already in millions of prints, we even sold out worldwide during the holidays last year, which was a great surprise for us,” says Carem Pereira.

But it’s not only sales figures that define its impact, as the story of Dom Russell and Seb Trevaskis exemplifies: The two Physiotherapy graduates brought the HP Sprocket to a Vietnamese orphanage where they educated, designed and implemented individualized rehabilitation programs for children who suffered from mental and physical disabilities due to the Vietnam War herbicide Agent Orange. The sprocket brought happiness to the orphanage and created everlasting memories for the children, as Dom Russell explains: “The sprocket was awesome, and the kids loved it! They would stick the photos on all their favorite possessions. Some would even just stare at them for hours as it may have been the first time they’d seen a physical copy of themselves. I think those photos will stay with them for life seeing the way they reacted when it got printed and the way they treated them as well. It’s amazing how just a little gesture like a photograph can make such a difference in someone’s life!”

Needless to say that sprocket was also an instant hit at our office. Playing around with the app that contains our editor was truly amazing and really let us grasp its potential. Using the sprocket on many different occasions gave us not only a hands-on experience of what our SDK is capable of but also of what needs further improvement or even what’s missing. Meanwhile our office is plastered with sprocket prints.

“PhotoEditor SDK was an essential asset for making things work with a top-notch standard.”

“We continue to get great feedback from our users on how straightforward and easy our experience is. And again, easy, that is the key,” Pereira says, “the PhotoEditor SDK was a vital and essential asset for making things work on time and with a top-notch standard. That was paramount to us,” Carem Pereira concludes. We couldn’t be happier about this.

Thanks for reading! To stay in the loop, subscribe to our Newsletter.

Deep Learning for Photo Editing

Malte — Thu, 20 Apr 2017 00:00:00 GMT

Deep learning, a subfield of machine learning, has become one of the most known areas in the ongoing AI hype. Having led to many important publications and impressive results, it is applied to dozens of different scenarios and has already yielded interesting results like human-like speech generation, high accuracy object detection, advanced machine translation, super resolution and many more.

There is a steady flow of papers and publications that describe the latest advances in network design, compare existing architectures or describe unseen approaches leading to even better results than the current state-of-the-art. At the same time more and more companies and developers jump on the deep learning bandwagon and deploy the ideas and architectures to real world production systems.

This article describes our approach to applying deep learning to our image editing product, the struggle we had with finding the right architecture and the experiences we made while developing a system that can be deployed to mobile devices.

Our vision

At 9elements, we’ve had various AI topics on our radar for quite some time now. With deep learning, we finally found a tremendous opportunity for our product, the Photoeditor SDK: We believe AI-based algorithms could be the ideal approach to boost our users creative output and simplify complex design tasks.

Given the hype and results, we decided to dip our toes into deep learning, which quickly lead to some research regarding the most common challenges in interactive image editing. We quickly surfaced image segmentation as a major challenge that could be solved using deep learning and started investigating further.

If you have ever tried to select a distinctive region in a picture, say your best friend on the beach or your cute pet, you know the struggle of carefully moving your cursor along the object’s outer bounds until you eventually miss a part or accidentally select something that doesn’t belong to the object. Professional image editing tools can be quite helpful in accomplishing such tasks, but on the one hand, they aren’t available on your mobile device, where you take and publish the images, and on the other hand, can be quite expensive and usually require some hands-on time, before you can produce anything usable.

Our goal was to finally remove the hassle from image clipping. We wanted to reduce the required user interaction to a minimum while offering an intuitive solution that doesn’t require any manuals or online courses. On top of that, as we provide native SDKs for web, iOS, and Android, the solution had to be deployable to all of these systems without relying on a powerful backend or being limited to certain features.

Having formulated our rather ambitious goals, we started our journey by looking into the most common research papers and classic techniques for image segmentation. We then focused on the deep learning part and quickly had an idea on how to design our approach.

Our journey

Image segmentation, the process of classifying each pixel in a picture to be rather fore- or background, is a popular research field and still perceived as quite challenging due to the complicated nature of the task. We, humans, are extremely well trained at perceiving scenes, identifying objects and making logical assumptions based on the visual input we receive.

For a long time, all approaches were based on colors, edges, and contrast and relied heavily on fine-tuned parameters, which had to be adjusted to every new scenario. That changed in 2012 when Krizhevsky et al. presented astonishing object classification results on the ImageNet benchmark using a neural network. Suddenly a system was able to classify objects with unprecedented accuracy and no need for any human fine-tuning. The neural network was ‘just’ trained on the dataset by seeing images combined with their corresponding labels and adjusting its internal representation until it couldn’t learn any further.

As we had already decided on using deep learning for our task, using a neural net clearly was our way to go. We started by examining the existing solutions and approaches, created our first prototype based on our findings and refined our approach and implementation until everything met our expectations.

Scene Labeling

The first approaches we examined focused on segmenting the whole image. This is a common task called scene labeling or semantic labeling, because it allows robots and other systems to understand a scene. The goal is to classify each pixel in an image to a particular object category. An example could be a self-driving car that searches for the road and tries to determine whether any pedestrians are crossing the street. Such a car would try to classify each pixel as road, pedestrian, tree, traffic sign, etc.:

While offering lots of possibilities, the existing solutions were lacking the desired accuracy we needed to provide visually pleasing image segmentations. For a self-driving car, it doesn’t matter if the ‘person’ region for some pedestrian accurately covers the person’s outlines. However, for us it does.

To overcome these issues we experimented with post processing techniques that used the segmentations we found as a base for further optimisations. This lead to our first approach where we would initially segment the entire image using a convolutional neural network, offer the found regions as selectable regions to a user and then try to refine the user’s selection using conventional image segmentations to find the best possible mask.

While already yielding some useful results the system did not quite match our requirements. If the initial segmentation was too coarse or off in critical regions, the user could never select an area that would lead to his desired segmentation.

Image segmentation based on user inputs

We went back to the drawing board and searched for other approaches that would fit our use case. It didn’t take long, and we stumbled upon Deep Interactive Object Selection, a paper that presents an interactive system which creates image segmentations based on user clicks. It looked like a good fit for our requirements, and we updated our existing system to generate fake user inputs and train on combinations of these inputs and images.

To train the net, we used the publicly available COCO dataset, which contains around 300.000 images with more than 2 million annotated object instances. To handle the amount of data, we limited our training data to a subset of the full dataset. This subset was made up of images that contain objects from certain categories and cover a minimum area within the image. As we generated the inputs artificially by adding clicks on the object mask, we could generate as many training data from the COCO subset as we wanted. After some experiments, we settled for three different strategies to create user inputs and trained the net with roughly 300.000 training records.

The masks generated by the updated system were quite impressive already. The neural net could infer which object the user wanted to mask in the image, just by looking at raw pixel data and the user’s clicks on the object. Happy with the first results, we tried to tackle the next hurdle. Before diving deeper into optimizing the neural net, which is a rather error prone process and consumes lots of time, we wanted to deploy the net to a mobile device. We wanted to make sure that such a tool is usable on any device and the performance would match our expectations.

Neural nets on mobile devices

Neural nets are sets of operations, executed in a specific order and based on millions of parameters. Therefore one “run” of such a net requires a lot of computation power, as millions of calculations have to be carried out. At the same time, the millions of parameters need to be deployed, as they represent the model or the representation the neural net has learned during training. So, to deploy our neural net, we had to solve these two requirements on an iPhone.

The first requirement, computing power, was thankfully solved by Apple. With the latest iOS version a specialised framework, called Metal Performance Shaders, was introduced. It offers the all required operations and is tailored to run these on the phones GPU, which is fast and efficient. To execute our net using the framework we had to translate our TensorFlow network code to Swift and rebuild the net’s architecture using Metal Performance Shader operations. Sadly Apple only supports a subset of todays common neural network operations, so we were forced to write some shader code to reconstruct the full network.

The second requirement, extracting the trained parameters and deploying them to the device was much easier. We just had to restore our previously trained model from a TensorFlow checkpoint, write all trained variables into a file and deploy this file with our iOS app. When needed, the iOS app would load the file into memory, and our network implementation would use the given parameters to run an inference pass.

Having met the two requirements, our network worked fine on an iPhone. We added the postprocessing operations and were able to segment images by a single tap without the need for a backend or any network communication. But there were some caveats.

While our neural net was a very common and widely used network, it was huge regarding the trainable variables. A trained model contains ~134 million parameters, which translates to about half a gigabyte of data that needs to be deployed with the app. This was obviously a showstopper for a mobile image editing app, as we couldn’t justify a 500MB download just to be able to segment images with your finger.

Furthermore, the results were still very coarse. If your colleague waved his arms in an image, the net usually could easily detect his torso, head and maybe his legs, but almost never the arms or hands. Fixing this using our postprocessing algorithms wasn’t that much of an option as it would have required lots of computing power and why bother using a neural net with millions of parameters if we fall back to conventional image processing techniques anyway?

So all in all, we had already learned a lot: Our approach of processing user inputs combined with raw image data as neural net input led to usable outputs, although quite coarse. Deploying such a net to mobile devices was possible, and the performance was good enough for using it in an interactive tool. The next step was to optimize the system to fix the parameter size and get finer results.

Combining SqueezeNet and SharpMask

We decided to tackle the network size first, as laying a proper foundation for optimizing the coarseness seemed like a sane thing to do. When looking for small nets with few parameters and fast inference its hard not to stumble across the SqueezeNet architecture by Iandola et al. which was published in November 2016. It met our use case, didn’t use any exotic operations that would be hard to implement on mobile and the results looked promising, so we removed the original network from our system and replaced it with an altered SqueezeNet implementation. And to our surprise, it worked almost right away. We had to tweak our training pipeline, and the results differed slightly, but all in all the small network with only ~5 million parameters matched the performance of our previous behemoth with ~134 million parameters. We quickly updated our conversion script and found out that our deployable model file just shrunk from ~500mb to 2.9mb. What a happy day!

Having solved the network size issue, we went ahead and thought about increasing the precision of our predictions. A loss of resolution is unavoidable in convolutional neural networks, as later layers acquire a larger “view” of the inputs by reducing their input size with so-called “pooling” layers. These layers take for example four values from the previous layer and merge them into a single one. Therefore our new SqueezeNet-based system created a 32 by 32-pixel image mask from a 512 by 512-pixel input image. Up to now we just scaled these up by using a transposed convolution. This allowed the net to learn how the upscaling worked best, but the fine details from the initial input image were already lost at this point.

We remembered Facebooks SharpMask system introduced in summer 2016 and revisited the accompanying paper. Their refinement modules seemed like a good fit, as they were able to gradually incorporate features from lower levels, but with higher resolution, into the coarse outputs. We adopted the idea and altered the refinement modules to take the final SqueezeNet output. The modules then combined the coarse SqueezeNet output with the pooling layers intermediate results and were able to refine the result. This increased our model size and the computation costs by a fair amount, but lead to much finer and more detailed results.

Once we settled on our architecture, we started an extensive training run, in which we tested more than one hundred different variations of hyperparameters, architectural details, and resizing techniques. Evaluating the results, we selected a variation, which made the best compromise between accuracy and inference speed/model size.

Our results and prototype

Having managed to fix all the issues, we were eager to see how the whole system performed on a mobile device with limited computing power and inputs. We updated our mobile app to use the new network architecture and the freshly trained model to compare the refined system to our previous approach. The results were amazing. When selecting objects that matched the categories of our training data and were fully visible in the image, we were able to generate fine-grained selection masks with just a single tap. More complex or larger objects required a few more taps, but we could always find a selection mask for our object, that was at least a solid starting point for further optimizations.

We decided to build a more polished prototype based on our existing img.ly iOS app. This app uses our PhotoEditor SDK to offer advanced image editing including focus and filter operations. As we were now able to create masks based on objects in the image we quickly settled on enhancing our filter and focus tools with selective masking.

Retrospective

Looking back at our journey into deep learning, it was one of the more frustrating yet fascinating ones. The sheer amount of possible applications is exciting, and once you get the hang of training something on your data, you immediately want to start experimenting with new things. On the other hand, you’re usually building huge black boxes with millions of float values, which makes debugging a pain. Especially when trying to replicate an already implemented architecture on other platforms, this can quickly become rather frustrating. If your outputs don’t match the expected results, your only option is to repeatedly go over your code, check all parameters and hope you stumble upon the wrong number somewhere. But once you manage to set everything up and start seeing some good results, you instantly want to tweak and optimise the bits and pieces of your system.

Overall, deep learning is a pain to debug, but yields great results, opens up a new field of photo editing applications and we’ll definitely keep exploring the new possibilities of applying the techniques in our product. Stay tuned for upcoming features!

Thanks for reading! To stay in the loop, subscribe to our Newsletter.