A new generation of A.I. algorithms, propelled by rising computational power, new hardware, and a shift in paradigms made its first notable impact in the creative world: The works of Gatys et al. and Krizhevsky et al. have not only gathered considerable public attention but have helped apps like Prisma to be adapted and used by millions. I strongly believe that this is merely the beginning. With the help of machine learning, we will fine-tune, simplify, and automate creative processes and ultimately empower new techniques for design and content creation.
We’ve been following this topic for quite some time now and have spent considerable effort in researching the opportunities of deep learning for our PhotoEditorSDK. After more than a year of research and development, today, we’re finally bringing one of our apps to beta. Portrait combines supervised deep learning with the visual power of our SDK. In a nutshell, Portrait makes creating beautifully designed portrait images as easy as taking a selfie. You turn your selfies into movie poster-like portraits, with styles ranging from double-exposure photography to stencil art. One may consider it as the next iteration of what Apple and Google recently brought to market with their new camera features.
We’ve now come a long way and gained invaluable insights on our journey so far. Not only did we get our hands dirty with countless training sessions and refinements to the neural net, but our first hand experience also helped to set expectation management right and to dismantle hype from substance. Most notably, it changed our product shaping process, making it more important than ever to foster strong ties between the product stakeholders and to share a common vision and goal everybody can get behind.
In the following I’d like to share the story of how we built the app and closed the gaps between roles of the stakeholders within this process.
Preface: Before Neural Networks were the Hot New Thing
My journey begins over ten years ago, while I was graduating in neuroscience. Back then, the idea of A.I. was just a vague promise. Artificial Neural Networks were too small, computers lacked the necessary power, and the results were certainly nice, but still too weak to compete with other traditional algorithms. Research felt stuck in tiny little specializations without really following a broader vision. Dazzled by its impracticability, my interest in Neural Networks slowly began to fade.
It took research on Neural Networks another six years to get back on my radar. At that time, I was leading several product developments at 9elements. When I learned about the work of DeepMind (now Google) I had a genuine feeling that this time, A.I. was ready for the limelight.
As we were in the course of building a library for image editing and computer vision — the PhotoEditorSDK, we realized how much neural nets could also affect the creative space, given its ability to abstract and formalize rules. What if there was a machine that could reproduce the common and dull tasks you have to do as an art director within a second? What if designers could get rid of repetitive and tedious activities that interrupt their creative flow?
But this topic isn’t something you’d learn in a week, obviously. Still, innovations cannot happen if you’re not willing to take a risk, so we decided to invest considerable time and resources into this technology.
From a product management’s perspective, this process is actually an anti-pattern: Usually, you wouldn’t want to start by finding the right purpose for a technology, instead you’d find the right technology for a purpose. I still believe that this is essentially the right approach, but sometimes you have to abandon your best practices and take a swim in uncharted waters. Consequently, we asked Malte, one of our iOS engineers, to spearhead our research and take a deep dive into this topic. We decided to start off with image segmentation as the first process that we wanted to optimize through machine learning. Masking and clipping sometimes can be a tedious tasks, and ultimately we wanted to reduce this process that can take several minutes to a single click.
Chapter 1: The Machine Engineer
Malte, who is a diligent engineer and — how convenient — a passionate photographer, started investigating some approaches that focused on image segmentation. You can read more about his journey in his article. Although he experimented with various neural networks and post-processing techniques, the resulting masks sometimes lacked the desired accuracy and wouldn’t have matched a user’s expectations. This was a first expected insight. As we want to deliver ready-to-use products to our customers, that don’t need any complex tweaking, this was something we had to fix. Our problems originated mostly from our rather ambitious goal to segment any type of object within an image. It would have required to train with vast data and to scale up the number of filters in our network. However, due to our on-device constraint, this would have killed our carefully crafted performance.
Therefore, we shifted this generalist approach to a specialized network for images of a certain domain that the model can be applied to. In hindsight, this seems quite obvious, as our rather small model would have never been able to cope with the amount of variations existing in ‘the real world’ anyway. So, we went back to the drawing board and started discussing which domain to focus on. That’s where we got suck; we struggled to find an obvious trend in our customers’ use cases or known photography platforms.
It was actually during his summer holiday, when Malte had the flash of genius. At a stop-over in Singapore, he noticed how the city was flooded with selfie-stick wielding tourists. The sheer amount of selfies taken at any public place in Singapore left him astonished and he realised that he just found the right domain. Selfies, and portraits in general, felt like an infinite datasource and prime use case for our image segmentation algorithm. Back home, we decided to focus on selfies and portrait-like photography.
Malte started searching for portrait datasets and found a collection of roughly 2000 portrait images collected from Flickr. Those were a great starting point and after a few training runs, he already reached satisfactory results, as the model was now capable to capture all available variations. At that point, we had a system at our hands that was able to segment portrait or selfie images in real-time on the device you’re capturing them with. This seemed like a great opportunity, but we didn’t want to stop just there. Releasing a prototype that can free a selfie from its background is nice, but doesn’t feel like something that would truly showcase how AI can make a difference in our creative process.
Chapter 2: The Art Director
This is where our Art Director Tommi, a renowned graphic artist and former sprayer, stepped in to explore what can be done with a selfie, an accurate alpha mask and the image editing features from our PhotoEditor SDK. When Tommi took the lead, I asked him to draft a vision, a creative direction for our app that combines all the tools and possibilities at our disposal.
Together, we started exploring portrait trends and unique imagery that would help us find a direction for our showcase. Soon, the walls of our meeting rooms and offices were plastered with inspirational works on portrait photography of all different kinds and styles. This visual catalogue kept inspiring us, although we weren’t sure on which style to settle in the end. It was when we could hardly find any more free spots on our walls and after looking at them for countless times that the idea struck:
What if we could enable users to turn their portrait to what we saw on these walls? And this, without actually having the design expertise they would normally be required to do so.
Instead of brooding over a completely new form of portraits, we could take all these styles and instantly realize them with the technology we had. From that point on, we flipped our process upside down. Instead of thinking about what the technology is capable of, or identifying a problem worth solving, we aimed for the creative output that we wanted our app to produce. While we started our venture with a technology, we now had visual results that we could work towards. The main question shifted from “What is our technology capable of?” to “How can we achieve this visual output with our technology?”
Tommi designed five lead graphics, so our team of engineers and designers could grasp what we ultimately wanted to achieve, using only a selfie and the features of our SDK.
Act 3: Closing ranks
With such a clear vision for our app, we started separating the wheat from the chaff, categorizing the portraits and understanding which operations of our SDK we had to combine, assemble and enhance to create these visuals.
What followed was a remarkable interplay across multiple stakeholders of our team. While we were always very vocal proponents of building strong relationship between product stakeholders, the introduction of the AI layer actually glued our team further together.
Our designers started to embrace the engineering perspective, playfully identifying both opportunities and constraints through the tech layer. At the same time, our engineers embraced the design vision and formalized it into code. Let me give you some examples:
While thinking of the UI, we understood that the transformation of a selfie into a graphical artwork required an immediate feedback for the user, so they can find a pose that works best with the respective artwork. Consequently, we optimized our networks for real-time processing, a true challenge that needed strong expertise in both iOS engineering and neural net architecture.
Our designs and recipes in turn had to be tweaked to gracefully allow for errors of our AI, because an error rate of 3% can still produce undesired artefacts and mask inaccuracies. We did that by using techniques that beautifully fringed edges of the portrait.
Altogether, the close cooperation, as well as countless meetings, feedback loops, and the continuous fine tuning of the code and underlying recipes is what brings us here.
All of this wouldn’t have happened if we hadn’t took the risk to invest in a rising technology in the first place. And all of this wouldn’t have been possible if it wasn’t for the exemplary cooperation between all the stakeholders. Portrait is a showcase of how technology can inspire and tie a team together. This, in the end, is absolutely necessary if we want to achieve the leaps we expect with AI. If you want to impact the creative space by introducing an AI layer to it, your engineers have to think like designers, or at least deeply understand their work.
The Road Ahead
Portrait is a first showcase and one step of many in our venture to wire several AI aspects deep into our SDK. On our journey, we’ve identified many more opportunities where we can help broader audiences to make creative work and design more accessible. Of course, we will also improve our models and networks with better and more data, always keeping in mind the aesthetic and visual output we’d like to achieve. We’ll keep you posted on our updates and next ventures into this exciting new era.
Thanks to my co-authors Malte & Felix!