<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Pascal – IMG.LY Blog</title><description>Posts by Pascal on the IMG.LY blog.</description><link>https://img.ly/blog/author/pascal/</link><language>en-us</language><image><url>https://img.ly/apple-touch-icon.png</url><title>Pascal – IMG.LY Blog</title><link>https://img.ly/blog/author/pascal/</link></image><atom:link href="https://img.ly/blog/author/pascal/rss.xml" rel="self" type="application/rss+xml"/><generator>Astro</generator><lastBuildDate>Fri, 12 Jun 2026 10:10:22 GMT</lastBuildDate><ttl>60</ttl><item><title>A Remote Data Aggregation Pipeline to Provide Machine Learning Datasets</title><link>https://img.ly/blog/data-aggregation/</link><guid isPermaLink="true">https://img.ly/blog/data-aggregation/</guid><description>This blog explores how we aggregate datasets remotely, keep track of already generated datasets, and download datasets from a remote server using a generalizable architecture pattern.</description><pubDate>Fri, 16 Apr 2021 15:55:59 GMT</pubDate><content:encoded>&lt;p&gt;In our NRW.EFRE funded research project &lt;a href=&quot;https://kidesign.img.ly/&quot;&gt;KI Design&lt;/a&gt;, we spent a lot of time in training and testing convolutional networks together with our fellows from the &lt;a href=&quot;http://www.bo-i-t.de/&quot;&gt;Bochumer Institute of Technology (BO-I-T)&lt;/a&gt;. Our goal: Using Artificial Intelligence (AI) to make photo editing more comprehensive and easier at the same time. Details about the project and the motivation behind it, and some results can be found on our &lt;a href=&quot;https://kidesign.img.ly&quot;&gt;project homepage&lt;/a&gt; and in this &lt;a href=&quot;https://img.ly/blog/image-inpainting/&quot;&gt;blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 1: Example of an AI alpha-matted image of a wood duck.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;721&quot; src=&quot;https://img.ly/_astro/Matting_example_sW2Wq.webp&quot; srcset=&quot;/_astro/Matting_example_ZESjaN.webp 640w, /_astro/Matting_example_ZDYIuA.webp 750w, /_astro/Matting_example_Z136yJu.webp 828w, /_astro/Matting_example_2bLPjB.webp 1080w, /_astro/Matting_example_ZkqN8b.webp 1280w, /_astro/Matting_example_Z70KUf.webp 1668w, /_astro/Matting_example_sW2Wq.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Today, we want to share details about our custom-made approach for a remote data aggregation and data transfer pipeline that we developed to support seamless integration of data preprocessing and storage into the training procedure. We think that this topic is of interest to the machine learning community, because the generation, versioning, and handling of datasets for the training of machine learning algorithms constitute a challenge for many researchers and developers.&lt;/p&gt;
&lt;h3 id=&quot;introduction&quot;&gt;Introduction&lt;/h3&gt;
&lt;p&gt;A major challenge in nearly every project in which machine learning and deep learning are applied, is set in the data preparation and augmentation for the training process. As nowadays many approaches and algorithms are data-driven, having training data in the right amount and quality even can make the difference between a project’s success or failure. This also includes a well-organized data infrastructure to store data, possibly in different versions of datasets.&lt;/p&gt;
&lt;p&gt;In our joint project KI Design, we face the setting that our server cluster is split up geographically, having a data storage server at the BO-I-T laboratory and a computing server at the IMG.LY site. This, of course, makes the design of a training pipeline a bit more demanding in how to efficiently use resources. After searching for an existing software solution, we decided to develop our custom-made approach, adapted to our requirements that we formulated as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The efficient workflow between data (pre)processing, storage, and provisioning on one side, as well as training initiation and execution on the other side&lt;/li&gt;
&lt;li&gt;A possibility to requests data preparations and augmentations based on configurations generated at the training side&lt;/li&gt;
&lt;li&gt;Versioning of datasets or configurations to ensure reproducibility&lt;/li&gt;
&lt;li&gt;Availability of already computed datasets to omit preparation and processing of equal data requests multiple times&lt;/li&gt;
&lt;li&gt;A notification process implemented on the data server to signal the availability of a dataset and trigger download as well as training processes to the computation side&lt;/li&gt;
&lt;li&gt;Integration into a TensorFlow training pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;our-custom-made-approach---overview&quot;&gt;Our Custom-made Approach - Overview&lt;/h3&gt;
&lt;p&gt;In this part, we briefly sketch our concept, before diving into our technical implementation in the next part. To train new models on our work-thirsty computing server at IMG.LY, we need training data. As mentioned before, data is stored on the data server at BO-I-T. We work on different tasks with different experiments in our project, thus we need to perform several different pieces of training of machine learning algorithms, many of which require different datasets.&lt;/p&gt;
&lt;p&gt;A dataset can be created in a data preparation process. In our context, the data preparation can be based on one of several “raw datasets” such as the &lt;a href=&quot;https://cocodataset.org/#home&quot;&gt;COCO&lt;/a&gt;, the &lt;a href=&quot;http://saliencydetection.net/duts/&quot;&gt;DUTS&lt;/a&gt;, or some self-assembled datasets and might include data pre-formatting (e.g. adjusting image size or section) and augmentation (e.g. image rotation, brightness adjustment, or combining foreground subjects with different backgrounds). Our idea was to use the data server for the whole process of data preparation and augmentation, to not waste valuable resources on the client for this.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 2: Example of an image data preparation process. Left: raw image data; top right: cropped and flipped image; bottom right: cropped and rotated image with adjusted exposure.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;1057&quot; src=&quot;https://img.ly/_astro/Augmentation_example-2_yGquC.webp&quot; srcset=&quot;/_astro/Augmentation_example-2_1292sz.webp 640w, /_astro/Augmentation_example-2_1YQ7yv.webp 750w, /_astro/Augmentation_example-2_1TJESd.webp 828w, /_astro/Augmentation_example-2_vt6fU.webp 1080w, /_astro/Augmentation_example-2_1LJEyq.webp 1280w, /_astro/Augmentation_example-2_Bxbj.webp 1668w, /_astro/Augmentation_example-2_yGquC.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;p&gt;The pre-training data aggregation phase can be described like that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The client on the computing server should be able to request datasets from the data server. To do so, the client should transmit a (parameter) description of an exact configuration of the required dataset (image type and resolution size, as well as meta information) to the data server.&lt;/li&gt;
&lt;li&gt;Waiting for the requested dataset to be generated and provided, the computing server can spend its resources into other scheduled training jobs. (Computation time is money!)&lt;/li&gt;
&lt;li&gt;Meanwhile, the data server checks if the requested dataset is already existent — prepared from a previous request — or whether a new data generation process needs to be launched.&lt;/li&gt;
&lt;li&gt;If a dataset is available and ready to be downloaded, either directly after the request (because it was already created in a previous request with the exact same configuration) or after the time it took to create the new dataset on the server, it sends back a notification message to the client.&lt;/li&gt;
&lt;li&gt;Receiving this response, the computing server can download the dataset and initiate the training process.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To implement this concept, we set up an architecture that is organized by three services, c.f. Figure 3:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;a &lt;em&gt;dataset-client&lt;/em&gt;,&lt;/li&gt;
&lt;li&gt;a &lt;em&gt;dataset-server&lt;/em&gt; and&lt;/li&gt;
&lt;li&gt;a &lt;em&gt;dataset-handler&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://lh3.googleusercontent.com/6ef2FEye8FD5EtWNXYVzqspJwpHA9gTcLXZpKJ32S-Qha5BjTlJzVX78yVVCQuo5zHlkXqJgd2TJFQDH7-rv4BP_PMdUnNcDew8YH3Br8pVTpAeRrRVGrP1B4hTpr9SEqJi6Rj1b&quot; alt=&quot;Figure 3: Schema of the data transfer process&quot;&gt;&lt;/p&gt;
&lt;p&gt;For our implementation, we developed a dataset-client service that provides the functionality required to cope with the client-side. The dataset-server service is the counterpart on the server-side. While the previous two components are pretty straightforward to understand, the third part, the dataset-handler, requires a little explanation: not only the implementation and training of machine learning models is part of our research project. With similar importance, we develop new strategies and approaches for data preparation. Thus, the server can not be provided with all preparation functions a-priori. Instead, it needs to be able to get to the required code for, e.g., new augmentation procedures and other algorithms, in their respective latest versions just before the preparation starts. The data-handler is the concept for this: it is the adjustable tool the server uses, to perform the data preparation.&lt;/p&gt;
&lt;h2 id=&quot;our-custom-made-approach---technical-implementation&quot;&gt;Our Custom-made Approach - Technical Implementation&lt;/h2&gt;
&lt;p&gt;In this part, we do not aim to provide a full description of our implementation. Instead, our goal is to give some insights into which frameworks we used and how we implemented the interfaces between the client, the data server, and the data handler.&lt;/p&gt;
&lt;h3 id=&quot;framework-and-languages&quot;&gt;Framework and Languages&lt;/h3&gt;
&lt;p&gt;To set up our remote data pipeline we used a combination of different tools, frameworks and programming languages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Apache2&lt;/li&gt;
&lt;li&gt;Node.js&lt;/li&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For our data server, we used Apache2 as a webserver to allow requests from outside and redirect them. Our REST API and WebSocket connection are set up in Node.js. Here we use the &lt;em&gt;Express.js&lt;/em&gt; and WebSocket libraries to handle the REST requests and establish a WebSocket connection. Further server-sided processes such as checking the availability of datasets by a hashtag, setting up a virtual environment (to be sure that all libraries are available on the server we set up a pipenv environment), and trigger the data handler configuration are written in Python. Here, we used libraries like hashlib, subprocess and zip file for the implementation as well as some other basic libraries.&lt;/p&gt;
&lt;p&gt;Besides the &lt;em&gt;data-server&lt;/em&gt; service, the &lt;em&gt;data-handler&lt;/em&gt; and the &lt;em&gt;data-client&lt;/em&gt; are built up in Python. Here we used libraries as requests, asyncio, and WebSocket to establish the client-sided connection with the data server. Further used libraries of the data handler strongly depend on the task to perform and thus vary a lot. Just to call some examples: we frequently use libraries as pillow, imageio, or OpenCV for image manipulation.&lt;/p&gt;
&lt;h3 id=&quot;client-side&quot;&gt;Client Side&lt;/h3&gt;
&lt;p&gt;The dataset client is running on the client and triggers the whole data aggregation process. It covers the following functionalities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;requesting datasets,&lt;/li&gt;
&lt;li&gt;establishing a WebSocket connection,&lt;/li&gt;
&lt;li&gt;downloading aggregated datasets,&lt;/li&gt;
&lt;li&gt;starting the model training.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get preprocessed training data from the server, a REST-based post request is sent from the client to the data server. This request includes a configuration, defining the exact dataset attributes, as well as meta information that specify the raw dataset that should be used for the preprocessing and the version of data-handler. Here is a code snippet showing the required post parameters and the data type:&lt;/p&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;request_dataset(dataset_name: str, handler_version: str, dataset_config: Dict)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;The parameter &lt;em&gt;dataset_name&lt;/em&gt; and &lt;em&gt;handler_version&lt;/em&gt; are passed as strings and the config settings &lt;em&gt;dataset_config&lt;/em&gt; in a dictionary. Different handlers are assigned to different projects and are customized to their specific needs. Therefore, the config settings vary on each project. But, to get an idea of some configurations here is a (short) example:&lt;/p&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;short_example_ config = {&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;&apos;input_attributes&apos;&lt;/span&gt;&lt;span&gt;: [&lt;/span&gt;&lt;span&gt;&apos;image&apos;&lt;/span&gt;&lt;span&gt;],&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;&apos;batch_size&apos;&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;100&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;&apos;image_size&apos;&lt;/span&gt;&lt;span&gt;: [&lt;/span&gt;&lt;span&gt;1024&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;1024&lt;/span&gt;&lt;span&gt;],&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;&apos;variations_per_sample&apos;&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;1&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;&apos;crop_size&apos;&lt;/span&gt;&lt;span&gt;: [&lt;/span&gt;&lt;span&gt;800&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;800&lt;/span&gt;&lt;span&gt;],&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;&apos;random_crop_centered&apos;&lt;/span&gt;&lt;span&gt;: &lt;/span&gt;&lt;span&gt;True&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;}&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;It shows some basic parameters like, e.g., image type, image resolution as well as batch size, but also options of further operations as centered cropping. These settings serve as instructions for the &lt;em&gt;data_handler&lt;/em&gt; to create the dataset and are further fully customizable and adapted to a use case.&lt;/p&gt;
&lt;p&gt;Receiving a successful response of the post request, the dataset client will establish a WebSocket connection. This allows continuous communication between the client and the data server, which is important for a server-sided notification in case of the finished data preprocessing (as an alternative, we could have implemented a regularly scheduled client-side polling to check for dataset states). Here we depict an example of our &lt;em&gt;asyncio&lt;/em&gt; event loop command:&lt;/p&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;asyncio.get_event_loop().run_until_complete(wait_for_completion(&lt;/span&gt;&lt;span&gt;&apos;topic_id&apos;&lt;/span&gt;&lt;span&gt;))&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;With this loop, the client waits until it receives a related response from the server. If a success token is returned, the client leaves the loop and starts to download the dataset and further, the actual training process (as soon as the training scheduler assigns resources, but this is a different story to tell).&lt;/p&gt;
&lt;h3 id=&quot;server-side&quot;&gt;Server Side&lt;/h3&gt;
&lt;p&gt;On the server-side, two services are running: the &lt;em&gt;dataset_server&lt;/em&gt; and the &lt;em&gt;dateset_handler&lt;/em&gt;. The &lt;em&gt;dataset_server&lt;/em&gt; handles the communication with the client and receives configuration requests as well as download requests. Furthermore, it checks the availability of datasets and if necessary triggers the &lt;em&gt;data_handler&lt;/em&gt; to run a dataset creation process. In summary, the &lt;em&gt;dataset_server&lt;/em&gt; covers the following functionalities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;receiving request over REST service&lt;/li&gt;
&lt;li&gt;set up websocket communication protocol&lt;/li&gt;
&lt;li&gt;check for existing datasets&lt;/li&gt;
&lt;li&gt;install &lt;em&gt;dataset_handler&lt;/em&gt; and initialize dataset creation&lt;/li&gt;
&lt;li&gt;send notification to the client&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using a REST-API as an entry point, the data server receives the post request of the client and checks, if the required dataset was already created. This check is being done by a string matching of a Universally Unique Identifier (UUID) of a configuration: Using an md5 hash, each request is converted to a special UUID, generated from the transferred dataset configurations. Information that is taken into account for the UUID/hash generation is the dataset name, the handler version as well as the md5 hash of import config settings. We set the dataset name and handler version in front of our md5 hash as it might be useful information in case we run out of disk space. Here is our function of the generation of the UUID.&lt;/p&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;col width=&quot;601&quot;&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;def&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;hash_dataset_name&lt;/span&gt;&lt;span&gt;(dataset_name: str, handler_version: str, dataset_config: str)&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;h = hashlib.md5()&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;h.update(dataset_config.encode(&lt;/span&gt;&lt;span&gt;&quot;utf-8&quot;&lt;/span&gt;&lt;span&gt;))&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;return&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;&quot;&quot;&lt;/span&gt;&lt;span&gt;.join([dataset_name, &lt;/span&gt;&lt;span&gt;&quot;_&quot;&lt;/span&gt;&lt;span&gt;, handler_version, &lt;/span&gt;&lt;span&gt;&quot;_&quot;&lt;/span&gt;&lt;span&gt;, h.hexdigest(), &lt;/span&gt;&lt;span&gt;&quot;.zip&quot;&lt;/span&gt;&lt;span&gt;])&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;If the UUID matches with an existing dataset, a notification to the client will be returned, pointing to the related zip archive on the datastore. Otherwise, the &lt;em&gt;data_handler&lt;/em&gt; will be advised to start data preprocessing. However, before the generation process can be started, the correct &lt;em&gt;data_handler&lt;/em&gt; has to be downloaded and installed. Different &lt;em&gt;data_handler&lt;/em&gt; versions are stored in a GitHub repository, available to be downloaded from &lt;em&gt;data_server&lt;/em&gt;. This is dispatched by application of a Python integrated bash command, running a “&lt;em&gt;pip install&lt;/em&gt;”:&lt;/p&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;subprocess.call([&lt;/span&gt;&lt;span&gt;&quot;pipenv&quot;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&quot;run&quot;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&quot;pip&quot;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&quot;install&quot;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&quot;--no-cache-dir&quot;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&quot;--upgrade&quot;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&quot;--process-dependency-links&quot;&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span&gt;&quot;git+https://github.com/imgly/dataset-handlers@{}#egg=dataset-handlers&quot;&lt;/span&gt;&lt;span&gt;.format(handler_version)])&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;Note: We need pip in version 18 here because it is the last version supporting the “&lt;em&gt;—process-dependency-links&lt;/em&gt;” option, which ensures that the dependencies inside of dataset-handlersare installed. The &lt;em&gt;data_handler&lt;/em&gt; version is directly passed into the pip command, linking to the corresponding GitHub repository and handler release version.&lt;/p&gt;
&lt;h3 id=&quot;data-handler&quot;&gt;Data Handler&lt;/h3&gt;
&lt;p&gt;After successful installation of the &lt;em&gt;dataset_handler&lt;/em&gt;, we can simply import the handler in Python: &lt;em&gt;import dataset_handlers&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As described earlier, the core functionality of the &lt;em&gt;data_handler&lt;/em&gt; is to create a dataset based on a given configuration and return zipped .tfrecord files. For those of you who are not familiar with the &lt;em&gt;.tfrecord&lt;/em&gt; file format: it is a special TensorFlow format for handling data as binary records in a sequence. This has the advantage of using less disc space, being faster in copying as well as being more efficient to read data from disk. But let’s go on with our &lt;em&gt;data_handler&lt;/em&gt;: Depending on the project and use case, the data handlers vary strongly and offer different methods. This makes an example a bit difficult at this point. But we can present some abstract methods of our class DatasetHandler():&lt;/p&gt;
&lt;p&gt;We start with the initialization method or constructor:&lt;/p&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;class&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;DatasetHandler&lt;/span&gt;&lt;span&gt;()&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;def&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;__init__&lt;/span&gt;&lt;span&gt;(self, config: Dict[str, any], base_data_path: str)&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;self.base_data_path = base_data_path&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;  &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;for&lt;/span&gt;&lt;span&gt; k, v &lt;/span&gt;&lt;span&gt;in&lt;/span&gt;&lt;span&gt; config.items():&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;setattr(self, k, v)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;The input variables from the client are passed into the constructor method and set as class attributes. Additionally, the base path of the raw data is required and set as an attribute. These are the main settings needed for data generation. Furthermore, the following methods are important for a generation process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a static method keeping all allowed config operations,&lt;/li&gt;
&lt;li&gt;a method to create the correct raw data path and&lt;/li&gt;
&lt;li&gt;a method that returns a .tfrecord file list.&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;&lt;colgroup&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p dir=&quot;ltr&quot;&gt;&lt;span&gt;      def&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;as_tfrecord&lt;/span&gt;&lt;span&gt;(self)&lt;/span&gt;&lt;span&gt; -&gt; List[str]:&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;pass&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;def&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;config_options&lt;/span&gt;&lt;span&gt;()&lt;/span&gt;&lt;span&gt; -&gt; Dict[str, Any]:&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;pass&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;def&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;tf_records_mapper&lt;/span&gt;&lt;span&gt;(self, files: List[str])&lt;/span&gt;&lt;span&gt; -&gt; tf.data.TFRecordDataset:&lt;/span&gt;&lt;span&gt;&lt;br&gt;&lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;    &lt;/span&gt;&lt;span&gt;pass&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;Finally, after successfully dataset creation, the server creates a .zip from the .tfrecord Mfiles and returns the file name to the client over the WebSocket connection. The client can now download the dataset.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Today we introduced our concept and some insights into the implementation of our developed remote data pipeline used to prepare and provide training data for complex machine learning training in our research project. Using this pipeline, we have solved the challenging task of a split infrastructure without suffering from strong performance-related deficits. This remote data transfer was one of the first milestones in our project to be reached and further the base of a successful collaboration.&lt;/p&gt;
&lt;p&gt;The advantage of our presented solution is the shared use of resources: The computing server focuses its resources entirely on training models, while the data server handles the labor-intensive creation of datasets. To give you an idea of why this is so important: as we work with large-size image data, a full augmentation process can easily take up to 12 hours. Waste of valuable computing resources if tasks are not split up. Furthermore, we usually start multiple training sessions all requiring the same dataset at the same time. Without our data pipeline, each experiment would create its own version of the same data set and will block even more resources. With our solution, this is solved in a far more efficient way.&lt;br&gt;
Our developed solution could be extended in further versions by features like a monitoring tool with visualization. Important information and statistics to display could be, for example, the status of currently running data preparation processes, a list of all cached datasets including their configurations, as well as statistics about usage and downloads. This could help to keep the storage more structured and clean.&lt;/p&gt;
&lt;p&gt;All in all, we were very content with how fast the communication and transfer between our servers take place and are very content with our self developed approach.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;This project was funded by the European Regional Development Fund (ERDF).&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 449px) 449px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;449&quot; height=&quot;112&quot; src=&quot;https://img.ly/_astro/9_Zv71r7.webp&quot; srcset=&quot;/_astro/9_Zv71r7.webp 449w&quot;&gt;&lt;/p&gt;</content:encoded><dc:creator>Leonard</dc:creator><dc:creator>Philip</dc:creator><dc:creator>Pascal</dc:creator><dc:creator>Sebastian</dc:creator><media:content url="https://blog.img.ly/2021/04/Zeichenfl-che-1-100-1.jpg" medium="image"/><category>Machine Learning</category><category>AI</category></item><item><title>Inpainting: Removing Distracting Objects in High-Resolution Images</title><link>https://img.ly/blog/image-inpainting/</link><guid isPermaLink="true">https://img.ly/blog/image-inpainting/</guid><description>We teamed up with the Bochumer Institute of Technology to present improved results with deep learning approaches. Here are our experiences from a mission to make editing easier.</description><pubDate>Tue, 08 Dec 2020 15:03:28 GMT</pubDate><content:encoded>&lt;h3 id=&quot;introduction&quot;&gt;Introduction&lt;/h3&gt;
&lt;p&gt;You may know this situation: You are out on a trip when suddenly a unique opportunity for a photograph appears, like a wild animal showing up or sun rays breaking through the rain clouds for a few seconds. Without hesitation, you grab your camera and capture the sight. Later you discover that a distracting object, like a road sign, is ruining your shot. Time for some cumbersome retouching.&lt;/p&gt;
&lt;p&gt;Now, imagine you could erase the distracting object just by highlighting it. Wonderful! From the field of deep learning, a technique for image manipulation called &lt;strong&gt;Image Inpainting&lt;/strong&gt; makes it possible. Image Inpainting aims to cut out undesired parts of an image and &lt;strong&gt;fills up missing information&lt;/strong&gt; with plausible content of patterns, colors, and textures that match the surrounding.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 1: Inpainting example. A) shows the original image; B) the masked (input) image; C) the results of the inpainting.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 1496px) 1496px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;1496&quot; height=&quot;579&quot; src=&quot;https://img.ly/_astro/1_1Tl2Oq.webp&quot; srcset=&quot;/_astro/1_8J1G8.webp 640w, /_astro/1_Zch4i2.webp 750w, /_astro/1_ZygjVb.webp 828w, /_astro/1_2eNKv8.webp 1080w, /_astro/1_1GfTHz.webp 1280w, /_astro/1_1Tl2Oq.webp 1496w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Today we would like to share experiences that we have gained during the application of deep learning inpainting approaches. Furthermore, we’ll present some quality optimization steps that we have implemented to improve results addressing the transformation to high-resolution outputs. But let us start with a quick introduction: who are we and why are we concerned with these kinds of topics?&lt;/p&gt;
&lt;p&gt;We are a small consortium consisting of the &lt;strong&gt;&lt;a href=&quot;https://bo-i-t.de/&quot;&gt;Bochumer Institute of Technology&lt;/a&gt;&lt;/strong&gt;, a research institute aiming to transfer knowledge from academia into industry, and the company &lt;strong&gt;&lt;a href=&quot;https://img.ly&quot;&gt;IMG.LY&lt;/a&gt;,&lt;/strong&gt; a team of software engineers and designers developing creative tools like the &lt;a href=&quot;https://img.ly/products/photo-sdk/&quot;&gt;PhotoEditor SDK&lt;/a&gt; and the &lt;a href=&quot;https://img.ly/blog/building-the-creative-engine-of-the-world/&quot;&gt;UBQ&lt;/a&gt; engine. Together we are working in the EFRE.NRW funded research project &lt;a href=&quot;https://kidesign.img.ly/&quot;&gt;KI Design&lt;/a&gt; that targets artificial intelligence (AI) and deep learning-based algorithms for image content analysis and modification, as well as a leveraging tool kit for aesthetic improvements.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Image Inpainting&lt;/strong&gt; has been a viable technique in image processing for quite some time, even before “Artificial Intelligence” was on everyone’s lips. Common for most inpainting algorithms is that an area of an image is highlighted to be corrected. Many conventional algorithms then analyze the statistical distribution to fill the resulting gap by finding and using nearest neighbor patches. The most famous and state of the art approach of this method is the &lt;strong&gt;&lt;a href=&quot;https://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/patchmatch.pdf&quot;&gt;PatchMatch&lt;/a&gt; algorithm&lt;/strong&gt;. It uses a fast, structured randomized search to identify the approximate nearest neighbor patches that will fill in the respective part of the image.&lt;/p&gt;
&lt;p&gt;However, there are two &lt;strong&gt;drawbacks&lt;/strong&gt;: first, regardless of the approximation, performance still might be an issue, and second, the results suffer from a lack of semantic understanding of the scene. Thus, research dived into new ideas and directions and tried the application and implementation of AI- and neural network-based approaches to solving these issues.‌‌‌‌ For us, the removal of annoying background content is a useful feature, as it improves the overall image aesthetic. Having this available on mobile devices would be particularly interesting. Due to performance limitations and ever-improving integrated cameras, a mobile solution requires a fast and lightweight model architecture as well as the ability to process high-resolution images.&lt;/p&gt;
&lt;p&gt;Summarized, our expectation for an AI-based inpainting algorithm are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;removal of (manually highlighted) background objects/persons&lt;/li&gt;
&lt;li&gt;feasibility to process high-resolution images&lt;/li&gt;
&lt;li&gt;fast and lightweight network (applicable for smartphones)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The number of publications addressing this or similar requirements has increased enormously in recent years. After digging into the literature, we identified two promising approaches and tested them.&lt;/p&gt;
&lt;h3 id=&quot;testing-and-comparing-model-architectures&quot;&gt;Testing and Comparing Model Architectures&lt;/h3&gt;
&lt;p&gt;These selected networks were based on the latest scientific findings and appeared to provide high-quality output. Both approaches have well-documented repositories – a special thank you to the authors for their great work (of repositories and papers as well)! The selected networks are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partial Convolutional Neural Networks (PCONV); [&lt;a href=&quot;https://arxiv.org/abs/1804.07723&quot;&gt;paper&lt;/a&gt;, &lt;a href=&quot;https://github.com/MathiasGruber/PConv-Keras&quot;&gt;github-repository&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;Generative Multi-column Convolutional Neural Networks (GMCNN); [&lt;a href=&quot;http://papers.nips.cc/paper/7316-image-inpainting-via-generative-multi-column-convolutional-neural-networks.pdf&quot;&gt;paper&lt;/a&gt;, &lt;a href=&quot;https://github.com/shepnerd/inpainting_gmcnn&quot;&gt;github-repository&lt;/a&gt;]&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You may be wondering why exactly we chose these models for comparison purposes, as the latest scientific findings sound a bit vague. Indeed, it is considerably difficult to identify the best fitting model architecture. As far as we know, there is no standardized validation method or data set. Most papers demonstrate their results on &lt;em&gt;self-selected&lt;/em&gt; test images and further compare them with again &lt;em&gt;self-selected&lt;/em&gt; approaches. The only option we had was to evaluate models that seemed reasonable to us. A validation method or standardized test set could be a valuable scientific contribution here. Let’s turn back to the selected models.‌‌‌‌ The PCONV network uses multiple convolutional layers and adds a &lt;em&gt;partial convolutional layer&lt;/em&gt;. The key feature is that the convolution does not consider invalid pixels, indicated by an updating mask. This prevents the algorithm from picking up the color of the mask (typically the average color tone of the image) and transmit it into the reconstruction process.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;GMCNN&lt;/strong&gt; – a GAN-based model – is built in a special architecture consisting of 3 networks: a generator, split up into three branches addressing different feature levels, a local and global discriminator, a VGG19 net calculating the implicit diversified Markov random field (ID-MRF), introduced in the paper. This ID-MRF serves as a loss term comparing generated content with nearest-neighbor patches of the ground truth image. While the interaction of all three networks is required in the training phase, only the generative network serves for testing and production. More details and figures regarding the model architecture are available in the official &lt;a href=&quot;http://papers.nips.cc/paper/7316-image-inpainting-via-generative-multi-column-convolutional-neural-networks.pdf&quot;&gt;paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Due to the lack of standardized sets, we created our own test sets addressing different levels of complexity. This also included image data requiring an understanding of semantic structures. In our comparison, we paid special attention to ensuring the filled content was harmonious, and a possible artifact interspersion was reduced to a minimum. In particular, image artifacts could raise issues in terms of translation with respect to high-resolution information. Here is an example output of our tests:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 2: Comparison of GMCNN and PCONV model. A) shows the original image; B) the masked (input) image; C) the results of the PCONV network and D) the results of the GMCNN model.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;1095&quot; src=&quot;https://img.ly/_astro/2_r92Ec.webp&quot; srcset=&quot;/_astro/2_1AEMMW.webp 640w, /_astro/2_ZtiLji.webp 750w, /_astro/2_H1eho.webp 828w, /_astro/2_18mnqN.webp 1080w, /_astro/2_Z1c3kHp.webp 1280w, /_astro/2_1Hu0Sr.webp 1668w, /_astro/2_r92Ec.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;p&gt;In comparison, the inpainting result based on PCONV suffered from some blurred artifacts and erratically deviating shades, cf. Figure 1C, whereas the GMCNN-based result appeared to be more precise and plausible concerning the semantic context, cf. Figure 1D. You can see this clearly when you look at the grille door that was covered by a person. The GMCNN approach, cf Figure 1D, had recognized and respected the grid structure, while the PCONV overlayed this with a uniform (black) color tone. In consideration of all test data results, we decided to follow up with the GMCNN.&lt;/p&gt;
&lt;p&gt;However, we would like to emphasize that this does not mean that one model architecture is better suited for Image Inpainting than the other. The used weights build-up of the PCONV architecture may achieve similar results with further training or different test sets.&lt;/p&gt;
&lt;h3 id=&quot;what-about-high-resolution-inpainting&quot;&gt;What About High-Resolution Inpainting?&lt;/h3&gt;
&lt;p&gt;At the current state of the model, the processing of high-resolution images remains uncovered. Out of all the papers and repositories we found, even papers promising high-resolution often just targeted image sizes of 1024x1024 pixel at maximum. Our expectations were a resolution of substantially more than 2000x2000 pixels. A reason for this issue seemed to be the hardware demanding and time-consuming training phase when processing high-resolution images. ‌‌&lt;br&gt;
Furthermore, the application of a high-resolution inpainting model could entail performance issues. These are not neglectable to us, as we are facing a prospective implementation on smartphones that can’t keep up with the power of a modern graphics card. Thus, an additional challenge is a high-quality transformation of low-resolution outputs to a high-resolution.&lt;/p&gt;
&lt;h3 id=&quot;apply-low-resolution-inpainting-output-to-high-resolution-images&quot;&gt;Apply Low-Resolution Inpainting Output to High-Resolution Images&lt;/h3&gt;
&lt;p&gt;‌‌The GMCNN model was trained with the &lt;a href=&quot;http://places2.csail.mit.edu/index.html&quot;&gt;Places&lt;/a&gt; dataset, formatted in a 512x680 resolution. Feeding in high-resolution images would exceed the training input size by far and further require information of feature dimensions that the model has never seen before. That could result in almost completely distorted reconstructions.&lt;/p&gt;
&lt;p&gt;A straightforward solution is to downscale the high-resolution image before feeding it to the model and then resize the result up to the original image size conclusively. Due to the upscaling (e.g., via bicubic interpolation), the image details suffer from a loss of quality. Therefore a better approach is to take only the masked areas of the upscaled inpainting prediction and stitch it back into the original image. That prevents the loss of initially known details from the unmasked regions. For the maintained inpainting regions, the lack of image details, as well as the artifacts and distortions, pose a complex challenge that we aimed to overcome with the following approaches.‌‌‌‌&lt;/p&gt;
&lt;h3 id=&quot;shrinking-mask-approach&quot;&gt;Shrinking-Mask-Approach&lt;/h3&gt;
&lt;p&gt;While the inpainted area mostly yields realistic-looking content for the more marginal regions, the performance decreases strongly towards the center, cf. Figure 3D. We especially noticed this behavior for larger masks. Conclusively a recursive inpainting procedure with an iteratively shrinking mask, cf. Figure 3E, seems to be a reasonable approach.  With this concept, we try to improve the inpainting results in a progressive manner starting from the boundary to the center of the masked regions while utilizing the generated information of the preceding recursion, cf. Figure 3F.‌‌&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 3: Shrinking-mask-approach. A) shows the original image; B) the final inpainting result after applying the shrinking-mask-approach for 4 iterations; C) shows the original image masked by the original mask (model input for iteration 1); D) the inpainting result of iteration 1; E) the inpainting result of the previous iteration 1 masked by a shrunken mask (model input for iteration 2) and F) the inpainting results of iteration 2.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 1062px) 1062px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;1062&quot; height=&quot;1479&quot; src=&quot;https://img.ly/_astro/3_Z24j7qw.webp&quot; srcset=&quot;/_astro/3_Z1giTCo.webp 640w, /_astro/3_ZPwLax.webp 750w, /_astro/3_ZAqkG6.webp 828w, /_astro/3_Z24j7qw.webp 1062w&quot;&gt;&lt;/p&gt;
&lt;p&gt;To us, it was essential to have a dynamic method that allows the handling of all mask forms and sizes. Therefore, we decided to apply an erosion kernel to the original mask in a recursive fashion until it is fully eroded. The amount of shrunk masks determines the number of inpainting performed by the network.‌‌‌‌&lt;/p&gt;
&lt;h3 id=&quot;two-step-approach&quot;&gt;Two-Step-Approach&lt;/h3&gt;
&lt;p&gt;While investigating and testing various quality optimization steps, we also fed high-resolution images into our model and discovered that the results for smaller masks were convincing. That led us to the hypothesis that not the resolution but rather the number of pixels to reconstruct seems to be the limiting factor. This finding served as the basis for our two-step-approach.‌‌‌‌&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 4: Two-step-approach. A) shows the original image; B) the final inpainting result after applying the two-step-approach&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;710&quot; src=&quot;https://img.ly/_astro/4_OlLcn.webp&quot; srcset=&quot;/_astro/4_ZcIuQI.webp 640w, /_astro/4_ZPa8Y2.webp 750w, /_astro/4_1peh2d.webp 828w, /_astro/4_ZdIXGY.webp 1080w, /_astro/4_xfsjk.webp 1280w, /_astro/4_uCJLy.webp 1668w, /_astro/4_OlLcn.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;p&gt;Briefly, the approach works as follows. In the first step, we perform inpainting on a downscaled high-resolution image while applying the original mask. In a second step, we transfer the model output of step one into a higher resolution and perform inpainting again. This time we apply a modified mask containing only small coherent mask regions, for which we exploit the provided higher resolution context information. ‌‌&lt;br&gt;
In more detail, the first step is characterized as the baseline approach, cf. Figure 5: We scale the masked image down to the training resolution of 512x680 pixels and fill up the missing information.&lt;/p&gt;
&lt;p&gt;Optionally, the shrinking-mask-approach can be applied in the first step.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 5: Step 1 of the two-step-approach. A) shows the original image masked by the original mask; B) the intermediate low-resolution inpainting result emerging from step 1&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;709&quot; src=&quot;https://img.ly/_astro/5_PAuNW.webp&quot; srcset=&quot;/_astro/5_ZhyHA5.webp 640w, /_astro/5_ZU0lHo.webp 750w, /_astro/5_1ko4iQ.webp 828w, /_astro/5_2oqGyL.webp 1080w, /_astro/5_Z1TL0dQ.webp 1280w, /_astro/5_htj7.webp 1668w, /_astro/5_PAuNW.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;p&gt;In the second step, we quadruplicate the output of step 1 to a resolution of 1024x1360 pixels. To prevent the resolution loss for unmasked regions caused by this upscaling, we stitch the generated content into the same sized (downscaled) input image. The resulting image serves as the model input for step 2. ‌‌&lt;/p&gt;
&lt;p&gt;To avoid/reduce image artifacts in the subsequent inpainting process, we modify the original mask to contain only the small mask regions and the boundaries of the large mask regions. In detail, we temporarily shrink the mask with an erosion kernel to ablate small mask segments and the marginal areas of larger mask sections, cf. Figure 6B. Finally, we calculate the difference between the original mask and the altered mask, resulting in our desired modified mask, cf. Figure 6C.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 6: Mask modification required for step 2 of the two-step-approach. A) shows the original mask; B) the original mask temporarily shrunken by an erosion kernel; and C) the modified mask containing small mask regions and boundary areas only.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 1946px) 1946px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;1946&quot; height=&quot;536&quot; src=&quot;https://img.ly/_astro/6_Zeoyj0.webp&quot; srcset=&quot;/_astro/6_P3BN5.webp 640w, /_astro/6_Zr5uSS.webp 750w, /_astro/6_ChAk2.webp 828w, /_astro/6_1dIWH7.webp 1080w, /_astro/6_1qfv3T.webp 1280w, /_astro/6_81fGE.webp 1668w, /_astro/6_Zeoyj0.webp 1946w&quot;&gt;&lt;/p&gt;
&lt;p&gt;By re-inpainting, we double the resolution of the generated content for the small contiguous mask regions, cf. Figure 7A bottom right, as well as for the masked boundary areas, cf. Figure 7A upper left. Moreover, through the latter, we achieve smoothing of the intense decay in resolution between the unmasked regions and the generated content arising from step 1. Finally, we scale our image back to the original input resolution and stitch the generated content to the original image to maintain the original resolution for unmasked areas.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Figure 7: Step 2 of the two-step-approach. A) shows the inpainting result of step 1 masked by the modified mask containing only small mask areas and boundary regions; B) final inpainting result arising from step 2&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;717&quot; src=&quot;https://img.ly/_astro/7_3lyip.webp&quot; srcset=&quot;/_astro/7_Z2x2NCb.webp 640w, /_astro/7_QNPLK.webp 750w, /_astro/7_Z1WXR0V.webp 828w, /_astro/7_ZbTJbF.webp 1080w, /_astro/7_Z1nAuf6.webp 1280w, /_astro/7_1GiIWk.webp 1668w, /_astro/7_3lyip.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;For us, it was impressive to see how AI-based inpainting can successfully and deceptively realistic fill in missing information. Not only the consideration of structural (semantic) content is an advantage compared to conventional approaches, but especially the decreased demand on required hardware. In our view, this opens up the opportunity to reach a much larger group of users of inpainting algorithms: in place of using powerful hardware and professional software, mobile devices could achieve small but decisive changes.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Easy removal of distracting objects with inpainting: Example photo of a red squirrel taking a nap in the trees, distracting twig marked, and inpainted result.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 2000px) 2000px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;2000&quot; height=&quot;667&quot; src=&quot;https://img.ly/_astro/8_l_Z1kJHjL.webp&quot; srcset=&quot;/_astro/8_l_1JTG5u.webp 640w, /_astro/8_l_26Omlu.webp 750w, /_astro/8_l_1GpI0f.webp 828w, /_astro/8_l_ZgiRCz.webp 1080w, /_astro/8_l_Z1Fvu52.webp 1280w, /_astro/8_l_ZAIwwN.webp 1668w, /_astro/8_l_Z1kJHjL.webp 2000w&quot;&gt;&lt;/p&gt;
&lt;p&gt;In summary, we have dealt with the application of high-resolution images, which is undoubtedly gaining in importance due to the ever-improving smartphone cameras. Processing high-resolution images entail an increasing number of pixels to “inpaint” and could further lead to quality as well as performance issues. Thus, we decided to improve the output of low-resolution networks and to provide them with more information to support a subsequent upscaling procedure.&lt;br&gt;
We have implemented two different approaches, shrinking-mask and two-step-approach that can be applied independently or in a combined manner. It turned out that both methods subjectively increased the image quality. However, this comes along with higher computational demands, as models are applied multiple times.&lt;br&gt;
Overall, we think that the combination of these two approaches will represent a good toolkit for AI-based high-resolution image inpainting. But we’ll keep an eye on the upcoming scientific developments.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;This project was funded by the European Regional Development Fund (ERDF).&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 449px) 449px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;449&quot; height=&quot;112&quot; src=&quot;https://img.ly/_astro/9_Zv71r7.webp&quot; srcset=&quot;/_astro/9_Zv71r7.webp 449w&quot;&gt;&lt;/p&gt;</content:encoded><dc:creator>Vivien</dc:creator><dc:creator>Philip</dc:creator><dc:creator>Pascal</dc:creator><media:content url="https://blog.img.ly/2020/12/image_inpainting_deeplearning.gif" medium="image"/><category>Deep Learning</category><category>Artificial Intelligence</category><category>Image Editing</category><category>Tutorial</category></item><item><title>On Magic Colors</title><link>https://img.ly/blog/on-magic-colors-d327c6d480db/</link><guid isPermaLink="true">https://img.ly/blog/on-magic-colors-d327c6d480db/</guid><description>As a photographer you are always searching for good light that makes everything shine. You are on a never-ending hunt for some of these ‘special’ photons that you can capture with your lens and on your camera sensor as you press the shutter button.</description><pubDate>Wed, 17 Jul 2019 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In this article, I will show and illuminate some pictures with the most magic colors that I have encountered as a photographer—these kind of colors that stick in your mind forever.&lt;/p&gt;
&lt;p&gt;Apart from my interest in magic colors as a photographer, I am also interested in magic colors as a researcher and developer at img.ly where we push the quality of photo editing in our &lt;a href=&quot;https://img.ly/products/photo-sdk/&quot;&gt;PhotoEditor SDK&lt;/a&gt; further. Therefore, this article will be completed by a follow-up article where I will show how to use post-production to gently push colors over the edge into the realm of magic colors.&lt;/p&gt;
&lt;h2 id=&quot;finding-magiccolors&quot;&gt;Finding Magic Colors&lt;/h2&gt;
&lt;p&gt;The best way to capture magic colors is — ‘simply’ — to be at the right place at the right time. Both place and time are important but often not equally important. I will show one example where the place was more important than the time and one example where it was the other way round.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The magic colors from my first example were hidden in the intense, overwhelming, colorful, and loud city of Marrakech. There you can find a quiet place of refuge: Jardin Majorelle, a beautiful garden created by the French painter Jacques Majorelle.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Example 1 — The bright blue colored former atelier of Jacques Majorelle in Marrakech.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 800px) 800px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;800&quot; height=&quot;800&quot; src=&quot;https://img.ly/_astro/1-Mr15WvdB-HpsgAsFTe3iVw_1vtGrC.webp&quot; srcset=&quot;/_astro/1-Mr15WvdB-HpsgAsFTe3iVw_Z1BNkWj.webp 640w, /_astro/1-Mr15WvdB-HpsgAsFTe3iVw_Z1swDbF.webp 750w, /_astro/1-Mr15WvdB-HpsgAsFTe3iVw_1vtGrC.webp 800w&quot;&gt;&lt;/p&gt;
&lt;p&gt;While it is a very relaxing place for your mind and ears, it is a visually stunning masterpiece. The path through the garden starts with the clean visual appearance of a small bamboo forest with all the green and turquoise tones. It then runs past all kinds of palm trees, cacti, small ponds, and fountains. Finally, it leads to the center of the garden where you are blown away by a striking blue house, Majorelle’s former atelier, overshooting the intense blue African sky.&lt;/p&gt;
&lt;p&gt;The blue is the famous Majorelle blue: an unreal looking cobalt blue, which is almost too strong to look at.&lt;/p&gt;
&lt;p&gt;In the garden, the blue is elaborately complemented with pastel yellow, beige, and turquoise tones, which add a nice contrast and make the blue shine even more.&lt;/p&gt;
&lt;p&gt;Surprisingly, around noon was the best time for this photo as the shown part of the building was in the shades under a roof and only harsh light yielded out all the colors.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The magic colors from my second example could be found on a stroll at some beach in the Netherlands.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Example 2 — Golden hour at the sea.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 800px) 800px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;800&quot; height=&quot;533&quot; src=&quot;https://img.ly/_astro/1-oevAhbKp441HfukIqVQOSQ_moDk.webp&quot; srcset=&quot;/_astro/1-oevAhbKp441HfukIqVQOSQ_22sAew.webp 640w, /_astro/1-oevAhbKp441HfukIqVQOSQ_Z2pW2P3.webp 750w, /_astro/1-oevAhbKp441HfukIqVQOSQ_moDk.webp 800w&quot;&gt;&lt;/p&gt;
&lt;p&gt;After a rainy day, suddenly, the sun broke through the clouds when it was already deep down on the horizon. Then the magic happened, the light passed through the right amount of clouds and haze so that its bluish rays were scattered away and its tone shifted towards orange. All surrounding clouds lit up and the sky began to glow. As a stunning result, everything was bathed in amazing golden light.&lt;/p&gt;
&lt;p&gt;These magic colors only appeared for three minutes during the golden hour. As the clouds moved back in front of the sun the light lost its golden shine. This moment could have easily been missed if I would have stayed at home just because of the rain.&lt;/p&gt;
&lt;h2 id=&quot;creating-magiccolors&quot;&gt;Creating Magic Colors&lt;/h2&gt;
&lt;p&gt;The next possible way to achieve magic colors is to create them by yourself.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Example 3 — Studio shot of the fluorescent mineral wernerite under ultraviolet lighting.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 800px) 800px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;800&quot; height=&quot;533&quot; src=&quot;https://img.ly/_astro/1-Xl17gKTpogwSEIDCKyANNw_GfYJH.webp&quot; srcset=&quot;/_astro/1-Xl17gKTpogwSEIDCKyANNw_Z2lOWt2.webp 640w, /_astro/1-Xl17gKTpogwSEIDCKyANNw_Z1J3rIF.webp 750w, /_astro/1-Xl17gKTpogwSEIDCKyANNw_GfYJH.webp 800w&quot;&gt;&lt;/p&gt;
&lt;p&gt;The magic colors from my third example were created by photographing the fluorescent mineral wernerite under ultraviolet (UV) lighting. I made the lighting with a DIY ultraviolet lamp consisting of a cheap spotlight housing, a cheap but powerful fluorescent tube for water disinfection, and an expensive filter glass. The filter glass only let the desired UV wavelength pass. Without such a filter glass, the colors would appear washed-out as the fluorescent tube not only outputs invisible UV light but also large amounts of visible light.&lt;/p&gt;
&lt;p&gt;The mineral absorbs the UV light and emits visible light instead. The resulting colors looked unreal as the usually boring looking stone now glows all over with a huge intensity. As a bonus, the blue matches that of Majorelle very closely.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;img alt=&quot;Example 4 — Studio shot of a feather from a blue-and-yellow macaw.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; sizes=&quot;(min-width: 800px) 800px, 100vw&quot; data-astro-image=&quot;constrained&quot; data-astro-image-pos=&quot;center&quot; width=&quot;800&quot; height=&quot;533&quot; src=&quot;https://img.ly/_astro/1-PeFyPq8sMl809zS2xLR5Mw_s2fD4.webp&quot; srcset=&quot;/_astro/1-PeFyPq8sMl809zS2xLR5Mw_2u8reg.webp 640w, /_astro/1-PeFyPq8sMl809zS2xLR5Mw_Z1XhbPj.webp 750w, /_astro/1-PeFyPq8sMl809zS2xLR5Mw_s2fD4.webp 800w&quot;&gt;&lt;/p&gt;
&lt;p&gt;The magic colors from my final example were created by photographing a feather from a blue-and-yellow macaw in my home studio with a simple bidirectional light setup. I used a 50 mm Zeiss Pancolar lens from the 80ies with a wide-open aperture of f/1.8, which adds a nice glow to the yellows. The feather was lit from above and below by two flashlights modified with small softboxes. The light was shaded from the background and the shutter speed was set fast enough so that the studio just disappeared into black.&lt;/p&gt;
&lt;p&gt;To keep the color magic on the yellow part of the feather, 20 images with different focuses on the blue part were taken with a wide-open aperture and then stacked to a single image. This created the fast drop-off in sharpness, which would have been impossible to create with a single shot.&lt;/p&gt;
&lt;p&gt;Interestingly, these magic colors could never be seen with the naked eye as they only appeared for approximately 80 μs while the flashes were triggered. Then, they were formed to their final appearance by the lens and the post-processing.&lt;/p&gt;
&lt;h2 id=&quot;what-makes-colorsmagic&quot;&gt;What Makes Colors Magic?&lt;/h2&gt;
&lt;p&gt;To sum up, magic colors can be found in many places or even created by yourself. For the four examples in this article, different aspects were important: the right location for the picture of the blue building in Marrakech, the right time as the sun came through the clouds for the golden sea, the subject and the equipment for the fluorescent colors, and the quality of the light and technical realization for the glowing feather.&lt;/p&gt;
&lt;p&gt;While writing this article I asked myself: What exactly does make colors magic? Is it the saturation, vividness, shininess, or glow? Is it the texture of the object, the quality of the light, or the surrounding colors?&lt;/p&gt;
&lt;p&gt;All of this is important. However, perhaps the most important part to consider colors as magic is that the colors need to look slightly unreal — but only a little bit. What do you think?&lt;/p&gt;
&lt;p&gt;In an upcoming article, I will tell how these thoughts on magic colors influenced the color editing process in our &lt;a href=&quot;https://img.ly/products/photo-sdk/&quot;&gt;PhotoEditor SDK&lt;/a&gt; and share some details on how to gently push colors towards magic colors in post.&lt;/p&gt;</content:encoded><dc:creator>Pascal</dc:creator><media:content url="https://blog.img.ly/downloaded_images/On-Magic-Colors/1-ivMWJ8VqkmXI_TyW8YcyHg.jpeg" medium="image"/><category>Photography</category><category>Design</category><category>Insights</category></item></channel></rss>