Major InnovationsTrends

The Power of Image Recognition in Fashion: How FashionDNA Enhances Shopping Experiences

FashionDNA, our AI-powered in-house technology, analyses visual details from millions of fashion items to drive innovation in personalised recommendations, helping shoppers find the perfect match.

THUMB FASHION DNA 01 V02
12mins

It all started with a dress, but not just any dress — the dress. Jennifer Lopez, 2000 Grammys, emerald green Versace. You know the one. It was so iconic that it practically broke the internet and inspired engineers to create Google Image Search. That's right, fashion lit the spark that led to one of the most transformative digital tools we use today. Now, fast-forward to 2017, and the evolution of visual search got another boost when Google launched Google Lens, allowing us to search for anything using just an image. Around that same time, Zalando quietly worked on its own game-changer: FashionDNA.

Built in-house and operating on a truly massive scale, FashionDNA analyses over 100 million fashion images, a proprietary data set that can’t be found or scraped from anywhere else. Although FashionDNA, as in-house software, doesn’t share the worldwide availability of Google Lens, its impact on the e-commerce world is no less impressive. It’s a powerful computer vision model built specifically for fashion, designed to "see" what makes an item unique – everything from colour and silhouette to texture and patterns. By taking multiple product images, FashionDNA compresses all that visual information into a compact "fingerprint," enabling the model to accurately identify, compare and recommend items.

Here’s the best part: this article isn’t just a peek behind the curtain. It’s a full backstage tour led by the scientists & engineers who built FashionDNA — Christian Bracher, Applied Science Manager, and Sebastian Heinz, Senior Applied Scientist. Get ready for a deep dive into the inner workings of one of the most innovative tools in fashion tech today.

Fashion DNA

We will let you guess the rule we used to sort dresses on this image with the help of FashionDNA. 

Turning fashion items into unique digital fingerprints

The FashionDNA model is a multi-image vision encoder that processes several images of the same fashion item to produce a unique numeric fingerprint (128 floating-point numbers), known as an embedding vector. This vector captures the essence of a fashion item in an abstract "fashion space." Essentially, embeddings serve as a form of lossy compression, where input images with millions of pixels are distilled into just a few numbers. FashionDNA’s embedding space has been carefully engineered to include highly practical properties:

  • Embeddings of visually or functionally similar items are positioned near each other in the space.

  • Key visual features relevant to customers—such as a striped pattern, the colour red, or classifications like sportswear and casual—align with specific directions within the space.

The figure above illustrates how dresses can be mapped into a two-dimensional subspace defined by axes that capture colour (from red to blue) and pattern (from stripes to dots). Together, these three attributes—compression, similarity, and directional projection—make FashionDNA highly effective for a range of downstream applications.

FashionDNA: Zalando's foundational AI for transforming fashion insights

To understand why we have developed and refined FashionDNA over the past 9 years, it helps to compare it to Large Language Models (LLMs) like GPT-4. LLMs are foundational models that can be reused across multiple domains, thanks to a few key qualities:

  • Pre-trained on diverse data: LLMs are trained on vast and varied text data, allowing them to capture a broad spectrum of linguistic patterns, contextual knowledge and nuance, which enables their adaptability across different tasks.

  • Transfer learning: LLMs utilise transfer learning, where insights gained from pre-training on one domain can be applied to new tasks. This approach minimises the need for extensive data and computational resources in task-specific fine-tuning.

  • Versatility: LLMs can be fine-tuned to serve a range of applications, from language translation and text summarisation to code generation and chatbot functionalities, making them highly adaptable.

  • Accessibility: Many LLMs are accessible through simple APIs, allowing developers to integrate advanced language features into their applications without the burden of training or maintaining the models, which improves their reusability.

These aspects make FashionDNA a powerful, reusable vision AI infrastructure:

  • Pre-trained on a large dataset: FashionDNA is initially trained on billions of images, then fine-tuned with about 100 million curated image-product attribute pairs from internal datasets and numerous publicly available fashion images, ensuring both high quality and broad coverage.

  • Transfer learning: Beyond attribute prediction, the model is trained to identify identical fashion items across images using contrastive learning, which drives the model to produce robust embeddings independent of image context.

  • Versatility: FashionDNA powers a broad range of applications, including customer-facing tools like personalised product rankings, outfit suggestions, alternative item recommendations and tailored marketing. It also supports business tools like targeted campaign marketing, visual search for in-stream TV shopping and customer clustering, while enhancing internal tasks like size flagging, seasonality forecasts, duplicate detection, brand authenticity checks and automated tagging.

  • Accessibility: FashionDNA embeddings are served in real-time as new items are onboarded to the Zalando store and are available company-wide through our streaming and data infrastructure.

  • Easy integration: FashionDNA abstracts the complexity of working with image data, allowing ML engineers to focus on their specific training and serving infrastructure requirements.

By centralising computer vision expertise, FashionDNA has introduced significant efficiency gains at Zalando, accelerating task-specific AI development and reducing costs.

From size prediction to targeted marketing

FashionDNA is extensively utilised across the Zalando Group, including in the Fashion Store, Offprice and Zalando Marketing Services, among others. Since its inception nine years ago, a steady stream of new use cases has emerged, continually validating the original concept's value.

Ranking Platform

The system is designed to enhance personalisation in Zalando's browse and search functionalities. It consists of two key components: the Candidate Generation Layer and the Ranking Layer.

The Candidate Generation Layer uses an AI model to filter the catalogue, selecting a set of candidate items most relevant to the user, utilising a two-tower system (user and fashion item towers) and features like FashionDNA and user session data. The Ranking Layer is a transformer-based model that ranks these candidates on the fly by considering user interactions, context and rich item features in the form of FashionDNA embeddings, generating relevance scores based on user and item representations. Relevance scores are then used to rank the items in order of relevance for a particular user or query.

SizeNet

SizeNet is a computer vision-based algorithm developed by the Size & Fit team that can predict potential size issues in articles given their images, where image information gets inserted through the use of FashionDNA. By relying on article images, it can produce a prediction at a very early stage of the article’s life cycle even before the first sales and returns are recorded.

SizeNet is used in production before Size Flags, a product that offers article-based size advice to customers based on sales and returns data. As a consequence, Size Flags are produced significantly more efficiently and, thus, can be raised much earlier in the article’s life cycle, leading to a decrease of 5.4 returns per article on average.

Article seasonality prediction

Understanding customer seasonal purchasing patterns is crucial for assortment planning & steering and inventory management to meet customer demand effectively. Zalando developed an Article Seasonality data product that generates seasonality insights based on different kinds of article-level data such as article attributes and FashionDNA. The data product predicts the seasonality pattern for every newly activated article and recommends the timings of the customer demand start, peak and end. The demand timing insights generated fulfil and support use cases and other data products across the assortment lifecycle management, such as assortment planning, delivery timing recommendations and replenishment recommendations, etc.

Personalised sponsored content

Zalando Marketing Services (ZMS) connects brands to consumers on the Zalando platform. By combining data-driven insights, marketing solutions, and content expertise, ZMS covers the entire marketing funnel—from awareness to conversion. We aim to deliver the right ad to the right user at the right time, ensuring relevance to the customer's intent, context, and history. Sponsored recommendations also consider campaign goals and target audiences, with the ad marketplace determining the best fit based on campaign budgets and contextual valuations.

FashionDNA enhances audience targeting by encoding product appearance and assessing visual similarity, improving recommendation accuracy for both customers and advertisers. For example, Reebok's campaign with ZMS significantly boosted brand image and purchase intent, achieving 25 million viewable impressions and 0.5 million product page views, with lookalike products selected using the FashionDNA algorithm.

In-stream shopping

During spring 2024, RTL and Zalando Studios joined forces in a pilot program to offer the dedicated fans of RTL’s cult soap opera ‘Gute Zeiten, Schlechte Zeiten’ (GZSZ) a brand-new service — shopping for fashion products directly related to the content in the episodes while streaming the series on selected devices.  The TV station conveyed images of the costumes featured in the show to Zalando, where fashion experts compared the pictures to the apparel and footwear in the Zalando catalogue and provided links to matching articles in the German fashion store back to RTL.  With hundreds of thousands of in-stock items, this is a challenging task — our experts turned to FashionDNA to analyse the provided images and return a dozen suitable candidates within mere seconds for review.  In the future, we can even envisage a fully automated process that extracts matching Zalando articles directly from the video stream.

Leveraging Zalando’s rich fashion data for AI model training

Successfully training a fashion foundational model rests on a large amount of high-quality fashion data. Zalando is in a unique position due to its comprehensive assortment and integrated in-house media production at Zalando Studios, which features photography, post-processing, labelling and quality control in a unified process. The resulting package comprises a collection of high-resolution images of the fashion item in multiple studio settings, including article, model and detailed shots, complemented by a detailed list of curated attribute labels and descriptions, as well as identifying data such as the European Article Number (EAN), a harmonised product code. EANs are provided per size, meaning most fashion items have multiple EANs. We also have confounder labels, which are unrelated to the fashion data points you would ordinarily expect. We will discuss later how we use this information to our advantage.

ZEOS 311024 FASHIONDNA INFO 05

Although Zalando has a large amount of high-quality fashion data, it intentionally comes from a narrow distribution. Zalando Studios has strict guidelines on lighting conditions, model pose, camera angles and outfit styling details, among others. To train a robust AI model, it is necessary to add more broadly distributed data. For this purpose, we use publicly available fashion images in their HTML context. From this unstructured data, we extract product identifiers allowing us to match products across multiple image sources.

Technical details

In technical terms, FashionDNA is visual information extracted from a deep neural network that encodes an image collection of a fashion item. To train the FashionDNA model, we define three generic training objectives:

  • A binary cross entropy (BCE) loss, independently predicting more than 15,000 individual attribute labels (brands, silhouettes, functions, patterns, etc.)

  • A contrastive loss that pulls embeddings of image samples of the same fashion item closer, while it pushes apart embeddings of different fashion items

  • An adversarial loss to reduce the confounding factor of the image information source (Zalando Studios, Zalando partners, web domains), where each source shows images of its own style.

ZEOS 311024 FASHIONDNA INFO 01 (1)

High-level diagram of FashionDNA training, gradients are backpropagated as usual for all losses except gradients from the classification head of the confounder label are reversed and rescaled. 

The encoder is a ConvNeXt pre-trained fully convolutional neural network with 300 million parameters used as a siamese network to encode partitioned image collections in parallel. The encoder outputs are fused by a single self-attention head and consecutive average pooling. The resulting 1,024-dimensional output vectors are used to calculate the contrastive loss between image sets. An additional linear layer is attached to compress the activations into 128-dimensional vectors, the FashionDNA embeddings. We attach linear classification heads with binary cross entropy loss both for article attributes as well as for confounder labels.

During training, we use image-level augmentation (random pixel and colour space transformations, blurring and compression) and collection-level augmentations (random sampling from the collection).

The evolution of FashionDNA

Each model iteration reflects the advances in computer vision models, the increase of computational power, and the growth of Zalando’s curated data catalogue, enabling new capabilities and improving existing ones.

ZEOS 311024 FASHIONDNA INFO 02

Training setup for the three FashionDNA versions. 

As a result, model performance grew rapidly, see comparative test results below, where we measure the utility of the embeddings on two typical downstream tasks. Classification accuracy (left) indicates how well the FashionDNA model is able to assign an attribute like brand to an article from product images alone, while retrieval (right) informs about the ability of FashionDNA to identify a Zalando article among a set of some 70,000 external images. Remarkably, the most recent model (v3) succeeds in both tasks in three out of four cases.

ZEOS 311024 FASHIONDNA INFO 03

The evolution of FashionDNA model performance on our test set was evaluated for classification and for retrieval (image search). 

The next generation of fashion

In an era where fashion is not just seen but experienced, Zalando's FashionDNA is leading the charge in transforming how consumers interact with style. By leveraging cutting-edge technology to create unique digital fingerprints for each item, Zalando is redefining e-commerce, making shopping more intuitive and personalised than ever before. As we continue to innovate, FashionDNA enhances customer experiences by delivering tailored recommendations, ensuring customers find exactly what they're looking for while reducing decision fatigue. It also streamlines operations through precise inventory management. For instance, our Article Seasonality Prediction reduces excess stock, while accurate size predictions through SizeNet lead to a decrease in returns. Join us as we explore new frontiers and redefine the way the world shops for fashion.

All the news. None of the fluff.

Get the best insights right in your inbox. Drop in your email address and we’ll ship our next newsletter the moment it comes out.

 

You can read more about how we handle your data in our Privacy Policy.

Newsletter image 2

Learn more about ZEOS

Maximum business. Minimum fuss. Experience an end-to-end solution that fulfils your multi-channel sales in one place.

Learn more