I Built My Own Fashion Dataset For Deep Learning

52,000 labeled fashion images from 10 categories

Amir Ali Hashemi

--

Image by author

Link to the dataset (including the scripts used)

When it comes to computer vision, there are two famous image datasets; handwritten digits-MNIST and fashion-MNIST. The latter dataset is provided by Zalando, one of the largest online clothing retailers, and consists of images of ten different clothing categories.

The main issue with the fashion-MNIST dataset is that it is widely acknowledged to be way too easy, overused, and unable to reflect modern computer vision challenges.

With that being the case, I decided to take advantage of this opportunity and build my own fashion dataset similar to fashion-MNIST but slightly more advanced. I wrote a web scraper script that crawled www.zalando.com and extracted the URL of images from 10 different categories as written below.

CATEGORIES = [Jacket, Pants, Jeans, Shorts, T-shirt, Pullover, Bag, Cap, Sandal, Skirt]

The URLs of the images were later converted into 28 by 28-pixel images through another script. The conversion process was the most time-consuming part of this project and took 7 days to complete. Here you can see a few examples of the dataset:

Image by author

What makes this dataset more advanced compared to other fashion datasets, is that a lot of the images in the dataset contain the human model wearing the corresponding cloth. This indeed makes it more realistic and allows you to solve modern computer vision challenges.

Depending on what you are trying to solve, you could possibly merge this dataset with the fashion-MNIST dataset and have over 100K image samples.

In the future, the plan is to collect more images with different labels from Zalando (or other clothing websites) and combine them with the existing dataset. If you are interested in helping to do so, you can find the image extraction & conversion scripts in the Kaggle dataset and contribute to the project.

Image by author

--

--

Amir Ali Hashemi

I'm an AI student who attempts to find simple explanations for questions and share them with others