Fashion-MNIST Embedding Visualization


A Convolutional Neural Network (CNN) is a deep learning model that is designed for image analysis. In this type of network, convolutional layers are used to detect features such as edges, textures, and patterns, while pooling layers are applied to reduce spatial dimensions and preserve essential information. In this context, the CNN is employed to classify images from the Fashion‑MNIST dataset into ten clothing categories: T‑shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Boot.


During the classification stage, the learned embedding is passed to the final dense (fully connected) layer and softmax classifier, where it is converted into prediction probabilities corresponding to each clothing category. In this process, the mapping from each embedding to a specific class label is learned by the classifier. For visualization purposes, the embeddings obtained just before the final classification layer (that is, before they are transformed into class scores) are used. Each embedding represents how the network perceives the image within the feature space. By applying a dimensionality‑reduction technique such as t‑SNE, the 512‑dimensional vectors are projected into a 3D space.


Diagram of CNN architecture for Fashion-MNIST embedding visualization
Point Image

Petar Bulovic personal website