Multi-Modal Image/Text

This example demonstrates how users can search for images by providing either image or text as an input. Underneath the scenes, a new type of database, a Vector Database, is utilized that is capable of storing embeddings and performing vector similarity searches.

The images in the dataset are pulled from Amazon and were added to the vector database without any description or information about the image. CLIP is used as an encoder.

CLIP is a neural network that can be used to encode images and text into a vector space. The vectors are then stored in the vector database. The vector database is capable of performing vector similarity searches. This means that the vector database can find images that are similar to a given image or text.

This is just one example of how Vector Databases can be used. Any data that can be transformed into a vector can be stored in a vector database and queried using vector similarity searches. Some other posibilities are:

Semantic-based information retrieval - e.g. searching for a question in a database of previously defined questions returns the most relevant answer, even if the question is not exactly the same
Classifying images or text - e.g. auto tagging and labeling
Providing a recommendations engine.

Either enter text and press the 'search with text ' button -or- upload an image and press the 'search with image'.