In this section, we present how to connect ChromaDB to MindsDB.

ChromaDB is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.

Connection

This handler is implemented using the chromadb Python library.

The required arguments to establish a connection are:

  • host: the host name or IP address of the ChromaDB instance.
  • port: the TCP/IP port of the ChromaDB instance.

OR

  • persist_directory: the directory to use for persisting data.

The host and port arguments should be provided if you want to connect to a remote ChromaDB instance. Otherwise, the persist_directory argument should be provided. This will create an in-memory ChromaDB instance.

To connect to a remote ChromaDB instance, the following CREATE DATABASE can be used:

CREATE DATABASE chromadb_datasource
WITH ENGINE = 'chromadb'
PARAMETERS = {
    "host": "YOUR_HOST",
    "port": YOUR_PORT
}

Alternateively, to connect to an in-memory ChromaDB instance, the following CREATE DATABASE can be used:

CREATE DATABASE chromadb_datasource
WITH ENGINE = "chromadb",
PARAMETERS = {
    "persist_directory": "YOUR_PERSIST_DIRECTORY"
}

Usage

Now, you can use the established connection to create a collection (or table in the context of MindsDB) in ChromaDB and insert data into it:

CREATE TABLE chromadb_datasource.test_embeddings (
    SELECT embeddings,'{"source": "fda"}' as metadata
    FROM mysql_datasource.test_embeddings
);

mysql_datasource is another MindsDB data source that has been created by connecting to a MySQL database. The test_embeddings table in the mysql_datasource data source contains the embeddings that we want to store in ChromaDB.

You can query your collection (table) as shown below:

SELECT * 
FROM chromadb_datasource.test_embeddings;

To filter the data in your collection (table) by metadata, you can use the following query:

SELECT * 
FROM chromadb_datasource.test_embeddings
WHERE `metadata.source` = "fda";

To conduct a similarity search, the following query can be used:

SELECT *
FROM chromadb_datasource.test_embeddings
WHERE search_vector = (
    SELECT embeddings
    FROM mysql_datasource.test_embeddings
    LIMIT 1
);