ChromaDB
In this section, we present how to connect ChromaDB to MindsDB.
ChromaDB is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.
Connection
This handler is implemented using the chromadb
Python library.
The required arguments to establish a connection are:
host
: the host name or IP address of the ChromaDB instance.port
: the TCP/IP port of the ChromaDB instance.
OR
persist_directory
: the directory to use for persisting data.
The host
and port
arguments should be provided if you want to connect to a remote ChromaDB instance. Otherwise, the persist_directory
argument should be provided. This will create an in-memory ChromaDB instance.
To connect to a remote ChromaDB instance, the following CREATE DATABASE can be used:
CREATE DATABASE chromadb_datasource
WITH ENGINE = 'chromadb'
PARAMETERS = {
"host": "YOUR_HOST",
"port": YOUR_PORT
}
Alternateively, to connect to an in-memory ChromaDB instance, the following CREATE DATABASE can be used:
CREATE DATABASE chromadb_datasource
WITH ENGINE = "chromadb",
PARAMETERS = {
"persist_directory": "YOUR_PERSIST_DIRECTORY"
}
Usage
Now, you can use the established connection to create a collection (or table in the context of MindsDB) in ChromaDB and insert data into it:
CREATE TABLE chromadb_datasource.test_embeddings (
SELECT embeddings,'{"source": "fda"}' as metadata
FROM mysql_datasource.test_embeddings
);
mysql_datasource
is another MindsDB data source that has been created by connecting to a MySQL database. The test_embeddings
table in the mysql_datasource
data source contains the embeddings that we want to store in ChromaDB.
You can query your collection (table) as shown below:
SELECT *
FROM chromadb_datasource.test_embeddings;
To filter the data in your collection (table) by metadata, you can use the following query:
SELECT *
FROM chromadb_datasource.test_embeddings
WHERE `metadata.source` = "fda";
To conduct a similarity search, the following query can be used:
SELECT *
FROM chromadb_datasource.test_embeddings
WHERE search_vector = (
SELECT embeddings
FROM mysql_datasource.test_embeddings
LIMIT 1
);