OpenAI Models Fine-Tuning
In this example we are going to teach an OpenAI model, how to write MindsDB AI SQL queries :)
All OpenAI models belong to the group of Large Language Models (LLMs). By definition, these are pre-trained on large amounts of data. However, it is possible to fine-tune these models with a task-specific dataset for a defined use case.
OpenAI supports fine-tuning of some of its models, including davinci
, curie
, babbage
, and ada
(more details here). And with MindsDB, you can easily fine-tune an OpenAI model making it more applicable to your specific use case.
Let’s create a model to answer questions about MindsDB’s custom SQL syntax.
First, create an OpenAI engine, passing your OpenAI API key:
CREATE ML_ENGINE openai_engine
FROM openai
USING
api_key = 'your-openai-api-key';
Then, create a model using this engine:
CREATE MODEL openai_davinci
PREDICT completion
USING
engine = 'openai_engine',
model_name = 'davinci',
prompt_template = 'Return a valid SQL string for the following question about MindsDB in-database machine learning: {{prompt}}';
You can check model status with this command:
DESCRIBE openai_davinci;
Once the status is complete, we can query for predictions:
SELECT prompt, completion
FROM openai_davinci as m
WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'
USING max_tokens=400;
On execution, we get:
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| prompt | completion |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | The SQL syntax is: SELECT * FROM input_data INNER JOIN predictions ON input_data.id = predictions.id |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
If you followed one of the MindsDB tutorials before, you’ll see that the syntax provided by the model is not exactly as expected.
Now, we’ll fine-tune our model using a table that stores details about MindsDB’s custom SQL syntax.
Here is a table we’ll use to fine-tune our model:
SELECT prompt, completion
FROM files.openai_learninghub_ft;
And here is its content:
+---------------------------------------------------------------------------------------------------+------------------------------------------+
| prompt | completion |
+---------------------------------------------------------------------------------------------------+------------------------------------------+
| What is the SQL syntax to connect a database to MindsDB? | CREATE DATABASE datasource_name |
| | [WITH] [ENGINE [=] engine_name] [,] |
| | [PARAMETERS [=] { |
| | "key": "value", |
| | ... |
| | }]; |
| | |
| What is the SQL command to create a home rentals MindsDB machine learning model? | CREATE MODEL |
| | mindsdb.home_rentals_model |
| | FROM example_db |
| | (SELECT * FROM demo_data.home_rentals) |
| | PREDICT rental_price; |
| | |
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | SELECT t.column_name, p.column_name, ... |
| | FROM integration_name.table_name [AS] t |
| | JOIN project_name.model_name [AS] p; |
+---------------------------------------------------------------------------------------------------+------------------------------------------+
This is how you can fine-tune an OpenAI model:
FINETUNE openai_davinci
FROM files
(SELECT prompt, completion FROM openai_learninghub_ft);
The FINETUNE
command creates a new version of the openai_davinci
model. You can query all available versions as below:
SELECT *
FROM models_versions
WHERE name = 'openai_davinci';
Once the new version status is complete and active, we can query the model again, expecting a more accurate output.
SELECT prompt, completion
FROM openai_davinci as m
WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'
USING max_tokens=400;
On execution, we get:
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| prompt | completion |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | SELECT * FROM mindsdb.models.my_model JOIN mindsdb.input_data_name; |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+