OpenAI Models Fine-Tuning

In this example we are going to teach an OpenAI model, how to write MindsDB AI SQL queries :) All OpenAI models belong to the group of Large Language Models (LLMs). By definition, these are pre-trained on large amounts of data. However, it is possible to fine-tune these models with a task-specific dataset for a defined use case.

OpenAI supports fine-tuning of some of its models, including davinci, curie, babbage, and ada (more details here). And with MindsDB, you can easily fine-tune an OpenAI model making it more applicable to your specific use case.

Let’s create a model to answer questions about MindsDB’s custom SQL syntax. First, create an OpenAI engine, passing your OpenAI API key:

CREATE ML_ENGINE openai_engine
FROM openai
USING
    api_key = 'your-openai-api-key';

Then, create a model using this engine:

CREATE MODEL openai_davinci
PREDICT completion
USING
    engine = 'openai_engine',
    model_name = 'davinci',
    prompt_template = 'Return a valid SQL string for the following question about MindsDB in-database machine learning: {{prompt}}';

You can check model status with this command:

DESCRIBE openai_davinci;

Once the status is complete, we can query for predictions:

SELECT prompt, completion
FROM openai_davinci as m
WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'
USING max_tokens=400;

On execution, we get:

+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| prompt                                                                                            | completion                                                                                           |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | The SQL syntax is: SELECT * FROM input_data INNER JOIN predictions ON input_data.id = predictions.id |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+

If you followed one of the MindsDB tutorials before, you’ll see that the syntax provided by the model is not exactly as expected. Now, we’ll fine-tune our model using a table that stores details about MindsDB’s custom SQL syntax.

Here is a table we’ll use to fine-tune our model:

SELECT prompt, completion
FROM files.openai_learninghub_ft;

And here is its content:

+---------------------------------------------------------------------------------------------------+------------------------------------------+
| prompt                                                                                            | completion                               |
+---------------------------------------------------------------------------------------------------+------------------------------------------+
| What is the SQL syntax to connect a database to MindsDB?                                          | CREATE DATABASE datasource_name          |
|                                                                                                   | [WITH] [ENGINE [=] engine_name] [,]      |
|                                                                                                   | [PARAMETERS [=] {                        |
|                                                                                                   |    "key": "value",                       |
|                                                                                                   |    ...                                   |
|                                                                                                   |    }];                                   |
|                                                                                                   |                                          |
| What is the SQL command to create a home rentals MindsDB machine learning model?                  | CREATE MODEL                             |
|                                                                                                   |   mindsdb.home_rentals_model             |
|                                                                                                   | FROM example_db                          |
|                                                                                                   |   (SELECT * FROM demo_data.home_rentals) |
|                                                                                                   | PREDICT rental_price;                    |
|                                                                                                   |                                          |
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | SELECT t.column_name, p.column_name, ... |
|                                                                                                   | FROM integration_name.table_name [AS] t  |
|                                                                                                   | JOIN project_name.model_name [AS] p;     |
+---------------------------------------------------------------------------------------------------+------------------------------------------+

This is how you can fine-tune an OpenAI model:

FINETUNE openai_davinci
FROM files
    (SELECT prompt, completion FROM openai_learninghub_ft);

The FINETUNE command creates a new version of the openai_davinci model. You can query all available versions as below:

SELECT *
FROM models_versions
WHERE name = 'openai_davinci';

Once the new version status is complete and active, we can query the model again, expecting a more accurate output.

SELECT prompt, completion
FROM openai_davinci as m
WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'
USING max_tokens=400;

On execution, we get:

+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| prompt                                                                                            | completion                                                                                           |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | SELECT * FROM mindsdb.models.my_model JOIN mindsdb.input_data_name;                                  |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+

Overview

Concepts

Use Cases

OpenAI Models Fine-Tuning