BaseMLEngine
class, you can connect with the machine learning library or framework of your choice.
__init__()
method, there are five methods, of which two must be implemented. We recommend checking actual examples in the codebase to get an idea of what goes into each of these methods, as they can change a bit depending on the nature of the system being integrated.
Let’s review the purpose of each method.
Method | Purpose |
---|---|
create() | It creates a model inside the engine registry. |
predict() | It calls a model and returns prediction data. |
update() | Optional. It updates an existing model without resetting its internal structure. |
describe() | Optional. It provides global model insights. |
create_engine() | Optional. It connects with external sources, such as REST API. |
mindsdb.integrations.libs.utils
library, contributors can find various methods that may be useful while implementing new handlers.Also, there is a wrapper class for the BaseMLEngine
instances called BaseMLEngineExec. It is automatically deployed to take care of modifying the data responses into something that can be used alongside data handlers.TPOT
. Its high-level API exposes classes that are either for classification or regression. But as a handler designer, you need to ensure that arbitrary ML tasks are dispatched properly to each class (i.e., not using a regressor for a classification problem and vice versa). First, type_infer
can help you by estimating the data type of the target variable (so you immediately know what class to use). Additionally, to quickly get a stratified train-test split, you can leverage dataprep_ml
splitters and continue to focus on the actual usage of TPOT for the training and inference logic.Step 1: Set up and run MindsDB locally
Step 2: Write a (failing) test for your new handler
python -m pytest tests/unit/ml_handlers/
. If you get the ModuleNotFoundError
error, try adding the __init__.py
file to any subdirectory that doesn’t have it.
USING engine={HandlerName}
.
Can't find integration_record for handler ...
.
Step 3: Add your handler to the source code
mindsdb/integrations/handlers/
. You must name the new directory {HandlerName}_handler/
.
.py
files from the StatsForecast handler folder. These are: __about__.py
, __init__.py
, setup.py
, and statsforecast_handler.py
.
.py
files to match your new handler. Also, change the name of the statsforecast_handler.py
file to match your handler.
requirements.txt
file to install your handler’s dependencies. You may get conflicts with other packages like Lightwood, but you can ignore them for now.
{HandlerName}_handler.py
file. Like for other handlers, this should be a subclass of the BaseMLEngine
class.
tests/unit/executor_test_base.py
file starting at line 91, you can see how other handlers are added with db.session.add(...)
. Copy that and modify it to add your handler. Please note to add your handler before Lightwood, otherwise the CI will break.
Step 4: Modify the handler source code until your test passes
create()
method that deals with the model setup arguments. This will add your handler to the models table. Depending on the framework, you may also train the model here using the df
argument.
create
method. This allows them to be accessed later. Use the engine_storage
attributes; you can find examples in other handlers’ folders.
predict()
method that makes model predictions. This method must return a dataframe with format matching the input, except with a column containing your model’s predictions of the target. The input df is a subset of the original df with the rows determined by the conditions in the predict SQL query.
create()
and predict()
methods with the print()
statement because they’re inside a subthread. Instead, write relevant info to disk.
Step 5: QA your handler locally
python -m mindsdb
. Again, any issues will appear in the terminal output.
SELECT * from information_schema.handlers
.
Predict Home Rental Prices
. And for time series data, this is Forecast Quarterly House Sales
. Specify USING ENGINE={your_handler}
while creating a model.
create()
and predict()
methods with the print()
statement because they’re inside a subthread. Instead, write relevant info to disk.
Step 6: Open a pull request
.github/workflows/mindsdb.yml
.
pytest
is the recommended testing package. Use pytest
to confirm your ML handler implementation is correct.