This is the implementation of the Solr data handler for MindsDB.

Apache Solr is a highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration, and more.

Implementation

This handler is implemented using the sqlalchemy-solr library, which provides a Python/SQLAlchemy interface.

The required arguments to establish a connection are as follows:

  • username is the username used to authenticate with the Solr server. This parameter is optional.
  • password is the password to authenticate the user with the Solr server. This parameter is optional.
  • host is the host name or IP address of the Solr server.
  • port is the port number of the Solr server.
  • server_path defaults to solr if not provided.
  • collection is the Solr Collection name.
  • use_ssl defaults to false if not provided.

If you installed MindsDB locally via pip, you need to install all handler dependencies manually. To do so, go to the handler’s folder (mindsdb/integrations/handlers/solr_handler) and run this command: pip install -r requirements.txt.

Usage

In order to make use of this handler and connect to the Solr database in MindsDB, the following syntax can be used:

CREATE DATABASE solr_datasource
WITH
    engine = 'solr',
    parameters = {
      "username": "demo_user",
      "password": "demo_password",
      "host": "127.0.0.1",
      "port": "8981",
      "server_path": "solr",
      "collection": "gettingstarted",
      "use_ssl": "false"
    };

You can use this established connection to query your table as follows:

SELECT *
FROM solr_datasource.gettingstarted
LIMIT 10000;

Requirements

A Solr instance with a Parallel SQL supported up and running.

There are certain limitations that need to be taken into account when issuing queries to Solr. Refer to https://solr.apache.org/guide/solr/latest/query-guide/sql-query.html#parallel-sql-queries.

Don’t forget to put limit in the end of the SQL statement