This is the implementation of the Google Sheets data handler for MindsDB.

Google Sheets is a spreadsheet program included as a part of the free, web-based Google Docs Editors suite offered by Google.

Please note that the integration of MindsDB with Google Sheets works for public sheets only.

Implementation

This handler is implemented using duckdb, a library that allows SQL queries to be executed on pandas DataFrames.

In essence, when querying a particular sheet, the entire sheet is first pulled into a pandas DataFrame using the Google Visualization API. Once this is done, SQL queries can be run on the DataFrame using duckdb.

Since the entire sheet needs to be pulled into memory first (DataFrame), it is recommended to be somewhat careful when querying large datasets so as not to overload your machine.

The required arguments to establish a connection are as follows:

  • spreadsheet_id is the unique ID of the Google Sheet.
  • sheet_name is the name of the sheet within the Google Sheet.

If you installed MindsDB locally via pip, you need to install all handler dependencies manually. To do so, go to the handler’s folder (mindsdb/integrations/handlers/sheets_handler) and run this command: pip install -r requirements.txt.

Usage

In order to make use of this handler and connect to a Google Sheet in MindsDB, the following syntax can be used:

CREATE DATABASE sheets_datasource
WITH
    engine = 'sheets',
    parameters = {
      "spreadsheet_id": "12wgS-1KJ9ymUM-6VYzQ0nJYGitONxay7cMKLnEE2_d0",
      "sheet_name": "iris"
    };

You can use this established connection to query your table as follows:

SELECT *
FROM sheets_datasource.example_tbl;

The name of the table will be the name of the relevant sheet, provided as an input to the sheet_name parameter.

At the moment, only the SELECT statemet is allowed to be executed through duckdb. This, however, has no restriction on running machine learning algorithms against your data in Google Sheets using the CREATE PREDICTOR statement.