Implementation
This handler is implemented usingboto3, the AWS SDK for Python.
The required arguments to establish a connection are as follows:
aws_access_key_idis the AWS access key that identifies the user or IAM role.aws_secret_access_keyis the AWS secret access key that identifies the user or IAM role.region_nameis the AWS region.bucketis the name of the S3 bucket.keyis the key of the object to be queried.input_serializationis the format of the data in the object that is to be queried.
If you installed MindsDB locally via pip, you need to install all handler dependencies manually. To do so, go to the handler’s folder
(mindsdb/integrations/handlers/s3_handler) and run this command:
pip install -r requirements.txt .Usage
In order to make use of this handler and connect to an object in the S3 bucket from MindsDB, the following syntax can be used:Queries to objects in the S3 bucket are issued using S3 Select.This feature is used in
boto3 as described here.The required format of the InputSerialization parameter, which translates to the input_serialization parameter of the handler, is of special importance, as it describes how to specify the format of the data in the object that is to be queried.At the moment, S3 Select does not allow multiple files to be queried. Therefore, queries can be issued only to the object that is passed in the key parameter. This object should always be referred to as an S3Object when writing queries as shown in the example above.