tabsdata.S3Source#

class S3Source(uri: str | List[str], credentials: dict | S3Credentials, format: str | dict | FileFormat = None, initial_last_modified: str | datetime = None, region: str = None)[source]#

Bases: Input

Class for managing the configuration of S3-file-based data inputs.

format#

The format of the file. If not provided, it will be inferred from the file extension of the data.

Type:

FileFormat

uri#

The URI of the files with format: ‘s3://path/to/files’. It can be a single URI or a list of URIs.

Type:

str | List[str]

credentials#

The credentials required to access the S3 bucket.

Type:

S3Credentials

initial_last_modified#

If provided, only the files modified after this date and time will be considered.

Type:

str | datetime.datetime

to_dict()[source]#

Converts the S3Source object to a dictionary.

__init__(uri: str | List[str], credentials: dict | S3Credentials, format: str | dict | FileFormat = None, initial_last_modified: str | datetime = None, region: str = None)[source]#
Initializes the S3Source with the given URI and the credentials required to

access the S3 bucket, and optionally a format and date and time after which the files were modified.

Parameters:
  • uri (str | List[str]) – The URI of the files with format: ‘s3://path/to/files’. It can be a single URI or a list of URIs.

  • credentials (dict | S3Credentials) – The credentials required to access the S3 bucket. Can be a dictionary or a S3Credentials object.

  • format (str | dict | FileFormat, optional) – The format of the file. If not provided, it will be inferred from the file extension of the data. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’, ‘jsonl’ and ‘log’.

  • initial_last_modified (str | datetime.datetime, optional) – If provided, only the files modified after this date and time will be considered. The date and time can be provided as a string in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601) or as a datetime object. If no timezone is provided, UTC will be assumed.

  • region (str, optional) – The region where the S3 bucket is located. If not provided, the default AWS region will be used.

Raises:
  • InputConfigurationError

  • FormatConfigurationError

Methods

__init__(uri, credentials[, format, ...])

Initializes the S3Source with the given URI and the credentials required to

Attributes

credentials

The credentials required to access the S3 bucket.

format

The format of the file.

initial_last_modified

The date and time after which the files were modified.

region

The region where the S3 bucket is located.

uri

's3://path/to/files'.