For Publishers#
Local File Source#
- class LocalFileSource(path: str | List[str], format: str | dict | FileFormat = None, initial_last_modified: str | datetime = None)[source]#
Bases:
Input
- Initializes the LocalFileSource with the given path, and optionally a format and
a date and time after which the files were modified.
- Parameters:
path – The path where the files can be found. It can be a single path or a list of paths.
format – (optional) The format of the file. If not provided, it will be inferred from the file extension of the data. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’, ‘jsonl’ and ‘log’.
initial_last_modified – (optional) If provided, only the files modified after this date and time will be considered. The date and time can be provided as a string in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601) or as a datetime object. If no timezone is provided, UTC will be assumed.
- Raises:
InputConfigurationError –
FormatConfigurationError –
- property format: FileFormat#
The format of the file or files. If not provided, it will be inferred from the file extension in the path.
- Type:
S3 Source#
- class S3Source(uri: str | List[str], credentials: dict | S3Credentials, format: str | dict | FileFormat = None, initial_last_modified: str | datetime = None, region: str = None)[source]#
Bases:
Input
- Initializes the S3Source with the given URI and the credentials required to
access the S3 bucket, and optionally a format and date and time after which the files were modified.
- Parameters:
uri – The URI of the files with format: ‘s3://path/to/files’. It can be a single URI or a list of URIs.
credentials – The credentials required to access the S3 bucket. Can be a dictionary or a S3Credentials object.
format – (optional) The format of the file. If not provided, it will be inferred from the file extension of the data. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’, ‘jsonl’ and ‘log’.
initial_last_modified – (optional) If provided, only the files modified after this date and time will be considered. The date and time can be provided as a string in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601) or as a datetime object. If no timezone is provided, UTC will be assumed.
region – (optional) The region where the S3 bucket is located. If not provided, the default AWS region will be used.
- Raises:
InputConfigurationError –
FormatConfigurationError –
- property credentials: S3Credentials#
The credentials required to access the S3 bucket.
- Type:
S3Credentials
- property format: FileFormat#
The format of the file. If not provided, it will be inferred from the file.
- Type:
Azure Source#
- class AzureSource(uri: str | List[str], credentials: dict | AzureCredentials, format: str | dict | FileFormat = None, initial_last_modified: str | datetime = None)[source]#
Bases:
Input
- Initializes the AzureSource with the given URI and the credentials required to
access Azure, and optionally a format and date and time after which the files were modified.
- Parameters:
uri – The URI of the files with format: ‘az://path/to/files’. It can be a single URI or a list of URIs.
credentials – The credentials required to access Azure. Can be a dictionary or a AzureCredentials object.
format – (optional) The format of the file. If not provided, it will be inferred from the file extension of the data. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’, ‘jsonl’ and ‘log’.
initial_last_modified – (optional) If provided, only the files modified after this date and time will be considered. The date and time can be provided as a string in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601) or as a datetime object. If no timezone is provided, UTC will be assumed.
- Raises:
InputConfigurationError –
FormatConfigurationError –
- property credentials: AzureCredentials#
The credentials required to access Azure.
- Type:
AzureCredentials
- property format: FileFormat#
The format of the file. If not provided, it will be inferred from the file extension of the data.
- Type:
MySQL Source#
- class MySQLSource(uri: str, query: str | List[str], credentials: dict | UserPasswordCredentials | None = None, initial_values: dict | None = None)[source]#
Bases:
Input
- Initializes the MySQLSource with the given URI and query, and optionally
connection credentials and initial values for the parameters in the SQL queries.
- Parameters:
uri – The URI of the database where the data is located
query – The SQL query(s) to execute. If multiple queries are provided, they must be provided as a list, and they will be mapped to the function inputs in the same order as they are defined.
credentials – (optional) The credentials required to access the MySQL database. Can be a dictionary or a UserPasswordCredentials object.
initial_values – (optional) The initial values for the parameters in the SQL queries.
- Raises:
InputConfigurationError –
- property credentials: UserPasswordCredentials | None#
The credentials required to access the MySQLDatabase. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
MariaDB Source#
- class MariaDBSource(uri: str, query: str | List[str], credentials: dict | UserPasswordCredentials | None = None, initial_values: dict | None = None)[source]#
Bases:
Input
- Initializes the MariaDBSource with the given URI and query, and optionally
connection credentials and initial values for the parameters in the SQL queries.
- Parameters:
uri – The URI of the database where the data is located
query – The SQL query(s) to execute. If multiple queries are provided, they must be provided as a list, and they will be mapped to the function inputs in the same order as they are defined.
credentials – (optional) The credentials required to access the MariaDB database. Can be a dictionary or a UserPasswordCredentials object.
initial_values – (optional) The initial values for the parameters in the SQL queries.
- Raises:
InputConfigurationError –
- property credentials: UserPasswordCredentials | None#
The credentials required to access MariaDB. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
Postgres Source#
- class PostgresSource(uri: str, query: str | List[str], credentials: dict | UserPasswordCredentials | None = None, initial_values: dict | None = None)[source]#
Bases:
Input
- Initializes the PostgresSource with the given URI and query, and optionally
connection credentials and initial values for the parameters in the SQL queries.
- Parameters:
uri – The URI of the database where the data is located
query – The SQL query(s) to execute. If multiple queries are provided, they must be provided as a list, and they will be mapped to the function inputs in the same order as they are defined.
credentials – (optional) The credentials required to access the Postgres database. Can be a dictionary or a UserPasswordCredentials object.
initial_values – (optional) The initial values for the parameters in the SQL queries.
- Raises:
InputConfigurationError –
- property credentials: UserPasswordCredentials | None#
The credentials required to access the PostgresDatabase. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
Oracle Source#
- class OracleSource(uri: str, query: str | List[str], credentials: dict | UserPasswordCredentials | None = None, initial_values: dict | None = None)[source]#
Bases:
Input
- Initializes the OracleSource with the given URI and query, and optionally
connection credentials and initial values for the parameters in the SQL queries.
- Parameters:
uri – The URI of the database where the data is located
query – The SQL query(s) to execute. If multiple queries are provided, they must be provided as a list, and they will be mapped to the function inputs in the same order as they are defined.
credentials – (optional) The credentials required to access the Oracle database. Can be a dictionary or a UserPasswordCredentials object.
initial_values – (optional) The initial values for the parameters in the SQL queries.
- Raises:
InputConfigurationError –
- property credentials: UserPasswordCredentials | None#
The credentials required to access Oracle. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
Custom Source (Plugin)#
- class SourcePlugin[source]#
Bases:
ABC
Abstract class for input plugins.
- property initial_values: dict#
Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.
- Returns:
A dictionary with the initial values of the parameters of the plugin.
- Return type:
- abstract trigger_input(working_dir: str) str | Tuple[str, ...] | List[str] [source]#
- Trigger the import of the data. This must be implemented in any class that
inherits from this class. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files: return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
For Subscribers#
Local File Destination#
- class LocalFileDestination(path: str | List[str], format: str | dict | FileFormat = None)[source]#
Bases:
Output
Initializes the LocalFileDestination with the given path; and optionally a format.
- Parameters:
path – The path where the files must be stored. It can be a single path or a list of paths.
format – (optional) The format of the file. If not provided, it will be inferred from the file extension of the data. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’ and ‘jsonl’.
- Raises:
OutputConfigurationError –
FormatConfigurationError –
- property format: FileFormat#
The format of the file or files. If not provided, it will be inferred from the file extension in the path.
- Type:
S3 Destination#
- class S3Destination(uri: str | List[str], credentials: dict | S3Credentials, format: str | dict | FileFormat = None, region: str = None)[source]#
Bases:
Output
- Initializes the S3Destination with the given URI and the credentials required to
access the S3 bucket, and optionally a format and date and time after which the files were modified.
- Parameters:
uri – The URI of the files with format: ‘s3://path/to/files’. It can be a single URI or a list of URIs.
credentials – The credentials required to access the S3 bucket. Can be a dictionary or a S3Credentials object.
format – (optional) The format of the file. If not provided, it will be inferred from the file extension of the data. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’ and ‘jsonl’.
region – (optional) The region where the S3 bucket is located. If not provided, the default AWS region will be used.
- Raises:
OutputConfigurationError –
FormatConfigurationError –
- property credentials: S3Credentials#
The credentials required to access the S3 bucket.
- Type:
S3Credentials
- property format: FileFormat#
The format of the file. If not provided, it will be inferred from the file.
- Type:
Azure Destination#
- class AzureDestination(uri: str | List[str], credentials: dict | AzureCredentials, format: str | dict | FileFormat = None)[source]#
Bases:
Output
- Initializes the AzureDestination with the given URI and the credentials
required to access Azure; and optionally a format.
- Parameters:
uri – The URI of the files to export with format: ‘az://path/to/files’. It can be a single URI or a list of URIs.
credentials – The credentials required to access Azure. Can be a dictionary or a AzureCredentials object.
format – (optional) The format of the file. If not provided, it will be inferred from the file extension. Can be either a string with the format, a FileFormat object or a dictionary with the format as the ‘type’ key and any additional format-specific information. Currently supported formats are ‘csv’, ‘parquet’, ‘ndjson’ and ‘jsonl’.
- Raises:
OutputConfigurationError –
FormatConfigurationError –
- property credentials: AzureCredentials#
The credentials required to access Azure.
- Type:
AzureCredentials
- property format: FileFormat#
The format of the file. If not provided, it will be inferred from the file extension of the URI.
- Type:
MySQL Destination#
- class MySQLDestination(uri: str, destination_table: List[str] | str, credentials: dict | UserPasswordCredentials = None, if_table_exists: Literal['append', 'replace'] = 'append')[source]#
Bases:
Output
Initializes the MySQLDestination with the given URI and destination table, and optionally connection credentials.
- Parameters:
uri – The URI of the database where the data is going to be stored.
destination_table – The tables to create. If multiple tables are provided, they must be provided as a list.
credentials – (optional): The credentials required to access the MySQL database. Can be a dictionary or a UserPasswordCredentials object.
if_table_exists – (optional) The strategy to follow when the table already exists. Defaults to ‘append’. - ‘replace’ will create a new database table, overwriting an existing one. - ‘append’ will append to an existing table.
- Raises:
OutputConfigurationError –
- property credentials: UserPasswordCredentials#
The credentials required to access the MySQLDatabase.
- Type:
- property destination_table: str | List[str]#
The table(s) to create. If multiple tables are provided, they must be provided as a list.
Maria DBDestination#
- class MariaDBDestination(uri: str, destination_table: List[str] | str, credentials: dict | UserPasswordCredentials = None, if_table_exists: Literal['append', 'replace'] = 'append')[source]#
Bases:
Output
Initializes the MariaDBDestination with the given URI and destination table, and optionally connection credentials.
- Parameters:
uri – The URI of the database where the data is going to be stored.
destination_table – The tables to create. If multiple tables are provided, they must be provided as a list.
credentials – (optional) The credentials required to access the MariaDB database. Can be a dictionary or a UserPasswordCredentials object.
if_table_exists – (optional) The strategy to follow when the table already exists. Defaults to ‘append’. - ‘replace’ will create a new database table, overwriting an existing one. - ‘append’ will append to an existing table.
- Raises:
OutputConfigurationError –
- property credentials: UserPasswordCredentials#
The credentials required to access the MariaDB database.
- Type:
- property destination_table: str | List[str]#
The table(s) to create. If multiple tables are provided, they must be provided as a list.
Postgres Destination#
- class PostgresDestination(uri: str, destination_table: List[str] | str, credentials: dict | UserPasswordCredentials = None, if_table_exists: Literal['append', 'replace'] = 'append')[source]#
Bases:
Output
Initializes the PostgresDestination with the given URI and destination table, and optionally connection credentials.
- Parameters:
uri (str) – The URI of the database where the data is going to be stored.
destination_table (List[str] | str) – The tables to create. If multiple tables are provided, they must be provided as a list.
credentials (dict | UserPasswordCredentials, optional) – The credentials required to access the Postgres database. Can be a dictionary or a UserPasswordCredentials object.
if_table_exists – (optional) The strategy to follow when the table already exists. Defaults to ‘append’. - ‘replace’ will create a new database table, overwriting an existing one. - ‘append’ will append to an existing table.
- Raises:
OutputConfigurationError –
- property credentials: UserPasswordCredentials#
The credentials required to access the Postgres database.
- Type:
- property destination_table: str | List[str]#
The table(s) to create. If multiple tables are provided, they must be provided as a list.
Oracle Destination#
- class OracleDestination(uri: str, destination_table: List[str] | str, credentials: dict | UserPasswordCredentials = None, if_table_exists: Literal['append', 'replace'] = 'append')[source]#
Bases:
Output
Initializes the OracleDestination with the given URI and destination table, and optionally connection credentials.
- Parameters:
uri (str) – The URI of the database where the data is going to be stored.
destination_table (List[str] | str) – The tables to create. If multiple tables are provided, they must be provided as a list.
credentials (dict | UserPasswordCredentials, optional) – The credentials required to access the Oracle database. Can be a dictionary or a UserPasswordCredentials object.
- if_table_exists: (optional) The strategy to
follow when the table already exists. Defaults to ‘append’. - ‘replace’ will create a new database table, overwriting an existing one. - ‘append’ will append to an existing table.
- Raises:
OutputConfigurationError –
- property credentials: UserPasswordCredentials#
The credentials required to access the Oracle database.
- Type:
- property destination_table: str | List[str]#
The table(s) to create. If multiple tables are provided, they must be provided as a list.
Custom Destination (Plugin)#
- class DestinationPlugin[source]#
Bases:
ABC
Abstract class for output plugins.
- abstract trigger_output(*args, **kwargs)[source]#
- Trigger the exporting of the data. This function will receive the resulting data
from the dataset function and must store it in the desired location.
- Parameters:
*args – The data to be exported
**kwargs – Additional parameters to be used in the export
- Returns:
None