Sources

class AzureSource( uri: str | list[str], credentials: AzureCredentials, format: str | FileFormat = None, initial_last_modified: str | datetime = None, )

Bases: SourcePlugin

Categories:: source

Azure-file-based data inputs.

class SupportedFormats( *values, )

Bases: Enum

Enum for the supported formats for the AzureSource.

avro = <class 'tabsdata._format.AvroFormat'>

csv = <class 'tabsdata._format.CSVFormat'>

log = <class 'tabsdata._format.LogFormat'>

ndjson = <class 'tabsdata._format.NDJSONFormat'>

parquet = <class 'tabsdata._format.ParquetFormat'>

chunk( working_dir: str, ) → list[str | None | list[str | None]]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: AzureCredentials

The credentials required to access Azure.

Type:: AzureCredentials

property format: FileFormat

The format of the file. If not provided, it will be inferred from the file extension of the data.

Type:: FileFormat

property initial_last_modified: str

The date and time after which the files were modified.

Type:: str

property initial_values: dict

Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.

Returns:: A dictionary with the initial values of the parameters of the plugin.
Return type:: dict

property uri: str | list[str]

‘az://path/to/files’.

Type:: str | list[str]
Type:: The URI of the files with format

class GCSSource( uri: str | list[str], credentials: GCPCredentials, format: str | FileFormat = None, initial_last_modified: str | datetime = None, )

Bases: SourcePlugin

Categories:: source

GCS-file-based data inputs.

class SupportedFormats( *values, )

Bases: Enum

Enum for the supported formats for the GCSSource.

avro = <class 'tabsdata._format.AvroFormat'>

csv = <class 'tabsdata._format.CSVFormat'>

log = <class 'tabsdata._format.LogFormat'>

ndjson = <class 'tabsdata._format.NDJSONFormat'>

parquet = <class 'tabsdata._format.ParquetFormat'>

chunk( working_dir: str, ) → list[str | None | list[str | None]]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: GCPCredentials

The credentials required to access GCS.

Type:: GCPCredentials

property format: FileFormat

The format of the file. If not provided, it will be inferred from the file extension of the data.

Type:: FileFormat

property initial_last_modified: str

The date and time after which the files were modified.

Type:: str

property initial_values: dict

Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.

Returns:: A dictionary with the initial values of the parameters of the plugin.
Return type:: dict

property uri: str | list[str]

‘gs://path/to/files’.

Type:: str | list[str]
Type:: The URI of the files with format

class LocalFileSource( path: str | list[str], format: str | FileFormat = None, initial_last_modified: str | datetime = None, )

Bases: SourcePlugin

Categories:: source

Local-file-based data inputs.

class SupportedFormats( *values, )

Bases: Enum

Enum for the supported formats for the LocalFileSource.

avro = <class 'tabsdata._format.AvroFormat'>

csv = <class 'tabsdata._format.CSVFormat'>

log = <class 'tabsdata._format.LogFormat'>

ndjson = <class 'tabsdata._format.NDJSONFormat'>

parquet = <class 'tabsdata._format.ParquetFormat'>

chunk( working_dir: str, ) → list[str | None | list[str | None]]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property format: FileFormat

The format of the file or files. If not provided, it will be inferred from the file extension in the path.

Type:: FileFormat

property initial_last_modified: str

The date and time after which the files were modified.

Type:: str

property initial_values: dict

Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.

Returns:: A dictionary with the initial values of the parameters of the plugin.
Return type:: dict

property path: str | list[str]

The path or paths to the files to load.

Type:: str | list[str]

class MSSQLSource(

**kwargs,

)

Bases: SourcePlugin

Categories:: source

Microsoft SQL Server based data inputs.

chunk( working_dir: str, )

Execute the query and yield chunks of data.

Parameters:: working_dir (str) – The working directory for temporary files.
Yields:: pd.DataFrame – A chunk of data from the query result.

property connection_string: str

Get the connection string for the database.

Returns:: The connection string.
Return type:: str

property credentials: UserPasswordCredentials | None

The credentials required to access Microsoft SQL Server. If no credentials were provided, it will return None.

Type:: UserPasswordCredentials | None

property initial_values: dict

The initial values for the parameters in the SQL queries.

Type:: dict

property query: list[str]

Get the SQL query or queries to execute.

Returns:: The SQL query or queries.
Return type:: str | list[str]

Bases: SourcePlugin

Categories:: source

Source Plugin for reading data from a Microsoft SQL Server (MSSQL) database.

property chunk_size: int: The chunk size in records for reading large queries in batches.

property conn: MSSQLConn: The MSSQL connection configuration.

property initial_values: dict: The initial values for the parameters in the SQL queries.

property loader: Literal['polars_sqlalchemy', 'pandas']: The data processing loader to use for executing SQL imports.

property queries: list[str]: The SQL queries to execute to read data.

property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None: The schema overrides for the tables being read.

stream( working_dir: str, ) → list[TableFrame]

Execute the query and load the results as a list of TableFrames.

Parameters:: working_dir – The directory where the Parquet files will be saved.
Returns:: A list of TableFrames containing the query results.

property transactional: bool: Whether to use transactions for reading to ensure consistency.

class MariaDBSource(

**kwargs,

)

Bases: SourcePlugin

Categories:: source

MariaDB-based data inputs.

chunk( working_dir: str, ) → list[str]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: UserPasswordCredentials | None

The credentials required to access MariaDB. If no credentials were provided, it will return None.

Type:: UserPasswordCredentials | None

property initial_values: dict

The initial values for the parameters in the SQL queries.

Type:: dict

property query: str | List[str]

The SQL query(s) to execute.

Type:: str | List[str]

property uri: str

The URI of the database where the data is located.

Type:: str

Bases: SourcePlugin

Categories:: source

Source plugin for reading data from MariaDB.

property chunk_size: int: The chunk size in records for reading large queries in batches.

property conn: MariaDBConn: The MariaDB connection configuration.

property initial_values: dict: The initial values for the parameters in the SQL queries.

property loader: Literal['polars_sqlalchemy', 'pandas']: The data processing loader to use for executing SQL imports.

property queries: list[str]: The SQL queries to execute to read data.

property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None

stream( working_dir: str, ) → list[TableFrame]

Execute the query and load the results as a list of TableFrames.

Parameters:: working_dir – The directory where the Parquet files will be saved.
Returns:: A list of TableFrames containing the query results.

property transactional: bool: Whether to use transactions for reading to ensure consistency.

class MySQLSource(

**kwargs,

)

Bases: SourcePlugin

Categories:: source

MySQL-based data inputs.

chunk( working_dir: str, ) → list[str]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: UserPasswordCredentials | None

The credentials required to access the MySQLDatabase. If no credentials were provided, it will return None.

Type:: UserPasswordCredentials | None

property initial_values: dict

The initial values for the parameters in the SQL queries.

Type:: dict

property query: str | List[str]

The SQL query(s) to execute.

Type:: str | List[str]

property uri: str

The URI of the database where the data is located.

Type:: str

Bases: SourcePlugin

Categories:: source

Source plugin for reading data from MySQL.

property chunk_size: int: The chunk size in records for reading large queries in batches.

property conn: MySQLConn: The MySQL connection configuration.

property initial_values: dict: The initial values for the parameters in the SQL queries.

property loader: Literal['polars_sqlalchemy', 'pandas']: The data processing loader to use for executing SQL imports.

property queries: list[str]: The SQL queries to execute to read data.

property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None: The schema overrides for the tables being read.

stream( working_dir: str, ) → list[TableFrame]

Execute the query and load the results as a list of TableFrames.

Parameters:: working_dir – The directory where the Parquet files will be saved.
Returns:: A list of TableFrames containing the query results.

property transactional: bool: Whether to use transactions for reading to ensure consistency.

class OracleSource(

**kwargs,

)

Bases: SourcePlugin

Categories:: source

Oracle-based data inputs.

chunk( working_dir: str, ) → list[str]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: UserPasswordCredentials | None

The credentials required to access Oracle. If no credentials were provided, it will return None.

Type:: UserPasswordCredentials | None

property initial_values: dict

The initial values for the parameters in the SQL queries.

Type:: dict

property query: str | List[str]

The SQL query(s) to execute.

Type:: str | List[str]

property uri: str

The URI of the database where the data is located.

Type:: str

Bases: SourcePlugin

Categories:: source

Source plugin for reading data from Oracle.

property chunk_size: int: The chunk size in records for reading large queries in batches.

property conn: OracleConn: The Oracle connection configuration.

property initial_values: dict: The initial values for the parameters in the SQL queries.

property loader: Literal['polars_sqlalchemy', 'pandas']: The data processing loader to use for executing SQL imports.

property queries: list[str]: The SQL queries to execute to read data.

property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None: The schema overrides for the tables being read.

stream( working_dir: str, ) → list[TableFrame]

Execute the query and load the results as a list of TableFrames.

Parameters:: working_dir – The directory where the Parquet files will be saved.
Returns:: A list of TableFrames containing the query results.

property transactional: bool: Whether to use transactions for reading to ensure consistency.

class PostgresSource(

**kwargs,

)

Bases: SourcePlugin

Categories:: source

Postgres-based data inputs.

chunk( working_dir: str, ) → list[str]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: UserPasswordCredentials | None

The credentials required to access the PostgresDatabase. If no credentials were provided, it will return None.

Type:: UserPasswordCredentials | None

property initial_values: dict

The initial values for the parameters in the SQL queries.

Type:: dict

property query: str | List[str]

The SQL query(s) to execute.

Type:: str | List[str]

property uri: str

The URI of the database where the data is located.

Type:: str

Bases: SourcePlugin

Categories:: source

Source plugin for reading data from Postgres.

property chunk_size: int: The chunk size in records for reading large queries in batches.

property conn: PostgresConn: The Postgres connection configuration.

property initial_values: dict: The initial values for the parameters in the SQL queries.

property loader: Literal['polars_sqlalchemy', 'pandas']: The data processing loader to use for executing SQL imports.

property queries: list[str]: The SQL queries to execute to read data.

property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None: The schema overrides for the tables being read.

stream( working_dir: str, ) → list[TableFrame]

Execute the query and load the results as a list of TableFrames.

Parameters:: working_dir – The directory where the Parquet files will be saved.
Returns:: A list of TableFrames containing the query results.

property transactional: bool: Whether to use transactions for reading to ensure consistency.

class S3Source( uri: str | list[str], credentials: S3Credentials, format: str | FileFormat = None, initial_last_modified: str | datetime = None, region: str = None, )

Bases: SourcePlugin

Categories:: source

S3-file-based data inputs.

class SupportedFormats( *values, )

Bases: Enum

Enum for the supported formats for the S3Source.

avro = <class 'tabsdata._format.AvroFormat'>

csv = <class 'tabsdata._format.CSVFormat'>

log = <class 'tabsdata._format.LogFormat'>

ndjson = <class 'tabsdata._format.NDJSONFormat'>

parquet = <class 'tabsdata._format.ParquetFormat'>

chunk( working_dir: str, ) → list[str | None | list[str | None]]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: S3Credentials

The credentials required to access the S3 bucket.

Type:: S3Credentials

property format: FileFormat

The format of the file. If not provided, it will be inferred from the file.

Type:: FileFormat

property initial_last_modified: str

The date and time after which the files were modified.

Type:: str

property initial_values: dict

Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.

Returns:: A dictionary with the initial values of the parameters of the plugin.
Return type:: dict

property region: str | None

The region where the S3 bucket is located.

Type:: str

property uri: str | list[str]

‘s3://path/to/files’.

Type:: str | list[str]
Type:: The URI of the files with format

class SalesforceReportSource(

credentials: SalesforceCredentials,

report: str | list[str],

column_name_strategy: Literal['columnName', 'label'],

find_report_by: Literal['id', 'name'] = None,

filter: tuple[str, str, str] | list[tuple[str, str, str]] = None,

filter_logic: str = None,

instance_url: str = None,

last_modified_column: str = None,

initial_last_modified: str = None,

**kwargs,

)

Bases: SourcePlugin

Categories:: source

Salesforce Reports based data inputs.

chunk( working_dir: str, ) → list[str]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property column_name_strategy: Literal['columnName', 'label']

property credentials: SalesforceCredentials | SalesforceTokenCredentials

property filter: list[tuple[str, str, str]] | None

property filter_logic: str | None

property find_report_by: Literal['id', 'name']

property report: list[str]

class SalesforceSource(

credentials: SalesforceCredentials,

query: str | list[str],

instance_url: str = None,

include_deleted: bool = False,

initial_last_modified: str = None,

**kwargs,

)

Bases: SourcePlugin

Categories:: source

Salesforce (SOQL query) based data inputs (not Salesforce Reports).

chunk( working_dir: str, ) → list[str]

Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]

Parameters:

working_dir (str) – The folder where the files must be stored

Returns:

The path of the file(s) created, in: the order they must be mapped to the dataset function

Return type:

Union[str, Tuple[str, …], List[str]]

property credentials: SalesforceCredentials | SalesforceTokenCredentials

property query: list[str]

class Stage( use_existing_data: bool = True, )

Bases: SourcePlugin

Categories:: source

Inputs that have been placed in a stage, usually from a StageTrigger.

stream( working_dir: str, ) → list[TableFrame | None | list[TableFrame | None]]

property use_existing_data: bool