Sources
- class AzureSource(
- uri: str | list[str],
- credentials: AzureCredentials,
- format: str | FileFormat = None,
- initial_last_modified: str | datetime = None,
Bases:
SourcePlugin- Categories:
source
Azure-file-based data inputs.
- class SupportedFormats(
- *values,
Bases:
EnumEnum for the supported formats for the AzureSource.
- avro = <class 'tabsdata._format.AvroFormat'>
- csv = <class 'tabsdata._format.CSVFormat'>
- log = <class 'tabsdata._format.LogFormat'>
- ndjson = <class 'tabsdata._format.NDJSONFormat'>
- parquet = <class 'tabsdata._format.ParquetFormat'>
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: AzureCredentials
The credentials required to access Azure.
- Type:
AzureCredentials
- property format: FileFormat
The format of the file. If not provided, it will be inferred from the file extension of the data.
- Type:
FileFormat
- property initial_last_modified: str
The date and time after which the files were modified.
- Type:
- property initial_values: dict
Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.
- Returns:
A dictionary with the initial values of the parameters of the plugin.
- Return type:
- class GCSSource(
- uri: str | list[str],
- credentials: GCPCredentials,
- format: str | FileFormat = None,
- initial_last_modified: str | datetime = None,
Bases:
SourcePlugin- Categories:
source
GCS-file-based data inputs.
- class SupportedFormats(
- *values,
Bases:
EnumEnum for the supported formats for the GCSSource.
- avro = <class 'tabsdata._format.AvroFormat'>
- csv = <class 'tabsdata._format.CSVFormat'>
- log = <class 'tabsdata._format.LogFormat'>
- ndjson = <class 'tabsdata._format.NDJSONFormat'>
- parquet = <class 'tabsdata._format.ParquetFormat'>
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: GCPCredentials
The credentials required to access GCS.
- Type:
GCPCredentials
- property format: FileFormat
The format of the file. If not provided, it will be inferred from the file extension of the data.
- Type:
FileFormat
- property initial_last_modified: str
The date and time after which the files were modified.
- Type:
- property initial_values: dict
Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.
- Returns:
A dictionary with the initial values of the parameters of the plugin.
- Return type:
- class LocalFileSource(
- path: str | list[str],
- format: str | FileFormat = None,
- initial_last_modified: str | datetime = None,
Bases:
SourcePlugin- Categories:
source
Local-file-based data inputs.
- class SupportedFormats(
- *values,
Bases:
EnumEnum for the supported formats for the LocalFileSource.
- avro = <class 'tabsdata._format.AvroFormat'>
- csv = <class 'tabsdata._format.CSVFormat'>
- log = <class 'tabsdata._format.LogFormat'>
- ndjson = <class 'tabsdata._format.NDJSONFormat'>
- parquet = <class 'tabsdata._format.ParquetFormat'>
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property format: FileFormat
The format of the file or files. If not provided, it will be inferred from the file extension in the path.
- Type:
FileFormat
- property initial_last_modified: str
The date and time after which the files were modified.
- Type:
- property initial_values: dict
Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.
- Returns:
A dictionary with the initial values of the parameters of the plugin.
- Return type:
- class MSSQLSource(
- **kwargs,
Bases:
SourcePlugin- Categories:
source
Microsoft SQL Server based data inputs.
- chunk(
- working_dir: str,
Execute the query and yield chunks of data.
- Parameters:
working_dir (str) – The working directory for temporary files.
- Yields:
pd.DataFrame – A chunk of data from the query result.
- property connection_string: str
Get the connection string for the database.
- Returns:
The connection string.
- Return type:
- property credentials: UserPasswordCredentials | None
The credentials required to access Microsoft SQL Server. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
- class MSSQLSrc(
- conn: MSSQLConn,
- queries: str | list[str],
- initial_values: dict[str, Any] | None = None,
- transactional: bool = True,
- chunk_size: Annotated[int, Gt(gt=0)] = 100000,
- loader: Literal['polars_sqlalchemy', 'pandas'] = 'polars_sqlalchemy',
- schema_overrides: dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64] | list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None = None,
Bases:
SourcePlugin- Categories:
source
Source Plugin for reading data from a Microsoft SQL Server (MSSQL) database.
- property chunk_size: int
The chunk size in records for reading large queries in batches.
- property conn: MSSQLConn
The MSSQL connection configuration.
- property initial_values: dict
The initial values for the parameters in the SQL queries.
- property loader: Literal['polars_sqlalchemy', 'pandas']
The data processing loader to use for executing SQL imports.
- property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None
The schema overrides for the tables being read.
- stream(
- working_dir: str,
Execute the query and load the results as a list of TableFrames.
- Parameters:
working_dir – The directory where the Parquet files will be saved.
- Returns:
A list of TableFrames containing the query results.
- property transactional: bool
Whether to use transactions for reading to ensure consistency.
- class MariaDBSource(
- **kwargs,
Bases:
SourcePlugin- Categories:
source
MariaDB-based data inputs.
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: UserPasswordCredentials | None
The credentials required to access MariaDB. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
- class MariaDBSrc(
- conn: MariaDBConn,
- queries: str | list[str],
- initial_values: dict[str, Any] | None = None,
- transactional: bool = True,
- chunk_size: Annotated[int, Gt(gt=0)] = 100000,
- loader: Literal['polars_sqlalchemy', 'pandas'] = 'polars_sqlalchemy',
- schema_overrides: dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64] | list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None = None,
Bases:
SourcePlugin- Categories:
source
Source plugin for reading data from MariaDB.
- property chunk_size: int
The chunk size in records for reading large queries in batches.
- property conn: MariaDBConn
The MariaDB connection configuration.
- property initial_values: dict
The initial values for the parameters in the SQL queries.
- property loader: Literal['polars_sqlalchemy', 'pandas']
The data processing loader to use for executing SQL imports.
- property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None
- stream(
- working_dir: str,
Execute the query and load the results as a list of TableFrames.
- Parameters:
working_dir – The directory where the Parquet files will be saved.
- Returns:
A list of TableFrames containing the query results.
- property transactional: bool
Whether to use transactions for reading to ensure consistency.
- class MySQLSource(
- **kwargs,
Bases:
SourcePlugin- Categories:
source
MySQL-based data inputs.
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: UserPasswordCredentials | None
The credentials required to access the MySQLDatabase. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
- class MySQLSrc(
- conn: MySQLConn,
- queries: str | list[str],
- initial_values: dict[str, Any] | None = None,
- transactional: bool = True,
- chunk_size: Annotated[int, Gt(gt=0)] = 100000,
- loader: Literal['polars_sqlalchemy', 'pandas'] = 'polars_sqlalchemy',
- schema_overrides: dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64] | list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None = None,
Bases:
SourcePlugin- Categories:
source
Source plugin for reading data from MySQL.
- property chunk_size: int
The chunk size in records for reading large queries in batches.
- property conn: MySQLConn
The MySQL connection configuration.
- property initial_values: dict
The initial values for the parameters in the SQL queries.
- property loader: Literal['polars_sqlalchemy', 'pandas']
The data processing loader to use for executing SQL imports.
- property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None
The schema overrides for the tables being read.
- stream(
- working_dir: str,
Execute the query and load the results as a list of TableFrames.
- Parameters:
working_dir – The directory where the Parquet files will be saved.
- Returns:
A list of TableFrames containing the query results.
- property transactional: bool
Whether to use transactions for reading to ensure consistency.
- class OracleSource(
- **kwargs,
Bases:
SourcePlugin- Categories:
source
Oracle-based data inputs.
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: UserPasswordCredentials | None
The credentials required to access Oracle. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
- class OracleSrc(
- conn: OracleConn,
- queries: str | list[str],
- initial_values: dict[str, Any] | None = None,
- transactional: bool = True,
- chunk_size: Annotated[int, Gt(gt=0)] = 100000,
- loader: Literal['polars_sqlalchemy', 'pandas'] = 'polars_sqlalchemy',
- schema_overrides: dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64] | list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None = None,
Bases:
SourcePlugin- Categories:
source
Source plugin for reading data from Oracle.
- property chunk_size: int
The chunk size in records for reading large queries in batches.
- property conn: OracleConn
The Oracle connection configuration.
- property initial_values: dict
The initial values for the parameters in the SQL queries.
- property loader: Literal['polars_sqlalchemy', 'pandas']
The data processing loader to use for executing SQL imports.
- property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None
The schema overrides for the tables being read.
- stream(
- working_dir: str,
Execute the query and load the results as a list of TableFrames.
- Parameters:
working_dir – The directory where the Parquet files will be saved.
- Returns:
A list of TableFrames containing the query results.
- property transactional: bool
Whether to use transactions for reading to ensure consistency.
- class PostgresSource(
- **kwargs,
Bases:
SourcePlugin- Categories:
source
Postgres-based data inputs.
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: UserPasswordCredentials | None
The credentials required to access the PostgresDatabase. If no credentials were provided, it will return None.
- Type:
UserPasswordCredentials | None
- class PostgresSrc(
- conn: PostgresConn,
- queries: str | list[str],
- initial_values: dict[str, Any] | None = None,
- transactional: bool = True,
- chunk_size: Annotated[int, Gt(gt=0)] = 100000,
- loader: Literal['polars_sqlalchemy', 'pandas'] = 'polars_sqlalchemy',
- schema_overrides: dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64] | list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None = None,
Bases:
SourcePlugin- Categories:
source
Source plugin for reading data from Postgres.
- property chunk_size: int
The chunk size in records for reading large queries in batches.
- property conn: PostgresConn
The Postgres connection configuration.
- property initial_values: dict
The initial values for the parameters in the SQL queries.
- property loader: Literal['polars_sqlalchemy', 'pandas']
The data processing loader to use for executing SQL imports.
- property schema_overrides: list[dict[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | None
The schema overrides for the tables being read.
- stream(
- working_dir: str,
Execute the query and load the results as a list of TableFrames.
- Parameters:
working_dir – The directory where the Parquet files will be saved.
- Returns:
A list of TableFrames containing the query results.
- property transactional: bool
Whether to use transactions for reading to ensure consistency.
- class S3Source(
- uri: str | list[str],
- credentials: S3Credentials,
- format: str | FileFormat = None,
- initial_last_modified: str | datetime = None,
- region: str = None,
Bases:
SourcePlugin- Categories:
source
S3-file-based data inputs.
- class SupportedFormats(
- *values,
Bases:
EnumEnum for the supported formats for the S3Source.
- avro = <class 'tabsdata._format.AvroFormat'>
- csv = <class 'tabsdata._format.CSVFormat'>
- log = <class 'tabsdata._format.LogFormat'>
- ndjson = <class 'tabsdata._format.NDJSONFormat'>
- parquet = <class 'tabsdata._format.ParquetFormat'>
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: S3Credentials
The credentials required to access the S3 bucket.
- Type:
S3Credentials
- property format: FileFormat
The format of the file. If not provided, it will be inferred from the file.
- Type:
FileFormat
- property initial_last_modified: str
The date and time after which the files were modified.
- Type:
- property initial_values: dict
Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.
- Returns:
A dictionary with the initial values of the parameters of the plugin.
- Return type:
- class SalesforceReportSource(
- credentials: SalesforceCredentials,
- report: str | list[str],
- column_name_strategy: Literal['columnName', 'label'],
- find_report_by: Literal['id', 'name'] = None,
- filter: tuple[str, str, str] | list[tuple[str, str, str]] = None,
- filter_logic: str = None,
- instance_url: str = None,
- last_modified_column: str = None,
- initial_last_modified: str = None,
- **kwargs,
Bases:
SourcePlugin- Categories:
source
Salesforce Reports based data inputs.
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property column_name_strategy: Literal['columnName', 'label']
- property credentials: SalesforceCredentials | SalesforceTokenCredentials
- property find_report_by: Literal['id', 'name']
- class SalesforceSource(
- credentials: SalesforceCredentials,
- query: str | list[str],
- instance_url: str = None,
- include_deleted: bool = False,
- initial_last_modified: str = None,
- **kwargs,
Bases:
SourcePlugin- Categories:
source
Salesforce (SOQL query) based data inputs (not Salesforce Reports).
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property credentials: SalesforceCredentials | SalesforceTokenCredentials
- class Stage(
- use_existing_data: bool = True,
Bases:
SourcePlugin- Categories:
source
Inputs that have been placed in a stage, usually from a StageTrigger.
- stream(
- working_dir: str,
- property use_existing_data: bool