Plugins
- class DestinationPlugin
Bases:
object- Categories:
plugin
Parent class for subscriber connectors.
- trigger_output(working_dir,
*args,**kwargs): Trigger the exporting of the data. This function will receive the resulting data from the dataset function and must store it in the desired location.
- chunk(
- working_dir: str,
- *results: VALID_PLUGIN_RESULT,
Trigger the exporting of the data to local parquet chunks. This method will receive the resulting data from the user function and must store it in the local system as parquet files, using the working_dir. Note: This method should not materialize the data, it should only store it in the local system.
- Parameters:
working_dir (str) – The folder where any files generated must be stored (this refers to temporary files that will be deleted after the execution of the plugin, not the final destination of the data)
results – The data to be exported. It is a list of polars LazyFrames or None.
- Returns:
A list of the intermediate files created
- stream(
- working_dir: str,
- *results: VALID_PLUGIN_RESULT,
Trigger the exporting of the data. This method will receive the resulting data from the user function and must store it in the desired location. Note: this method might materialize the data provided in a single chunk generated by the chunk function if invoked, so chunks should be of an appropriate size.
- Parameters:
working_dir (str) – The folder where any intermediate files generated must be stored (this refers to temporary files that will be deleted after the execution of the plugin, not the final destination of the data)
results – The data to be exported. It is a list of polars LazyFrames or None.
- Returns:
None
- write( )
Given a file or a list of files, write to the desired destination. Note: this method might materialize the data in the files it receives, so chunks should be of an appropriate size.
- Parameters:
files (str) – The file or files to be stored in the final destination.
- class SourcePlugin
Bases:
object- Categories:
plugin
Parent class for publisher connectors.
- def chunk(working_dir) -> Union[str, Tuple[str, …], List[str]]
Trigger the import of the data. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files: return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- chunk(
- working_dir: str,
Trigger the import of the data. This must be implemented in any class that inherits from this class unless directly implementing streaming. The method will receive a folder where it must store the data as parquet files, and return a list of the paths of the files created. This files will then be loaded and mapped to the dataset function in positional order, so if you want file.parquet to be the first argument of the dataset function, you must return it first. If you want a parameter to receive multiple files, return a list of the paths. For example, you would give the following return to provide a first argument with a single file and a second argument with two files:return [“file1.parquet”, [“file2.parquet”, “file3.parquet”]]
- property initial_values: dict
Return a dictionary with the initial values to be stored after execution of the plugin. They will be accessible in the next execution of the plugin. The dictionary must have the parameter names as keys and the initial values as values, all the type string.
- Returns:
A dictionary with the initial values of the parameters of the plugin.
- Return type:
- stream(
- working_dir: str,