Local File System#

You can use a subscriber to write tables to a local file system from the Tabsdata server. Subscribers can write in following file formats: CSV, jsonl, ndjson, and parquet.

Example (Subscriber - Local File System)#

Here is an example subscriber named write_employees. It reads departments table and multiple employees tables from Tabsdata. Subsequently, it writes the tables to the output HR folder. This subscriber executes automatically as soon as a new commit occurs on any of its input tables.

import tabsdata as td

@td.subscriber(
    tables=["departments", "employees_1", "employees_2"],
    destination=td.LocalFileDestination(
        [
            "/users/username/opt/hr/output/departments.csv",
            "/users/username/opt/hr/output/employees_1.csv",
            "/users/username/opt/hr/output/employees_2.csv",
        ]
    ),
)
def write_employees(tf1: td.TableFrame, tf2: td.TableFrame, tf3: td.TableFrame):
    return tf1, tf2, tf3

Note: After defining the subscriber, you need to register it with a Tabsdata collection. For more information, see Register a Function.

Setup (Subscriber - Local File System)#

The following code uses placeholder values for defining a subscriber that reads data from the Tabsdata server and writes it to the Local File System:

    import tabsdata as td

@td.subscriber(
    tables=["<output_table1>", "<output_table2>"],
    destination=td.LocalFileDestination(["<path_to_file1>", "<path_to_file2>"]),
    trigger_by=["<trigger_table1>", "<trigger_table2>"],
)
    def <subscriber_name>(<table_frame1>: td.TableFrame, <table_frame2>: td.TableFrame):
            <function_logic>
            return <table_frame_output1>, <table_frame_output2>

Note: After defining the subscriber, you need to register it with a Tabsdata collection. For more information, see Register a Function.

Following properties are defined in the setup code above:

Destination#

<path_to_file1>, <path_to_file2>… are the full system directory paths to write the files to. They are usually of the format /users/username/....

All the destination files in a subscriber need to be have the same extension. Following file formats are supported currently: CSV, jsonl, ndjson, and parquet.

You must use the absolute system path (e.g., /user/username/project/employees.csv) in the code instead of the relative one (e.g., ./employees.csv). Since these functions will be registered in the Tabsdata server, an absolute path is necessary to ensure proper access to the destination.

You can specify as many file paths as needed.

You can define the destination files in the following ways:

File Path

To write by file path where the file extension is included as part of the file path, define the destination as follows:

destination=td.LocalFileDestination([
        "<path_to_file1.ext>","<path_to_file2.ext>"
        ]),

<path_to_file1.ext>, <path_to_file2.ext>… are paths to files with extensions of the file included in the file name.

File Format

To write files by file format where the format is declared separately and not in the file name, define the destination as follows:

destination=td.LocalFileDestination([
        "<path_to_file1_no_extension>",
        "<path_to_file2_no_extension>",
        ], format="<format_name>"),

"<path_to_file1_no_extension>", "<path_to_file2_no_extension>"… are paths to files with extensions of the file not included in the file name. The extension is to all files is mentioned separately in format.

Custom delimiter for CSV

To define a custom delimiter for reading a CSV file, define the destination as follows:

destination=td.LocalFileDestination([
        "<path_to_file1_no_extension>",
        "<path_to_file2_no_extension>",
        ], format=td.CSVFormat(separator="<separator_character>")),

"<path_to_file1_no_extension>", "<path_to_file2_no_extension>"… are paths to CSV files with a custom delimiter, with extensions of the file not included in the file name. The delimiter is a single byte character such as colon (:), semicolon (;), and period (.) that separate the fields in the given file instead of a comma(,). You define the character in separator.

trigger_by#

[optional] <trigger_table1>, <trigger_table2>… are the names of the tables in the Tabsdata server. A new commit to any of these tables triggers the subscriber. All listed trigger tables must exist in the server before registering the subscriber.

Defining trigger tables is optional. If you don’t define the trigger_by property, the subscriber will be triggered by any of its input tables. If you define the trigger_by property, then only those tables listed in the property can automatically trigger the subscriber.

For more information, see Working with Triggers.

<subscriber_name>#

<subscriber_name> is the name for the subscriber that you are configuring.

<function_logic>#

<function_logic> governs the processing performed by the subscriber. You can specify function logic to be a simple write or to perform additional processing as needed. For more information about the function logic that you can include, see Working with Tables.

<table_frame1>, <table_frame2>… are the names for the variables that temporarily store source data for processing.

<table_frame_output1>, <table_frame_output2>… are the output from the function that are written to the external system.