Azure Blob Storage#
You can use a subscriber to write files from Tabsdata server to Azure Blob Storage. Subscribers can write the following file formats: CSV, jsonl, ndjson, and parquet.
Example (Subscriber - Azure Blob Storage)#
Here is an example subscriber named write_employees
. It reads departments table and multiple employees tables from Tabsdata. Subsequently, it writes the tables to the output HR folder. This subscriber executes automatically as soon as a new commit occurs on any of its input tables.
import tabsdata as td
azure_credentials = td.AzureAccountKeyCredentials(
account_name=td.HashiCorpSecret("path-to-secret","AZURE_ACCOUNT_NAME"),
account_key=td.HashiCorpSecret("path-to-secret","AZURE_ACCOUNT_KEY"),
)
@td.subscriber(
tables=["departments", "employees_1", "employees_2"],
destination=td.AzureDestination(
[
"az://opt/hr/departments.csv",
"az://opt/hr/employees_1.csv",
"az://opt/hr/employees_2.csv",
],
credentials = azure_credentials,
),
)
def write_employees(tf1: td.TableFrame, tf2: td.TableFrame, tf3: td.TableFrame):
return tf1, tf2, tf3
Where:
AZURE_ACCOUNT_NAME
is your Azure account name.
AZURE_ACCOUNT_KEY
is the value of your Azure account key.
Note: After defining the function, you need to register it with a Tabsdata collection. For more information, see here.
Setup (Subscriber - Azure Blob Storage)#
The following code uses placeholder values for defining a subscriber that reads Tabsdata tables and writes them to Azure Blob Storage:
import tabsdata as td
azure_credentials = td.AzureAccountKeyCredentials(
account_name=td.HashiCorpSecret("path-to-secret","AZURE_ACCOUNT_NAME"),
account_key=td.HashiCorpSecret("path-to-secret","AZURE_ACCOUNT_KEY"),
)
@td.subscriber(
tables=["<input_table1>", "<input_table2>"],
destination=td.AzureDestination(
["az://<path_to_file1>", "az://<path_to_file2>"], azure_credentials
),
trigger_by=["<trigger_table1>", "<trigger_table2>"],
)
def <subscriber_name>(<table_frame1>: td.TableFrame, <table_frame2>: td.TableFrame):
<function_logic>
return <table_frame_output1>, <table_frame_output2>
Note: After defining the function, you need to register it with a Tabsdata collection. For more information, see here here.
Following properties are defined in the setup code above:
tables#
<input_table1>
, <input_table2>
… are the names of the Tabsdata tables to be written to the external system.
destination#
<path_to_file1>
, <path_to_file2>
… are the full system directory paths to write the files to.
All the destination files in a subscriber need to have the same extension. Following file formats are supported currently: CSV, jsonl, ndjson, and parquet.
You can specify as many file paths as needed.
You can define the destination files in the following ways:
File Path
To write by file path where the file extension is included as part of the file path, define the destination as follows:
destination=td.AzureDestination(["az://<path_to_file1.ext>","az://<path_to_file2.ext>"], azure_credentials),
<path_to_file1.ext>
, <path_to_file2.ext>
… have the extensions of the file included in the file name as part of the path.
File Format
To write files by file format where format is declared separately and not in the file name, define the destination as follows:
destination=td.AzureDestination([
"az://<path_to_file1_no_extension>",
"az://<path_to_file2_no_extension>",
], azure_credentials, format="<format_name>"),
"<path_to_file1_no_extension>"
, "<path_to_file2_no_extension>"
… are paths to files with extensions of the file not included in the file name. The extension to all files is mentioned separately in format
.
Custom delimiter for CSV
To define a custom delimiter for reading a CSV file, use the format code as follows:
destination=td.AzureDestination([
"az://<path_to_file1_no_extension>",
"az://<path_to_file2_no_extension>",
], azure_credentials, format=td.CSVFormat(separator="<separator_character>")),
"<path_to_file1_no_extension>"
, "<path_to_file2_no_extension>"
… are paths to CSV files with a custom delimiter, with extensions of the file not included in the file name. The delimiter is a single byte character such as colon (:), semicolon (;), and period (.) that separate the fields in the given file instead of a comma(,). You define the character in separator
.
credentials#
A subscriber needs credentials to write files to Azure Blob Storage. Here the value is defined using a variable azure_credentials. The variable is an object of class AzureAccountKeyCredentials
with following values.
AZURE_ACCOUNT_NAME
is your Azure account name.
AZURE_ACCOUNT_KEY
is the value of your Azure account key.
You can use different ways to store the credentials which are highlighted here in the documentation.
trigger_by#
[optional] <trigger_table1>
, <trigger_table2>
… are the names of the tables in the Tabsdata server. A new commit to any of these tables triggers the subscriber. All listed trigger tables must exist in the server before registering the subscriber.
Defining trigger tables is optional. If you don’t define the trigger_by
property, the subscriber will be triggered by any of its input tables. If you define the trigger_by
property, then only those tables listed in the property can automatically trigger the subscriber.
For more information, see Working with Triggers.
<subscriber_name>#
<subscriber_name>
is the name for the subscriber that you are configuring.
<function_logic>#
<function_logic>
governs the processing performed by the subscriber. You can specify function logic to be a simple write or to perform additional processing as needed. For more information about the function logic that you can include, see Working with Tables.
<table_frame1>
, <table_frame2>
… are the names for the variables that temporarily store source data for processing.
<table_frame_output1>
, <table_frame_output2>
… are the output from the function that are written to the external system.