Getting Started#

This chapter provides a tutorial to walk you through essential Tabsdata operations.

In this tutorial, you create a collection to store the Tabsdata functions and tables. Then, you implement a publisher to read the data from an external system and write to Tabsdata, a transformer to modify the data, and a subscriber to read the tables from Tabsdata and write the modified data to an external system.

The tutorial shows you how to define, register, and trigger the functions. It also demonstrates how the downstream dependencies are automatically managed by Tabsdata.

If you encounter any trouble with the tutorial, you might find help in Troubleshooting, and reach out to us on Slack.

Step 1. Install Tabsdata#

Note: The following tutorial is meant for CMD shell. You might need to adapt some commands if using PowerShell. You can use bash such as MinGW, CygWIn or Bash for Git, and follow the steps for Linux/macOS.

While not required, you should install the Tabsdata Python package in a clean virtual environment. You can use python venv, conda or other similar tools to create and manage your virtual environment.

The commands in this tutorial assume that the user is running the tabsdata server in the same machine as the one in which doing the tutorial.

Important: The virtual environment or alternative installation location must have Python 3.12 or later.

To install Tabsdata, run the following command in your command line interface (CLI):

$ pip install tabsdata

Tabsdata installation can be done as an ‘all’ installation or as an ‘only what you need’ installation.

The ‘full’ instalation installs all Tabsdata and the third party libraries for all Tabsdata connectors. The ‘only what you need’ installation installs all Tabsdata but none of the third party connectors. This is done for two reasons, one is to keep the packages small. And second, some third party libraries require the users to accept their terms.

The ‘full’ installation is done with the command pip install tabsdata["all"]. The ‘only what you need’ installation is done using the command pip install tabsdata followed by installations for specific components, for example pip install tabsdata["salesforce"].

Step 2. Start the Server#

In case you have started the Tabsdata server earlier, it is suggested that you remove the older Tabsdata instance. This enables you to start from scratch, reducing the possibilities of error or conflicts.

Clearing the old Tabsdata instance [Only if you have started a Tabsdata server before]

Run the following commands in your CLI, to stop the Tabsdata server and clear the instance:

$ tdserver stop
$ tdserver clean
$ tdserver delete

Starting the server

To start the Tabsdata server, use the following command:

$ tdserver start

The command will run until all the required components are running. You can also stop the monitoring and check the status using the command below.

To verify that the Tabsdata server instance is running:

$ tdserver status

Output:

../../_images/tdserver_status.png

Step 3. Setup Tutorial#

TO setup the tutorial run the following commands in your CLI from your desired directory.

The td examples --dir examples command creates the examples directory, input and output directories and create the function configuration files called publisher.py, tranformer.py, and subscriber.py for the tutorial. This step is needed only in the context of this tutorial. You can choose a different setup for your own projects. The --dir flag specifies the directory where the examples will be created. You can change the directory name to your preference.

$ td examples --dir examples
$ cd examples

Step 4. Log in and Create a Collection#

Collections are logical containers for Tabsdata tables and functions. You use collections to enable different business domains to have their own organizational space.

Use the following steps to create a collection for the tutorial.

  1. If you are not logged in, use the following command to log in to Tabsdata:

$ td login --server localhost --user admin --role sys_admin --password tabsdata
  1. To create a collection called tutorial, run the following command:

$ td collection create --name tutorial

Now that you have created a collection, you’re ready to implement a publisher.

Step 5. Implement a Publisher#

Publishers import data from external systems such as local file systems, databases, and cloud storage, and publish the data as tables in the Tabsdata server.

Use the following steps to create a publisher that reads the sample data from local system, register the publisher in the tutorial collection, and manually trigger the publisher to execute.

1. Define the publisher.

In the publisher.py file in your examples directory, the following code is used to define a publisher.

import os
import tabsdata as td

@td.publisher(
    source = td.LocalFileSource(os.path.join(os.getcwd(), "input", "persons.csv")),
    tables = ["persons"]
)

def pub(persons: td.TableFrame):
    return persons

This publisher, named pub, reads the “persons.csv” file in the input folder and writes it to the Tabsdata server as a table called persons.

You can configure publishers to read data from many external systems. For details, Publishers.

2. Register the publisher.

Before you execute a publisher, you need to register it with a collection. Publishers write all of their output tables to the collection that they are registered with.

Use the following command to register your pub publisher with the tutorial collection:

$ td fn register --coll tutorial --path publisher.py::pub

3. Execute the publisher.

Use the following command to manually trigger the publisher to execute:

$ td fn trigger --coll tutorial --name pub

The command triggers the function and polls the transaction status. Once successful, the polling returns the following transaction data:

../../_images/pub_trx.png

.

4. View the new persons Tabsdata table.

To view the schema of the new persons table, run the following command:

$ td table schema --coll tutorial --name persons

The results should look like this:

../../_images/pub_schema.png

To view the data in the table, use the following command:

$ td table sample --coll tutorial --name persons

The results should look like this:

../../_images/sample_persons.png

Now that you have successfully defined, registered, and executed a publisher, you’re ready to do the same for a transformer.

Step 6. Implement a Transformer#

Transformers modify tables inside the Tabsdata server. They can read from one or more Tabsdata tables, transform the data, and write to new Tabsdata tables.

Use the following steps to create a transformer that modifies the tutorial data and writes the results to new tables, register the transformer with the tutorial collection, and manually trigger the transformer to execute.

1. Define a transformer.

In the transformer.py file in your examples directory, the following code defines a transformer.

import tabsdata as td

@td.transformer(
    input_tables=["persons"],
    output_tables=["spanish", "french", "german"]
)

def tfr(persons: td.TableFrame):
    persons = persons.select(["identifier", "name", "surname", "nationality", "language"])
    res = {}
    for nationality in ["Spanish", "French", "German"]:
        res[nationality] = persons.filter(td.col("nationality").eq(nationality)).drop(["nationality"])
    return res["Spanish"], res["French"], res["German"]

This transformer, named tfr, reads data from the persons Tabsdata table, transforms it, and writes the results to three output tables. The transformer performs the following processing:

  • Selects specific columns, omitting the other columns from the data

  • Filters data by nationality

  • Writes country-specific data to the appropriate table

Since no trigger is explicitly defined, the transformer is triggered by a commit to the specified input table: persons. You can use a trigger_by command to define different trigger tables or to prevent automated triggering. For more information, see Working with Triggers.

You can configure transformers to perform a range of operations. For details, see Transformers.

2. Register the transformer with a Tabsdata collection.

Before you execute a transformer, you need to register it with a collection. Transformers can read data from any Tabsdata collection. And like publishers, they write all of their output tables to the collection that they are registered with.

Use the following command to register the tfr transformer with the tutorial collection:

$ td fn register --coll tutorial --path transformer.py::tfr

3. Execute the transformer.

Use the trigger command to manually trigger the transformer to execute. The command syntax is the same for all Tabsdata functions:

$ td fn trigger --coll tutorial --name tfr

The command triggers the function and polls the transaction status. Once successful, the polling returns the transaction data such as the following:

../../_images/tfr_new.png

If you face any issues, make sure to check out the Troubleshooting guide.

4. View the output tables.

To view the schema of the spanish output table, run the following command:

$ td table schema --coll tutorial --name spanish

The results should look like this:

../../_images/transf_schema.png

If you like, you can use the command to check the schema for all of the new tables.

To view the data in the table, use the following command:

$ td table sample --coll tutorial --name spanish

The results should look like this:

../../_images/sample_spanish.png

Now that you have implemented a transformer, it’s time to work with a subscriber….

Step 7. Implement a Subscriber#

Subscribers export tables from the Tabsdata server to external systems such as local systems, databases, and cloud storage. Use subscribers to provide prepared data to data consumer teams.

Use the following steps to create a subscriber that exports the tutorial tables, register the subscriber with the tutorial collection, and manually trigger the subscriber to execute.

1. Define a subscriber.

In the subscriber.py file in your examples directory, the following code defines a subscriber.

import os
import tabsdata as td

@td.subscriber(
    ["spanish", "french"],
    td.LocalFileDestination(
        [
            os.path.join(os.getcwd(), "output", "spanish.jsonl"),
            os.path.join(os.getcwd(), "output", "french.jsonl")`
        ]
    )
)

def sub(spanish: td.TableFrame, french: td.TableFrame) -> (td.TableFrame, td.TableFrame):
    return spanish, french

This subscriber, named sub, reads data from the spanish and french tables in the Tabsdata server and writes the output files spanish.json1 and french.ndjson to the output folder in the examples directory.

Since no trigger is explicitly defined, the subscriber is triggered by a commit to any of the specified input tables: spanish or french. You can use a trigger_by command to define different trigger tables or to prevent automated triggering.

You can configure subscribers to export data to many external systems. For details, see Working with Triggers.

2. Register the subscriber.

Before you execute a subscriber, you need to register it with a collection. Subscribers can read data from any Tabsdata collection.

Use the following command to register the sub subscriber with the tutorial collection:

$ td fn register --coll tutorial --path subscriber.py::sub

3. Execute the subscriber.

Use the trigger command to manually trigger the subscriber:

$ td fn trigger --coll tutorial --name sub

The command triggers the function and polls the transaction status. Once successful, the polling returns the transaction data such as the following:

../../_images/sub_trx.png

If you face any issues, make sure to check out the Troubleshooting guide.

4. Verify subscriber execution.

You can verify that the spanish.json1 and french.ndjson output files have been created by listing files in the output directory:

$ ls output

You can use your favorite editor to view the contents of the files.

Step 8. Initiate Automated Triggers#

So far, you have used manual triggers to execute the functions in this tutorial.

As mentioned earlier, transformer and subscriber functions have default automated triggers: when a trigger is not explicitly defined, the input tables for the function act as trigger tables. Publishers do not include a default trigger like transformers and subscribers, but you can define them when needed. For more information, see Working with Triggers.

Since the tutorial transformer does not have a specified trigger, it is triggered by a commit to its input table, persons. Similarly, since the tutorial subscriber does not have a specified trigger, it is triggered by a commit to either of its input tables: french or spanish.

So when the tutorial publisher executes, it writes to the persons table and creates a commit to that table. The commit to the persons table automatically triggers the transformer. The transformer processes the data and generates commits to the french and spanish tables. This automatically triggers the subscriber, which exports the data to files in the output directory. As a result, the manual execution of the tutorial publisher results in the automatic processing of the data and the writing of the desired output files.

To see this in action, clear the output folder and test the automated triggers for this tutorial:

1. Remove the existing files from the output directory.

You can do this manually or by running the following command:

$ rm output/*

2. Execute the publisher.

To run the entire tutorial workflow, use the following command to manually trigger the publisher:

$ td fn trigger --coll tutorial --name pub

3. View the list of functions related to the manual trigger of the publisher.

The command triggers the function and polls the transaction status. Once successful, the polling returns the transaction for your publisher, transformer, and subscriber functions. The transaction data looks like the following:

../../_images/pub_trx_full.png

4. Verify the output.

You can verify that the spanish.json1 and french.ndjson files have been written to the output folder with the following command:

$ ls output

And you can once again check the contents of those files with your favorite editor.

Next Steps#

Congratulations on working with Tabsdata functions and creating an automated workflow!

Here are some suggestions for your next steps: