Key Concepts#
Tabsdata is a data integration middleware that leverages the publish-subscribe (pub/sub) model to enable seamless and decoupled sharing and access to data within an organization.
Here’s the basic Tabsdata pub/sub workflow:
Data producers use a publisher function to read data from an external source system and publish the data to tables in the Tabsdata server. This provides engineers from the data production team a greater control of the data that is disseminated for use and an easy way to provide data to multiple data consumers.
In Tabsdata, data engineers can then use the transformer function to perform complex transformations using Python-based APIs, as needed. The results are stored as tables in the Tabsdata server.
Data consumers, such as operational and analytics teams, use the subscriber function to subscribe to the Tabsdata tables that they are interested in. Subscribers export data from one or more tables in the Tabsdata server to an external destination system.
The following Tabsdata concepts enable this workflow:
Tabsdata Server#
The Tabsdata server performs all data processing and stores generated data, such as current and historical writes, as Tabsdata tables. The server also maintains the metadata catalog.
The Tabsdata server performs processing using Tabsdata TableFrames, which are populated using built-in connectors. You can also write your own connectors.
Since Tabsdata maintains a full history of table data and metadata, it provides clear and accurate lineage and provenance information to aid in troubleshooting and governance.
Collections#
A collection is a logical container for Tabsdata tables and functions. A collection normally defines a domain. For example, all tables and functions related to sales might be grouped into a single collection for easier maintenance and control.
Before you run a function, you must register it with a collection. Registration with a collection affects functions as follows:
All functions and tables within a collection must have a unique name.
Functions that read Tabsdata tables - transformers and subscribers - can read from any collection.
Functions that write to Tabsdata tables - publishers and transformers - can write only to the collection they are registered with.
Tables#
A table is the basic unit of operation inside Tabsdata. A Tabsdata table is organized in rows and columns, similar to a database table. Each column has a name and all the values in the column have the same data type.
Tables are created in Tabsdata by publishers importing external source data as tables in the Tabsdata server or by transformers processing Tabsdata tables to create new tables.
Table Commits#
A table commit is generated by a successful execution of a publisher or transformer.
Once created, a table commit is immutable. The Tabsdata server retains previous commits of all tables for auditing and traceability.
When a function reads a table, it reads the latest commit by default. You can configure functions to read earlier commits when needed.
When a publisher or transformer writes to multiple output tables, each table receives a commit with the successful execution of the function regardless of whether changes in data or metadata have occurred.
TableFrame#
Based on the Polars DataFrame, the Tabsdata TableFrame is a two-dimensional, labeled data structure, used for data manipulation and analysis by Tabsdata functions. Tables in the Tabsdata server are operated on as TableFrames.
For more information, see Working with Tables.
Functions#
Tabsdata functions process data. Tabsdata provides publisher functions to read data from external systems, transformer functions to modify data, and subscriber functions to write to external systems.
Tabsdata functions process data as TableFrames using the TableFrame API. The Tabsdata TableFrame is similar to dataframes used in popular data engineering libraries but with added benefits of lineage and provenance.
For publishers, the TableFrame API uses connectors to populate Tabsdata with data from external systems. Similarly, for subscribers, the TableFrame API writes data from Tabsdata to external systems using connectors.
The use of our built-in connectors greatly reduces the complexity of accessing external systems.
Tabsdata provides built-in connectors to connect with following external systems:
Local file systems
Amazon S3
Azure
MariaDB
MySQL
Oracle
PostgreSQL
When needed, you can build your own connectors. For details, see Working with Connector Plugins.
Publishers#
A publisher function reads data from an external system and writes the data to one or more tables in the Tabsdata server.
For example, you might create one publisher to read from Oracle and another from Amazon S3. Both write the data to tables in the Tabsdata server.
For more information, see Working with Publishers.
Transformers#
A transformer function reads data from one or more tables in the Tabsdata server, transforms data, then writes the transformed data to tables in the Tabsdata server.
For example, you might use a transformer to join data from two Tabsdata tables and create an output table for further processing or for exporting to external systems.
For more information, see Working with Transformers.
Subscribers#
A subscriber function reads data from tables in the Tabsdata server and writes the data to an external system.
For example, you create separate subscribers to read from tables in the Tabsdata server and write the data to Amazon S3 and Azure.
For more information, see Working with Subscribers.
Triggers#
A trigger executes a function. A trigger can be initiated through a CLI command or by a new commit to its associated table. Consequently, changes on tables can automatically trigger functions, which in turn change other tables, leading to a cascading workflow of updates.
For example, you can define a trigger to automatically execute a publisher each time there is an update in the sales table in the Tabsdata server. You can also trigger the function manually from CLI.
For more information, see Working with Triggers.