tabsdata.tableframe.udf.function

class UDF(
output_columns: list[tuple[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]] | tuple[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64],
)

Bases: ABC

columns() list[tuple[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]]
on_batch(
series: list[Series],
) list[Series]
on_batch(
*series: Series,
) list[Series]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_batch receives a list of series. Override the signature property to return “unpacked” to receive each series as a separate argument instead.

on_element(
values: list[Any],
) list[Any]
on_element(
*values: Any,
) list[Any]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_element receives a list of values. Override the signature property to return “unpacked” to receive each value as a separate argument instead.

property signature: Literal['list', 'unpacked']

Defines how parameters are passed to on_batch and on_element methods.

Returns:

Parameters are passed as a single list (default). “unpacked”: Each parameter is passed as a separate argument.

Return type:

“list”

Override this property in your UDF subclass to change the parameter style.

with_columns(
output_columns: tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None] | list[tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]] | dict[int, tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]],
) UDF
UDF.columns() list[tuple[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]]
UDF.on_batch(
series: list[Series],
) list[Series]
UDF.on_batch(
*series: Series,
) list[Series]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_batch receives a list of series. Override the signature property to return “unpacked” to receive each series as a separate argument instead.

UDF.on_element(
values: list[Any],
) list[Any]
UDF.on_element(
*values: Any,
) list[Any]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_element receives a list of values. Override the signature property to return “unpacked” to receive each value as a separate argument instead.

UDF.with_columns(
output_columns: tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None] | list[tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]] | dict[int, tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]],
) UDF
class UDFList(
output_columns,
)

Bases: UDF, ABC

Abstract base class for UDFs that use list-style parameter passing.

When subclassing UDFList, implement on_batch or on_element with list signature:

  • on_batch(self, series: list[Series]) -> list[Series]

  • on_element(self, values: list[Any]) -> list[Any]

property signature: Literal['list', 'unpacked']

Defines how parameters are passed to on_batch and on_element methods.

Returns:

Parameters are passed as a single list (default). “unpacked”: Each parameter is passed as a separate argument.

Return type:

“list”

Override this property in your UDF subclass to change the parameter style.

UDFList.columns() list[tuple[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]]
UDFList.on_batch(
*args,
) list[Series]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_batch receives a list of series. Override the signature property to return “unpacked” to receive each series as a separate argument instead.

UDFList.on_element(
*args,
) list[Any]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_element receives a list of values. Override the signature property to return “unpacked” to receive each value as a separate argument instead.

UDFList.with_columns(
output_columns: tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None] | list[tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]] | dict[int, tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]],
) UDF
class UDFUnpacked(
output_columns,
)

Bases: UDF, ABC

Abstract base class for UDFs that use unpacked-style parameter passing.

When subclassing UDFUnpacked, implement on_batch or on_element with unpacked signature:

  • on_batch(self, col1: Series, col2: Series, ...) -> list[Series]

  • on_element(self, val1: Any, val2: Any, ...) -> list[Any]

property signature: Literal['list', 'unpacked']

Defines how parameters are passed to on_batch and on_element methods.

Returns:

Parameters are passed as a single list (default). “unpacked”: Each parameter is passed as a separate argument.

Return type:

“list”

Override this property in your UDF subclass to change the parameter style.

UDFUnpacked.columns() list[tuple[str, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64]]
UDFUnpacked.on_batch(
*args,
) list[Series]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_batch receives a list of series. Override the signature property to return “unpacked” to receive each series as a separate argument instead.

UDFUnpacked.on_element(
*args,
) list[Any]

Creating UDFs:

  1. Subclass tabsdata.tableframe.udf.function.UDF.

  2. Implement __init__ to call super().__init__(output_columns) where output_columns is a tuple or list of tuples (name, data type) specifying the UDF default output schema (column names and data types). Each tuple must contain a column name (string) and a data type (DataType).

  3. Override exactly one of on_batch or on_element, to implement the UDF function logic.

  4. Return a list of TabsData Series (for on_batch) or TabsData supported scalars (for on_element) with the same length as specified in the output schema.

  5. If overriding the on_batch method, the return type must be a list of TabsData Series. If overriding the on_element method, the return type must be a list of supported TabsData scalar values. For both cases, the number of elements in the returned lists must match the number of elements in the output_columns list provided to the UDF constructor.

Using UDFs:

  1. Instantiate a function created as above.

  2. Pass it to TableFrame method udf().

  3. Optionally use UDF.output_columns() to override output column names or data types after instantiation.

  4. By default, on_element receives a list of values. Override the signature property to return “unpacked” to receive each value as a separate argument instead.

UDFUnpacked.with_columns(
output_columns: tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None] | list[tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]] | dict[int, tuple[str | None, Boolean | Categorical | Date | Datetime | Decimal | Duration | Enum | Float32 | Float64 | Int8 | Int16 | Int64 | Int32 | Int128 | Null | String | Time | UInt8 | UInt16 | UInt32 | UInt64 | None]],
) UDF