Aggregation

TableFrame.group_by( *by: td_typing.IntoExpr | Iterable[td_typing.IntoExpr], ) → td_group_by.TableFrameGroupBy

Categories:: aggregation

Perform a group by on the TableFrame.

Parameters:: by – Columns or expressions to group by.

Example:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ A   ┆ 1   │
│ X   ┆ 10  │
│ C   ┆ 3   │
│ D   ┆ 5   │
│ M   ┆ 9   │
│ A   ┆ 100 │
│ M   ┆ 50  │
└─────┴─────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b").sum())
>>>
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ M   ┆ 59  │
│ A   ┆ 101 │
│ C   ┆ 3   │
│ D   ┆ 5   │
│ X   ┆ 10  │
└─────┴─────┘

Expr.count() → Expr

Categories:: aggregation

Aggregation operation that counts the non null values of the given column in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 2    │
│ A    ┆ 3    │
│ B    ┆ 0    │
│ C    ┆ 5    │
│ null ┆ 6    │
│ C    ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).count().alias("count"))
>>>
┌──────┬───────┐
│ a    ┆ count │
│ ---  ┆ ---   │
│ str  ┆ u32   │
╞══════╪═══════╡
│ null ┆ 1     │
│ A    ┆ 2     │
│ B    ┆ 2     │
│ C    ┆ 1     │
└──────┴───────┘

Expr.first() → Expr

Categories:: aggregation

Get the first element.

Example:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
>>> tf = tf.select(td.col("age"), td.col("age").first().alias("first"))
>>>
┌──────┬─────────┐
│ age  ┆ first   │
│ ---  ┆ ------- │
│ i64  ┆ i64     │
╞══════╪═════════╡
│ 10   ┆ 10      │
│ 11   ┆ 10      │
│ 18   ┆ 10      │
│ 65   ┆ 10      │
│ 70   ┆ 10      │
└──────┴─────────┘

Expr.last() → Expr

Categories:: aggregation

Get the last element.

Example:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
>>> tf = tf.select(td.col("age"), td.col("age").first().alias("last"))
>>>
┌──────┬─────────┐
│ age  ┆ last    │
│ ---  ┆ ------- │
│ i64  ┆ i64     │
╞══════╪═════════╡
│ 10   ┆ 70      │
│ 11   ┆ 70      │
│ 18   ┆ 70      │
│ 65   ┆ 70      │
│ 70   ┆ 70      │
└──────┴─────────┘

Expr.len() → Expr

Categories:: aggregation

Aggregation operation that counts the rows in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 2    │
│ A    ┆ 3    │
│ B    ┆ 0    │
│ C    ┆ 5    │
│ null ┆ 6    │
│ C    ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).len().alias("len"))
>>>
┌──────┬─────┐
│ a    ┆ len │
│ ---  ┆ --- │
│ str  ┆ u32 │
╞══════╪═════╡
│ null ┆ 1   │
│ A    ┆ 2   │
│ B    ┆ 2   │
│ C    ┆ 2   │
└──────┴─────┘

Expr.max() → Expr

Categories:: aggregation

Aggregation operation that finds the maximum value in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 0    │
│ A    ┆ 2    │
│ B    ┆ 3    │
│ B    ┆ 4    │
│ C    ┆ -1   │
│ C    ┆ -2   │
│ C    ┆ -3   │
│ D    ┆ -4   │
│ F    ┆ -5   │
│ null ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).max())
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ F    ┆ -5   │
│ C    ┆ -1   │
│ A    ┆ 2    │
│ B    ┆ 4    │
│ D    ┆ -4   │
│ null ┆ null │
└──────┴──────┘

Expr.mean() → Expr

Categories:: aggregation

Aggregation operation that finds the mean of the values in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 0    │
│ A    ┆ 2    │
│ B    ┆ 3    │
│ B    ┆ 4    │
│ C    ┆ -1   │
│ C    ┆ -2   │
│ C    ┆ -3   │
│ D    ┆ -4   │
│ F    ┆ -5   │
│ null ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b").mean())
>>>
┌──────┬──────────┐
│ ss   ┆ i        │
│ ---  ┆ ---      │
│ str  ┆ f64      │
╞══════╪══════════╡
│ null ┆ null     │
│ A    ┆ 1.5      │
│ F    ┆ -5.0     │
│ C    ┆ -2.0     │
│ D    ┆ -4.0     │
│ B    ┆ 2.333333 │
└──────┴──────────┘

Expr.median() → Expr

Categories:: aggregation

Aggregation operation that finds the median of the values in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 0    │
│ A    ┆ 2    │
│ B    ┆ 3    │
│ B    ┆ 4    │
│ C    ┆ -1   │
│ C    ┆ -2   │
│ C    ┆ -3   │
│ D    ┆ -4   │
│ F    ┆ -5   │
│ null ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).median())
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ F    ┆ -5.0 │
│ C    ┆ -2.0 │
│ B    ┆ 3.0  │
│ D    ┆ -4.0 │
│ A    ┆ 1.5  │
│ null ┆ null │
└──────┴──────┘

Expr.min() → Expr

Categories:: aggregation

Aggregation operation that finds the minimum value in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 0    │
│ A    ┆ 2    │
│ B    ┆ 3    │
│ B    ┆ 4    │
│ C    ┆ -1   │
│ C    ┆ -2   │
│ C    ┆ -3   │
│ D    ┆ -4   │
│ F    ┆ -5   │
│ null ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).min())
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ C    ┆ -3   │
│ A    ┆ 1    │
│ null ┆ null │
│ B    ┆ 0    │
│ F    ┆ -5   │
│ D    ┆ -4   │
└──────┴──────┘

Expr.n_unique() → Expr

Categories:: aggregation

Aggregation operation that counts the unique values of the given column in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 0    │
│ A    ┆ 2    │
│ B    ┆ 3    │
│ B    ┆ 4    │
│ C    ┆ -1   │
│ C    ┆ -2   │
│ C    ┆ -3   │
│ D    ┆ -4   │
│ F    ┆ -5   │
│ null ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).n_unique())
>>>
┌──────┬─────┐
│ ss   ┆ i   │
│ ---  ┆ --- │
│ str  ┆ u32 │
╞══════╪═════╡
│ D    ┆ 1   │
│ C    ┆ 3   │
│ A    ┆ 2   │
│ B    ┆ 3   │
│ F    ┆ 1   │
│ null ┆ 1   │
└──────┴─────┘

Expr.rank( method: Literal['average', 'min', 'max', 'dense', 'ordinal', 'random'] = 'average', *, descending: bool = False, seed: int | None = None, ) → Expr

Categories:: aggregation

Compute the rank of the element values. Multiple rank types are available.

Parameters:

method – the ranking type: ‘average’ (default), ‘dense’, ‘max’, ‘min’, ‘ordinal’ or ‘random’.
descending – if the order is ascending (default) or descending.
seed – random seed when using ‘random’ rank type.

Example:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
>>> tf.select(td.col("val"), td.col("val").rank("max").alias("rank"))
>>>
┌──────┬──────┐
│ val  ┆ rank │
│ ---  ┆ ---  │
│ f64  ┆ u32  │
╞══════╪══════╡
│ -1.0 ┆ 1    │
│ 0.0  ┆ 2    │
│ 1.1  ┆ 3    │
│ 2.0  ┆ 4    │
│ inf  ┆ 5    │
│ null ┆ null │
│ NaN  ┆ 6    │
└──────┴──────┘

Expr.sum() → Expr

Categories:: aggregation

Aggregation operation that sums the values in the group.

Examples:

>>> import tabsdata as td
>>>
>>> tf: td.TableFrame ...
>>>
┌──────┬──────┐
│ ss   ┆ i    │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 0    │
│ A    ┆ 2    │
│ B    ┆ 3    │
│ B    ┆ 4    │
│ C    ┆ -1   │
│ C    ┆ -2   │
│ C    ┆ -3   │
│ D    ┆ -4   │
│ F    ┆ -5   │
│ null ┆ null │
└──────┴──────┘
>>>
>>> tf.group_by(td.col("a")).agg(td.col("b")).sum())
>>>
┌──────┬─────┐
│ ss   ┆ i   │
│ ---  ┆ --- │
│ str  ┆ i64 │
╞══════╪═════╡
│ null ┆ 0   │
│ A    ┆ 3   │
│ B    ┆ 7   │
│ C    ┆ -6  │
│ D    ┆ -4  │
│ F    ┆ -5  │
└──────┴─────┘