tabsdata.tableframe.expr.string.ExprStringNameSpace.grok#

ExprStringNameSpace.grok(pattern: str, schema: dict[str, Column]) Expr[source]#

Parse log text into structured fields using a Grok pattern.

Applies the given Grok pattern to the values in the current string expression. Each named capture group in the pattern becomes a new output column. Rows that do not match the pattern will return null for the extracted fields.

Parameters:
  • pattern (str) – Grok pattern with named captures (e.g., %{WORD:user}).

  • schema (dict[str, td_col.Column]) – A mapping where each capture name is associated with its corresponding column definition, specifying both the column name and its data type.

Example:

>>> import tabsdata as td
>>> tf = td.TableFrame({"logs": [
...     "alice-login-2023",
...     "bob-logout-2024",
... ]})
>>>
>>> log_pattern = r"%{WORD:user}-%{WORD:action}-%{INT:year}"
>>> log_schema = {
>>>     "word": td_col.Column("user", td.String),
>>>     "action": td_col.Column("action", td.String),
>>>     "year": td_col.Column("year", td.Int8),
>>> }
>>> out = tf.grok("logs", log_pattern, log_schema)
>>> tf.select(
...     td.col("logs"),
...     td.col("logs").str.grok(log_pattern, log_schema)
... )
>>>
┌──────────────────┬───────┬────────┬──────┐
│ logs             ┆ user  ┆ action ┆ year │
│ ---              ┆ ---   ┆ ---    ┆ ---  │
│ str              ┆ str   ┆ str    ┆ i64  │
╞══════════════════╪═══════╪════════╪══════╡
│ alice-login-2023 ┆ alice ┆ login  ┆ 2023 │
│ bob-logout-2024  ┆ bob   ┆ logout ┆ 2024 │
└──────────────────┴───────┴────────┴──────┘

Notes

  • The function automatically expands the Grok captures into separate columns.

  • Non-matching rows will show null for the extracted columns.

  • If a pattern defines duplicate capture names, numeric suffixes like field, field[1] will be used to disambiguate them.