tabsdata.tableframe.lazyframe.frame.TableFrame.grok#
- TableFrame.grok(expr: td_typing.IntoExpr, pattern: str, schema: dict[str, td_col.Column]) TableFrame [source]#
Parse log text into structured fields using a Grok pattern.
Applies a Grok pattern to the given column or expression and directly appends one new column per named capture in the pattern to the output TableFrame. Rows that do not match the pattern will contain null values for the extracted columns.
- Parameters:
expr (IntoExpr) – Column name or expression that resolves to a single string column containing log lines.
pattern (str) – Grok pattern with named captures (e.g., %{WORD:user}).
schema (dict[str, td_col.Column]) – A mapping where each capture name is associated with its corresponding column definition, specifying both the column name and its data type.
- Returns:
A new TableFrame with one column per Grok capture added.
- Return type:
TableFrame
Example
>>> import tabsdata as td >>> tf = td.TableFrame({"logs": [ ... "alice-login-2023", ... "bob-logout-2024", ... ]}) >>> >>> # Capture 3 fields: user, action, year >>> log_pattern = r"%{WORD:user}-%{WORD:action}-%{INT:year}" >>> log_schema = { >>> "word": td_col.Column("user", td.String), >>> "action": td_col.Column("action", td.String), >>> "year": td_col.Column("year", td.Int8), >>> } >>> out = tf.grok("logs", log_pattern, log_schema) >>> out.collect() ┌──────────────────┬───────┬────────┬──────┐ │ logs ┆ user ┆ action ┆ year │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ i64 │ ╞══════════════════╪═══════╪════════╪══════╡ │ alice-login-2023 ┆ alice ┆ login ┆ 2023 │ │ bob-logout-2024 ┆ bob ┆ logout ┆ 2024 │ └──────────────────┴───────┴────────┴──────┘
Notes
The function automatically expands the Grok captures into separate columns.
Non-matching rows will show null for the extracted columns.
If a pattern defines duplicate capture names, numeric suffixes like field, field[1] will be used to disambiguate them.