tabsdata.tableframe.expr.string.ExprStringNameSpace.grok#
- ExprStringNameSpace.grok(pattern: str, schema: dict[str, Column]) Expr [source]#
Parse log text into structured fields using a Grok pattern.
Applies the given Grok pattern to the values in the current string expression. Each named capture group in the pattern becomes a new output column. Rows that do not match the pattern will return null for the extracted fields.
- Parameters:
Example:
>>> import tabsdata as td >>> tf = td.TableFrame({"logs": [ ... "alice-login-2023", ... "bob-logout-2024", ... ]}) >>> >>> log_pattern = r"%{WORD:user}-%{WORD:action}-%{INT:year}" >>> log_schema = { >>> "word": td_col.Column("user", td.String), >>> "action": td_col.Column("action", td.String), >>> "year": td_col.Column("year", td.Int8), >>> } >>> out = tf.grok("logs", log_pattern, log_schema) >>> tf.select( ... td.col("logs"), ... td.col("logs").str.grok(log_pattern, log_schema) ... ) >>> ┌──────────────────┬───────┬────────┬──────┐ │ logs ┆ user ┆ action ┆ year │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ i64 │ ╞══════════════════╪═══════╪════════╪══════╡ │ alice-login-2023 ┆ alice ┆ login ┆ 2023 │ │ bob-logout-2024 ┆ bob ┆ logout ┆ 2024 │ └──────────────────┴───────┴────────┴──────┘
Notes
The function automatically expands the Grok captures into separate columns.
Non-matching rows will show null for the extracted columns.
If a pattern defines duplicate capture names, numeric suffixes like field, field[1] will be used to disambiguate them.