String
- TableFrame.grok( ) TableFrame
- Categories:
string
Parse log text into structured fields using a Grok pattern.
Applies a Grok pattern to the given column or expression and directly appends one new column per named capture in the pattern to the output TableFrame. Rows that do not match the pattern will contain null values for the extracted columns.
- Parameters:
expr (IntoExpr) – Column name or expression that resolves to a single string column containing log lines.
pattern (str) – Grok pattern with named captures (e.g., %{WORD:user}).
schema (dict[str, td_col.Column]) – A mapping where each capture name is associated with its corresponding column definition, specifying both the column name and its data type.
- Returns:
A new TableFrame with one column per Grok capture added.
- Return type:
Example
>>> import tabsdata as td >>> tf = td.TableFrame({"logs": [ ... "alice-login-2023", ... "bob-logout-2024", ... ]}) >>> >>> # Capture 3 fields: user, action, year >>> log_pattern = r"%{WORD:user}-%{WORD:action}-%{INT:year}" >>> log_schema = { >>> "word": td_col.Column("user", td.String), >>> "action": td_col.Column("action", td.String), >>> "year": td_col.Column("year", td.Int8), >>> } >>> out = tf.grok("logs", log_pattern, log_schema) >>> out.collect() ┌──────────────────┬───────┬────────┬──────┐ │ logs ┆ user ┆ action ┆ year │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ i64 │ ╞══════════════════╪═══════╪════════╪══════╡ │ alice-login-2023 ┆ alice ┆ login ┆ 2023 │ │ bob-logout-2024 ┆ bob ┆ logout ┆ 2024 │ └──────────────────┴───────┴────────┴──────┘
Notes
The function automatically expands the Grok captures into separate columns.
Non-matching rows will show null for the extracted columns.
If a pattern defines duplicate capture names, numeric suffixes like field, field[1] will be used to disambiguate them.
- ExprStringNameSpace.contains( ) Expr
- Categories:
string
Evaluate if the string contains a pattern.
- Parameters:
pattern – The pattern to search for.
literal – Take the pattern as a literal string (not a regex).
strict – if the given pattern is not valid regex, raise an error.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.contains("ab").alias("contains")) >>> ┌──────┬──────────┐ │ a ┆ contains │ │ --- ┆ --- │ │ str ┆ bool │ ╞══════╪══════════╡ │ a ┆ false │ │ ab ┆ true │ │ b ┆ false │ │ xaby ┆ true │ │ null ┆ null │ └──────┴──────────┘
- ExprStringNameSpace.contains_any(
- patterns: td_typing.IntoExpr,
- *,
- ascii_case_insensitive: bool = False,
- Categories:
string
Evaluate if the string contains any of the given patterns.
- Parameters:
patterns – The patterns to search for.
ascii_case_insensitive – If true, the search is case-insensitive.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.contains_any(["a", "b"]).alias("contains_any")) >>> ┌──────┬──────────────┐ │ a ┆ contains_any │ │ --- ┆ --- │ │ str ┆ bool │ ╞══════╪══════════════╡ │ abc ┆ true │ │ axy ┆ true │ │ xyb ┆ true │ │ xyz ┆ false │ │ null ┆ null │ └──────┴──────────────┘
- ExprStringNameSpace.count_matches( ) Expr
- Categories:
string
Counts the ocurrrences of the given pattern in the string.
- Parameters:
pattern – The pattern to extract.
literal – Take the pattern as a literal string (not a regex).
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.count_matches("b.").alias("count_matches")) >>> ┌───────────┬───────────────┐ │ a ┆ count_matches │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════════╪═══════════════╡ │ a bAb c d ┆ 2 │ │ bCbb c d ┆ 2 │ │ bb ┆ 1 │ │ b ┆ 0 │ │ a ┆ 0 │ │ null ┆ null │ └───────────┴───────────────┘
- ExprStringNameSpace.ends_with( ) Expr
- Categories:
string
Evaluate if the string ends with.
- Parameters:
suffix – The suffix to search for.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.ends_with("b").alias("ends_with")) >>> ┌──────┬───────────┐ │ a ┆ ends_with │ │ --- ┆ --- │ │ str ┆ bool │ ╞══════╪═══════════╡ │ a ┆ false │ │ ab ┆ true │ │ b ┆ true │ │ xaby ┆ false │ │ null ┆ null │ └──────┴───────────┘
- ExprStringNameSpace.extract(
- pattern: td_typing.IntoExprColumn,
- group_index: int = 1,
- Categories:
string
Extract a pattern from the string.
- Parameters:
pattern – The pattern to extract.
group_index – The group index to extract.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.extract("(b.b)", 1).alias("extract")) >>> ┌───────────┬─────────┐ │ a ┆ extract │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════╪═════════╡ │ a bAb c d ┆ bAb │ │ bCbb c d ┆ bCb │ │ bb ┆ null │ │ null ┆ null │ └───────────┴─────────┘
- ExprStringNameSpace.find( ) Expr
- Categories:
string
Find the position of the first occurrence of the given pattern.
- Parameters:
pattern – The pattern to search for.
literal – Take the pattern as a literal string (not a regex).
strict – if the given pattern is not valid regex, raise an error.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.find("b").alias("find")) >>> ┌──────┬──────┐ │ a ┆ find │ │ --- ┆ --- │ │ str ┆ u32 │ ╞══════╪══════╡ │ a ┆ null │ │ ab ┆ 1 │ │ b ┆ 0 │ │ xaby ┆ 2 │ │ null ┆ null │ └──────┴──────┘
- ExprStringNameSpace.grok( ) Expr
- Categories:
string
Parse log text into structured fields using a Grok pattern.
Applies the given Grok pattern to the values in the current string expression. Each named capture group in the pattern becomes a new output column. Rows that do not match the pattern will return null for the extracted fields.
- Parameters:
Example:
>>> import tabsdata as td >>> tf = td.TableFrame({"logs": [ ... "alice-login-2023", ... "bob-logout-2024", ... ]}) >>> >>> log_pattern = r"%{WORD:user}-%{WORD:action}-%{INT:year}" >>> log_schema = { >>> "word": td_col.Column("user", td.String), >>> "action": td_col.Column("action", td.String), >>> "year": td_col.Column("year", td.Int8), >>> } >>> out = tf.grok("logs", log_pattern, log_schema) >>> tf.select( ... td.col("logs"), ... td.col("logs").str.grok(log_pattern, log_schema) ... ) >>> ┌──────────────────┬───────┬────────┬──────┐ │ logs ┆ user ┆ action ┆ year │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ i64 │ ╞══════════════════╪═══════╪════════╪══════╡ │ alice-login-2023 ┆ alice ┆ login ┆ 2023 │ │ bob-logout-2024 ┆ bob ┆ logout ┆ 2024 │ └──────────────────┴───────┴────────┴──────┘
Notes
The function automatically expands the Grok captures into separate columns.
Non-matching rows will show null for the extracted columns.
If a pattern defines duplicate capture names, numeric suffixes like field, field[1] will be used to disambiguate them.
- ExprStringNameSpace.head(
- n: int | td_typing.IntoExprColumn,
- Categories:
string
Extract the start of the string up to the given length.
- Parameters:
n – The length of the head.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.head(2).alias("head")) >>> ┌──────┬──────┐ │ a ┆ head │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪══════╡ │ abc ┆ ab │ │ a ┆ a │ │ null ┆ null │ └──────┴──────┘
- ExprStringNameSpace.len_bytes() Expr
- Categories:
string
Return number of bytes (not chars) of a string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.len_bytes().alias("len_bytes")) >>> ┌──────┬────────────┐ │ a ┆ to_decimal │ │ --- ┆ --- │ │ str ┆ u32 │ ╞══════╪════════════╡ │ ab ┆ 2 │ │ 再 ┆ 3 │ │ null ┆ null │ └──────┴────────────┘
- ExprStringNameSpace.len_chars() Expr
- Categories:
string
Return number of chars (not bytes) of a string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.len_chars().alias("len_chars")) >>> ┌──────┬────────────┐ │ a ┆ to_decimal │ │ --- ┆ --- │ │ str ┆ u32 │ ╞══════╪════════════╡ │ ab ┆ 2 │ │ 再 ┆ 3 │ │ null ┆ null │ └──────┴────────────┘
- ExprStringNameSpace.pad_end( ) Expr
- Categories:
string
Pad string values at the end to the given length using the given fill character.
- Parameters:
length – The length to end pad the string to.
fill_char – The character to use for padding.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.pad_end(6, "-").alias("pad_end")) >>> ┌────────┬─────────┐ │ a ┆ pad_end │ │ --- ┆ --- │ │ str ┆ str │ ╞════════╪═════════╡ │ abc ┆ abc--- │ │ def ┆ def │ │ null ┆ null │ └────────┴─────────┘
- ExprStringNameSpace.pad_start( ) Expr
- Categories:
string
Pad string values at the front to the given length using the given fill character.
- Parameters:
length – The length to front pad the string to.
fill_char – The character to use for padding.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.pad_start(6, "-").alias("pad_start")) >>> ┌────────┬───────────┐ │ a ┆ pad_start │ │ --- ┆ --- │ │ str ┆ str │ ╞════════╪═══════════╡ │ abc ┆ ---abc │ │ def ┆ def │ │ null ┆ null │ └────────┴───────────┘
- ExprStringNameSpace.replace( ) Expr
- Categories:
string
Replace the first occurence of a pattern with the given string.
- Parameters:
pattern – The pattern to replace.
value – The value to replace the pattern with.
literal – Take the pattern as a literal string (not a regex).
n – Number of matches to replace.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.replace("b", "X").alias("replace")) >>> ┌───────────┬───────────┐ │ a ┆ replace │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════╪═══════════╡ │ a bAb c d ┆ a XAb c d │ │ bCbb c d ┆ XCbb c d │ │ bb ┆ Xb │ │ b ┆ X │ │ a ┆ a │ │ null ┆ null │ └───────────┴───────────┘
- ExprStringNameSpace.replace_all( ) Expr
- Categories:
string
Replace the all occurences of a pattern with the given string.
- Parameters:
pattern – The pattern to replace.
value – The value to replace the pattern with.
literal – Take the pattern as a literal string (not a regex).
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.replace("b", "X").alias("replace")) >>> ┌───────────┬─────────────┐ │ a ┆ replace_all │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════╪═════════════╡ │ a bAb c d ┆ a XAX c d │ │ bCbb c d ┆ XCXX c d │ │ bb ┆ XX │ │ b ┆ X │ │ a ┆ a │ │ null ┆ null │ └───────────┴─────────────┘
- ExprStringNameSpace.replace_many(patterns: td_typing.IntoExpr | Mapping[str, str], replace_with: td_typing.IntoExpr | NoDefault = <no_default>, *, ascii_case_insensitive: bool = False) td_expr.Expr
- Categories:
string
Replace the all occurences of any the given patterns with the given string.
- Parameters:
patterns – The patterns to replace.
replace_with – The value to replace the pattern with.
ascii_case_insensitive – If true, the search is case-insensitive.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.replace_many(["a", "b"], "X").alias("replace_many")) >>> ┌──────┬──────────────┐ │ a ┆ replace_many │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪══════════════╡ │ abc ┆ XXc │ │ axy ┆ Xxy │ │ xyb ┆ xyX │ │ xyz ┆ xyz │ │ null ┆ null │ └──────┴──────────────┘
- ExprStringNameSpace.reverse() Expr
- Categories:
string
Reverse the string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.reverse().alias("reverse")) >>> ┌──────┬─────────┐ │ a ┆ reverse │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪═════════╡ │ abc ┆ cba │ │ a ┆ a │ │ null ┆ null │ └──────┴─────────┘
- ExprStringNameSpace.slice( ) td_expr.Expr
- Categories:
string
Extract the substring at the given offset for the given length.
- Parameters:
offset – The offset to start the slice.
length – The length of the slice. If None, slice until the end of the string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.slice(1,1).alias("slice")) >>> ┌──────┬───────┐ │ a ┆ slice │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪═══════╡ │ abc ┆ b │ │ a ┆ │ │ null ┆ null │ └──────┴───────┘
- ExprStringNameSpace.starts_with( ) Expr
- Categories:
string
Evaluate if the string start with.
- Parameters:
prefix – The suffix to search for.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.starts_with("a").alias("starts_with")) >>> ┌──────┬────────────┐ │ a ┆ start_with │ │ --- ┆ --- │ │ str ┆ bool │ ╞══════╪════════════╡ │ a ┆ true │ │ ab ┆ true │ │ b ┆ false │ │ xaby ┆ false │ │ null ┆ null │ └──────┴────────────┘
- ExprStringNameSpace.strip_chars(
- characters: td_typing.IntoExpr = None,
- Categories:
string
Trim string values.
- Parameters:
characters – Characters to trim from start and end of the string. All characteres in the given string are removed, regardless the order. Default is whitespace.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.strip_chars("a ").alias("strip_chars")) >>> ┌─────────────────────────────────┬─────────────┐ │ a ┆ strip_chars │ │ --- ┆ --- │ │ str ┆ str │ ╞═════════════════════════════════╪═════════════╡ │ acba cda … ┆ cba cd │ │ xy z ┆ xy z │ │ null ┆ null │ └─────────────────────────────────┴─────────────┘
- ExprStringNameSpace.strip_chars_end(
- characters: td_typing.IntoExpr = None,
- Categories:
string
Trim string values from the end of the string.
- Parameters:
characters – Characters to trim from start of the string. All ending characteres in the given string are removed, regardless the order. Default is whitespace.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.strip_chars_end("dc ").alias("strip_chars_end")) >>> ┌───────────────────────────────┬─────────────────┐ │ a ┆ strip_chars_end │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════════════════════════╪═════════════════╡ │ cba cd ┆ cba │ │ xy z ┆ xy z │ │ null ┆ null │ └───────────────────────────────┴─────────────────┘
- ExprStringNameSpace.strip_chars_start(
- characters: td_typing.IntoExpr = None,
- Categories:
string
Trim string values from the start of the string.
- Parameters:
characters – Characters to trim from start of the string. All starting characteres in the given string are removed, regardless the order. Default is whitespace.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.strip_chars_start("abc").alias("strip_chars_start")) >>> ┌───────────────────────────────┬────────────────────────────┐ │ a ┆ strip_chars_start │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════════════════════════╪════════════════════════════╡ │ cba cd ┆ cd │ │ xy z ┆ xy z │ │ null ┆ null │ └───────────────────────────────┴────────────────────────────┘
- ExprStringNameSpace.strip_prefix(
- prefix: td_typing.IntoExpr,
- Categories:
string
Trim string values removing the given prefix
- Parameters:
prefix – Prefix to remove from the string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.strip_prefix("cb").alias("strip_prefix")) >>> ┌───────────────────────────────┬─────────────────┐ │ a ┆ strip_prefix │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════════════════════════╪═════════════════╡ │ cba cd ┆ a cd │ │ bx ┆ bx │ │ null ┆ null │ └───────────────────────────────┴─────────────────┘
- ExprStringNameSpace.strip_suffix(
- suffix: td_typing.IntoExpr,
- Categories:
string
Trim string values removing the given suffix
- Parameters:
suffix – Suffix to remove from the string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a") .str.strip_suffix("cd").alias("strip_suffix")) >>> ┌───────────────────────────────┬─────────────────┐ │ a ┆ strip_suffix │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════════════════════════╪═════════════════╡ │ cba cd ┆ cba │ │ bx ┆ bx │ │ null ┆ null │ └───────────────────────────────┴─────────────────┘
- ExprStringNameSpace.tail(
- n: int | td_typing.IntoExprColumn,
- Categories:
string
Extract the end of the string up to the given length.
- Parameters:
n – The length of the tail.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.tail(2).alias("tail")) >>> ┌──────┬──────┐ │ a ┆ tail │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪══════╡ │ abc ┆ bc │ │ a ┆ a │ │ null ┆ null │ └──────┴──────┘
- ExprStringNameSpace.to_lowercase() Expr
- Categories:
string
Return the lowercase of a string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.to_lowercase().alias("to_lowercase")) >>> ┌──────┬───────────────┐ │ a ┆ to_lowerrcase │ │ --- ┆ --- │ │ str ┆ u32 │ ╞══════╪═══════════════╡ │ aB ┆ ab │ │ null ┆ null │ └──────┴───────────────┘
- ExprStringNameSpace.to_titlecase() Expr
- Categories:
string
Uppercase the first character and lowercase all the others ones of a string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.to_titlecase().alias("titlecase")) >>> ┌──────┬───────────┐ │ a ┆ titlecase │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪═══════════╡ │ ab ┆ Ab │ │ Ab ┆ Ab │ │ AB ┆ Ab │ │ aB ┆ Ab │ │ null ┆ null │ └──────┴───────────┘
- ExprStringNameSpace.to_uppercase() Expr
- Categories:
string
Return the uppercase of a string.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.to_uppercase().alias("to_uppercase")) >>> ┌──────┬──────────────┐ │ a ┆ to_uppercase │ │ --- ┆ --- │ │ str ┆ u32 │ ╞══════╪══════════════╡ │ aB ┆ AB │ │ null ┆ null │ └──────┴──────────────┘
- ExprStringNameSpace.zfill(
- length: int | td_typing.IntoExprColumn,
- Categories:
string
Pad numeric string values at the start to the given length using zeros.
- Parameters:
length – The length to end pad the string to.
Example:
>>> import tabsdata as td >>> >>> tf: td.TableFrame ... >>> >>> tf.select(td.col("a"), td.col("a").str.zfill(2).alias("zfill")) >>> ┌──────┬───────┐ │ a ┆ zfill │ │ --- ┆ --- │ │ str ┆ str │ ╞══════╪═══════╡ │ 0 ┆ 00 │ │ 1 ┆ 01 │ │ 1000 ┆ 1000 │ │ null ┆ null │ └──────┴───────┘