tabsdata.dataquality

class AllOk(
none_is_ok: bool = False,
nan_is_ok: bool = False,
*,
tags: str | list[str] | None = None,
)

Bases: BoolCriteria

Categories:

dq-operator

A boolean criteria that selects records where all boolean classifiers pass (True).

class AnyFailed(
none_is_ok: bool = False,
nan_is_ok: bool = False,
*,
tags: str | list[str] | None = None,
)

Bases: BoolCriteria

Categories:

dq-operator

A boolean criteria that selects records where at least one boolean classifier fails (False).

class BoolClassifier(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: Classifier, ABC

Abstract base class for classifiers that produce a boolean result (True or False).

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by boolean classifiers.

classmethod BoolClassifier.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by boolean classifiers.

class BoolCriteria(
none_is_ok: bool,
nan_is_ok: bool,
*,
tags: str | list[str] | None = None,
)

Bases: Criteria, ABC

A boolean criteria to select records based on classifier results.

property nan_is_ok: bool

Returns if NaN values should be considered OK.

property none_is_ok: bool

Returns if None values should be considered OK.

property tags: list[str] | None

Returns the list of tags associated with this criteria.

class Categorizer(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: Classifier, ABC

A base class for categorizers.

classmethod Categorizer.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by this classifier.

class CategoryCriteria(
column_name: str,
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow'] | Annotated[list[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']], Strict],
)

Bases: Criteria, ABC

A category criteria to select records based on classifier results.

property bins: AbstractSet[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']]

Returns a set of bins.

property column_name: str

Returns the column name to check for values.

class Classifier(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: ABC

Abstract base class for all data quality classifiers.

A classifier is applied to one or more columns to produce a classification result, which is stored in a new column. It supports tags to allow data quality operations to selectively act on specific classifiers.

property column_names: list[tuple[str, str | None]] | None

Returns the list of columns the classifier applies to, including optional destination column names.

property on_missing_column: Literal['ignore', 'fail']

Returns what do if the column is missing.

property on_wrong_type: Literal['ignore', 'fail']

Returns what do if the column type is incompatible with the classifier.

property on_wrong_value: Literal['ignore', 'fail']

Returns what do if the column type is incompatible with the classifier.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by this classifier.

property tags: list[str] | None

Returns the list of tags associated with the classifier.

classmethod Classifier.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by this classifier.

class Criteria

Bases: ABC

Defines the conditions for selecting records based on classifier results.

A Criteria object specifies which rows of a table should be considered for a data quality operation. This can be based on the outcomes of boolean classifiers or the binning results of categorical classifiers.

class DataQuality(
table: str,
classifiers: Classifier | list[Classifier],
operators: Operator | list[Operator],
)

Bases: OnTablesAction

Categories:

dq-action

Represents a data quality action to be performed on a table.

This class encapsulates a set of data quality rules, defined by classifiers and operators, that are applied to a specified table. It serves as a container for a complete data quality workflow.

property classifiers: list[Classifier]

Returns the list of classifiers used in this data quality action.

property operators: list[Operator]

Returns the list of operators that will be executed based on the classifier results.

property table: str

Returns the name of the table targeted by this data quality action.

class DoesNotMatch(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
pattern: str,
tags: str | list[str] | None = None,
)

Bases: MatchClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value does not match a regex pattern.

classmethod DoesNotMatch.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by Matches classifier.

class Enrich(
to_table: str | None = None,
*,
tags: str | list[str] | None = None,
)

Bases: TagOperator

Categories:

dq-operator

An operator that adds the data quality classifier columns to the original table frame. It does not remove any row from the original table.

property to_table: str | None

Returns the destination table with the enriched data quality classifier columns. If None, the original table is used.

class ExponentialScale(
min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
base: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None,
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
use_bin_zero: bool = False,
)

Bases: NonIdentityScale

Categories:

dq-classifier

An exponential scale.

property base: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])]

Returns the base of the exponential scale.

class Fail(
criteria: Criteria,
threshold: Threshold,
)

Bases: CriteriaOperator

Categories:

dq-operator

An operator that fails the current transaction if a data quality threshold is met.

property threshold: Threshold

Returns the threshold for this Fail operation.

class Filter(
criteria: Criteria,
to_table: str | None = None,
include_quality_columns: Literal['none', 'criteria', 'all'] = 'none',
)

Bases: CriteriaOperator

Categories:

dq-operator

An operator that filters rows from a table based on data quality criteria.

The filtered rows are removed from the table being processed. They can either be discarded or redirected to another table.

property include_quality_columns: Literal['none', 'criteria', 'all']

Returns the data quality columns to included in the table with filtered out data.

property to_table: str | None

Returns the destination table for filtered rows, or None if they are discarded.

class HasLength(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
min_len: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])] = 0,
max_len: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])] = 9223372036854775807,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if the length of a value is within a given range.

property max: int

Returns the maximum allowed length.

property min: int

Returns the minimum allowed length.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by HasLength classifier.

classmethod HasLength.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by HasLength classifier.

class IdentityScale(
min_val: Annotated[int, Strict(strict=True)],
max_val: Annotated[int, Strict(strict=True)],
use_bin_zero: bool = False,
)

Bases: Scale

Categories:

dq-classifier

An identity scale where each integer value corresponds to a bin.

property bins

Returns the number of bins in the scale.

property scale_range: tuple[Annotated[int, Strict(strict=True)], Annotated[int, Strict(strict=True)]]

Returns the (min, max) range of the scale.

class InBins(
column_name: str,
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow'] | Annotated[list[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']], Strict],
)

Bases: CategoryCriteria

Categories:

dq-operator

A category criteria that selects records where the value in the specified columns falls into one of the given bins.

class IsBetween(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
min_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
max_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
closed_on: Literal['none', 'lower', 'upper', 'both'] = 'both',
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BetweenClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is inside a specified range.

classmethod IsBetween.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsBetween classifier.

class IsFalse(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is False.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsFalse classifier.

classmethod IsFalse.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsFalse classifier.

class IsIn(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
values: Collection,
tags: str | list[str] | None = None,
)

Bases: InClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is in a specified set of values.

classmethod IsIn.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsIn classifier.

class IsNan(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is NaN (Not a Number).

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNan classifier.

classmethod IsNan.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNan classifier.

class IsNegative(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a numeric value is negative.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNegative classifier.

classmethod IsNegative.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNegative classifier.

class IsNegativeOrZero(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a numeric value is negative or zero.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNegativeOrZero classifier.

classmethod IsNegativeOrZero.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNegativeOrZero classifier.

class IsNotBetween(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
min_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
max_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
closed_on: Literal['none', 'lower', 'upper', 'both'] = 'both',
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BetweenClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is outside a specified range.

classmethod IsNotBetween.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsBetween classifier.

class IsNotIn(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
*,
values: Collection,
tags: str | list[str] | None = None,
)

Bases: InClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is not in a specified set of values.

classmethod IsNotIn.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsIn classifier.

class IsNotNan(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is not NaN.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotNan classifier.

classmethod IsNotNan.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotNan classifier.

class IsNotNull(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is not NULL.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotNull classifier.

classmethod IsNotNull.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotNull classifier.

class IsNotNullNorNan(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is neither NULL nor NaN.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotNullNorNan classifier.

classmethod IsNotNullNorNan.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotNullNorNan classifier.

class IsNotZero(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a numeric value is not zero.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotZero classifier.

classmethod IsNotZero.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNotZero classifier.

class IsNull(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is NULL.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNull classifier.

classmethod IsNull.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNull classifier.

class IsNullOrNan(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is either NULL or NaN.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNullOrNan classifier.

classmethod IsNullOrNan.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsNullOrNan classifier.

class IsPositive(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a numeric value is positive.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsPositive classifier.

classmethod IsPositive.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsPositive classifier.

class IsPositiveOrZero(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a numeric value is positive or zero.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsPositiveOrZero classifier.

classmethod IsPositiveOrZero.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsPositiveOrZero classifier.

class IsTrue(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value is True.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsTrue classifier.

classmethod IsTrue.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsTrue classifier.

class IsZero(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
tags: str | list[str] | None = None,
)

Bases: BoolClassifier

Categories:

dq-classifier

A boolean classifier that checks if a numeric value is zero.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsZero classifier.

classmethod IsZero.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by IsZero classifier.

class LinearScale(
min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
use_bin_zero: bool = False,
)

Bases: NonIdentityScale

Categories:

dq-classifier

A linear scale.

class LogarithmicScale(
min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
base: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None,
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
use_bin_zero: bool = False,
)

Bases: NonIdentityScale

Categories:

dq-classifier

A logarithmic scale.

property base: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])]

Returns the base of the logarithmic scale.

class Matches(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
*,
pattern: str,
tags: str | list[str] | None = None,
)

Bases: MatchClassifier

Categories:

dq-classifier

A boolean classifier that checks if a value matches a regex pattern.

classmethod Matches.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by Matches classifier.

class MonomialScale(
min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
power: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])],
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
use_bin_zero: bool = False,
)

Bases: NonIdentityScale

Categories:

dq-classifier

A monomial (power-law) scale.

property power: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])]

Returns the power of the monomial scale.

class NotInBins(
column_name: str,
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow'] | Annotated[list[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']], Strict],
)

Bases: CategoryCriteria

Categories:

dq-operator

A category criteria that selects records where the value in the specified columns does not fall into any of the given bins.

class Operator

Bases: ABC

Abstract base class for all data quality operators.

This class provides a common structure for operators, including support for tags, which allow for selective application of classifiers.

class PercentThreshold(
percent: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])],
)

Bases: Threshold

Categories:

dq-operator

A threshold based on a percentage of total rows.

property percent: float

Returns the percentage for the threshold.

class RowCountThreshold(
row_count: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])],
)

Bases: Threshold

A threshold based on a specific number of rows.

property row_count: int

Returns the row count for the threshold.

class Scale(
scale_range: tuple[Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)], Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)]],
bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])],
use_bin_zero: bool,
)

Bases: ABC

Abstract base class for scales used in categorization.

A scale defines how numerical data is divided into bins.

property use_bin_zero: bool

Returns whether the minimum value is in bin zero or in bin one.

class ScaleCategorizer(
column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
on_missing_column: Literal['ignore', 'fail'] = 'ignore',
on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
on_wrong_scale_value: Literal['ignore', 'fail'] = 'ignore',
*,
scale: Scale,
tags: str | list[str] | None = None,
)

Bases: Categorizer

Categories:

dq-classifier

A classifier that bucketizes data into bins based on a given scale.

property scale: Scale

Returns the scale used for categorization.

classmethod supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by categorizers.

classmethod ScaleCategorizer.supported_dtypes() FrozenSet[Type]

Returns the set of data types supported by categorizers.

class Select(
criteria: Criteria,
to_table: str,
include_quality_columns: Literal['none', 'criteria', 'all'] = 'none',
)

Bases: CriteriaOperator

Categories:

dq-operator

An operator that selects rows based on data quality criteria and writes them to another table.

This operation does not modify the original table.

property include_quality_columns: Literal['none', 'criteria', 'all']

Returns the data quality columns to included in the table with selected data.

property to_table: str

Returns the destination table for the selected rows.

class Summary(
table: str | None = None,
*,
tags: str | list[str] | None = None,
)

Bases: TagOperator

Categories:

dq-operator

An operator that generates a data quality summary report for a table.

property table: str | None

Returns the name of the data quality summary table.

class Threshold

Bases: ABC

Abstract base class for thresholds used in the Fail operator.