tabsdata.dataquality
- class AllOk( )
Bases:
BoolCriteria- Categories:
dq-operator
A boolean criteria that selects records where all boolean classifiers pass (True).
- class AnyFailed( )
Bases:
BoolCriteria- Categories:
dq-operator
A boolean criteria that selects records where at least one boolean classifier fails (False).
- class BoolClassifier(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
Classifier,ABCAbstract base class for classifiers that produce a boolean result (True or False).
- classmethod BoolClassifier.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by boolean classifiers.
- class BoolCriteria( )
-
A boolean criteria to select records based on classifier results.
- property nan_is_ok: bool
Returns if NaN values should be considered OK.
- property none_is_ok: bool
Returns if None values should be considered OK.
- class Categorizer(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
Classifier,ABCA base class for categorizers.
- classmethod Categorizer.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by this classifier.
- class CategoryCriteria(
- column_name: str,
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow'] | Annotated[list[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']], Strict],
-
A category criteria to select records based on classifier results.
- property bins: AbstractSet[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']]
Returns a set of bins.
- property column_name: str
Returns the column name to check for values.
- class Classifier(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
ABCAbstract base class for all data quality classifiers.
A classifier is applied to one or more columns to produce a classification result, which is stored in a new column. It supports tags to allow data quality operations to selectively act on specific classifiers.
- property column_names: list[tuple[str, str | None]] | None
Returns the list of columns the classifier applies to, including optional destination column names.
- property on_missing_column: Literal['ignore', 'fail']
Returns what do if the column is missing.
- property on_wrong_type: Literal['ignore', 'fail']
Returns what do if the column type is incompatible with the classifier.
- property on_wrong_value: Literal['ignore', 'fail']
Returns what do if the column type is incompatible with the classifier.
- classmethod Classifier.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by this classifier.
- class Criteria
Bases:
ABCDefines the conditions for selecting records based on classifier results.
A Criteria object specifies which rows of a table should be considered for a data quality operation. This can be based on the outcomes of boolean classifiers or the binning results of categorical classifiers.
- class DataQuality(
- table: str,
- classifiers: Classifier | list[Classifier],
- operators: Operator | list[Operator],
Bases:
OnTablesAction- Categories:
dq-action
Represents a data quality action to be performed on a table.
This class encapsulates a set of data quality rules, defined by classifiers and operators, that are applied to a specified table. It serves as a container for a complete data quality workflow.
- property classifiers: list[Classifier]
Returns the list of classifiers used in this data quality action.
- property operators: list[Operator]
Returns the list of operators that will be executed based on the classifier results.
- property table: str
Returns the name of the table targeted by this data quality action.
- class DoesNotMatch(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- pattern: str,
- tags: str | list[str] | None = None,
Bases:
MatchClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value does not match a regex pattern.
- classmethod DoesNotMatch.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by Matches classifier.
- class Enrich( )
Bases:
TagOperator- Categories:
dq-operator
An operator that adds the data quality classifier columns to the original table frame. It does not remove any row from the original table.
- class ExponentialScale(
- min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- base: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None,
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
- use_bin_zero: bool = False,
Bases:
NonIdentityScale- Categories:
dq-classifier
An exponential scale.
- class Fail( )
Bases:
CriteriaOperator- Categories:
dq-operator
An operator that fails the current transaction if a data quality threshold is met.
- property threshold: Threshold
Returns the threshold for this Fail operation.
- class Filter(
- criteria: Criteria,
- to_table: str | None = None,
- include_quality_columns: Literal['none', 'criteria', 'all'] = 'none',
Bases:
CriteriaOperator- Categories:
dq-operator
An operator that filters rows from a table based on data quality criteria.
The filtered rows are removed from the table being processed. They can either be discarded or redirected to another table.
- property include_quality_columns: Literal['none', 'criteria', 'all']
Returns the data quality columns to included in the table with filtered out data.
- class HasLength(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- min_len: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])] = 0,
- max_len: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])] = 9223372036854775807,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if the length of a value is within a given range.
- property max: int
Returns the maximum allowed length.
- property min: int
Returns the minimum allowed length.
- classmethod HasLength.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by HasLength classifier.
- class IdentityScale(
- min_val: Annotated[int, Strict(strict=True)],
- max_val: Annotated[int, Strict(strict=True)],
- use_bin_zero: bool = False,
Bases:
Scale- Categories:
dq-classifier
An identity scale where each integer value corresponds to a bin.
- property bins
Returns the number of bins in the scale.
- class InBins(
- column_name: str,
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow'] | Annotated[list[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']], Strict],
Bases:
CategoryCriteria- Categories:
dq-operator
A category criteria that selects records where the value in the specified columns falls into one of the given bins.
- class IsBetween(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- min_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
- max_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
- closed_on: Literal['none', 'lower', 'upper', 'both'] = 'both',
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BetweenClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is inside a specified range.
- classmethod IsBetween.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsBetween classifier.
- class IsFalse(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is False.
- classmethod IsFalse.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsFalse classifier.
- class IsIn(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- values: Collection,
- tags: str | list[str] | None = None,
Bases:
InClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is in a specified set of values.
- classmethod IsIn.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsIn classifier.
- class IsNan(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is NaN (Not a Number).
- classmethod IsNan.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNan classifier.
- class IsNegative(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a numeric value is negative.
- classmethod IsNegative.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNegative classifier.
- class IsNegativeOrZero(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a numeric value is negative or zero.
- classmethod IsNegativeOrZero.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNegativeOrZero classifier.
- class IsNotBetween(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- min_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
- max_val: int | float | bool | str | date | time | datetime | timedelta | bytes | Decimal | None = None,
- closed_on: Literal['none', 'lower', 'upper', 'both'] = 'both',
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BetweenClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is outside a specified range.
- classmethod IsNotBetween.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsBetween classifier.
- class IsNotIn(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- values: Collection,
- tags: str | list[str] | None = None,
Bases:
InClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is not in a specified set of values.
- classmethod IsNotIn.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsIn classifier.
- class IsNotNan(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is not NaN.
- classmethod IsNotNan.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNotNan classifier.
- class IsNotNull(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is not NULL.
- classmethod IsNotNull.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNotNull classifier.
- class IsNotNullNorNan(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is neither NULL nor NaN.
- classmethod IsNotNullNorNan.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNotNullNorNan classifier.
- class IsNotZero(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a numeric value is not zero.
- classmethod IsNotZero.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNotZero classifier.
- class IsNull(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is NULL.
- classmethod IsNull.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNull classifier.
- class IsNullOrNan(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is either NULL or NaN.
- classmethod IsNullOrNan.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsNullOrNan classifier.
- class IsPositive(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a numeric value is positive.
- classmethod IsPositive.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsPositive classifier.
- class IsPositiveOrZero(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a numeric value is positive or zero.
- classmethod IsPositiveOrZero.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsPositiveOrZero classifier.
- class IsTrue(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value is True.
- classmethod IsTrue.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsTrue classifier.
- class IsZero(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- tags: str | list[str] | None = None,
Bases:
BoolClassifier- Categories:
dq-classifier
A boolean classifier that checks if a numeric value is zero.
- classmethod IsZero.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by IsZero classifier.
- class LinearScale(
- min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
- use_bin_zero: bool = False,
Bases:
NonIdentityScale- Categories:
dq-classifier
A linear scale.
- class LogarithmicScale(
- min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- base: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None,
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
- use_bin_zero: bool = False,
Bases:
NonIdentityScale- Categories:
dq-classifier
A logarithmic scale.
- class Matches(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- *,
- pattern: str,
- tags: str | list[str] | None = None,
Bases:
MatchClassifier- Categories:
dq-classifier
A boolean classifier that checks if a value matches a regex pattern.
- classmethod Matches.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by Matches classifier.
- class MonomialScale(
- min_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- max_val: Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)],
- power: Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])],
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])] = 100,
- use_bin_zero: bool = False,
Bases:
NonIdentityScale- Categories:
dq-classifier
A monomial (power-law) scale.
- class NotInBins(
- column_name: str,
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow'] | Annotated[list[Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Literal['none', 'nan', 'underflow', 'overflow']], Strict],
Bases:
CategoryCriteria- Categories:
dq-operator
A category criteria that selects records where the value in the specified columns does not fall into any of the given bins.
- class Operator
Bases:
ABCAbstract base class for all data quality operators.
This class provides a common structure for operators, including support for tags, which allow for selective application of classifiers.
- class PercentThreshold(
- percent: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])] | Annotated[float, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0), Le(le=100)])],
Bases:
Threshold- Categories:
dq-operator
A threshold based on a percentage of total rows.
- property percent: float
Returns the percentage for the threshold.
- class RowCountThreshold(
- row_count: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])],
Bases:
ThresholdA threshold based on a specific number of rows.
- property row_count: int
Returns the row count for the threshold.
- class Scale(
- scale_range: tuple[Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)], Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)]],
- bins: Annotated[int, Strict(strict=True), FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1), Le(le=100)])],
- use_bin_zero: bool,
Bases:
ABCAbstract base class for scales used in categorization.
A scale defines how numerical data is divided into bins.
- property use_bin_zero: bool
Returns whether the minimum value is in bin zero or in bin one.
- class ScaleCategorizer(
- column_names: str | tuple[str, str | None] | Annotated[list[str], Strict] | Annotated[list[tuple[str, str | None]], Strict] | None = None,
- on_missing_column: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_type: Literal['ignore', 'fail'] = 'ignore',
- on_wrong_scale_value: Literal['ignore', 'fail'] = 'ignore',
- *,
- scale: Scale,
- tags: str | list[str] | None = None,
Bases:
Categorizer- Categories:
dq-classifier
A classifier that bucketizes data into bins based on a given scale.
- property scale: Scale
Returns the scale used for categorization.
- classmethod ScaleCategorizer.supported_dtypes() FrozenSet[Type]
Returns the set of data types supported by categorizers.
- class Select(
- criteria: Criteria,
- to_table: str,
- include_quality_columns: Literal['none', 'criteria', 'all'] = 'none',
Bases:
CriteriaOperator- Categories:
dq-operator
An operator that selects rows based on data quality criteria and writes them to another table.
This operation does not modify the original table.
- property include_quality_columns: Literal['none', 'criteria', 'all']
Returns the data quality columns to included in the table with selected data.
- property to_table: str
Returns the destination table for the selected rows.
- class Summary( )
Bases:
TagOperator- Categories:
dq-operator
An operator that generates a data quality summary report for a table.
- class Threshold
Bases:
ABCAbstract base class for thresholds used in the Fail operator.