Skip to content

Data quality

execution.models.data_quality

Data Quality Models and Validation Framework

This module provides comprehensive data quality validation capabilities including validation rules, quality dimensions, metrics tracking, and reporting functionality.

data_quality_validator = DataQualityValidator() module-attribute

ValidationSeverity

Enumeration of validation severity levels.

Used to categorize the importance and impact of data quality issues.

INFO = 'info' class-attribute instance-attribute

WARNING = 'warning' class-attribute instance-attribute

ERROR = 'error' class-attribute instance-attribute

CRITICAL = 'critical' class-attribute instance-attribute

QualityDimension

Data quality dimensions based on industry standards.

Defines the different aspects of data quality that can be measured: - COMPLETENESS: Data has all required values - ACCURACY: Data values are correct and precise - CONSISTENCY: Data is consistent across systems/time - VALIDITY: Data conforms to defined formats/rules - UNIQUENESS: No duplicate records exist - TIMELINESS: Data is up-to-date and available when needed - INTEGRITY: Data maintains referential integrity

COMPLETENESS = 'completeness' class-attribute instance-attribute

ACCURACY = 'accuracy' class-attribute instance-attribute

CONSISTENCY = 'consistency' class-attribute instance-attribute

VALIDITY = 'validity' class-attribute instance-attribute

UNIQUENESS = 'uniqueness' class-attribute instance-attribute

TIMELINESS = 'timeliness' class-attribute instance-attribute

INTEGRITY = 'integrity' class-attribute instance-attribute

ValidationRule dataclass

Represents a single data validation rule.

Contains the metadata and function needed to validate data quality against specific business or technical requirements.

Attributes:

Name Type Description
name str

Unique identifier for the rule

description str

Human-readable description of what the rule validates

severity ValidationSeverity

Impact level of validation failures

dimension QualityDimension

Quality dimension this rule addresses

rule_function Optional[Callable]

Optional callable that performs the validation

parameters Dict[str, Any]

Configuration parameters for the validation function

name: str instance-attribute

description: str instance-attribute

severity: ValidationSeverity instance-attribute

dimension: QualityDimension instance-attribute

rule_function: Optional[Callable] = None class-attribute instance-attribute

parameters: Dict[str, Any] = field(default_factory=dict) class-attribute instance-attribute

__init__(name: str, description: str, severity: ValidationSeverity, dimension: QualityDimension, rule_function: Optional[Callable] = None, parameters: Dict[str, Any] = dict()) -> None

validate(value: Any, context: Optional[Dict[str, Any]] = None) -> ValidationResult

ValidationResult dataclass

rule_name: str instance-attribute

passed: bool instance-attribute

message: str instance-attribute

severity: ValidationSeverity = ValidationSeverity.INFO class-attribute instance-attribute

dimension: QualityDimension = QualityDimension.VALIDITY class-attribute instance-attribute

timestamp: datetime = field(default_factory=(lambda: datetime.now(UTC))) class-attribute instance-attribute

record_id: Optional[str] = None class-attribute instance-attribute

field_name: Optional[str] = None class-attribute instance-attribute

__init__(rule_name: str, passed: bool, message: str, severity: ValidationSeverity = ValidationSeverity.INFO, dimension: QualityDimension = QualityDimension.VALIDITY, timestamp: datetime = (lambda: datetime.now(UTC))(), record_id: Optional[str] = None, field_name: Optional[str] = None) -> None

to_dict() -> Dict[str, Any]

FieldQualityMetrics dataclass

field_name: str instance-attribute

total_records: int instance-attribute

non_null_count: int instance-attribute

unique_count: int instance-attribute

completeness_score: float instance-attribute

uniqueness_score: float instance-attribute

format_compliance_score: float instance-attribute

validation_results: List[ValidationResult] = field(default_factory=list) class-attribute instance-attribute

null_count: int property

duplicate_count: int property

overall_quality_score: float property

__init__(field_name: str, total_records: int, non_null_count: int, unique_count: int, completeness_score: float, uniqueness_score: float, format_compliance_score: float, validation_results: List[ValidationResult] = list()) -> None

DataQualityMetrics dataclass

dataset_name: str = '' class-attribute instance-attribute

schema_name: Optional[str] = None class-attribute instance-attribute

total_records: int = 0 class-attribute instance-attribute

valid_records: int = 0 class-attribute instance-attribute

invalid_records: int = 0 class-attribute instance-attribute

duplicate_records: int = 0 class-attribute instance-attribute

field_metrics: Dict[str, FieldQualityMetrics] = field(default_factory=dict) class-attribute instance-attribute

completeness_score: float = 0.0 class-attribute instance-attribute

accuracy_score: float = 0.0 class-attribute instance-attribute

consistency_score: float = 0.0 class-attribute instance-attribute

validity_score: float = 0.0 class-attribute instance-attribute

uniqueness_score: float = 0.0 class-attribute instance-attribute

timeliness_score: float = 0.0 class-attribute instance-attribute

integrity_score: float = 0.0 class-attribute instance-attribute

overall_quality_score: float = 0.0 class-attribute instance-attribute

validation_results: List[ValidationResult] = field(default_factory=list) class-attribute instance-attribute

processing_stats: Optional[ProcessingStats] = None class-attribute instance-attribute

errors: List[ErrorInfo] = field(default_factory=list) class-attribute instance-attribute

__init__(id: str, record_type: RecordType, created_at: datetime = datetime.now(), updated_at: datetime = datetime.now(), metadata: Dict[str, Any] = dict(), processing_stats: Optional[ProcessingStats] = None, errors: List[ErrorInfo] = list(), dataset_name: str = '', schema_name: Optional[str] = None, total_records: int = 0, valid_records: int = 0, invalid_records: int = 0, duplicate_records: int = 0, field_metrics: Dict[str, FieldQualityMetrics] = dict(), completeness_score: float = 0.0, accuracy_score: float = 0.0, consistency_score: float = 0.0, validity_score: float = 0.0, uniqueness_score: float = 0.0, timeliness_score: float = 0.0, integrity_score: float = 0.0, overall_quality_score: float = 0.0, validation_results: List[ValidationResult] = list()) -> None

__post_init__()

validate() -> List[str]

calculate_overall_score()

add_field_metrics(field_name: str, metrics: FieldQualityMetrics)

add_validation_result(result: ValidationResult)

get_quality_summary() -> Dict[str, Any]

DataQualityValidator

rules: Dict[str, ValidationRule] = {} instance-attribute

__init__()

register_rule(rule: ValidationRule)

validate_record(record: Dict[str, Any], rule_names: Optional[List[str]] = None) -> List[ValidationResult]