Architecture¶

Owl-Watch utilizes a fully AWS-native serverless architecture to orchestrate large-scale data engineering workloads. It separates data into multiple zones and uses Glue for ETL processing and Lambda/Bedrock for advanced machine learning curation.

High-Level Flow¶

Ingestion: Raw data is uploaded or streamed into the Raw S3 bucket.
ETL Processing: AWS Glue jobs perform distributed data cleaning and transformation using PySpark.
Storage: Cleaned data is stored in the S3 Cleaned Bucket.
ML Curation: Machine learning processing (using AWS Bedrock via Lambda functions) further refines and analyzes the data, producing sentiment analysis and insights.
Output: The finalized curated data is stored in the S3 Curated Bucket for downstream consumption.

Project Structure¶

cdk/ - AWS CDK infrastructure (TypeScript)
lib/stacks/ - Data, Glue, and Monitoring stacks
lib/utils/ - Asset and resource creation utilities
execution/ - PySpark ETL and ML code (Python)
core/ - Job runners, config managers, and factory patterns
jobs/ - Specialized ETL and ML jobs
models/ - Data models and quality constraints
schemas/ - Standardized schemas for communication data
integration_tests/ - Integration tests (Python, pytest)