Ethiopian Medical Data Warehouse
A data engineering pipeline that scrapes, cleans, and warehouses Ethiopian medical and pharmaceutical data from Telegram channels, enabling downstream analytics and object detection for medical product images.
Overview
A data engineering pipeline that scrapes, cleans, and warehouses Ethiopian medical and pharmaceutical data from Telegram channels, enabling downstream analytics and object detection for medical product images.
Problem
Ethiopian medical supply chains lack centralized, structured data. Pharmaceutical pricing, availability, and product information are scattered across informal Telegram channels. Building a reliable data warehouse enables market analysis, price monitoring, and supply chain insights.
Dataset
Raw data scraped from Ethiopian medical and pharmaceutical Telegram channels, including text messages, product images, and pricing information. Data spans multiple channels covering medical equipment, pharmaceuticals, and health services.
Architecture
ELT pipeline: raw data ingested into PostgreSQL staging tables, transformed via dbt models into a star schema data warehouse. YOLOv5 object detection model fine-tuned to identify and classify medical products in images. Apache Airflow orchestrates daily pipeline runs.
Training
YOLOv5 fine-tuned on a manually labeled dataset of medical product images. dbt transformations implement data quality checks, deduplication, and business logic. Incremental loading strategy handles daily data updates efficiently.
Results
Pipeline processes 500+ daily messages across 10+ channels. Object detection model achieves 78% mAP on medical product classification. Data warehouse enables price trend analysis and product availability tracking across the Ethiopian medical market.