Traffic Flow — Data Warehouse with Postgres, DBT, Airflow, and Redash
Denamo Markos

Denamo Markos

Jul 23, 2022
6 min read

Traffic Flow — Data Warehouse with Postgres, DBT, Airflow, and Redash

Data Engineering#PostgreSQL#DBT#Airflow#Redash#Data Warehouse

Introduction

A city traffic department initiated a project to collect traffic data using swarm UAVs (drones) from multiple city locations. The goal is to improve traffic flow and support various urban planning initiatives. Our team was tasked with creating a scalable data warehouse to host vehicle trajectory data extracted from drone footage and static roadside cameras.

Info: This project utilizes the pNEUMA dataset - an open large-scale collection of naturalistic trajectories from half a million vehicles, gathered through a unique drone swarm experiment in downtown Athens, Greece.

Objective

Create a scalable central repository for traffic data analysis using:

  • PostgreSQL
  • DBT
  • Airflow
  • Redash

Technology Stack

Core Technologies

TechnologyPurposeDescription
PostgreSQLDatabaseOpen-source RDBMS with SQL compliance
AirflowOrchestrationWorkflow management platform
DBTTransformationData transformation and testing tool
RedashVisualizationData exploration and visualization platform

Implementation

Traffic Flow Diagram
ELT Pipeline

The ELT pipeline consists of three main workflows:

  1. Data Loading
  • Airflow orchestrates CSV file loading into Postgres
  • Data is permanently stored in the database
  1. Data Transformation
  • Airflow triggers DBT transformations
  • Transformed data is stored back in Postgres
  1. Data Visualization
  • Redash queries the transformed data
  • Creates interactive dashboards

Airflow Implementation

Airflow Postgres Connection
Airflow — Postgres connection
1from airflow import DAG 2from airflow.operators.python_operator import PythonOperator 3from airflow.operators.bash_operator import BashOperator 4 5def load_data(): 6 # Read CSV files 7 # Create database tables 8 # Load data into Postgres 9 pass 10 11with DAG('load_dag', schedule_interval='@daily') as dag: 12 load_task = PythonOperator( 13 task_id='load_data', 14 python_callable=load_data 15 )
Load DAG Graph
load_dag graph view

DBT Transformations

DBT Data Lineage
dbt data lineage

The DBT workflow includes:

  1. Running transformations (dbt run)
  2. Executing tests (dbt test)
  3. Generating documentation (dbt docs generate)

Redash Visualization

Redash Dashboard
redash dashboard

Example query for location visualization:

1SELECT 2 track_id, 3 lat, 4 lon 5FROM traffic_data 6GROUP BY track_id 7LIMIT 10000;

Lessons Learned

  1. DBT Expertise Gained
  • Creating and testing transformations
  • Documentation generation
  • Model management
  1. Airflow Skills
  • PythonOperator implementation
  • BashOperator usage
  • DAG orchestration
  1. Visualization
  • Dashboard creation in Redash
  • Complex query optimization
  • Interactive visualization design

Info: The project provided hands-on experience with modern data engineering tools and practices.

Future Enhancements

  1. DBT Improvements
  • Additional transformation models
  • Enhanced testing coverage
  • More comprehensive documentation
  1. Environment Separation
  • Dedicated Production database
  • Staging environment
  • Development instance

Resources

Share this post

Related Posts