Causal Inference

Causal Inference

Causal Inference for Cancer Research

June 2022

Overview

This project applies causal inference techniques to breast cancer analysis using the Wisconsin Breast Cancer dataset. By moving beyond traditional correlation analysis, this system identifies true cause-effect relationships in cancer data to support more effective research and treatment approaches.

Technical Approach

The project implements causal inference methodologies using:

  • Directed Acyclic Graphs (DAGs): Visual modeling of causal relationships between tumor characteristics and diagnosis outcomes
  • Structural Equation Modeling: Using the CausalNex library to construct and analyze causal networks
  • Machine Learning Integration: Combining causal models with predictive algorithms (Logistic Regression) for better classification
  • Feature Analysis: Identifying which tumor characteristics have true causal influence on malignancy

Implementation

  • Leverages MLflow for experiment tracking and model management
  • Implements data version control with DVC for reproducibility
  • Provides interactive visualizations of causal structures
  • Includes comprehensive exploratory data analysis in Jupyter notebooks

Applications

This framework supports breast cancer research by providing tools to:

  • Identify which tumor characteristics are causally linked to malignancy
  • Distinguish between correlational and causal relationships in cancer data
  • Provide a foundation for targeted treatment strategies based on causal factors

Impact

By providing researchers with tools to understand true causality in breast cancer data, this project contributes to more accurate diagnosis and potentially more effective treatment approaches. The ability to distinguish causal factors from mere correlations is crucial for advancing evidence-based medicine in oncology.

Technologies Used

PythonPython
JupyterJupyter
NumPyNumPy
MatplotlibMatplotlib
CausalNexCausalNex
scikit-learnscikit-learn
MLflowMLflow
DVCDVC