Skip to content

This repository contains the notebooks used to produce the data, figures, and table-data included in the bachelor's thesis "Evaluation of automatic bias detection and pre-processing mitigation techniques" by Max Kleinegger.

Notifications You must be signed in to change notification settings

mkleinegger/evaluation-bias-detection-and-mitigation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation of automatic bias detection and pre-processing mitigation techniques

This repository contains the notebooks used to produce the data, figures, and table-data included in the bachelor's thesis "Evaluation of automatic bias detection and pre-processing mitigation techniques" by Max Kleinegger.

Getting Started

The notebooks contain code written in Python for which an environment with the necessary dependencies is required. The environment is simple created via pip and therefore, we provide a necessary script which allows easy setup and use. Just call

./setup.sh

Repository Structure

The repository is organized into several main directories, each serving a distinct purpose related to data synthesis, bias detection, and evaluation. Below is an overview of the structure:

  • README.md: Provides an overview of the project, its purpose, and usage instructions.

  • data/: Contains datasets used for experiments, including synthetic data generated by different models.

    • DataSynthesizer/: Holds JSON descriptions for different data synthesis modes (correlated, independent, random).
    • SDV/: Includes pre-trained models for synthetic data generation using various techniques.
    • Synthetic Data Files:
      • Various .json and .csv files containing generated datasets and metadata.
      • Includes synthetic data from different synthesizers (CTGAN, CopulaGAN, GaussianCopula, TVAE, DataSynthesizer).
      • Contains trainset.json and testset.json for model evaluation.
  • notebooks/: Contains Jupyter notebooks used for bias detection, mitigation, and preprocessing.

    • bias_detection_pre_synthesized.ipynb: Analyzes bias in pre-synthesized datasets.
    • bias_detection_synthesized.ipynb: Evaluates bias in synthetic datasets.
    • bias_mitigation_synthesized.ipynb: Implements bias mitigation techniques on synthetic data.
    • bias_mitigation_synthesized_subsampling.ipynb: Applies subsampling-based bias mitigation.
    • preprocessing.ipynb: Prepares data for further analysis.
  • requirements.txt: Lists dependencies required for running the project.

  • results/: Stores experimental results and analysis outputs.

    • data/: Contains JSON files with fairness and utility metrics from different approaches.
      • Includes metrics for original and synthetic datasets, processed through binning and PDF-based techniques.
    • plots.ipynb: Notebook to visualize and analyze fairness results.
  • setup.sh: Shell script for setting up the project environment.

  • src/: Contains the source code for processing data, computing fairness metrics, and generating synthetic datasets.

About

This repository contains the notebooks used to produce the data, figures, and table-data included in the bachelor's thesis "Evaluation of automatic bias detection and pre-processing mitigation techniques" by Max Kleinegger.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published