README.md 1.6 KB

Towards Anonymous Medical Data Collection

This is the source code of the implementation for the bachelor thesis "Towards Anonymous Medical Data Collection".

Requirements

Python 3.9+
Other dependencies can be installed with pip install -r requirements.txt.

Execution

Configuration files are needed to set the parameters for the implementation.

To start the mix network simulator, run python simulation.py [path to configuration file].
See test_conf.json for an example of the configuration file.

To start the k-anonymization algorithms, run python anonymization.py [path to configuration file].
See exp1_conf.json, exp2_conf.json, exp3_conf.json, and test_conf.json for examples of the configuration file.

Dataset

Currently, only the Adult dataset is supported. To include more datasets, add a .csv file for the raw dataset and update categorical.py in the datasets folder. See the .csv file for Adult as an example.

Code

The implementation is based on the open-source code of Piotrowska and Slijepčević et al. The original repositories can be found here (Piotrowska) and here (Slijepčević et al.). For more information see

  • Ania M Piotrowska. “Studying the anonymity trilemma with a discrete-event mix network simulator”. In: Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society. 2021, pp. 39–44.
  • Djordje Slijepčević et al. “k-Anonymity in practice: How generalisation and suppression affect machine learning classifiers”. In: Computers & Security 111 (2021), p. 102488.