A New Utility Evaluation Framework for Data Anonymization in the Context of Mobility


Sharing urban mobility and public transportation data is critical to use the mobility infrastructure of cities to its fullest potential. For data protection reasons, however, the disclosure of data to the public is restricted and only permitted if the anonymity of each individual associated with the dataset can be guaranteed. To achieve anonymity in a given dataset, numerous approaches can be applied, while each ap- proach follows a dierent denition of anonymity. One of the most used denitions is k-anonymity, which builds on the building of equivalence classes so that each row in a dataset belongs to an equivalence class that contains at least k rows that cannot be distinguished. Naturally, this can be achieved by multiple realizations. However, the question is which realization will provide the highest utility for future real-world applications. Currently, abstract metrics are used to assess the utility of dierent k-anonymizations, based on the structure of the dataset. However, these abstract metrics do not properly reect the usefulness of the anonymized datasets in real-world applications. Hence, in this work, we provide a novel framework that helps to evaluate the given abstract metrics from the literature in terms of their performance in measuring utility in the context of urban mobility. To do this, we de- ne a set of potential data science use cases that can be derived from a publicly available dataset on taxi drives and compute multiple real- izations of k-anonymity. By training prediction models on the original dataset and the anonymized datasets and comparing the corresponding performance decrease with the abstract metrics from the literature, we are able to derive recommendations on the usage of abstract metrics to evaluate the utility of potential realizations to achieve k-anonymity.
Zur Publikation