Eran Toch

 

 

 

 

 

 

Talk: How User Privacy Behavior Shapes

Machine Learning Models  

Many Machine Learning (ML) models, including large language models, rely on data generated by individuals or about them. However, little attention has been given to how user decisions may affect the characteristics of these models, primarily since many ML models rely on personal and sensitive information. In this talk, I will present a series of studies that examine how users’ decisions are influenced by the level of privacy guarantees provided to them, and how these decisions impact the performance of ML models trained on the data. In an online experiment (n = 734), we demonstrate that differential privacy guarantees have a significant influence on people’s perception of the data collection process and their willingness to share their data. We also present a tool based on conjoint analysis that can enable data scientists to evaluate and predict the impact of these decisions on the collected data. In a follow-up study (n = 817), we demonstrate that users’ sharing decisions can significantly degrade the data collected and the performance of specific model types. Interestingly (to us, at least), differential privacy guarantees, which add noise to protect individual data, can enhance user trust and improve data quality and model accuracy. We summarize these findings with a discussion on the crucial, yet underexplored, dynamics between user behavior and machine learning models.

Workshop Home Page

Return to the TAU-UIUC Workshop Home Page

Home→