WINE 2013, 9th Conference on Web and Internet Economics, 11-14 December 2013, Boston, MA, USA / Also published in LNCS, Volume 8289
Linear regression amounts to estimating a linear model that maps features (e.g., age or gender) to corresponding data (e.g., the answer to a survey or the outcome of a medical exam). It is a ubiquitous tool in experimental sciences. We study a setting in which features are public but the data is private information. While the estimation of the linear model may be useful to participating individuals, (if, e.g., it leads to the discovery of a treatment to a disease), individuals may be reluctant to disclose their data due to privacy concerns. In this paper, we propose a generic game-theoretic model to express this trade-off. Users add noise to their data before releasing it. In particular, they choose the variance of this noise to minimize a cost comprising two components: (a) a privacy cost, representing the loss of privacy incurred by the release; and (b) an estimation cost, representing the inaccuracy in the linear model estimate. We study the Nash equilibria of this game, establishing the existence of a unique non-trivial equilibrium. We determine its efficiency for several classes of privacy and estimation costs, using the concept of the price of stability. Finally, we prove that, for a specific estimation cost, the generalized least-square estimator is optimal among all linear unbiased estimators in our non-cooperative setting: this result extends the famous Aitken/Gauss-Markov theorem in statistics, establishing that its conclusion persists even in the presence of strategic individuals.
© Springer. Personal use of this material is permitted. The definitive version of this paper was published in WINE 2013, 9th Conference on Web and Internet Economics, 11-14 December 2013, Boston, MA, USA / Also published in LNCS, Volume 8289 and is available at : http://dx.doi.org/10.1007/978-3-642-45046-4_23