DATA Talk : "Variable Prioritization in Nonlinear Black Box Methods, with Applications in Genomics and to Interpreting Deep Neural Networks"

Seth Flaxman - Lecturer in the statistics section of the Department of Mathematics at Imperial College London, joint with the Data Science Institute
Data Science

Date: Tue, 05/14/2019 - 14:00 - Tue, 05/14/2019 - 16:00
Location: Eurecom

Abstract: I will present two recent papers (https://arxiv.org/abs/1801.07318 and https://arxiv.org/abs/1901.09839) describing our work on developing new methods to interpret nonlinear Bayesian machine learning models. In the first paper, we address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variables that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Gaussian process regression, but the methodological innovations apply to other "black box" methods. In the second paper, we extend these methods to deep neural networks (DNNs) and computer vision. DNNs are successful across a variety of domains, yet our ability to explain and interpret these methods is limited. We propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision).