Feature Learning for Bayesian Inference

  • Mira, Antonietta A. (PI)
  • Perez-cruz, Fernando F. (CoPI)
  • Albert, Carlo C. (CoPI)
  • Onnela, Jukka-pekka J.-P. (CoPI)
  • Laio, Alessandro A. (CoPI)
  • Rourke, Sean B. S. (PI)
  • Arbess, Gordon (CoPI)
  • Mcgee, Frank (CoPI)
  • Bacon, Jean (CoPI)
  • Betancourt, Gerardo (CoPI)
  • Karapita, Stephanie Mary-ellen (CoPI)
  • Kennedy, Rick A (CoPI)
  • Brennan, David (CoPI)
  • Chambers, Lori A. (CoPI)
  • Hart, Trevor Adam T.A. (CoPI)
  • Kirkland, Susan (CoPI)
  • Rueda, Sergio (CoPI)
  • Sok, Phan (CoPI)
  • Wilson, Michael Gordon (CoPI)

Project: Research project

Project Details

Description

The goal of this project is to use interpretable Machine Learning (ML) to find low-dimensional features in high-dimensional noisy data generated by (i) stochastic models or (ii) real systems. In the first case, we are interested in the features imprinted on simulated data by the parameters of the stochastic model. In the latter, the interesting features depend on the particular system. In hydrology, one of the domains considered in this project, they are fingerprints of catchment properties in observed time-series of river-runoff. In both cases, the problem is to disentangle the effect of high-dimensional disturbances (noise realizations in the first case or the rain falling on the catchment in the latter) from the effects of relevant characteristics (model parameters in the first case, catchment properties in the latter).This problem is reminiscent of the problem of finding collective quantities characterizing states of interacting particle systems in Statistical Mechanics. Variational Autoencoders (VAE) have been proven to be capable of learning such quantities in the form of order parameters or collective variables, which could then be used to identify phase transitions or to enhance Molecular Dynamics (MD) simulations. We expect parameter-features of stochastic model-outputs to be of great value for Bayesian inference. They can be used to sample from the Bayesian posterior, in situations where the likelihood function (LHD) is too expensive to evaluate, and one thus has to resort to comparing - ideally sufficient-summary statistics of simulated data with corresponding statistics of observed data as in Approximate Bayesian Computation (ABC). Instead of comparing summary statistics, we may attempt to sample the product space of model parameters and model states. As this results in an extremely high-dimensional inference problem, sophisticated sampling schemes such as Hamiltonian Monte Carlo (HMC) have to be employed. But HMC, which is essentially the same as MD, suffers from slowly-mixing collective features of high-dimensional model-states associated with changing model parameters.Hence, one of the objectives of this project is to bring the method of biased sampling with collective variables developed for MD to fruition for sampling-based Bayesian inference.A considerable part of the project is further devoted to the development of ML-methods for summary-statistics learning. Towards this end, we will also employ a robust algorithm for finding the Intrinsic Dimension (ID) of data, which was developed by the PI. Not only might this method help to identify the ideal number of summary statistics, but it might also be useful for identifying different phases of the model based on the distribution of the learned features. The data ID can guide the design of the architecture of the ML tools leading to higher interpretability.We will finally apply feature-learning to real data in different domains, either directly to observed data (hydrology), or indirectly to the outputs of the stochastic model that is used to model the data (infectious disease epidemiology). We stress that dimensionality reduction, feature learning and interpretable ML are versatile tools that can be leveraged far beyond Bayesian learning and that the application domains of the developed methodology range beyond our two case studies.

StatusActive
Effective start/end date5/1/108/31/26

ASJC Scopus Subject Areas

  • Statistics and Probability
  • Mathematics (miscellaneous)
  • Public Health, Environmental and Occupational Health
  • Health(social science)
  • Medicine (miscellaneous)