Inferning 2012‎ > ‎

Keynote Talks

Keynote Talks        News        Papers        Schedule        Contact Us        Dates        Call For Papers

Learning Tractable but Expressive Models

Pedro Domingos

University of Washington, Seattle

Inference is the hardest part of learning. Learning most powerful models requires repeated intractable inference, and approximate inference often interacts badly with parameter optimization. At inference time, an intractable accurate model can effectively become an inaccurate model due to approximate inference. All these problems would be avoided if we learned only tractable models, but standard tractable model classes - like thin junction trees and mixture models - are insufficiently expressive for most applications. However, in recent years a series of surprisingly expressive tractable model classes have been developed, including arithmetic circuits, feature trees, sum-product networks, and tractable Markov logic. I will give an overview of these representations, algorithms for learning them, and their startling successes in challenging applications.

Modeling, Inference and Estimation using Random Maximum A-Posteriori Perturbations

Tamir Hazan

Toyota Technological Institute at Chicago

Learning and inference in complex models drives much of the research in machine learning applications, from computer vision, natural language processing, to computational biology. The inference problem in such cases involves assessing the likelihood of possible structures, whether objects, parsers, or molecular structures. Although it is often feasible to only find the most likely or maximum a-posteriori (MAP) assignment rather than considering all possible assignments, MAP inference is limited when there are other likely assignments. In a fully probabilistic treatment, all possible alternative assignments are considered thus requiring summing over the assignments with their respective weights -- evaluating the partition function -- which is considerably harder. The main surprising result of our work is that MAP inference (maximization) can be used to approximate and bound the partition function (weighted counting). This leads us to a new approximate inference framework that is based on MAP-statistics, thus does not depend on pseudo-probabilities, contrasting the current framework of Bethe approximations which lacks statistical meaning. This approach excels in regimes where there are several but not exponentially many prominent assignments. For example, this happens in cases where observations carry strong signals (local evidence) but are also guided by strong consistency constraints (couplings).

Structured Prediction using Linear Programming Relaxations: Algorithms and Theory

David Sontag

New York University

Predicting structured objects such as parse trees or protein folds is often formulated as combinatorial optimization, where the goal is to find the most likely structure given the available evidence. Linear programming relaxations are a powerful tool for solving these optimization problems. Learning for structured prediction corresponds to inverse combinatorial optimization, finding parameters for the model such that for each of the data points, the optimal solution is the desired structure. This talk will survey existing algorithms and theory relating to learning for structured prediction using linear programming relaxations.

Learning Approximate Inference Policies for Fast Prediction

Jason Eisner

Johns Hopkins University

In many domains, our best models are computationally intractable. This problem will only get worse as we manage to build more richly detailed models of specific domains. Fortunately, the practical goal of artificial or natural intelligence is not to do perfect detailed inference, but rather to answer specific questions by reasoning from observed data. Thus, we should seek policies for fast approximate inference that will actually achieve low expected loss on our target task. The target task is a distribution not only over test-time data but also over which variables will be observed and queried. The loss function may explicitly penalize for runtime (or data acquisition).

This story leaves open an engineering question: What space of policies should we search? I will review a range of options and point out past work for each. Among others, I will show our own recent successes using message-passing approximate inference policies for graphical models. The form of these policies is determined by the structure of our intractable and surely mismatched domain model, but we tune the parameters to minimize loss (Stoyanov & Eisner, 2012).

One may search a space of sophisticated policies or a space of crude hacks, but crucially, one should tune the policy parameters to optimize expected error and runtime. This expectation can be taken over training data (empirical risk minimization), or over samples from a posterior belief about the target task (which I will call imputed risk minimization). The latter case requires some sort of prior model, but this is necessary when the empirical risk estimate suffers from sparse, non-independent, or out-of-domain training data.

Marginalization-Based Parameter Learning in Graphical Models

Justin Domke

Rochester Institute of Technology

Likelihood-based learning of graphical models faces challenges of computational complexity and robustness to model error.  This talk will discuss methods that directly maximize a measure of the accuracy of predicted marginals, in the context of a particular approximate inference algorithm.  Experiments suggest that marginalization-based learning, by compensating for both model and inference approximations at training time, can perform better than likelihood-based approximations when the model being fit is approximate in nature.