Inferning 2013‎ > ‎

Keynote Talks

Amortized Integer Linear Programming Inference


Dan Roth

University of Illinois, Urbana-Champaign

Computational approaches to problems in Natural Language Understanding and Information Extraction are often modeled as structured predictions – predictions that involve assigning values to sets of interdependent variables. Over the last few years, one of the most successful approaches to studying these problems involves Constrained Conditional Models (CCMs), an Integer Learning Programming formulation that augments probabilistic models with declarative constraints as a way to support such decisions.

I will present research within this framework, focusing on learning and inference issues and, in particular, will present recent results on amortized inference – learning to accelerate inference during the life time of the learning system.

Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the director of a DHS Center for Multimodal Information Access & Synthesis (MIAS) and holds faculty positions in Statistics, Linguistics, and at the School of Library and Information Sciences.

Roth is a Fellow of the ACM, AAAI and ACL, for his contributions to Machine Learning and to Natural Language Processing. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community.

Roth is the Associate Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR) and will serve as Editor-in-Chief for a two-year term beginning in 2015. He was the program chair of AAAI’11, ACL’03 and CoNLL'02, has been on the editorial board of several journals in his research areas and has won several teaching and paper awards. He has also given keynote talks and presented tutorials in some of the major conferences in his research areas. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D in Computer Science from Harvard University in 1995.


Learning Discriminative Value of Information for Structured Prediction


Ben Taskar

University Washington, Seattle

Discriminative methods for learning structured models have enabled wide-spread use of very rich feature representations. However, the computational cost of feature extraction is prohibitive for large-scale or time-sensitive applications, often dominating the cost of inference in the models. Significant efforts have been devoted to sparsity-based model selection to decrease this cost. Such feature selection methods control computation statically and miss the opportunity to fine-tune feature extraction to each input at run-time. We address the key challenge of learning to control fine-grained feature extraction adaptively, exploiting non-homogeneity of the data. We propose an architecture that uses a rich feedback loop between extraction and prediction. The run-time control policy is learned using efficient value-function approximation, which adaptively determines the value of information of features at the level of individual variables for each input. We demonstrate significant speedups over state-of-the-art methods on two challenging datasets. For articulated pose estimation in video, we achieve a more accurate state-of-the-art model that is simultaneously many times faster while using only a small fraction of possible features, with similar results on an OCR task.
Ben Taskar is the Boeing Associate Professor in Computer Science and Engineering at the University of Washington. He received his doctoral degree in Computer Science from Stanford University. His research interests include machine learning, natural language processing and computer vision. He has been awarded the Sloan Research Fellowship, the NSF CAREER Award, and selected for the Young Investigator Program by the Office of Naval Research and the DARPA Computer Science Study Group. His work on structured prediction has received best paper awards at NIPS and EMNLP conferences.

On Finding First and m-best Solutions in Probabilistic Graphical Models


Rina Dechter

University of California, Irvine

Inference tasks over probabilistic graphical models can be reduced to either maximization queries such as “most probable explanation” (MPE) that seek a high probability configuration of the model, or summation queries that compute the marginal distribution over the variables, or a combination such as the MAP queries. These queries are instrumental for both learning the models and for applications using it and while all are NP-hard they are normally solved (mostly approximately) using either message-passing schemes (e.g., the junction-tree algorithm, generalized belief propagation, variable-elimination) or by search (e.g., AND/OR search, MCMC schemes) or by a hybrid of the two.

The focus of this talk is on an extending the single-solution maximization queries into m-best queries. These queries seek to find several high-probability configurations of the model, which can be used to assess the stability or robustness of the “best” configuration and give an idea of the uncertainty associated with optimal solutions. This task is apparently highly relevant in various applications including speech-recognition and vision.

Following a brief overview of state-of the art AND/OR search schemes for MPE and on their use of mechanically generated heuristics such a mini-buckets and linear-relaxation-based soft-arc-consistency ( a version that won the 2011 Pascal competition), I will describe recent m-best algorithms extending both variable-elimination and AND/OR graph search schemes (e.g. AND/OR Branch and Bound). I will analyze, compare and contrast the various schemes and will show, a) that search is superior to pure variable-elimination and b) that Best-First schemes (e.g., m-A*), maintain their principled superiority compared with other search schemes, but practically can fail whenever not sufficient memory is available thus making depth-first branch and bound (m-BB) overall more robust. Finally, c) a simple hybrid of variable-elimination and search is superior from a worst-case analysis point of view. I will discuss extensions of these schemes to approximations and to anytime methods.

Collaboration with: Natasha Flerova, Radu Marinescu and Emma Rollon

Rina Dechter is a professor of Computer Science at the University of California, Irvine. She received her PhD in Computer Science at UCLA in 1985, an MS degree in Applied Mathematic from the Weizmann Institute and a B.S in Mathematics and Statistics from the Hebrew University, Jerusalem. Her research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing and probabilistic reasoning. Professor Dechter is an author of Constraint Processing published by Morgan Kaufmann, 2003, has authored over 150 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research and journal of Machine Learning (JLMR). She was awarded the Presidential Young investigator award in 1991, is a fellow of the American association of Artificial Intelligence since 1994, was a Radcliffe Fellowship 2005-2006 and received the 2007 Association of Constraint Programming (ACP) research excellence award. She has been Co-Editor-in-Chief of Artificial Intelligence, since 2011.

Better! Faster! Stronger (theorems)! Learning to Balance Accuracy and Efficiency when Predicting Linguistic Structures


Hal Daumé III

University of Maryland, College Park

Viewed abstractly, many classic problems in natural language processing can be cast as trying to map a complex input (e.g., a sequence of words) to a complex output (e.g., a syntax tree or semantic graph). This task is challenging both because language is ambiguous (learning difficulties) and represented with discrete combinatorial structures (computational difficulties). I will discuss some approaches we've been developing to build learning algorithms that explicitly learn to trade-off accuracy and efficiency, applied to a variety of language processing phenomena. Moreover, I will show that in some cases, we can actually obtain a model that is faster and more accurate by exploiting smarter learning algorithms. And yes, those algorithms come with stronger theoretical guarantees too. This is joint work with Jason Eisner, Jiarong Jiang, He He, Adam Teichert and Tim Vieira.
Hal Daumé III is an assistant professor in Computer Science at the University of Maryland, College Park. He holds joint appointments in UMIACS and Linguistics. He was previously an assistant professor in the School of Computing at the University of Utah. His primary research interest is in developing new learning algorithms for prototypical problems that arise in the context of language processing and artificial intelligence. This includes topics like structured prediction, domain adaptation and unsupervised learning; as well as multilingual modeling and affect analysis. He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern California with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn't like to use C (instead he uses O'Caml or Haskell). He doesn't like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.

Learning to Search the Space of Structured Outputs


Alan Fern

Oregon State University

Researchers in Artificial Intelligence have long studied heuristically guided search in combinatorial search spaces for optimization and constraint satisfaction problems. However, little has been done in the area for structured prediction to leverage those ideas. This talk will review recent work in that direction that develops a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then uses a separately learned cost function C to select a final prediction among those outputs. We can decompose the regret of the overall approach into the loss due to H not leading to high quality outputs, and the loss due to C not selecting the best among the generated outputs. Guided by this decomposition, we minimize the overall regret in a greedy stage-wise manner. Experiments on several benchmark domains show that our approach significantly outperforms the state-of-the-art methods.
Alan Fern is an Associate Professor of Computer Science at Oregon State University. He received his Ph.D (2004) and M.S (2000) in Computer Engineering from Purdue University under the direction of Robert Givan, and his B.S (1997) in Electrical Engineering from the University of Maine. His research interests span a range of topics in artificial intelligence, including machine learning and automated planning, with a particular interest in work falling at their intersection. He is also interested in applications of AI, including AI for real-time strategy games and activity recognition from video of American football and other sports.