Amortized Integer Linear Programming Inference
University of Illinois, Urbana-Champaign
Computational approaches to problems in Natural Language Understanding and Information
Extraction are often modeled as structured predictions – predictions that involve assigning
values to sets of interdependent variables. Over the last few years, one of the most successful
approaches to studying these problems involves Constrained Conditional Models (CCMs), an
Integer Learning Programming formulation that augments probabilistic models with declarative
constraints as a way to support such decisions.
I will present research within this framework, focusing on learning and inference issues and, in
particular, will present recent results on amortized inference – learning to accelerate inference
during the life time of the learning system.
Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute
at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the
director of a DHS Center for Multimodal Information Access & Synthesis (MIAS) and holds
faculty positions in Statistics, Linguistics, and at the School of Library and Information Sciences.
Roth is a Fellow of the ACM, AAAI and ACL, for his contributions to Machine Learning and to
Natural Language Processing. He has published broadly in machine learning, natural language
processing, knowledge representation and reasoning, and learning theory, and has developed
advanced machine learning based tools for natural language applications that are being used
widely by the research community.
Roth is the Associate Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR) and
will serve as Editor-in-Chief for a two-year term beginning in 2015. He was the program chair
of AAAI’11, ACL’03 and CoNLL'02, has been on the editorial board of several journals in his
research areas and has won several teaching and paper awards. He has also given keynote talks
and presented tutorials in some of the major conferences in his research areas.
Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his
Ph.D in Computer Science from Harvard University in 1995.
Learning Discriminative Value of Information for Structured Prediction
University Washington, Seattle
Discriminative methods for learning structured models have
enabled wide-spread use of very rich feature representations.
However, the computational cost of feature extraction is
prohibitive for large-scale or time-sensitive applications, often
dominating the cost of inference in the models. Significant
efforts have been devoted to sparsity-based model selection to
decrease this cost. Such feature selection methods control
computation statically and miss the opportunity to fine-tune
feature extraction to each input at run-time. We address the key
challenge of learning to control fine-grained feature extraction
adaptively, exploiting non-homogeneity of the data. We propose an
architecture that uses a rich feedback loop between extraction
and prediction. The run-time control policy is learned using
efficient value-function approximation, which adaptively
determines the value of information of features at the level of
individual variables for each input. We demonstrate significant
speedups over state-of-the-art methods on two challenging
datasets. For articulated pose estimation in video, we achieve a
more accurate state-of-the-art model that is simultaneously
many times faster while using only a small fraction of possible
features, with similar results on an OCR task.
Ben Taskar is the Boeing Associate Professor in Computer Science
and Engineering at the University of Washington. He received his
doctoral degree in Computer Science from Stanford University. His
research interests include machine learning, natural language
processing and computer vision. He has been awarded the Sloan
Research Fellowship, the NSF CAREER Award, and selected for the
Young Investigator Program by the Office of Naval Research and
the DARPA Computer Science Study Group. His work on structured
prediction has received best paper awards at NIPS and EMNLP
On Finding First and m-best Solutions in Probabilistic Graphical Models
University of California, Irvine
Inference tasks over probabilistic graphical models can be reduced to either maximization queries such as “most probable explanation” (MPE) that seek a high probability configuration of the model, or summation queries that compute the marginal distribution over the variables, or a combination such as the MAP queries. These queries are instrumental for both learning the models and for applications using it and while all are NP-hard they are normally solved (mostly approximately) using either message-passing schemes (e.g., the junction-tree algorithm, generalized belief propagation, variable-elimination) or by search (e.g., AND/OR search, MCMC schemes) or by a hybrid of the two.
The focus of this talk is on an extending the single-solution maximization queries into m-best queries. These queries seek to find several high-probability configurations of the model, which can be used to assess the stability or robustness of the “best” configuration and give an idea of the uncertainty associated with optimal solutions. This task is apparently highly relevant in various applications including speech-recognition and vision.
Following a brief overview of state-of the art AND/OR search schemes for MPE and on their use of mechanically generated heuristics such a mini-buckets and linear-relaxation-based soft-arc-consistency ( a version that won the 2011 Pascal competition), I will describe recent m-best algorithms extending both variable-elimination and AND/OR graph search schemes (e.g. AND/OR Branch and Bound). I will analyze, compare and contrast the various schemes and will show, a) that search is superior to pure variable-elimination and b) that Best-First schemes (e.g., m-A*), maintain their principled superiority compared with other search schemes, but practically can fail whenever not sufficient memory is available thus making depth-first branch and bound (m-BB) overall more robust. Finally, c) a simple hybrid of variable-elimination and search is superior from a worst-case analysis point of view. I will discuss extensions of these schemes to approximations and to anytime methods.
Collaboration with: Natasha Flerova, Radu Marinescu and Emma Rollon
Rina Dechter is a professor of Computer Science at the University of California, Irvine. She received her PhD in Computer Science at UCLA in 1985, an MS degree in Applied Mathematic from the Weizmann Institute and a B.S in Mathematics and Statistics from the Hebrew University, Jerusalem. Her research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing and probabilistic reasoning. Professor Dechter is an author of Constraint Processing published by Morgan Kaufmann, 2003, has authored over 150 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research and journal of Machine Learning (JLMR). She was awarded the Presidential Young investigator award in 1991, is a fellow of the American association of Artificial Intelligence since 1994, was a Radcliffe Fellowship 2005-2006 and received the 2007 Association of Constraint Programming (ACP) research excellence award. She has been Co-Editor-in-Chief of Artificial Intelligence, since 2011.
Better! Faster! Stronger (theorems)! Learning to Balance Accuracy and Efficiency when Predicting Linguistic Structures
University of Maryland, College Park
Viewed abstractly, many classic problems in natural language processing can be cast as trying to map a complex input (e.g., a sequence of words) to a complex output (e.g., a syntax tree or semantic graph). This task is challenging both because language is ambiguous (learning difficulties) and represented with discrete combinatorial structures (computational difficulties). I will discuss some approaches we've been developing to build learning algorithms that explicitly learn to trade-off accuracy and efficiency, applied to a variety of language processing phenomena. Moreover, I will show that in some cases, we can actually obtain a model that is faster and more accurate by exploiting smarter learning algorithms. And yes, those algorithms come with stronger theoretical guarantees too. This is joint work with Jason Eisner, Jiarong Jiang, He He, Adam Teichert and Tim Vieira.
Hal Daumé III is an assistant professor in Computer Science at the University of Maryland, College Park. He holds joint appointments in UMIACS and Linguistics. He was previously an assistant professor in the School of Computing at the University of Utah. His primary research interest is in developing new learning algorithms for prototypical problems that arise in the context of language processing and artificial intelligence. This includes topics like structured prediction, domain adaptation and unsupervised learning; as well as multilingual modeling and affect analysis. He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern California with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn't like to use C (instead he uses O'Caml or Haskell). He doesn't like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.
Learning to Search the Space of Structured Outputs
Oregon State University
Researchers in Artificial Intelligence have long studied
heuristically guided search in combinatorial search spaces for
optimization and constraint satisfaction problems. However, little
has been done in the area for structured prediction to leverage
those ideas. This talk will review recent work in that direction
that develops a new framework for structured prediction called
HC-Search. Given a structured input, the framework uses a search
procedure guided by a learned heuristic H to uncover high quality
candidate outputs and then uses a separately learned cost function C
to select a final prediction among those outputs. We can decompose
the regret of the overall approach into the loss due to H not
leading to high quality outputs, and the loss due to C not
selecting the best among the generated outputs. Guided by this
decomposition, we minimize the overall regret in a greedy
stage-wise manner. Experiments on several benchmark domains show
that our approach significantly outperforms the state-of-the-art
Alan Fern is an Associate Professor of Computer Science at Oregon
State University. He received his Ph.D (2004) and M.S (2000) in
Computer Engineering from Purdue University under the direction of
Robert Givan, and his B.S (1997) in Electrical Engineering from the
University of Maine. His research interests span a range of topics
in artificial intelligence, including machine learning and
automated planning, with a particular interest in work falling at
their intersection. He is also interested in applications of AI,
including AI for real-time strategy games and activity recognition
from video of American football and other sports.