Fernando Pereira

Publications

Contact

Teaching

Blog

Personal

Oldies

Links

Penn

Penn CIS

Machine Learning at Penn

Fernando C. N. Pereira

Andrew and Debra Rachleff Professor
Chair, Computer and Information Science
University of Pennsylvania

CSE 120 Fall 2003

We have revised the course extensively for 2003. We use a single programming language, Java, to introduce the basic ideas of computing, rather than starting with O'Caml and switching to Java later. This is a bit tricky as Java was not designed for introductory teaching, but we think we found a way to smooth the transition between introductory material and more advanced topics that are already taught with Java. See the course page for more details.

To Prospective Graduate Students

I receive so many e-mail messages from potential graduate students that if I replied thoughtfully to each I would have no time left for research. I try to reply to specific questions about my research. Anything you have to say about your background, qualifications, and research interests would be better put in your official application to our graduate program. Students are admitted to the program, not into individual research groups, based on the record of their studies and tests, recommendation letters, and the match between their stated interests and the ongoing research in the department. All applications are studied carefully. E-mail from applicants to individual faculty is not necessary.

Research

My main research goal is to develop machine-learnable models of language and other natural sequential data such as biological sequences. Penn, with its strong and growing machine learning group, is the ideal place to pursue that goal. My most recent work has been on finite-state models for text information extraction and speech recognition, but I am also interested in information-theoretic approaches to inducing compact representations of multivariate data, and on bridging the gap between distributional and logical views of natural-language syntax and semantics.

Conditional probability models for information extraction and segmentation

Many sequence-processing problems involve breaking it into subsequences (person name vs other), or labeling sequence elements (parts of speech). Existing probabilistic methods for these tasks, in particular HMMs, have difficulty in dealing with multiple overlapping features of the input sequence. Maximum entropy Markov models were a first approach to this problem, which have now been superseded the more powerful conditional random fields. We are applying these models to text information extraction and gene finding.

Finite-state speech processing

What do regular expressions turn into when we need to assign weights (maybe probabilities) to alternative matches, and to compose pattern matchers? Weighted finite-state transducers. At AT&T, I was involved in developing these as a framework for speech recognition, leading to a creation of a powerful library that has been made available for non-commercial use.

The information bottleneck

How does one quantify the notion of information about something? Given some variables of interest, sources of information about those variables can be compressed while preserving the information about the variables. The tradeoff between compression and information preservation, which we call the information bottleneck, answers the question. Using this model, we can build compact representations of complex relationships, for instance word cooccurrences in text.

Formal semantics of natural language

The syntactic structures of natural-language sentences and their meanings must be linked by a systematic, compositional process for language learning and use to be possible. However, this form of compositionality is more subtle than those used in logical and programming languages. Linear logic turns out to be a good metalanguage to describe the natural-language syntax-semantics mapping.

Bio

I was born and raised around Lisbon , Portugal. I started college studying electrical engineering but majored in mathematics. While in college, I worked part-time for a architectural CAD project at LNEC, a government engineering laboratory. After graduating, I stayed at LNEC for two years as a systems programmer and administrator, but got also involved in urban traffic modeling, artificial intelligence and logic programming. In 1977 I took a scholarship from the British Council to study artificial intelligence at the University of Edinburgh. There I worked on natural-language understanding and logic programming, and for a while again in architectural CAD. I was involved in creating the first Prolog compiler (for the PDP-10), and I also wrote the first widely-used Prolog interpreter for 32-bit Unix machines. I graduated in 1982 and joined the Artificial Intelligence Center of SRI International in Menlo Park, CA, where I worked on logic programming, natural-language understanding and later on speech-understanding systems. During 1987-88, I headed SRI's Cambridge, England, research center. I joined AT&T in the summer of 1989, were worked on speech recognition, speech retrieval, probabilistic language models, and several other topics. From 1994 to 2000, I headed the Machine Learning and Information Retrieval department of AT&T Labs -- Research. I spent the 2000-2001 academic year as a research scientist at WhizBang! Labs, where I developed finite-state models and algorithms for information extraction from the Web.