Currently on sabbatical from the Max Planck Institute, Bernhard Schölkopf is spending several months in New York at the Courant Institute of Mathematical Sciences. On May 8, 2013, Professor Schölkopf sat down with reporter M.L. Ball to discuss the field of machine learning, its relationship to large data sets, and the intimidation factor inherent in delivering Courant Lectures.
Please describe your experience of delivering this year’s Courant Lectures.
It was certainly a great honor; they asked me two months ago. It was quite intimidating to be invited and then to look at the list of past Courant speakers. The very first one was Eugene Wigner  who gave a very famous talk, “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” I had read that before. So that was quite scary.
I started off in mathematics and physics but now I’m in a field that’s closer to computer science so I don’t often talk to a mathematical audience – every so often, because now mathematics is getting interested in machine learning and empirical observations, certainly on the side of applied mathematics. I try to imagine what is of interest to mathematicians; they’re interested in slightly different questions.
The first Courant lecture was supposed to be of broad interest so I tried to make it very understandable, because I’ve found when you’re working in one part of mathematics and you listen to a speaker from another field and that speaker doesn’t really adapt the presentation or material, you can be completely lost. The field can be so rich and specialized that you can easily be lost if you have a different mathematical background. I hope people found it interesting; afterward I got positive feedback. Quite a few came back the next day for the more specialized talk.
What brought you to Courant for three months, other than giving the lectures?
My wife and I had been planning to do a sabbatical for quite some time and the kids’ ages seemed to be right to do it now. I convinced her that we should go to America; we were looking at Berkeley and New York. I know some colleagues here who are very interesting, so I have three hosts: one is David Hogg from cosmology and astrophysics at NYU; another is Rob Fergus in computer vision at Courant; and the third is Yann LeCun, machine learning and head of the new Center for Data Science. I did the main part of my Ph.D. at Bell Labs in New Jersey in a department headed by Yann almost 20 years ago, so it is nice to come back to a place headed by him again! New York has always my favorite place in America. I lived here for a year in 2001 when September 11 happened.
I’m now in machine learning but actually I started with physics and mathematics. The reason for studying physics was that I always wanted to become an astronomer; that was my scientific interest as a child. I was very much fascinated by that. Then I got sidetracked by a chain of events that led me to this field of machine learning that I didn’t know existed when I was a student. But when I found out about it, I was very much fascinated by it.
I found out that there are people using mathematical tools to study how we find laws or regularities in the world, so we can make observations and also predict what’s going to happen. When I first noticed that this exists, I was amazed that it was a field of mathematical study. I wanted to learn about it so I got into that field. It was a good decision because this field is relatively young. Almost all the people who developed it are still around. It has been developing fast; it’s like a little revolution in science.
Could you define machine learning.
Traditionally in science, you make some measurements, do some experiments, measure some quantities, and then try to come up with some relatively simple laws that describe these measurements. You model your data and develop theories.
In machine learning, the story is a little bit different. We can find regularities in data that humans cannot find. These regularities may not be simple in mathematical terms, because you are looking at systems that are very complex, maybe very high dimensional. We have to look at many quantities at the same time. We don’t understand their interplay and how they produce a certain outcome, but we have training observations. We have to observe the system for a while, measure many quantities, and see which happen afterward. Then we put them into a learning algorithm that will end up giving us a complicated expression — still a mathematical expression but nothing that looks simple.
You can also think of it as a computer program that is predictive, that captures something about the real world that usually does not look simple but is not random. It does make correct predictions. It’s a way of finding laws in the real world but the laws are different from the laws of physics; they look more complicated. Bertrand Russell expressed this beautifully: “Physics is mathematical not because we know so much about the physical world, but because we know so little; it is only its mathematical properties that we can discover.”
The actual laws that we find in machine learning are not the thing that we study in theoretical physics. Instead of writing papers about the law that we found, we write papers about what procedures we used to arrive at a law, given empirical observations. We study the methodology by which people arrive at conclusions.
I originally started studying mathematics, physics and philosophy. I was always interested in the epistemology and philosophy of science, so I was interested in this issue of how do we get knowledge about the world. What I like about machine learning is that it’s just a “mathematization” of this branch of philosophy. It’s a very exciting field to work in.
How does machine learning apply to the real world?
In a general sense, machine learning is applicable whenever we want to build a system that’s in some sense intelligent, not just hand-coded or doing some very simple kind of behavior. Something that’s a little bit more complicated and that is able to learn from experience.
Most applications of machine leaning are software systems: the Internet, for example, where you have a place like Amazon that has to have a method that sets the price. Such systems are now automatic, using machine learning tools to decide what price they can charge for an item.
Another application is search engines that have to have methods for how to rank results from a search that are consistent with what people actually look for. There’s no simple algorithm or formula for that. It’s a complicated system that’s based on lots of training examples of which link you clicked on, in which order, and what that means. These are all empirical observations about some complex underlying regularity that’s not easy to model but that can now be automatically extracted from the data.
How does machine learning relate to data science?
Nowadays that we have such systems that work on huge data sets, possessing these data sets gives you a lot of power, in some sense. Rob Fergus here at Courant is a computer vision guy, including image recognition systems. He now has probably the best machine learning-based object recognition system in the world. Courant is a great center for machine learning, but Rob doesn’t have the largest data sets to train the group on. These are owned by companies like Google.
When you do a Google image search — say, you’re looking for an image of a squirrel — you will get a list of results and you’ll click on one that shows a squirrel. Google records what you clicked on. That implies that they have label data sets for all objects and categories that people search for, lots of data sets. People do billions of search queries, and of course these are not public. So it’s not just a question of who has the best algorithms to do this kind of learning, but also who has all this data. It raises a lot of interesting questions about data science.
We have some problems, for instance in biology, where if you look at only a few hundred training examples, the system will perform at chance level; it will not be better than random guessing. But if we have a few million training examples, it’s close to perfect. There is a regularity in the world, a very complicated one involving a few hundred variables, that will not be visible unless you look at millions of training examples. This is a paradigm shift.
The issue of large data sets connects to this question: what constitutes predictable structure in the world. Something might be predictable but not comprehensible. In the past, our way to predict things would have been to first comprehend, then build a model, and then use that model for prediction. But with computers and machine learning algorithms, we can now bypass that. It’s possible with large data sets; that’s why data is suddenly so important.
Your thoughts on NYU launching the Initiative for Data Science and Statistics and the Center for Data Science.
I think that’s very, very exciting. There are a few other places that are trying to do this but America is ahead of Europe in this respect. If you want to do something intelligent with data, to extract knowledge that will be useful, it’s hard to do that without machine learning or statistics.
I believe machine learning may allow us to discover, or at least predict, other properties of the world — properties that have evaded simple mathematical descriptions, for instance in biology.
While these regularities may not be comprehensible, we hope that there may be comprehensible methods for learning or inferring them. That’s why this field aspires to be called ‘data science,’ and NYU is pioneering this area both in computer science and in the natural sciences.
What do you expect to gain from your time here at Courant?
During the three months that I’m here, I will give a few talks in different places, including one class of a course on computer vision. The main reason for my sabbatical is to do research.
What’s really nice about the NYU Center for Data Science is that there are some other places that have machine learning departments, but here, it involves not just machine learning people but also people from the sciences, like physics (David Hogg). He wouldn’t call himself a machine learning guy but he’s very much interested in inference from data.
I think it’s nice to connect data to the actual sciences. Nowadays in machine learning, a lot of it is about making money and a lot of the good students go to Internet companies or Wall Street. It’s quite a fundamental problem: how do we make inferences from data, and get useful knowledge. It shouldn’t all be used just to make more money.
What’s different about the Courant Institute?
I talk to people here every day — that’s what’s special, discussing different things. When you bring people together who normally don’t talk, sometimes something new comes up. It’s extremely inspiring; nowadays a lot of research happens in teams. It’s quite rare that someone just works alone. You’re much more productive if you can discuss with a lot of people who think slightly differently. Something comes up that wouldn’t have happened otherwise.
By M.L. Ball