David Hogg Appointed Deputy Director of the Center for Data Science

David W. Hogg, NYU Associate Professor, Department of Physics, Director of Undergraduate Studies—Physics, Center for Cosmology and Particle Physics, and Adjunct Senior Staff Scientist at the Max Planck Institute for Astronomy, Heidelberg, Germany, has recently been appointed Deputy Director of the Center for Data Science, serving alongside Interim Director Raghu Varadhan.

Professor Hogg is also Executive Director at NYU of the Moore-Sloan Data Science Environment Initiativea five-year partnership between New York University, the University of California, Berkeley and the University of Washington, supported by a $37.8 million grant from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation. This ambitious, far-reaching initiative seeks to bring together scientists and engineers in methodological domains and the sciences to bring about breakthrough research, interdisciplinary collaboration and scientific discovery. 

Recently, Professor Hogg discussed his new appointment, NYU inter-departmental collaboration and the scientific significance of a Washington Square playground.

Why did you agree to become Deputy Director of the Center for Data Science?

Primarily, of course, because I am a data scientist and I want to spread the news. My research is in astronomy, but it is interdisciplinary with statistics, applied mathematics and machine learning. NYU’s entire data science effort is extremely timely; there are many scientific projects across the University that could benefit from interdisciplinary data science approaches.

Another reason I agreed to take on the role as Deputy Director is that I put a lot of time into getting the Center for Data Science started and supporting the Moore-Sloan Data Science Environment here at NYU, and I want that work to result in a smoothly running and sensibly operating Center for Data Science. Raghu Varadhan, the Interim Director, felt that it was important to have continuity between the former director, Yann LeCun, and whoever succeeds him as the next permanent director. I agreed to help with that continuity.

One more reason I agreed to be the Deputy Director is that the Center for Data Science is not a Department of the University. It cross-cuts across the whole University, not just Departments but across Schools. It includes the Schools of Engineering, Journalism, Education, Nursing, Medicine, Dental, Law, the social sciences….everything. It is effectively a partnership across and among Departments, and among members of Departments.

One of the things this non-Department requires is an academic structure that is consistent with NYU rules and the principles of academia. For instance, an important issue that is starting to come up is that faculty who are teaching in the Data Science Master’s program are being asked in their home Departments, “Why are you teaching in the Center for Data Science and not for us?” This is one of the priorities we’re tackling right now: creating ways for the Center for Data Science to be beneficial to the Departments, for their faculty to be involved in it, and for the Departments to feel the benefit of data science.

What does a department have to gain from lending its faculty to the Center for Data Science?

That is the question. And there are several answers and aspects to it. The most trivial is that the Center for Data Science is a revenue-generating part of the University. Revenue can get shared with the Departments to participate, so one aspect is purely that we can help the Departments with their costs.

Another aspect is that when a faculty member teaches in the Center for Data Science, we cross-list the course with the home Department and make it beneficial for the graduate students and senior undergrads of that Department to take it. For instance, when Foster Provost teaches in Data Science, his courses are cross-listed with Stern, and Stern graduate students take them alongside Center for Data Science grad students.

The Center for Data Science courses are available to everybody in the University. One of the benefits of having a faculty member in your Department teaching in the Center for Data Science is that your students become more aware of the activity in data science. That is one of the ways we are propagating information about data science to the broader University.

That leads to a third point: In the long run, Departments benefit enormously from being intellectually engaged in the Center for Data Science because every scientific Department of the University is doing empirical research with complex data. The Center for Data Science is going to make that research work better, make connections that make graduate students more capable, make projects more successful, make measurements more accurate and permit new kinds of discoveries.

How will you get the Departments to realize the benefit of the Center for Data Science and participate in it?

One of the other reasons I agreed to be the Deputy Director is that the Center for Data Science is the home unit for the Moore-Sloan Data Science Environment, of which I am the Executive Director. The grant from the Moore and Sloan Foundations is partially to understand how to propagate data science activities that are inherently interdisciplinary around the University. In essence, how do you get everybody in the University to benefit from each other’s work in data science? How do you get somebody in the Medical School to benefit from research that’s going on in the Physics Department? There are huge overlaps in the way people work with their data but they’re not being discovered, exploited and understood because people in different Departments don’t talk to each other, at least not as much as they should.

So how do we get the Departments to realize the benefit of the Center for Data Science and want to be a part of it? One of the reasons the Moore and Sloan Foundations chose NYU as one of the three awardees for this grant is that they saw the opportunity to have significant influence as our university considers these things.

What are your plans for getting the word out?

In the short term, we are hosting events on different scales, both large and small. The first one, held in March, was an intimate event where we invited people in different Departments whom we know are interested in data science. Jennifer Hill gave a very nice lecture about causality and how to infer from data that X causes Y. It got people who already know about data science to come together and think more deeply about something they’re already involved with. We also want to do some big events that are very broad, symposium-like, where everyone at NYU is encouraged to come and we introduce them to data science.

We also want to organize events that are domain-specific, such as ways that data science impacts biology. Interestingly, when the Center for Data Science was first launched, one of the faculty members of the Physics Department said, “I don’t know what data science is. Can you explain it?” I promised her that I would do an event in the Physics Department explaining to the physicists what data science is and how I think data science will impact this Department and improve physics. In every Department, we have data science-interested faculty. We can hold similar events where someone inside that Department can say, “Here’s how data science benefits dentistry,” for example.

Right now, people around the University are aware of the Center for Data Science but don’t fully know what it is and what it can do for them. And once that is understood, I think a lot of Departments will be very excited about being involved in the Center for Data Science, both from a teaching perspective and from an intellectual perspective of benefiting from what we’re doing.

Can you explain the “Key Park effect”?

When the Moore and Sloan Foundations decided to give us the Data Science Environment grant, they specifically mentioned a particular playground in Washington Square Village called the Key Park, where faculty kids play. And the reason they mentioned it is because several people they interviewed had started interdisciplinary collaborations by sitting on the bench in the Key Park talking to the person next to them while their kids ran around. The Key Park is a nexus where NYU faculty meet other NYU faculty who are in different Departments and work on different things. The Moore and Sloan people said, “We want to capture that kind of energy and informality, that safe space where new ideas can emerge that are interdisciplinary and not inside the silos of the University.”

Our challenge is to capture this almost magical environment in a way that is more integrated into the institution. For instance, you can’t benefit from the playground if you don’t have a kid! So we want to create a space and a way to interact where people in the University will naturally find new interdisciplinary collaborations without having to take a kid to the Key Park.

If we can figure out how to create an interesting interdisciplinary space where people can talk to each other and new projects can be born, that could affect everything.

What long-term impact do you see this communal space, and in a broader sense, the Center for Data Science, having on NYU?

The Center for Data Science’s permanent space is going to be in the Forbes Magazine building on lower Fifth Avenue, which we will renovate to be appropriate for what we need. One of the things we are thinking about is making it a type of studio/café where people can come in, bring their team and work. Optimally, there might be two teams that could discuss over coffee what they are doing, and maybe they would have some overlap and collaborate on something. Similar to dating, we want to create an environment for people to discover that they click.

First, we need to make this space very attractive for people to come together. Then once they are there, they need a reason to stay. For myself and my style of working, spending time there would make new things happen, help things get done. It would make my research better.

Another idea we’re exploring is whether to have researchers from a certain domain on display on some day in the week or in the year. For instance, “Today the astrophysics group is working here. Feel free to join them, or observe, or ask questions.” People would learn what kind of techniques we use, what kind of data sets we have, where we have trouble, who our experts are, and so on. Then when it’s another domain’s turn, those faculty might say, “Oh, the astrophysicists had this same problem and now we have the solution.”

What’s interesting is that the problems people have in different domains overlap a lot, but they don’t always notice because of their locations in the University and because of traditional academic boundaries. If all the Center for Data Science does is move those boundaries by 10 percent, we will have done a lot of good. There are many people in the empirical sciences who have complex data and are stuck on a data analysis problem that other people have solved, but they just don’t know it. So if we can let people know that there is a solution to that problem, and let the people with the solutions know that those people over there are stuck on that problem, the Center for Data Science will be a huge success. That’s one of the reasons I’m very confident that we are going to have a big impact on the University.

Do you find it gratifying being Director of Undergraduate Studies in the Department of Physics?

I love teaching. That’s the fundamental reason I chose to do research in an academic setting, rather than, say, a national lab or an observatory. One of the things I’ve really enjoyed as the Director of Undergraduate Studies is that we have been making a lot of changes to the Physics Major, making it more experimental so that it involves more time in the lab, more hands-on activities and more data science. Plus, we’ve been getting more and more of our undergraduates involved in original research projects.

As Director of Undergraduate Studies, I’m also Chair of our Curriculum Committee. This ties into the Center for Data Science in the trivial sense that Data Science has a Master’s program, and with that, a curriculum. This means that it needs a Director of Graduate Studies and a Curriculum Committee; one of the first things I did as Deputy Director of the Center for Data Science was establish those two.

I really love my Curriculum Committee because the members are filled with good ideas about how we should be teaching and what changes we should make. And I like bringing that structure to the Center for Data Science because the CDS comprises a very creative faculty and, as we do in the Department of Physics, we should be getting together in a room and arguing about how we are teaching and what we are teaching and why we are teaching. It’s a very exciting time.

 

By ML Ball