I Basically Can't Hire People Who Don't Know Git

In 2014, Sebastian Gutierrez published a collection of interviews entitled Data Scientists at Work. My friend and former boss Eric Jonas posted his interview on his website. It’s full of gems.

On engineering skills required for data science work, Eric says,

On the industry side, I think that the ability to do software engineering is something that is very important, but isn’t really taught. You don’t actually learn it as a computer science undergraduate, and you certainly don’t learn it as a graduate student. So for me it’s very important that someone has learned it somehow—either by themselves or from someone else. I basically can’t hire people who don’t know Git.

On someone trained in pure mathematics learning to analysis of real-world data, Eric says:

…data analysis is so much messier than actual math. I have friends who work on these topology-based approaches, and I’m like, “You realize these manifolds totally evaporate when you actually throw noise into the system. How do you think this is really going to play out here?” So I would much rather someone be computationally skilled. I’m willing to trade off what their Putnam score was for how many open source GitHub projects they’ve committed to in the past.

I tried to argue this same point in an earlier post.

On applying academic research, Eric observes:

For example, when I evaluate machine learning papers, what I am looking to find out is whether the technique worked or not. This is something that the world needs to know—most papers don’t actually tell you whether the thing worked. It’s really infuriating because most papers will show five dataset examples and then show that they’re slightly better on two different metrics when comparing against something from 20 years ago. In academia, it’s fine. In industry, it’s infuriating, because you need to know what actually works and what doesn’t.

I have suggested before that we need a good website for sharing implementations of academic algorithms and providing a forum for discussion of whether or not the algorithm actually works.

I highly recommend reading Eric’s full interview.