A Subjective and Anecdotal FAQ on Becoming a Data Scientist

Three years ago, I wrote a post called “How I Became a Data Scientist Despite Being a Math Major”.

When I wrote the post, I thought I was explaining that that was about all I know about how someone can become a data scientist: that is, I shared my subjective experience. I intended to communicate my uncertainty about the path others should take. But, given the number of people who have read the post and emailed asking for my advice on becoming a data scientist, that message wasn’t clear. With most of these emails, I have felt bad because I simply don’t know the answers to the questions people have. But because so many people seem to have these questions, I decided to consolidate my uncertainty here. I hope you find it helpful.

A note before the Q&A: one reason it is so hard to advise someone on becoming a “data scientist” is that “data scientist” is an ill-defined job title. At some companies, data scientists are building machine learning models and running them in a high performance production system. At other data companies, data scientists are business analysts running SQL queries and visualizing the results with Tableau. Some data scientist are doing complex experimental design while others are mostly moving files around S3 buckets. Some data science roles require deep domain expertise, others only need some programming skill and knowledge of Scikit-Learn. As you shape your own path to data science, take some time to think specifically about the kind of work you are interested in doing; this will shape your preparation.

Should I get a masters degree?

My sense is that a good masters degree in a technical field is a valuable degree. It takes a fraction of the time a Ph.D. takes, yet you can learn a lot in 3 or 4 semesters of coursework, and the degree is viewed favoribly by employers. The quality and curriculum of masters programs varies wildly, so you’ll want to do your due diligence before diving in.

That said, I’m wary of people going into deep debt for a masters degree. If you can get a teaching or research position that waives your tuition, do it. Consider going a good school on in-state tuition instead of a great school for $100,000. If you have publicly subsidized graduate education (as in many European countries), go for it!

Will a masters degree help me become/be a data scientist?

I don’t know, but it has helped me. As I said in my earlier post, I learned a lot about algorithms, probability models, math, and machine learning that has been invaluable. Grad school also gave me the time and inspiration to learn R and Python which have played important roles in my career?

Should I get a masters degree in computer science? Will it help me become a data scientist?

I think a masters degree in computer science will be likely to pay off in the long run. It may not help you get a job as a data scientist, but it would undoubtedly help you in a data science job.

I have half of a computer science masters, and I sometimes wish I had finished it.

Should I get a masters degree in operations research?

I’m ambivalent about operations research. As a discipline, operations research has been in an identity crisis. As a curriculum, many operations research programs are full of content valuable for data scientists. As a signal to potential employers, operations research is relatively unknown and won’t mean as much as a degree in “machine learning” or similar.

Did you feel that operations research was too theoretical and that you had to hustle outside the classroom and build stat/ML skills separately?

Operations research programs have wildly different curriculums. Mine didn’t do a great job preparing me for real-world applications, but that might be better learned on the job anyway. My program allowed me to build statistics and ML expertise only because I had flexibility in the courses I could select; I was able to take a handful of stats/ML related classes. Other programs might not offer that.

Should I get a masters degree in statistics?

If you have strong programming/software engineering skills (or have another means of building them), a statistics degree could be valuable; as with anything, I’m sure the quality of statistics masters degrees varies greatly, and it’s worth trying to find a good one.

Should I get a Ph.D.?

I don’t think it’s worth it for most people. It’s also not necessary for the vast majority of data science jobs. I have a dedicated website to help you answer this question.

There are a lot of people with Ph.D.’s in data science roles. It’s possible that is more a result of the large difference in the number Ph.D.’s verses the small number of permanent faculty positions in American universities. People with Ph.D.’s need jobs; many have skills overlapping with data science; they make their way from academia to data science.1

Can I become a data scientist with only an undergraduate degree?

Many others have. I have known several who don’t even have college degrees.

I’m in school should I take X class?

If it’s linear algebra, definitely. Otherwise, I’m not sure. Among other reasons, I’ve been often surprised how classes I never expected to apply have helped me years later; it’s hard for me to know what other classes would’ve helped me had I taken them.

Take the best professors you can (note I didn’t say easiest). Talk to older students you admire about different classes and professors. Don’t let your schooling interfere with your education.

Can you evaluate my qualifications for being a data scientist?

Not very well. In fact, I think it’s pretty challenging even for people who interview data science candidates. I have tried to share the things that helped qualify me, and I imagine those things would be valuable for you as well.

I would suggest trying to evaluate your qualifications against specific jobs (or job descriptions) you are interested in. Are you more interested in analysis or production systems? Are you interested cybersecurity applications? Ad markets? Social good? Journalism? Finance? Healthcare? Self-driving cars? Find job postings for roles and look at the qualifications. Find people in these roles on Linkedin and look at their qualifications and job history.

One other note: just because you’re qualified, doesn’t mean you will get job offers. Not getting an offer after interviewing might reflect more on poor interviewers than being a poor candidate.

How do I show-on-my-resume/demonstrate-to-employers that I am qualified to be a data scientist?

My best advice is to work on interesting and relevant things and tell people about them. I don’t know how to be more specific.

How can I get a job where I can do more applied math?

Even in data science jobs, a lot of the work is far removed from interesting math. My hypothesis is that relatively few people get to spend a substantial part of their job thinking about interesting math. I’ve almost never gotten to spend as much time doing math as I would like.

What skills would you recommend I develop if I hope to become a data scientist?

You can never be a good enough writer, communicator, software engineer, linear algebraist, or applied statistician. Tenacity is important too, though I’m not sure how you develop it.

Am I doing the right things to build a successful career in data science?

I don’t know the answer to that question. I have tried to share the things that have been valuable for me in the preceding answer and in my blog post.

I’m in a career as a teacher/developer/analyst/etc. Can you advise me on how I can transition to be a data scientist?

I have tried to share the things that worked for me; you might be able to emulate them, but I can’t guarantee they’ll work for you. I would encourage you to stay curious, keep learning, network (via the internet and face to face), and keep applying for jobs.

Can we find a time to talk on the phone about this?

Unfortunately, I don’t have the time and energy to do this.

Thanks to Roy Keyes, Vicki Boykis, and Justin Bozonier for helpful feedback on a draft of this post.

  1. I thank my friend Dr. Roy Keyes for this insight. [return]