I gave a talk at the Data Science Conference on on building a realtime machine learning system with Kafka, Streamparse, and Storm. You can see the video on Youtube
I recently gave to the Duke Big Data Initiative entitled
Dr. Hopper, or How I Quit My Ph.D. and Learned to Love Data Science. The talk was well received, and my slides seemed to resonate in the Twitter data science community.
I’ve started a long-form blog post with the same message, but it’s not done yet. In the mean time, I wanted to share the slides that want along with the talk.
I gave a talk last week at Research Triangle Analysts on understanding probabilistic topic models (specificly LDA) by using Python for simulation. Here’s the description:
Latent Dirichlet Allocation and related topic models are often presented in the form of complicated equations and confusing diagrams. Tim Hopper presents LDA as a generative model through probabilistic simulation in simple Python. Simulation will help data scientists to understand the model assumptions and limitations and more effectively use black box LDA implementations.
You can watch the video on Youtube:
I gave a talk at Pydata Carolinas 2016 on Sharing Your Side Projects Online. Here’s the abstract:
Python makes it easy to create small programs to handle all kinds of tasks, and tools like Github make it easy and free to share code with the world. However, simply adding a *.py to a Github repository (or worse: a zip file on your personal website) doesn’t mean other Python programmers will be able to run and use your code.
For years, I’ve written one-off scripts and small programs to automate personal tasks and satisfy my curiosity. Until recently, I was never comfortable sharing this code online. In this talk, I will share good practices I’ve learned and developed for sharing my small projects online.
The talk will include tips on writing reusable scripts, the basics of Git and Github, the importance of READMEs and software licenses, and creation of reproducible Python environments with Conda.
Besides making your code more usable and accessible to others, the tips in this talk will help you make your Github profile a valuable component of your online résumé and open the door for others to improve your programs through Github pull requests.
The video is now online. I sincerely hope others find it valuable.
I gave a talk at the Research Triangle Analysts meetup about Pyspark. It wasn’t recorded, but you can see the IPython notebook I presented from.
I gave a talk at a recent Research Triangle Analysts meetup on Scikit-learn, the excellent machine learning libary for Python. The talk wasn’t recorded, but you can see the IPython notebook that I presented from.
I presented at INFORMS 2012 on Bringing Operations Research into the 21st Century with Online Video. You can see the recording on Youtube.
I gave a talk at PyCarolinas 2012 about using Pickle and Redis to persist data with Python. It wasn’t recorded, but you can see the IPython notebook I presented from.