DataWrangl Dreaming in Data

Most Important Data Science Skill? (Re)learning to Learn, Perhaps

“What, in your opinion, is the most important skill for data scientists to have?”

Fortunately, I had a minute to think. I was one of four data scientist panelists, talking to a meetup group of mostly aspiring data scientists, about how to get into the profession and how to be successful. I was on the opposite end from the microphone, so I had a few minutes to gather my thoughts while the other panelists answered.

Storytelling: The Power to Influence in Data Science

Note: This is a lightly edited excerpt from my recent interview with Kaggle.
Humans are story-telling animals. We love a good story. Stories, whether in the form of movies, epics, songs, or myths: they capture us, they move us, they bind together generations, get us in touch with deeper messages. Everyone knows how to tell stories, at least a little bit. But, certainly some of us seem to have more storytelling talent than others. That group of five or six friends you hung out with in college: there was probably one person in that group that everyone loved to listen to. They could tell stories for hours on end – funny stories, stories that made you think, stories that made you cry, stories that changed your life. This person, they didn’t necessarily know more about the world than you. But they knew how to craft their message in an interesting manner, and likely had an outsized influence on you and your circle of friends.

Data Science: Beyond the Kaggle

A few weekends ago, on a snowy Saturday in April (not uncommon in Denver), I signed into Kaggle for the first time in several months, looking to play around with some competition data in order to while away the chilly day. My kids’ endless chatter and my wife’s disapproving looks faded into the background, and I blissfully wrangled data from the Expedia Hotel Recommendation competition for several hours. I submitted a few entries, slowly climbing the leaderboard, and then finally I got up to help with my family duties.

My Post on the Comverge Blog

Just a quick plug here. I work as a Data Scientist doing energy and demand response forecasting for Comverge in Denver, Colorado. The marketing team asked me to put together a high-level overview of how we are using Machine Learning at the company. If you are interested in learning more about how we use Data Science and Machine Learning techniques at Comverge, please click through to see the blog post. I used R to do all my modeling and make the “Model 1” and “Model 2” charts. My engineering team created the nice user interface using D3.

Exploring 2014 Denver B-cycle Ridership

Denver, Colorado is considered to be one of the most bikeable cities in America. In addition to the fine bike lanes, bike trails, and mostly favorable weather (300 days of sunshine, says the possibly dubious but sunny claim), Denver is also a B-cycle city, with bicycle rental kiosks dotting the sidewalks near the city center. For this short study, I obtained Denver’s 2014 B-cycle trip data, and used it along with some other data sources to see if I could model hourly ridership across the Denver B-cycle network. My study indicated that most calendar and clock variables are highly significant when predicting ridership, and weather variables such as temperature and amount of cloud cover appear to be as well. This post details how I obtained the data, how I merged data from different sources, shows some explorations of the data, and finally shows how I created a regression model of system ridership.