DataWrangl Dreaming in Data

Reviewing the UW Data Science Certificate Program

In December, I completed the third of three courses in the University of Washington’s Professional & Continuing Education’s (PCE) Certificate in Data Science program. All courses in this program were 10 weeks long, with 3-hour lectures held once a week. Since I live in Denver, I could not physically attend the meetings, held at the UW campus in Seattle, so I was part of the online cohort. This is my review of the program, and hopefully may be useful for others that are considering the program.

Why Did I Enter the Certificate Program?

Time for a Change

In early 2014 I decided I had had enough. I had been doing data-related work for the federal government for a little over 8 years, but I was ready for a new challenge. The work was good, but I was tired of the crushing bureaucracy, and feeling like my career was no longer progressing in the direction I desired (someone unfortunately decided that I had a little bit of aptitude for management, and I could feel myself getting pulled in), and was tired of using a 2006 release of MATLAB to do much of my “fun” data work.

Taking MOOCs

So in 2014 I began taking Massive Open Online Courses (MOOCs) in earnest. I knew I loved, and was fairly good at, data analysis and programming, so I began taking courses in the Data Science track at Udacity (this was before their popular Nanodegree programs were developed). By May, I had discovered Coursera, and in June I began the JHU Data Science Specialization (I finished that specialization in December 2014). I plan to review some aspects of these MOOCs at a later date, but for now I’ll just mention that these courses were life-changing.

UW Certificate and Application

In summer 2014, I found out about the UW PCE Data Science certificate. Even though I was taking many MOOCs, I was still unsure if they would actually help me qualify for a job. I wasn’t getting many hits from recruiters on LinkedIn, and a few applications I chucked over companies’ walls didn’t get any bites. I wasn’t ready for the time and expense of a master’s program in Data Science, so I thought a university professional certificate would be the next-best thing, and maybe help me find some additional contacts to make a job connection.

So I applied for the program. Since it’s been about a year and a half since I applied, I don’t remember many specifics about the application process, but I believe there was a 25-question test on Data Science knowledge (some SQL and stats are what I remember), I had to submit a resume, a short statement of interest, and about $50 as an application fee. It wasn’t too stressful.

Accepted, but New Job!

I was accepted into the program. However, there was a waitlist, so I couldn’t start until April 2015.

By the time April rolled around, I was in talks with a recruiter and had an interview with the company that I now work for. They wanted me to come and do Machine Learning for them! I was 4 weeks into the first class in the certificate program when I accepted my new job. So, I had a dilemma: I had reached my goal (new job in the private sector), but I had just begun this program that I had waited 7 months to start (and paid a non-refundable $1100+ to attend). Should I continue? After some deliberations, I gave in, and decided that the money was a sunk cost, so I might as well learn something.

Course #1 was good enough that I continued to pay for and take each subsequent course, despite massive internal debate each time (do I need this? should I spend my time/money elsewhere?). Overall, I mostly enjoyed the remainder of the program and feel accomplished for finishing; however, I’m not sure it was the best investment of my time and money. I learn more, and more quickly, taking most MOOCs, than I did from this program (James Altucher’s Don’t Send Your Kids to College post comes to mind). But, I also made several valuable connections with my fellow classmates, far more connections than I made from any MOOC.

Is This Program for You?

Ok, enough about my motivations for taking the certificate program. Here are some of my positives and negatives for the program, which, if you are a budding Data Scientist and wondering if you should try for the certificate, might help you to make a decision.

What I Liked About the Program

  • Certificate from a trusted institution that is a leader in Data Science and Machine Learning.
  • Attend courses online or in person (if you live in the Seattle area). Nice that they give you the choice, and that even as an online student you can attend “live.”
  • Personal feedback from the instructors on all assignments (for me it was usually 2-3 lines of feedback). Not sure if this is better or worse than auto-graded or peer-graded assignments on many MOOCs.
  • Courses taught by Data Scientists in industry. My courses were taught by Data Scientists at Prediction Software, Zillow, and Microsoft.
  • Opportunity to talk to and ask questions of the instructors. Even online, there is a chat function, and the teaching assistant relays questions to the instructor. It’s nice to get real-time feedback from the instructors.
  • The cost: I spent around $3400 for the three courses. Much more expensive than a MOOC, but much cheaper than a master’s degree.
  • Extra learning and reading materials. The instructors all did a great job of finding interesting things to read as part of the weekly assignments.
  • Making contacts, and a continuous cohort. In the first course, there were about 45 students; approximately half were in-class and half online. The instructors set up a LinkedIn group, and most of us connected (the group is closed, so don’t try to find and join it!). Maybe 20% of the students washed out by the third course, but those students that were most active in the group stayed in and contributed. It’s been fun watching my classmates get new jobs in analytics over the past few months, and making the connections with them has been valuable.
  • Fairly easy assignments. This one can cut both ways – if the assignments weren’t hard, then I didn’t learn much; but, it means I can supplement my learning with other experiences. On average, I spent 3 hours in class a week, and 2-3 hours on homework.

What I Didn’t Like about the Program

  • Mandatory class attendance at least 8 of 10 lectures per course. Attendance is taken. The class meets at 6PM Pacific Time and runs for 3 hours. So, since I’m an hour ahead, for me that was 7PM-10PM. For the first two months, I had to wake up at 5AM for work, so attending these lectures while they were being held could make for a sleepy next day. Obviously, the further away from Seattle you are, the more difficult it will be to attend the lectures online.
  • Watching lectures in real-time. When I take MOOCs, I usually speed the videos up to 1.5x to 2.0x the real speed, depending on how fast the speaker talks. This helps me focus better. With MOOCs, you can also rewind if you miss something. With the live course, obviously you can’t do that until the lecture is posted (usually the next day), and then finding your key moment isn’t trivial.
  • Cannot see the instructor in lectures. Other online courses I’ve taken have a camera trained on the instructor, so you can see them while also looking at slides. UW’s technology, for some reason, doesn’t show the instructor, so you just hear their voice and look at the slides. This makes it much harder to focus if you are an online student, and dilutes the learning a bit, since you can’t see the non-verbal communication.
  • Continuity between classes less than ideal. In courses #2 and #3, there were several times when the instructors asked, “Did you learn this last course?” It would be good if a more focused curriculum could be nailed down and the instructors passed information to each other better.
  • Weka for Machine Learning? In the second course, we did statistics using R. And then, instead of continuing with R in the third course, the instructor taught Machine Learning using Weka. The good people at the University of Waikato did a good job with the Weka software, but is anyone in industry using Weka any more? I don’t see it for many job advertisements. Fortunately, the instructor knew R well, and accepted assignments in R if we wanted, so that’s how I completed my assignments. But, he taught much of the course using Weka, which I think is a major missed opportunity and a mistake.
  • Not enough depth. Okay, so Data Science is a very broad subject, growing all the time. UW apparently decided for this program that they would go for breadth rather than depth. This is probably a good decision, because there’s only so much you can fit into 90 hours of lecture time, then you send students off to learn on their own, having been at least exposed to new concepts. But I kept wanting to go deeper and learn more about various subjects we talked about; instead we’d move to the next concept for a slide or two.

Quick Review of Individual Courses

Course 1: Introduction to Data Science

The first course was a basic survey of the land of Data Science. It truly is an introduction, and assumes almost no previous knowledge of Data Science. We learned about basic data flow through a project, and went through some primers on tools of Data Science (R, Python, SQL, MATLAB/Octave, as well as a little intro to Hadoop). We had a few homeworks in R, and learned about sparse matrices, and had a SQL homework or two. The course isn’t very challenging, especially if you’ve had any previous exposure to Data Science, but the lectures were good and the extra readings were helpful. I give this course a B- – good content and good instructor, but the speed and depth were not what I hoped I paid for. I almost dropped out of the program after this course, but when it came time to register and hand over my credit card information for Course 2, I went ahead and did it.

Course 2: Methods for Data Analysis

Despite the name of this course, this was primarily a statistics and data wrangling course. I thought this was the best course of the three, and I learned quite a bit about some stats methods I did not know. My running of the course was taught by a Senior Data Scientist at Zillow, an Applied Math PhD who was also a talented teacher. Unfortunately it looks like future runnings of the course may have a new instructor – TBD as I write this – so hopefully they can find a good replacement.

All of the homeworks used R. We had some exposure to doing web scraping with R (yes, you can saw “Ew!” here – Python is far superior for web scraping!), running Monty Hall simulations, doing basic network graph analysis, regression, and creating autoregressive variables, among other statistics concepts. We finished the course with a course project, where we had to find our own data, analyze it, and write a report. My final project was exploring Denver B-Cycle 2014 Ridership.

I give this course an A. I found it to be fairly challenging at times, and the homeworks made me think and took a few hours each. I felt this course was worth my money and my time.

Course 3: Deriving Knowledge from Data at Scale

After a good experience with Course 2, I had no problem signing up for the third course.

This course was kind of, sort of, the Machine Learning course.

This course was taught by a long-time Boeing statistician who is currently working as a Principal Data Scientist for Microsoft. He seemed like a very personable guy, making jokes and telling folksy stories for the class. I wish I could have been at the lectures in person, instead of listening in online, because I could tell he was really interacting with the class and sharing his enthusiasm (he also had a nasty habit of wandering away from the microphone for half a minute at a time).

However, I was rather disappointed in this course as a follow-up to the second course. By now, we had spent probably 3 months doing work in R, for the first and second course combined, so it would have made sense to teach machine learning concepts and assign homeworks in R. Instead, as I mentioned in my gripe list, we used Weka. Sure, Weka has a nice-looking GUI, but I didn’t want to learn a new software that I guarantee I will not use in the future (sorry, but R, Python, and Spark Machine Learning packages will be much more useful). Fortunately, the instructor allowed us to turn in our homeworks in R. But I became so frustrated with the course that I tuned in to the lectures, in order to get my attendance, and then left my computer running while I went to do something else.

The final project for this course was to participate in a Kaggle competition. The focus was only a little bit on the competition – it was more about documenting our process, from understanding exploring the data, to modeling, to writing up our results. The instructor asked us to pair up. This is another very difficult thing to do as an online student. I initially found a couple guys to form a team; however, we were all in different time zones, and given family responsibilities it was difficult to meet up at night, so I eventually said “sorry guys, I’m going it alone.” Each team in the class chose a current Kaggle competition that looked interesting to them, so we collectively worked on about 8 different projects, instead of competing against each other, like in the edX Analytics Edge course. I settled on the How Much Did It Rain? II competition; if interested you can see my project writeup in my GitHub repo.

I give this course a C-. There was good information, and the instructor was interesting, but the choice of Weka boggled my mind, and the course organization was a bit poor (students were often confused by due dates).

Summary

Overall, I give this certificate program a grade of B-.

I would recommend it for people who can attend in person, for those with good tech/math/stats skills who have not been exposed to Data Science, but are really curious and want to learn, and for those who like a very well-structured learning environment. I also would recommend it if you are interested in growing your professional network of like-minded individuals (especially those in the Seattle area). I’ve heard that people in the courses network and have helped each other find jobs. Maybe that happened in my cohort(?), but it’s hard to tell as an online student.

I don’t recommend it if you’ve taken, or are comfortable taking, MOOCs in Data Science and/or Machine Learning (Coursera, Udacity, edX, etc), or perhaps are willing to slog through the Open Source Data Science Masters list, or have been working in a Data Science capacity for any length of time. For my money and my time, courses on the MOOC sites were much more valuable. Don’t expect to finish these three courses and come out a Data Scientist on the other side – this barely scratches the surface (although it could be a good place to start!).

If you have any additional questions, please ask them in the comments below! Also, if any of my fellow students care to share their experiences in the comments, that would also be appreciated! I am but one voice; anyone else trying to decide for themselves whether this is a good program for them would be wise to listen to more than just me!