Introduction
In late 2006 the Netflix Prize was inaugurated and as of this post, is still going. The premise is simple: write software that predicts how a Netflix subscriber will rate a movie. To win, the predictions have to be 10% better than Netflix. At stake: $1,000,000.
Coding, Not Competing
I have programming talent but this challenge is different; it demands specialized skills. Database optimization and machine learning are a few areas that come to mind. My expertise is not in either area. Why bother to compete if my chances of winning are so small? To put it simply, I wanted to see how far I could get.
- I could barely remember how to write an SQL query. The last time I touched a database was in college when I helped design a web-enabled, automated horse feeder. I used MySQL and PHP. This was the perfect excuse to re-learn those skills.
- I was still using .NET 1.1 at work. The .NET frameworks kept evolving, so it was time to try 2.0. I could have written the software in any language but C# was most relevant to my everyday work.
- Writing interfaces is something that takes practice. Let me rephrase that: writing good interfaces is something that takes practice.
- Why not make this an exercise in GUI design? I love drawing buttons.
Database
The challenge supplies you with a large amount of sample data: 2GB. Unless you’re a seasoned database administrator, this can be intimidating. Where do you start? Is a database even the right direction for such a project? The problems mounted and addressing the actual challenge seemed far away.
I stayed up until 1AM several nights a week trying different ideas. I waited for hours, even a day, as my INSERT statement went through the sample data line-by-line and placed it into the database tables. Finally, a breakthrough: I formatted the sample data into flat files that MySQL could import via LOAD INFILE. Importing was reduced to minutes.
GUI
The problem is to know what tasks are frequently repeated. In the beginning, formatting and importing data took the most time. Thus, three of the four tabs are for that purpose. I also created some basic logging features and began to think multi-threaded. When importing data or making a prediction, such tasks would require lots of time for processing. The worker threads provide feedback on how long they’ve taken to execute.

Cooking Metaphor
I estimate 10% of my time was spent on the core problem, which is generating rating predictions. I spent so much time on the database, caching, and GUI, when I was ready to implement the algorithm to address the challenge, burnout was imminent. Since the previous 90% of my effort addressed 100% of my goals stated above, I felt a warm sense of accomplishment.
Like cooking, most of my time went into preparation, not the actual cooking. If you observe the leader board for several weeks, you’ll see the cooking metaphor in action. You’ll see the same teams making incremental improvements…adding a dash of salt, oregano, etc. It’s trial and error to make the original recipe better. The same goes for prediction algorithms and getting the RMSE just slightly lower.
Conclusion
My FlixLogix source is available for download (63KB). I created the project and solution using Microsoft Visual C# 2005 Express Edition. If you don’t have an ADO.NET driver for MySQL, you need to reference one in order to build. Using this project as-is will yield an RMSE of 1.269. Make my recipe your own. Enjoy!

Write a Comment