FlixLogix
Posted October 20th, 2007 at 10:06 PM in the Projects category; there are no comments yet

Introduction

In late 2006 the Netflix Prize was inaugurated and as of this post, is still going. The premise is simple: write software that predicts how a Netflix subscriber will rate a movie. To win, the predictions have to be 10% better than Netflix. At stake: $1,000,000.

Coding, Not Competing

I have programming talent but this challenge is different; it demands specialized skills. Database optimization and machine learning are a few areas that come to mind. My expertise is not in either area. Why bother to compete if my chances of winning are so small? To put it simply, I wanted to see how far I could get.

Database

The challenge supplies you with a large amount of sample data: 2GB. Unless you’re a seasoned database administrator, this can be intimidating. Where do you start? Is a database even the right direction for such a project? The problems mounted and addressing the actual challenge seemed far away.

I stayed up until 1AM several nights a week trying different ideas. I waited for hours, even a day, as my INSERT statement went through the sample data line-by-line and placed it into the database tables. Finally, a breakthrough: I formatted the sample data into flat files that MySQL could import via LOAD INFILE. Importing was reduced to minutes.

GUI

The problem is to know what tasks are frequently repeated. In the beginning, formatting and importing data took the most time. Thus, three of the four tabs are for that purpose. I also created some basic logging features and began to think multi-threaded. When importing data or making a prediction, such tasks would require lots of time for processing. The worker threads provide feedback on how long they’ve taken to execute.

FlixLogix Screen Capture

Cooking Metaphor

I estimate 10% of my time was spent on the core problem, which is generating rating predictions. I spent so much time on the database, caching, and GUI, when I was ready to implement the algorithm to address the challenge, burnout was imminent. Since the previous 90% of my effort addressed 100% of my goals stated above, I felt a warm sense of accomplishment.

Like cooking, most of my time went into preparation, not the actual cooking. If you observe the leader board for several weeks, you’ll see the cooking metaphor in action. You’ll see the same teams making incremental improvements…adding a dash of salt, oregano, etc. It’s trial and error to make the original recipe better. The same goes for prediction algorithms and getting the RMSE just slightly lower.

Conclusion

My FlixLogix source is available for download (63KB). I created the project and solution using Microsoft Visual C# 2005 Express Edition. If you don’t have an ADO.NET driver for MySQL, you need to reference one in order to build. Using this project as-is will yield an RMSE of 1.269. Make my recipe your own. Enjoy!

Write a Comment

Validation Image