July 31, 2010
This is an interesting podcast about how data mining pushes marketing to a new level from marketplace (american public media). It is only 5 min long, starting at 09:49 until 14:26 as there are other topics discussed in the radio show. If you do not want to listen to the whole show just click on the timebar to move around.
The podcast starts like this:
Kai Ryssdal: Almost everything you do, short of taking a long, lonely walk in the woods or something, leaves little bits of electronic data behind. Every time you search the Internet, you punch something into your mobile phone or you write on someone’s Facebook wall, there’s a giant industry right behind you sucking up all that data and using it to figure out how to sell you something. Toothpaste to life insurance. The data mining business, as it’s known, is growing 10 percent a year, and as you might have guessed, the amount of data we produce is booming.
July 24, 2010
When I first stumbled over Kaggle I thought it was just a page offering a data analysis competition. But I was wrong, it is a page which offers a platform for data-related competitions. Companies, researchers, government and others can open their research request to any analysts worldwide seeking the best solution possible. This approach is much more likely to yield an innovative solution to problems that seem impossible to solve as a single company or organization barely has the perfect team or technique to solve any given obstacle. Publishing a data analysis assignment via such a platform not only improves the chances of generating the most suitable results but also offers academical researchers the option to interact with the business world. They can test and improve their new methods applying them on industrial challenges.
Of course the idea of putting up a competion for business related topics by companies is not new. For example, in 2006 Netflix, a DVD rental provider, offered $1m to the analyst who could improve their recommendations algorithm by 10%. $1m dollars seems like a huge prize but according to Netflix CEO Reed Hastings, an improvement of 10 per cent was worth “well in excess of $1m”.
Something that really amazed me is that hosting a contest on Kaggle is for free! They take care about all the competitions privacy, provide the infrastructure and it is quite easy to set up a contest.
Here the three steps to host a competion on Kaggle:
STEP 1: The competition host posts contest details
The competition host frames the contest and uploads relevant data. Framing the contest involves setting a deadline, outlining the competition’s objectives, describing any data, providing submission instructions and the criteria by which the winner will be selected. Visit Post a Contest to step through the process.
STEP 2: Competitors upload their predictions
Competitors upload their submissions. For predicting-the-past competitions, submissions are evaluated on-the-fly (against a solution file uploaded during STEP 1). For predicting-the-future competitions, submissions are evaluated once the relevant event has taken place (but the competition host can make use of the forecasts in the meantime).
STEP 3: Predictions are evaluated based on their accuracy
Once the deadline passes, the winners are selected. In some competitions, the prize may not be awarded until a satisfactory explanation is received.
I love the idea!
Here is a great interview with the CEO of Kaggle.
What about you?
Would you enter a competion on Kaggle?
July 7, 2010
How do you decide which statistical software to use? Sure you think about which one you handle best or which is most suitable for the analysis aim. But it can also be helpful to consider the size of the software’s market share and whether it is growing or shrinking. It is important to check whether your software skills are still up to date or to make a strategic decision about what package should be adopted when starting to build a new system. Ranking though is not an easy task, Robert A. Muenchen presents various ways of measuring the popularity or market share of BMDP, JMP, Minitab, R, R-PLUS, Revolution R, S-PLUS, SAS, SPSS, Stata, Statistica, and Systat. He researches different indicators such as internet discussion, internet search, surveys of use, impact on scholarly activity, growth in capability and job market.
The open source tools grow in popularity, especially R challenges commercial packages. Although SAS and SPSS are prevalent in the business community and STATA is the first choice in educational area. Overall each software is the most widely used in one market or another.