Any question- Quora gives the answer

March 16, 2011

Need some hints about useful statistical books, free public data sets or have any question about statistic got to:

http://www.quora.com/What-are-some-good-resources-for-learning-about-statistical-analysis
or
http://www.quora.com/What-are-some-free-public-data-sets
or
http://www.quora.com/Statistics-mathematical-science


Google Refine 2.0

January 16, 2011

From time to time I find software that amazes me. Google Refine falls into this category.
When we deal with real data, we find missing values, inconsistencies etc. that need to be cleaned before conducting any analysis. Fixing the problem manually is time-consuming and annoying, instead use Google Refine. It not only cleans data in a powerful way but also transforms data sets from one format into another or extends them with other data sets.

It  is a desktop application which runs on your computer. Thus not a web service and there is no need to upload you super sensitive data to some other server. Of course Refine is for free.


Microsoft SQL Server

September 19, 2010

In this video Scott Golightly shows how to use the Microsoft SQL Server data mining wizard. The video is about 20 minutes long and quite good to follow.
In the “How Do I?” library you can find many more video tutorials about different topics e.g.:  “How Do I: Optimize SQL Server Integration Services?” or “How Do I: Render reports to a wide-range of formats?”


Rocketing Analysis Software Market

August 22, 2010

Click on the image to get redirected to the video and news article:


Popular Data Analysis Software

July 7, 2010

How do you decide which statistical software to use? Sure you think about which one you handle best or which is most suitable for the analysis aim. But it can also be helpful to consider the size of the software’s market share and whether it is growing or shrinking. It is important to check whether your software skills are still up to date or to make a strategic decision about what package should be adopted when starting to build a new system. Ranking though is not an easy task, Robert A. Muenchen presents various ways of measuring the popularity or market share of BMDP, JMP, Minitab, R, R-PLUS, Revolution R, S-PLUS, SAS, SPSS, Stata, Statistica, and Systat. He researches different indicators such as internet discussion, internet search, surveys of use, impact on scholarly activity, growth in capability and job market.

Conclusion

The open source tools grow in popularity, especially R challenges commercial packages. Although SAS and SPSS are prevalent in the business community and STATA is the first choice in educational area. Overall each software is the most widely used in one market or another.

Examples:

Internet Discussion

Job Market

Scholarly Activity

Source: http://r4stats.com/popularity

Ever heard of Weka?

May 22, 2010


It means Waikato Environment for Knowledge Analysis as it was developed at the University of Waikato (New Zealand), although Weka as well is a bird that is endemic to New Zealand. I like that maybe because I spend some time in New Zealand myself.
However, the Weka software is easy to use due to the graphical user interfaces it contains. It supports several data mining techniques such as preprocessing, classification, clustering, regression and many more. You can download Weka for free on this page.
Sure you will need some tutorials on how to use Weka! Visit this web page for free video tutorials. You will find videos on how to apply different data mining tools like Text Mining, Neural Network, Clustering etc. and you can download the data sets from the tutorials to experiment with them on your own. The quality of the videos is good and the lectures are given in a understandable pace so you should not have problems to follow them. Although the lecturer gets confused sometimes which can be disturbing for the listener. But my favorite tutorials about Weka are from IBM. They show how to use Weka and how to interpret the results as well, which is an advantage if you want to improve your data mining skills. So I recomend to check out the page.


RapidMiner- Data Mining Software

May 22, 2010

If you are looking for a really good data mining software you should consider RapidMiner. RapidMiner is a leading open-source system for knowledge discovery and data mining with a graphical user interface. It supports a variety of data mining algorithms as decision trees, self-organization maps, clustering, classification to name only a few. Of course it can be downloaded for free on the RapidMiner website where you can also find free video tutorials on how to use the software. As it is a wide-spread software you will be able to find many sources and discussion boards about RapidMiner.
A blog with good content about RapidMiner (and more interesting topics) is Neural Market Trends. Check out the video tutorials here!


Free video tutorials from StatSoft

May 22, 2010

One of my favorite free courses of data mining is from StatSoft. You can find the tutorials on youtube. If you want to practice with StatSoft you can download a free 30 day trial version of there Statistica software on the StatSoft page.

There are 35 Sessions which give practical insight on how to apply data mining. I think that this a really good video series if you are at a beginners level because those very professionaly made videos focus on how data mining works in general and not only about how to use there software. After watching those 35 sessions you really get an idea how to do a whole data mining analysis from data cleaning to interpreting the results. The first two sessions give an overview about Data Mining and CRISP. The following sessions are about preparing the data and applying different DM techniques with the help of credit risk data and marketing data respectivly. So if you are a practical learner (and who is not) you should give it a try and just listen to the videos which are about 10 minutes each.

Here you can finde the topics of the videos.