January 16, 2011
From time to time I find software that amazes me. Google Refine falls into this category.
When we deal with real data, we find missing values, inconsistencies etc. that need to be cleaned before conducting any analysis. Fixing the problem manually is time-consuming and annoying, instead use Google Refine. It not only cleans data in a powerful way but also transforms data sets from one format into another or extends them with other data sets.
It is a desktop application which runs on your computer. Thus not a web service and there is no need to upload you super sensitive data to some other server. Of course Refine is for free.
September 19, 2010
In this video Scott Golightly shows how to use the Microsoft SQL Server data mining wizard. The video is about 20 minutes long and quite good to follow.
In the “How Do I?” library you can find many more video tutorials about different topics e.g.: “How Do I: Optimize SQL Server Integration Services?” or “How Do I: Render reports to a wide-range of formats?”
August 22, 2010
Click on the image to get redirected to the video and news article:
July 7, 2010
How do you decide which statistical software to use? Sure you think about which one you handle best or which is most suitable for the analysis aim. But it can also be helpful to consider the size of the software’s market share and whether it is growing or shrinking. It is important to check whether your software skills are still up to date or to make a strategic decision about what package should be adopted when starting to build a new system. Ranking though is not an easy task, Robert A. Muenchen presents various ways of measuring the popularity or market share of BMDP, JMP, Minitab, R, R-PLUS, Revolution R, S-PLUS, SAS, SPSS, Stata, Statistica, and Systat. He researches different indicators such as internet discussion, internet search, surveys of use, impact on scholarly activity, growth in capability and job market.
The open source tools grow in popularity, especially R challenges commercial packages. Although SAS and SPSS are prevalent in the business community and STATA is the first choice in educational area. Overall each software is the most widely used in one market or another.
May 22, 2010
It means Waikato Environment for Knowledge Analysis as it was developed at the University of Waikato (New Zealand), although Weka as well is a bird that is endemic to New Zealand. I like that maybe because I spend some time in New Zealand myself.
However, the Weka software is easy to use due to the graphical user interfaces it contains. It supports several data mining techniques such as preprocessing, classification, clustering, regression and many more. You can download Weka for free on this page.
Sure you will need some tutorials on how to use Weka! Visit this web page for free video tutorials. You will find videos on how to apply different data mining tools like Text Mining, Neural Network, Clustering etc. and you can download the data sets from the tutorials to experiment with them on your own. The quality of the videos is good and the lectures are given in a understandable pace so you should not have problems to follow them. Although the lecturer gets confused sometimes which can be disturbing for the listener. But my favorite tutorials about Weka are from IBM. They show how to use Weka and how to interpret the results as well, which is an advantage if you want to improve your data mining skills. So I recomend to check out the page.
May 22, 2010
If you are looking for a really good data mining software you should consider RapidMiner. RapidMiner is a leading open-source system for knowledge discovery and data mining with a graphical user interface. It supports a variety of data mining algorithms as decision trees, self-organization maps, clustering, classification to name only a few. Of course it can be downloaded for free on the RapidMiner website where you can also find free video tutorials on how to use the software. As it is a wide-spread software you will be able to find many sources and discussion boards about RapidMiner.
A blog with good content about RapidMiner (and more interesting topics) is Neural Market Trends. Check out the video tutorials here!