Application: Missing Data

In the real world we will almost always come across missing values in data due to many reasons. This problem must be adressed to produce reliable statistical results.  First of all we need to identify what is missing. Then ask yourself why the data is missing and what it means. After you have answered those questions you need to deal with the missing values. Those are the steps to take:

1. When missing values are few and lay far apart then do nothing.

2. When a column has a significant number of missing values then create a variable for missing, present values (0/1).

3. When a column has a significant number of missing values then replace the missing value with a constant value e.g. mean, median or mode.

4. When a column and its values are essential to producing accurate predictions then estimate the missing value based on other, non-missing data elements.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: