If a variable is normally distributed, most of its values centre around the mean, within two standard deviations (up to the blue areas, at both sides). If value is more than two SD away from the mean, it is potentially an outlier. Remove it. If there are two many values like that, we should not consider the variable normally distributed.
In order to judge if a variable is ‘normally distributed’
- calculate the mean and SD : SD = square root[sum((xi – mean)2)/mean]
- convert each value into z score: (xi – mean)/SD
- compare each z score with 1.96*SD
- if the result is greater than 1.96*SD, that value is an outlier.