A boxplot for males and females for the variable Day1
A clustered bar chart for independent means
A clustered bar chart for related means.
A clustered bar chart for mixed designs.
A scatter plot that includes a regression line
Exploratory data analysis (EDA) involves different statistical procedures that are available to provide a researcher with a view of the data in terms of distribution and general characteristics of a given data. While analyzing data using SPSS, there are a variety of statistical procedures available to carry out exploratory data analysis. There include descriptive statistics such as median, mean, mode, maximum, minimum, variance, range, and standard deviation among other numerical summaries. In addition, there are graphical procedures that help in exploring data visually. These include scatter plots, bar charts, histograms, pie charts, stem and leaf plots among others.
There are several reasons why it is critical to perform exploratory data analysis during data analysis. Any set of data is prone to errors that may be introduced during collection or data entry. Using EDA, it is possible to identify such errors with outliers being among the best indicators of errors in data. It is possible to identify outliers in a data set using box plots. Since this is a visual form of exploratory data analysis, it is easy to clearly identify values that are abnormal in the data. When collecting data, it is impossible to identify some features such as pattern of distribution without performing an analysis of the data. This is where exploratory data analysis proves invaluable. Using the skewness measure, it is possible to identify how data distributed from the mean. When data is skewed to the left, this indicates that most of the values lie on the left side of the mean whereas when data is skewed to the right, more data lies on the right side of the mean. When most data values are concentrated at the mean, the graph assumes a dome shape. Scatter plots are useful for displaying the distribution of data along the X- and Y-axes. The scatter plot is can be enriched by a regression line which indicates deviation of data from the line of best fit (from the area where most data lie). The regression line can have a positive or a negative gradient thus indicating a proportional or an inverse proportion in relationship between variables (Field, 2009).
Exploratory data analysis is also helpful in testing for normal distribution in data. Using normal probability plots, a researcher is able to define whether the mean, mode and median are the same. If these are the same, this is defined as perfect normal distribution. To identify differences in distributions, one can utilize the Kolmogorov-Smirnov & Shapiro-Wilk tests. Performing EDA helps in choosing between parametric and non-parametric tests for further data analysis. For numerical data with normal distribution, one can perform analysis using t-test or ANOVA. On the other hand, non-parametric methods include Chi-square test or Spearman correlation coefficient (Field, 2009). Missing data can also be dealt with using pairwise or listwise deletion during exploratory data analysis.
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Los Angeles: Sage. ISBN: 9781847879073