Return to UOCC HomeComputing News Home
Header bar

Does the Shoe Fit? Know What Statistical Analysis Method Works Best for Your Data

By Robin High (robinh@darkwing.uoregon.edu)

The development of high-speed personal computers and user-friendly statistics software capable of performing many sophisticated procedures has increased the likelihood of choosing the wrong technique for the job. At the very least, the bewildering increase in the numbers and types of available statistical procedures has made it easy to overlook, or be unaware of, other methods that might be more appropriate to your needs.

Statistics can easily be applied incorrectly and even abused with any set of data. It's advisable to always think carefully about how a collection of numbers can and should be used. Before you even begin collecting data, ask yourself some key questions:

The answers to these questions are critical, because they define what statistical techniques are most appropriate for your data and research questions.

Common Pitfalls

Chi-square test. For example, the commonly used chi-square test is most appropriate for nominal data (discrete data which have levels in no inherent order). If discrete data have an order (i.e., ordinal levels) or if the data are measured on a more "continuous" scale (interval or ratio), some other technique will probably produce a much more powerful and appropriate test.

T-tests. A simple example of choosing a less than optimal data analysis procedure occurs with a comparison of the two-sample t-test versus the paired t-test. Both tests require the presence of two "columns" of interval/ratio data.

The two-sample t-test assumes two independent groups of experimental units, with not necessarily equal sample sizes, where the difference between the two column means is of interest. With the paired t-test, the experimental units have some inherent or planned matching feature, so that the mean of the paired differences between the two columns is of interest. In the first situation, the data are assumed independent of each other. In the second, there is an implied correlation, either positive or negative, that must be considered. The use of the words "means" and "differences" in these two situations give a very different interpretation of the problem.

"Normal" theoretical models. A more complex example involves data analysis that is perceived in terms of a "normal" or "bell-shaped" theoretical model, such as those used with the analysis of variance or linear regression. These methods are usually among the first procedures taught and in many respects are the most easily understood. If necessary, data may be transformed to make this "normal" theory model fit.

In many situations these normal theory models are good approximations; however, statistical computing has reached the level where models based on other distributions can be applied to the data. Discrete data can and should be treated differently from continuous data. For example, if the data are counts, Logistic or Poisson regression may be much more appropriate than ANOVA or linear regression.

Choose Appropriate Models

To summarize, it's essential for you to understand which statistical analysis is most appropriate for your data. Always consider statistical models that will work well with your data, rather than forcing your data into some specific model.


Spring 1999 Computing News | Computing Center Home Page