Return to UOCC HomeComputing News Home
Header bar

An Introduction to Statistical Computing at the University of Oregon

Robin High
Statistical Programmer and Consultant
robinh@uoregon.edu

Each year statistical computing technology presents new developments. This article introduces you to the statistical computing resources currently available on campus.

Most statistical analyses programs are available for both desktop and laptop PCs. PC programs are powerful, fast, and relatively easy to learn and efficient to utilize. In most situations their speed and portability are among the advantages which will give you the computing edge for most data analysis problems.

If you have a Mac, however, you'll discover that choices of good programs for data analysis are more limited. Other options are available through access to the UO's timesharing computers via a secure shell connection (shell.uoregon.edu).

Statistical Programs on the UO's Large Systems

Statistical programs that are currently available on shell.uoregon.edu include SAS Version 9.1.3, SPLUS 7.0.0, and SPSS Release 6.14. These programs are available to all UO students, faculty, and staff. (Note: The UNIX version of SPSS dates back to the mid 1990s and does not have many of the options available with more recently released versions of SPSS because SPSS focused on developing its PC product.)

Statistical Programs for Personal Computers

SAS for the PC. SAS is currently the only statistical software we offer for installation on personal computers.

In recent years the PC version of SAS has become much more versatile, powerful, and convenient to learn, and we now have the most recent and advanced version available, 9.1.3. In some unusual cases you may need to install Version 8.2. (Both Version 9.1.3 and 8.2 of SAS are available for checkout from the Computing Center Documents Room in 175 McKenzie. See "About SAS at the UO" on page 20 for details on installing SAS.)

UO site license. Our site license for SAS requires an annual update of the license file for all users because the SAS Institute doesn't sell its software outright; it only leases its product. This license allows all UO students, faculty, and staff to run SAS on campus computers and to also install it on their home computers through mid-July, 2006 (for installation details, see "Time to Renew Your SAS License" on page 20 ).

SPSS for PC or Mac. Version 14 of SPSS has recently been released for the PC, and version 11 is the most recent version of SPSS available for the Mac. Unfortunately, we cannot afford to offer a site-wide license for SPSS. However, you may be able to run it on a computer located in a campus computing laboratory, or you may purchase a license directly from SPSS (http://www.spss.com/).

For students, a less expensive version of SPSS is the Grad Pack. This version is available for purchase from the UO Bookstore at a price that's significantly less than regular SPSS licensing fees. (The Student Version of SPSS is not recommended, since its functionality for professional data analysis is extremely limited.)

Other Programs. Other programs such as STATA and SCA may be found within specific departments, but they are not officially supported at this time by the Computing Center. STATA offers three nearly identical versions of its product for Windows, Mac OS X, and Linux 32-bit operating systems. The only real differences in the programs are the price and the size of the datasets you can analyze. STATA 9 is currently available for purchase at reduced rates (campus-wide GradPlan pricing) by UO students, faculty, and staff. Further information about acquiring STATA is available at http://www.stata.com/order/new/edu/gradplans/gp-campus.html

Which Statistical Program Should You Choose?

SAS and SPSS have long been the primary choices for most applications of statistical methods and are fully supported by the Computing Center. Both programs handle routine analysis tasks. Your analytical needs and the choices of your professional colleagues may also influence the program you choose.

An increasingly important aspect of data analysis is how well a program will work with correlated or clustered data; that is, what can the program do for you when you have collected two or more observations from the same subject under the same or different conditions? SAS has several procedures that will analyze both continuous and discrete data under these conditions with the most recent statistical methods available. Recent releases of SPSS have improved its capability to analyze correlated continuous data, yet it still lacks the important correlated analytical techniques for discrete or count data.

SAS is very well suited for data analysis that requires a step-by-step approach or when working with several datasets that contain many variables. It is particularly well-suited for data collected over time (e.g., repeated measures, crossover, time series, or other types of longitudinal studies), survey data, computations, as well as situations in which you need to merge multiple files.

SAS also has a macro feature which makes repeated tasks simple to run once the program is written. The Output Delivery System (ODS) can be a real aid to answering questions or formatting results not easily seen from the usual output listing. And the full functionality of the DATA step is far beyond what you can ever expect to accomplish in other programs.

SPSS, on the other hand, works well for the most basic textbook-oriented statistical applications (e.g., frequency tables, regression, and fixed-effects analysis of variance for independent groups). Complex data file manipulations and advanced statistical work are more difficult to calculate with SPSS, or are not available.

No matter which program you choose, make certain you have a valid license to run it! You should also select a program that will help develop your analytical and data management skills over time.

Point-and-Click vs. Syntax Method

Some programs such as SPSS are interactive with a Graphical User Interface (GUI) as they offer you a menu of choices to "point-and-click" your way through the analysis. While this approach seems appealing, it has the potential to cause problems. Contrary to popular belief and the desire to follow the path of least resistance, the GUI interface is actually slower for tasks that require multiple steps or that need to be repeated; it is also tedious to document.

The recommended process for data analysis is to enter commands into a program file with a text editor. Known as the syntax approach, it allows you to run the entire program with a "submit" command, or you can block contiguous lines of the program and submit only that portion.

The real strength of saving commands is that it clearly documents your data analysis computations and choices of statistical analyses from beginning to end. It emphasizes what you did and perhaps even more important, you may discover what you neglected to do. The tasks are all there for future reference, saved in a retrievable file! The syntax approach is also a highly efficient way to proceed if you have many repetitive tasks or a large number of variables to process.

SPSS should actually be considered a sophisticated system in which the window menus are syntax-directed editors. The "paste" option writes statements to a window where they can accumulate and then be saved in one file. Some tasks can only be accomplished through syntax, e.g., in an ANOVA to test factor means in the presence of significant interactions. Learning how to write commands in this manner and then to consistently run and save them in a program file is well worth the time expended.

One frequently asked question is why SAS doesn't give greater focus to a Windows interface similar to what you find with SPSS. Although the PC version of SAS does have Windows-based selection menus, the real power and versatility of SAS is found by working with data through commands entered with the Enhanced Editor.

In summary, whether you run programs on a personal computer or submit them in the batch mode on a large system, it is extremely important to keep a current record of what you did. Whatever software you select or system you run it on, always document your work! When you record your data processing and analysis steps in a syntax file with concise and relevant comments, this simple process can save you a great deal of time and confusion in the long run.

What about Spreadsheets?

Spreadsheet programs such as Microsoft Office's Excel are helpful tools for data entry and storage; in some situations they can compute basic summary statistics or make graphical displays. However, they are rarely appropriate or suitable for most statistical analyses. Their statistical methods are limited to the most basic choices, and they can be very awkward to run, especially if your dataset contains many rows and columns. In addition, computations in the presence of missing data may be suspect. Further information concerning the disadvantages of Microsoft Excel as a statistics program are available at http://www.practicalstats.com/Pages/excelstats.html

Once your data have been entered into an Excel spreadsheet in the required format, it's a very simple process to access them with statistical programs such as SAS or SPSS through their various import processes. You can find information about data transfer from Excel to SAS or SPSS at http://www.uoregon.edu/~robinh/data_transfer.html

More Information

Web Resources: You'll find detailed information concerning statistical programs and direct connections to statistical websites at http://www.uoregon.edu/~robinh/statistics.html

For detailed product information on SAS, SPSS, STATA, and Splus, visit the vendor websites:


Fall 2005 Computing News | Computing Center Home Page