三大統計軟件:SAS、Stata與SPSS比較

Strategically using General Purpose Statistics Packages:
A Look at Stata, SAS and SPSS

中文版(自英文版本翻譯):
很多人曾問及SAS,Stata 和SPSS之間的不同,它們之中哪個是最好的。可以想到,每個軟件都有自己獨特的風格,有自己的優缺點。本文對此做了概述,但並不是一個綜合的比較。人們時常會對自己所使用的統計軟件有特別的偏好,希望大多數人都能認同這是對這些軟件真實而公允的一個對比分析。

  SAS
  一般用法。SAS由於其功能強大而且可以編程,很受高級用戶的歡迎。也正是基於此,它是最難掌握的軟件之一。使用SAS時,你需要編寫SAS程序來處理數據,進行分析。如果在一個程序中出現一個錯誤,找到並改正這個錯誤將是困難的。
  數據管理。在數據管理方面,SAS是非常強大的,能讓你用任何可能的方式來處理你的數據。它包含SQL(結構化查詢語言)過程,可以在SAS數據集中使用SQL查詢。但是要學習並掌握SAS軟件的數據管理需要很長的時間,在Stata或SPSS中,完成許多複雜數據管理工作所使用的命令要簡單的多。然而,SAS可以同時處理多個數據文件,使這項工作變得容易。它可以處理的變量能夠達到32,768個,以及你的硬盤空間所允許的最大數量的記錄條數。
  統計分析。SAS能夠進行大多數統計分析(迴歸分析,logistic迴歸,生存分析,方差分析,因子分析,多變量分析)。SAS的最優之處可能在於它的方差分析,混合模型分析和多變量分析,而它的劣勢主要是有序和多元logistic迴歸(因爲這些命令很難),以及穩健方法(它難以完成穩健迴歸和其他穩健方法)。儘管支持調查數據的分析,但與Stata比較仍然是相當有限的。
  繪圖功能。在所有的統計軟件中,SAS有最強大的繪圖工具,由SAS/Graph模塊提供。然而,SAS/Graph模塊的學習也是非常專業而複雜,圖形的製作主要使用程序語言。SAS 8雖然可以通過點擊鼠標來交互式的繪圖,但不象SPSS那樣簡單。
  總結。SAS適合高級用戶使用。它的學習過程是艱苦的,最初的階段會使人灰心喪氣。然而它還是以強大的數據管理和同時處理大批數據文件的功能,得到高級用戶的青睞。

  Stata
  一般用法。Stata以其簡單易懂和功能強大受到初學者和高級用戶的普遍歡迎。使用時可以每次只輸入一個命令(適合初學者),也可以通過一個Stata程序一次輸入多個命令(適合高級用戶)。這樣的話,即使發生錯誤,也較容易找出並加以修改。
  數據管理。儘管Stata的數據管理能力沒有SAS那麼強大,它仍然有很多功能較強且簡單的數據管理命令,能夠讓複雜的操作變得容易。Stata主要用於每次對一個數據文件進行操作,難以同時處理多個文件。隨着Stata/SE的推出,現在一個Stata數據文件中的變量可以達到32,768,但是當一個數據文件超越計算機內存所允許的範圍時,你可能無法分析它。
  統計分析。Stata也能夠進行大多數統計分析(迴歸分析,logistic迴歸,生存分析,方差分析,因子分析,以及一些多變量分析)。Stata最大的優勢可能在於迴歸分析(它包含易於使用的迴歸分析特徵工具),logistic迴歸(附加有解釋logistic迴歸結果的程序,易用於有序和多元logistic迴歸)。Stata也有一系列很好的穩健方法,包括穩健迴歸,穩健標準誤的迴歸,以及其他包含穩健標準誤估計的命令。此外,在調查數據分析領域,Stata有着明顯優勢,能提供迴歸分析,logistic迴歸,泊松迴歸,概率迴歸等的調查數據分析。它的不足之處在於方差分析和傳統的多變量方法(多變量方差分析,判別分析等)。
  繪圖功能。正如SPSS,Stata能提供一些命令或鼠標點擊的交互界面來繪圖。與SPSS不同的是它沒有圖形編輯器。在三種軟件中,它的繪圖命令的句法是最簡單的,功能卻最強大。圖形質量也很好,可以達到出版的要求。另外,這些圖形很好的發揮了補充統計分析的功能,例如,許多命令可以簡化迴歸判別過程中散點圖的製作。
  總結。Stata較好地實現了使用簡便和功能強大兩者的結合。儘管其簡單易學,它在數據管理和許多前沿統計方法中的功能還是非常強大的。用戶可以很容易的下載到別人已有的程序,也可以自己去編寫,並使之與Stata緊密結合。

  SPSS
  一般用法。SPSS非常容易使用,故最爲初學者所接受。它有一個可以點擊的交互界面,能夠使用下拉菜單來選擇所需要執行的命令。它也有一個通過拷貝和粘貼的方法來學習其“句法”語言,但是這些句法通常非常複雜而且不是很直觀。
  數據管理。SPSS有一個類似於Excel的界面友好的數據編輯器,可以用來輸入和定義數據(缺失值,數值標籤等等)。它不是功能很強的數據管理工具(儘管SPS 11版增加了一些增大數據文件的命令,其效果有限)。SPSS也主要用於對一個文件進行操作,難以勝任同時處理多個文件。它的數據文件有4096個變量,記錄的數量則是由你的磁盤空間來限定。
  統計分析。SPSS也能夠進行大多數統計分析(迴歸分析,logistic迴歸,生存分析,方差分析,因子分析,多變量分析)。它的優勢在於方差分析(SPSS能完成多種特殊效應的檢驗)和多變量分析(多元方差分析,因子分析,判別分析等),SPSS11.5版還新增了混合模型分析的功能。其缺點是沒有穩健方法(無法完成穩健迴歸或得到穩健標準誤),缺乏調查數據分析(SPSS12版增加了完成部分過程的模塊)。
  繪圖功能。SPSS繪圖的交互界面非常簡單,一旦你繪出圖形,你可以根據需要通過點擊來修改。這種圖形質量極佳,還能粘貼到其他文件中(Word 文檔或Powerpoint等)。SPSS也有用於繪圖的編程語句,但是無法產生交互界面作圖的一些效果。這種語句比Stata語句難,但比SAS語句簡單(功能稍遜)。
  總結。SPSS致力於簡便易行(其口號是“真正統計,確實簡單”),並且取得了成功。但是如果你是高級用戶,隨着時間推移你會對它喪失興趣。SPSS是製圖方面的強手,由於缺少穩健和調查的方法,處理前沿的統計過程是其弱項。

  總體評價
  每個軟件都有其獨到之處,也難免有其軟肋所在。總的來說,SAS,Stata和SPSS是能夠用於多種統計分析的一組工具。通過Stat/Transfer可以在數秒或數分鐘內實現不同數據文件的轉換。因此,可以根據你所處理問題的性質來選擇不同的軟件。舉例來說,如果你想通過混合模型來進行分析,你可以選擇SAS;進行logistic迴歸則選擇Stata;若是要進行方差分析,最佳的選擇當然是SPSS。假如你經常從事統計分析,強烈建議您把上述軟件收集到你的工具包以便於數據處理。

English Version:SAS

General use. SAS is a package that many "power users" like because of its power and programmability. Because SAS is such a powerful package, it is also one of the most difficult to learn. To use SAS, you write SAS programs that manipulate your data and perform your data analyses. If you make a mistake in a SAS program, it can be hard to see where the error occurred or how to correct it.
Data Management. SAS is very powerful in the area of data management, allowing you to manipulate your data in just about any way possible. SAS includes proc sql that allows you to perform sql queries on your SAS data files. However, it can take a long time to learn and understand data management in SAS and many complex data management tasks can be done using simpler commands in Stata or SPSS. However, SAS can work with many data files at once easing tasks that involve working with multiple files at once. SAS can handle enormous data files up to 32,768 variables and the number of records is generally limited to the size of your hard disk.
Statistical Analysis. SAS performs most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, multivariate analysis). The greatest strengths of SAS are probably in its ANOVA, mixed model analysis and multivariate analysis, while it is probably weakest in ordinal and multinomial logistic regression (because these commands are especially difficult), robust methods (it is difficult to perform robust regression, or other kinds of robust methods). While there is some support for the analysis of survey data, it is quite limited as compared to Stata.
Graphics. SAS may have the most powerful graphic tools among all of the packages via SAS/Graph. However, SAS/Graph is also very technical and tricky to learn. The graphs are created largely using syntax language; however, SAS 8 does have a point and click interface for creating graphs but it is not as easy to use as SPSS.
Summary. SAS is a package geared towards power users. It has a steep learning curve and can be frustrating at first. However, power users enjoy the its powerful data management and ability to work with numerous data files at once.


Stata

General Use. Stata is a package that many beginners and power users like because it is both easy to learn and yet very powerful. Stata uses one line commands which can be entered one command at a time (a mode favored by beginners) or can be entered many at a time in a Stata program (a mode favored by power users). Even if you make a mistake in a Stata command, it is often easy to diagnose and correct the error.
Data Management. While the data management capabilities of Stata may not be quite as extensive as those of SAS, Stata has numerous powerful yet very simple data management commands that allows you to perform complex manipulations of your data with ease. However, Stata primarily works with one data file at a time so tasks that involve working with multiple files at once can be cumbersome. With the release of Stata/SE, you can now have up to 32,768 variables in a Stata data file but probably would not want to analyze a data file that exceeds the size of your computers memory.
Statistical Analysis . Stata performs most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, and some multivariate analysis). The greatest strengths of Stata are probably in regression (it has very easy to use regression diagnostic tools), logistic regression, (add on programs are available that greatly simplify the interpretation of logistic regression results, and ordinal logistic and multinomial logistic regressions are very easy to perform). Stata also has a very nice array of robust methods that are very easy to use, including robust regression, regression with robust standard errors, and many other estimation commands include robust standard errors as well. Stata also excels in the area of survey data analysis offering the ability to analyze survey data for regression, logistic regression, poisson regression, probit regression, etc...). The greatest weaknesses in this area would probably be in the area of analysis of variance and traditional mutivariate methods (e.g. manova, discriminant analysis, etc.).
Graphics. Like SPSS, Stata graphics can be created using Stata commands or using a point and click interface. Unlike SPSS, the graphs cannot be edited using a graph editor. The syntax of the graph commands is the easiest of the three packages and is also the most powerful. Stata graphs are high quality, publication quality graphs. In addition, Stata graphics are very functional for supplementing statistical analysis, for example there are numerous commands that simplify the creation of plots for regression diagnostics.
Summary. Stata offers a good combination of ease of use and power. While Stata is easy to learn, it also has very powerful tools for data management, many cutting edge statistical procedures, the ability to easily download programs developed by other users and the ability to create your own Stata programs that seamlessly become part of Stata.

SPSS

General use. SPSS is a package that many beginners enjoy because it is very easy to use. SPSS has a "point and click" interface that allows you to use pulldown menus to select commands that you wish to perform. SPSS does have a "syntax" language which you can learn by "pasting" the syntax from the point and click menus, but the syntax that is pasted is generally overly complicated and often unintuitive.
Data Management. SPSS has a friendly data editor that resembles Excel that allows you to enter your data and attributes of your data (missing values, value labels, etc.) However, SPSS does not have very strong data management tools (although SPSS version 11 added commands for reshaping data files from "wide" format to "long" format, and vice versa). SPSS primarily edits one data file at a time and is not very strong for tasks that involve working with multiple data files at once. SPSS data files can have 4096 variables and the number of records is limited only by your disk space.
Statistical Analysis. SPSS performs most general statistical analyses (regression, logistic regression, survival analysis, analysis of variance, factor analysis, and multivariate analysis). The greatest strengths of SPSS are in the area of analysis of variance (SPSS allows you to perform many kinds of tests of specific effects) and multivariate analysis (e.g. manova, factor analysis, discriminant analysis) and SPSS 11 has added some capabilities for analyzing mixed models. The greatest weakness of SPSS are probably in the absence of robust methods (we know of no abilities to perform robust regression or to obtain robust standard errors), the absence of survey data analysis (we know of no tools in this area).
Graphics. SPSS has a very simple point and click interface for creating graphs and once you create graphs they can be extensively customized via its point and click interface. The graphs are very high quality and can be pasted into other documents (e.g. word documents or powerpoint). SPSS does have a syntax language for creating graphs but many of the features in the point and click interface are not available via the syntax language. The syntax language is more complicated than the language provided by Stata, but probably simpler (but less powerful) than the SAS language.
Summary. SPSS focuses on ease of use (their motto is "real stats, real easy", and it succeeds in this area. But if you intend to use SPSS as a power user, you may outgrow it over time. SPSS is strong in the area of graphics, but weak in more cutting edge statistical procedures lacking in robust methods and survey methods.

Overall Summary

Each package offers its own unique strengths and weaknesses. As a whole, SAS, Stata and SPSS form a set of tools that can be used for a wide variety of statistical analyses. With Stat/Transfer it is very easy to convert data files from one package to another in just a matter of seconds or minutes. Therefore, there can be quite an advantage to switching from one analysis package to another depending on the nature of your problem. For example, if you were performing analyses using mixed models you might choose SAS, but if you were doing logistic regression you might choose Stata, and if you were doing analysis of variance you might choose SPSS. If you are frequently performing statistical analyses, we would strongly urge you to consider making each one of these packages part of your toolkit for data analysis.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章