Pages

Thursday, January 20, 2011

SAS

SAS (pronounced "sass", originally Statistical Analysis System) is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:
data entry, retrieval, management, and mining
report writing and graphics
statistical analysis
business planning, forecasting, and decision support
operations research and project management
quality improvement
applications development
data warehousing (extract, transform, load)
platform independent and remote computing
In addition, SAS has many business solutions that enable large-scale software solutions for areas such as IT management, human resource management, financial management, business intelligence, customer relationship management and more.


Description

SAS is driven by SAS programs, which define a sequence of operations to be performed on data stored as tables. Although non-programmer graphical user interfaces to SAS exist (such as the SAS Enterprise Guide), these GUIs are most often merely a front-end that automates or facilitates the generation of SAS programs. The functionalities of SAS components are intended to be accessed via application programming interfaces, in the form of statements and procedures.
A SAS program has three major parts:
the DATA step
procedure steps (effectively, everything that is not enclosed in a DATA step)
a macro language
SAS Library Engines and Remote Library Services allow access to data stored in external data structures and on remote computer platforms.
The DATA step section of a SAS program,[1] like other database-oriented fourth-generation programming languages such as SQL or Focus, assumes a default file structure, and automates the process of identifying files to the operating system, opening the input file, reading the next record, opening the output file, writing the next record, and closing the files. This allows the user/programmer to concentrate on the details of working with the data within each record, in effect working almost entirely within an implicit program loop that runs for each record.
All other tasks are accomplished by procedures that operate on the data set (SAS' terminology for "table") as a whole. Typical tasks include printing or performing statistical analysis, and may just require the user/programmer to identify the data set. Procedures are not restricted to only one behavior and thus allow extensive customization, controlled by mini-languages defined within the procedures. SAS also has an extensive SQL procedure, allowing SQL programmers to use the system with little additional knowledge.
There are macro programming extensions, that allow for rationalization of repetitive sections of the program. Proper imperative and procedural programming constructs can be simulated by use of the "open code" macros or the Interactive Matrix Language SAS/IML component.
Macro code in a SAS program, if any, undergoes preprocessing. At runtime, DATA steps are compiled and procedures are interpreted and run in the sequence they appear in the SAS program. A SAS program requires the SAS software to run.
Compared to general-purpose programming languages, this structure allows the user/programmer to concentrate less on the technical details of the data and how it is stored, and more on the information contained in the data. This blurs the line between user and programmer, appealing to individuals who fall more into the 'business' or 'research' area and less in the 'information technology' area, since SAS does not enforce (although it recommends) a structured, centralized approach to data and infrastructure management.
SAS runs on IBM mainframes, Unix, Linux, OpenVMS Alpha, and Microsoft Windows. Code is "almost" transparently moved between these environments. Older versions have supported PC-DOS, the Apple Macintosh, VMS, VM/CMS, PrimeOS, Data General AOS and OS/2.

Early history

SAS was conceived by Anthony J. Barr in 1966. As a North Carolina State University graduate student from 1962 to 1964, Barr had created an analysis of variance modeling language inspired by the notation of statistician Maurice Kendall, followed by a multiple regression program that generated machine code for performing algebraic transformations of the raw data. Drawing on those programs and his experience with structured data files, he created SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr developed the fundamental structure and language of SAS.
In January 1968, Barr and James Goodnight collaborated, integrating new multiple regression and analysis of variance routines developed by Goodnight into Barr's framework. Goodnight's routines made the handling of basic statistical analysis more robust, and his later implementation (in SAS 76) of the general linear model increased the analytical power of the system.[citation needed] By 1971, SAS was gaining popularity within the academic community. One strength of the system was analyzing experiments with missing data, which was useful to the pharmaceutical and agricultural industries, among others.
In 1973, John Sall joined the project, making extensive programming contributions in econometrics, time series, and matrix algebra. Other participants in the early years included Caroll G. Perkins, Jolayne W. Service, and Jane T. Helwig. Perkins made programming contributions. Service and Helwig created the early documentation.
In 1976, SAS Institute, Inc. was incorporated by Barr, Goodnight, Sall, and Helwig.

Components

SAS consists of a number of components, which organizations separately license and install as required.
SAS Add-In for Microsoft Office
A component of the SAS Enterprise Business Intelligence Server, is designed to provide access to data, analysis, reporting and analytics for non-technical workers (such as business analysts, power users, domain experts and decision makers) via menus and toolbars integrated into Office applications.
Base SAS
The core of SAS, the so-called Base SAS Software, manages data. SAS procedures software analyzes and reports the data. The SQL procedure allows SQL (Structured Query Language) programming in lieu of data step and procedure programming. Library Engines allow transparent access to common data structures such as Oracle, as well as pass-through of SQL to be executed by such data structures. The Macro facility is a tool for extending and customizing SAS software programs and reducing overall program verbosity. The DATA step debugger is a programming tool that helps find logic problems in DATA step programs. The Output Delivery System (ODS) is an extendable system that delivers output in a variety of formats, such as SAS data sets, listing files, RTF, PDF, XML, or HTML. The SAS windowing environment is an interactive, graphical user interface used to run and test SAS programs.
BI Dashboard
A plugin for Information Delivery Portal. It allows the user to create various graphics that represent a broad range of data. This allows a quick glance to provide a lot of information, without having to look at all the underlying data.
Data Integration Studio
Provides extract, transform, load (ETL) services.
SAS Enterprise Business Intelligence Server
Includes both a suite of business intelligence (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with Business Objects and Cognos' offerings.
Enterprise Computing Offer (ECO)
Not to be confused with Enterprise Guide or Enterprise Miner, ECO is a product bundle.
Enterprise Guide
SAS Enterprise Guide is a Microsoft Windows client application that provides a guided mechanism to use SAS and publish dynamic results throughout an organization in a uniform way. It is marketed as the default interface to SAS for business analysts, statisticians, and programmers. Though Data Integration Studio is the true ETL tool of SAS, Enterprise Guide can be used for the ETL of smaller projects.
Enterprise Miner
A data mining tool.
Information Delivery Portal
Allows users to setup personalized homepages where they can view automatically generated reports, dashboards, and other SAS data structures.
Information Map Studio
A client application that helps with building information maps.
OLAP Cube Studio
A client application that helps with building OLAP Cubes.
SAS Web OLAP Viewer for Java
Web based application for viewing OLAP cubes and data explorations.
SAS Web OLAP Viewer for.NET
SAS/ACCESS
Provides the ability for SAS to transparently share data with non-native datasources.
SAS/ACCESS for PC Files
Allows SAS to transparently share data with personal computer applications including MS Access and Microsoft Office Excel.
SAS/AF
Applications facility, a set of application development tools to create customized desktop GUI applications; a library of drag-and-drop widgets are available; widgets and models are fully object oriented; SCL programs can be attacted as needed.
SAS/SCL
SAS Component Language, allows programmers to create and compile object-oriented programs. Uniquely, SAS allows objects to submit and execute Base/SAS and SAS/Macro statements.
SAS/ASSIST
Early point-and-click interface to SAS, has since been superseded by SAS Enterprise Guide and its client–server architecture.
SAS/C
SAS/CALC
Is a discontinued spreadsheet application, which came out in version 6 for mainframes and PCs, and didn't make it further.
SAS/CONNECT
Provides ability for SAS sessions on different platforms to communicate with each other.
SAS/DMI
A programming interface between interactive SAS and ISPF/PDF applications. Obsolete since version 5.
SAS/EIS
A menu-driven system for developing, running, and maintaining an enterprise information systems.
SAS/ETS
Provides Econometrics and Time Series Analysis
SAS/FSP
Allows interaction with data using integrated tools for data entry, computation, query, editing, validation, display, and retrieval.
SAS/GIS
An interactive desktop Geographic Information System for mapping applications.
SAS/GRAPH
Although base SAS includes primitive graphing capabilities, SAS/GRAPH is needed for charting on graphical media.
SAS/IML
Matrix-handling SAS script extensions.
SAS/INSIGHT
Dynamic tool for data mining - allows examination of univariate distributions, visualization of multivariate data, and model fitting using regression, analysis of variance, and the generalized linear model.
SAS/Integration Technologies
Allows the SAS System to use standard protocols, like LDAP for directory access, CORBA and Microsoft's COM/DCOM for inter-application communication, as well as message-oriented middleware like Microsoft Message Queuing and IBM WebSphere MQ. Also includes the SAS' proprietary client–server protocols used by all SAS clients.
SAS/IntrNet
Extends SAS’ data retrieval and analysis functionality to the Web with a suite of CGI and Java tools
SAS/LAB
Superseded by SAS Enterprise Guide.
SAS/OR
Operations Research
SAS/PH-Clinical
Defunct product
SAS/QC
Quality Control provides quality improvement tools.
SAS/SHARE
A data server that allows multiple users to gain simultaneous access to SAS files
SAS/SHARE*NET
Discontinued and now part of SAS/SHARE. It allowed a SAS/SHARE data server to be accessed from non-sas clients, like JDBC or ODBC compliant applications.
SAS/SPECTRAVIEW
Allows visual exploration of large amounts of data. Once the system has plotted the data in a 3D space, users can then visualise it by creating envelope surfaces, cutting planes, etc, which can be animated depending on a fourth parameter (time for example).
SAS/STAT
Statistical Analysis with a number of procedures, providing statistical information such as analysis of variance, regression, multivariate analysis, and categorical data analysis. Note for example the GLIMMIX procedure.
SAS/TOOLKIT
SAS/Warehouse Administrator
superseded in SAS 9 by SAS ETL Server.
SAS Web Report Studio
Part of the SAS Enterprise Business Intelligence Server, provides access to query and reporting capabilities on the Web. Aimed at non-technical users.
SAS Financial Management
Budgeting, planning, financial reporting and consolidation.
SAS Activity Based Management
Cost and revenue modeling.
SAS Strategy Management (formerly Strategic Performance Management)
Collaborative scorecards.
SAS Scalable Performance Data Server (SPDS)
Distributed data system offering increased performance; Data processing server.

Terminology

Where many other languages refer to tables, rows, and columns/fields, SAS uses the terms data sets, observations, and variables. There are only two kinds of variables in SAS: numeric and character (string). By default all numeric variables are stored as (8 byte) real. It is possible to reduce precision in external storage only. Date and datetime variables are numeric variables that inherit the C tradition and are stored as either the number of days (for date variables) or seconds (for datetime variables).

Features

Read and write different file formats.
Process data in different formats.
SAS programming language, a 4th generation programming language. SAS DATA steps are written in a 3rd-generation procedural language very similar to PL/I; SAS PROCS, especially PROC SQL, are non-procedural and therefore better fit the definition of a 4GL.
SAS AF/SCL is a fifth generation programming language[citation needed] that is similar in syntax to Java.
WHERE filtering available in DATA steps and PROCs; based on SQL WHERE clauses, incl. operators like LIKE and BETWEEN/AND.
Built-in statistical and random number functions.
Functions for manipulating character and numeric variables. Version 9 includes Perl Regular Expression processing.
System of formats and informats. These control representation and categorization of data and may be used within DATA step programs in a wide variety of ways. Users can create custom formats, either by direct specification or via an input dataset.
Comprehensive date- and time-handling functions; a variety of formats to represent date and time information without transformation of underlying values.
Interaction with database products through a subset of SQL (and ability to use SQL internally to manipulate SAS data sets). Almost all SAS functions and operators available in PROC SQL.
SAS/ACCESS modules allow communication with databases (including databases accessible via ODBC); in most cases, database tables can be viewed as though they were native SAS data sets. As a result, applications may combine data from many platforms without the end-user needing to know details of or distinctions between data sources.
Direct output of reports to CSV, HTML, PCL, PDF, PostScript, RTF, XML, and more using Output Delivery System. Templates, custom tagsets, styles incl. CSS and other markup tools available and fully programmable.
Interaction with the operating system (for example, pipelining on Unix and Windows and DDE on Windows).
Fast development time, particularly from the many built-in procedures, functions, in/formats, the macro facility, etc.
An integrated development environment.
Dynamic data-driven code generation using the SAS Macro language.
Can process files containing millions of rows and thousands of columns of data.
University research centers often offer SAS code for advanced statistical techniques, especially in fields such as Political Science, Economics and Business Administration.
Large user community supported by SAS Institute. Users have a say in future development, e.g. via the annual SASWare Ballot.

Example SAS code

SAS uses data steps and procedures to analyze and manipulate data. By default, a data step iterates through each observation in a data set (like every row in a SQL table).
This data step creates a new data set BBB that includes those observations from data set AAA that had charges greater than 100.
data BBB;
set AAA;
where charge > 100;
run;
SAS makes available procedures that can summarize data. The proc freq procedure shows a frequency distribution of a given variable in a data set.
proc freq data=BBB;
table charge;
run;
SAS also allows direct subsetting of rows and/or columns of the data used as input to a procedure. The two previous examples could be replaced by the following:
proc freq data=AAA;
where charge > 100;
table charge;
run;
The same program could produce a data set containing the frequency distribution:
...
table charge / out=charge_freq;
...
The SAS Macro Language enables such features as conditional execution of SAS language components either across multiple data-steps and proc-steps, or within a single such step. One can think of it as a "code-generator", although it can also be used merely to establish static values that can be reused throughout the program, and altered as needed. For instance, the above example could be re-used in many pieces of code by rewriting it as a macro:
%macro freqtable (table, variable);
proc freq data = &table;
table &variable;
run;
%mend freqtable;
%freqtable (BBB, charge)
And further, other macro variables could be used for both conditional execution, as well as modification of the functionality of the step, as shown below. The first procedure is modified to include a new parameter limitObs, which, if used, subsets the data before performing the frequency analysis. A second macro provides overall program control functionality, including a flag indicating whether the frequency analysis should be performed at all.
%macro freqtable (table, variable, limitObs);
proc freq data = &table
%if .&limitObs ne . %then (obs=&limitObs);
; /* End of PROC FREQ statement */
table &variable;
run;
%mend freqtable;
%macro wrapper(myTable, myVariable, limitObs, doFreq);
/* Perform other proc-steps and data-steps */
%if &doFreq=Y %then %freqtable(&mytable, &myVariable, &limitObs);
%mend wrapper;
%wrapper(work.test, CLASS, 20, Y)
SAS also features SQL, usable to create, modify or query SAS datasets or external database tables accessed with a SAS libname engine. For example, duplicate records could be extracted from a table for analysis:
proc sql;
create table dup_recs
as select *
from your_dataset
group by id
having count(*) > 1
;
quit;
The proc print procedure allows the user to display information in ways not possible using only the SQL SELECT statement.
proc print data=BBB;
run;
SAS features SCL, which can be used to create object-oriented programs. SCL programs provide a robust library of features not available in Base SAS or the SAS Macro Language
class arrays;
public num supplyChain [*,*,*,*];
eventhandler runInterface / (sender='*', event='prepack for singles and bulk');
runInterface: method;
call send(_self_, 'step1');
call send(_self_, 'step2');
* ---;
call send(_self_, 'step99');
endmethod;
step1: method / (description='initialize array: suppliers, distro-centers, stores, prepack options');
supplyChain=makearray(34000, 15, 3207, 10);
endmethod;
step2: method / (description='load data');
* code cut;
endmethod;
step99: method / (description='print results');
submit continue;
proc print data=work.results;
run;
endsubmit;
endmethod;
endclass;
Version history

SAS 71
SAS 71 represents the first limited release of the system. The first manual for SAS was printed at this time, approximately 60 pages long. The DATA step was implemented. Regression and analysis of variance were the main uses of the program.

SAS 72
This more robust release was the first to achieve wide distribution. It included a substantial user's guide, 260 pages in length. The MERGE statement was introduced in this release, adding the ability to perform a database JOIN on two data sets. This release also introduced the comprehensive handling of missing data.

SAS 76
SAS 76 was a complete system level rewrite, featuring an open architecture for adding and extending procedures, and for extending the compiler. The INPUT and INFILE statements were significantly enhanced to read virtually all data formats in use on the IBM mainframe. Report generation was added through the PUT and FILE statements.The capacity to analyze general linear models was added.

79.3 - 82.4
1980 saw the addition of SAS/GRAPH, a graphing component; and SAS/ETS for econometric and time-series analysis. In 1981 SAS/FSP followed, providing full-screen interactive data entry, editing, browsing, retrieval, and letter writing.
In 1983 full-screen spreadsheet capabilities were introduced (PROC FSCALC).
For IBM mainframes, SAS 82 no longer required SAS databases to have direct access organization ( (DSORG=DAU), because SAS 82 removed location-dependent information from databases. This permitted SAS to work with datasets on tape and other media besides disk.

Version 4 series
In the early 1980s, SAS Institute released Version 4, the first version for non-IBM computers. It was written mostly in a subset of the PL/I language, to run on several minicomputer manufacturers' operating systems and hardware: Data General's AOS/VS, Digital Equipment's VAX/VMS, and Prime Computer's PRIMOS. The version was colloquially called "Portable SAS" because most of the code was portable, i.e., the same code would run under different operating systems.

Version 5 series
Version 6 series
Version 6 represented a major milestone for SAS. While it appeared superficially similar to the user, major changes occurred "under the hood": the software was rewritten. From its FORTRAN origins, followed by PL/I and mainframe assembly language; in version 6 SAS was rewritten in C, to provide enhanced portability between operating systems, as well as access to an increasing pool of C programmers compared to the shrinking pool of PL/I programmers.
This was the first version to run on UNIX, MS-DOS and Windows platforms. The DOS versions were incomplete implementations of the Version 6 spec: some functions and formats were unavailable, as were SQL and related items such as indexing and WHERE subsetting. DOS memory limitations restricted the size of some user-defined items.
The mainframe version of SAS 6 changed the physical format of SAS databases from "direct files" (DSORG=DA) to standard blocked physical sequential files (DSORG=PS,RECFM=FS) with a customized EXCP macro instead of BSAM, QSAM or previously BDAM which was used through version 5 until the complete rewrite of version 6. The practical benefit of this change is that a SAS 6 database can be copied from any media with any copying tool including IEBGENER - which uses BSAM.
In 1984 a project management component was added (SAS/PROJECT).
In 1985 SAS/AF software, econometrics and time series analysis (SAS/ETS) component, and interactive matrix programming (SAS/IML) software was introduced. MS-DOS SAS (version 6.02) was introduced, along with a link to mainframe SAS.
In 1986 Statistical quality improvement component is added (SAS/QC software); SAS/IML and SAS/STAT software is released for personal computers.
1987 saw concurrent update access provided for SAS data sets with SAS/SHARE software. Database interfaces are introduced for DB2 and SQL-DS.
In 1988 SAS introduced the concept of MultiVendor Architecture (MVA); SAS/ACCESS software is released. Support for UNIX-based hardware announced. SAS/ASSIST software for building user-friendly front-end menus is introduced. New SAS/CPE software establishes SAS as innovator in computer performance evaluation. Version 6.03 for MS-DOS is released.
6.06 for MVS, CMS, and OpenVMS is announced in 1990. The same year, the last MS-DOS version (6.04) is released.
Data visualization capabilities added in 1991 with SAS/INSIGHT software.
In 1992 SAS/CALC, SAS/TOOLKIT, SAS/PH-Clinical, and SAS/LAB software is released.
In 1993 software for building customized executive information systems (EIS) is introduced. Release 6.08 for MVS, CMS, VMS, VSE, OS/2, and Windows is announced.
1994 saw the addition of ODBC support, plus SAS/SPECTRAVIEW and SAS/SHARE*NET components.
6.09 saw the addition of a data step debugger.
6.09E for MVS.
6.10 in 1995 was a Microsoft Windows release and the first release for the Apple Macintosh. Version 6 was the first, and last series to run on the Macintosh. JMP, also produced by the SAS Institute, is the software package the company produces for the Macintosh.
Also in 1995, 6.11 (codenamed Orlando) was released for Windows 95, Windows NT, and UNIX.
6.12 (Some of the following milestones in this sub-section may belong under version 7 or 8.)
In 1996 SAS announces Web enablement of SAS software and introduced the scalable performance data server.
In 1997 SAS/Warehouse Administrator and SAS/IntrNet software goes into production.
1998 sees SAS introduce a customer relationship management (CRM) solution, and an ERP access interface — SAS/ACCESS interface for SAP R/3. SAS is also the first to release OLE-DB for OLAP and releases HOLAP solution. Balanced scorecard, SAS/Enterprise Reporter, and HR Vision are released. First release of SAS Enterprise Miner.
1999 sees the releases of HR Vision software, the first end-to-end decision-support system for human resources reporting and analysis; and Risk Dimensions software, an end-to-end risk-management solution. MS-DOS versions are abandoned because of Y2K issues and lack of continued demand.
In 2000 SAS shipped Enterprise Guide and ported its software to Linux.

Version 7 series
The Output Delivery System debuted in version 7; as did long variable names (from 8 to 32 characters); storage of long character strings in variables (from 200 to 32,767); and a much improved built-in text editor, the Enhanced Editor.
Version 7 saw the synchronisation of features between the various platforms for a particular version number (which previously hadn't been the case).
Version 7 foreshadowed version 8. It was believed in the SAS users community, although never officially confirmed, that in releasing version 7 SAS Institute released a snapshot from their development on version 8 to meet a deadline promise. To some, SAS Institute recommending that sites wait until version 8 before deploying the new software was a confirmation of this.

Version 8 series
Released about 1999; 8.0, 8.1, 8.2 were Unix, Linux, Microsoft Windows, CMS (z/VM) and z/OS releases. Key features: long variable names, Output Delivery System (ODS).
SAS 8.1 was released in 2000.
SAS 8.2 was released in 2001.

Version 9 series
SAS 9.2 is the latest release (March 2008) and was demonstrated at SAS Global Forum (previously called SUGI) 2008. A list of features added to this release of SAS can be seen at the "What's New in SAS" web page.
SAS 9.2 will be released incrementally in three phases:
1) MVA-based products eg. SAS/BASE, SAS/STAT, SAS/Graph. Nothing that relies on metadata. Limited availability from March 2008 because most users rely on the Metadata Server (see Phase 2) or products released in Phased 3.
2) Enterprise Intelligence Platform. Metadata Server for Business Intelligence (BI) and Data Integration. Availability from March 2009.
3) Client software for metadata driven analytics and business solutions. Enterprise Miner, Text Miner, Model manager. Solutions include Financial, Retail, Health & Life Science. Availability unknown, probably 2nd Quarter 2009.
Version 9 makes additions to base SAS. The new hash object now allows functionality similar to the MERGE statement without sorting data or building formats. The function library was enlarged, and many functions have new parameters. Perl Regular Expressions are now supported, as opposed to the old "Regular Expression" facility, which was incompatible with most other implementations of Regular Expressions. Long format names are now supported.

Criticism

The Base SAS component had been criticized for its poor graphics when compared with other statistical software packages. With the release of the Output Delivery System (ODS) for Statistical Graphics extension in SAS 7, and with the use of the SAS Graph component the graphics have improved significantly.[19] The development tools provided — which include the Enhanced text editor, log, DATA step debugger, SCL debugger — are also outdated compared to what other development environments provide. Debugging tools are especially lacking. Finding bugs in modern SAS programs that use many macros can be complex; SAS will often not note the correct line number of execution when reporting an error, as diagnostic messages will refer to the expanded macro code.


See also



(source:wikipedia)

No comments:

Post a Comment