Objectives of this chapter is to:
- Understand the terms: data value,
variable, observation, and data set.
- Understand the rules for writing SAS
statements and for naming variables and data sets.
Let us start with a generic
example to understand how SAS reads and understands data. Most of the
times, the data is extracted from a data warehouse. But many a
times we need to create our own datasets for testing certain procedures
or functions so this learning comes handy.
Below is the table
that provides some information about 10 accounts holders of a credit
card company. Information includes account number, Account Open Date,
Account Status Code and current Credit Limit. Now let us look at some
data features.
|
Account #
|
Open Date
|
Status Code
|
Credit Limit
|
|
1234670
|
11-Sep-04
|
Z
|
2000
|
|
1234671
|
12-Sep-04
|
|
3000
|
|
1234672
|
13-Sep-04
|
Z
|
2500
|
|
1234673
|
14-Sep-04
|
T
|
3200
|
|
1234674
|
15-Sep-04
|
|
8000
|
|
1234675
|
16-Sep-04
|
D
|
2000
|
|
1234676
|
17-Sep-04
|
|
4000
|
|
1234677
|
18-Sep-04
|
S
|
6000
|
|
1234678
|
19-Sep-04
|
T
|
8000
|
|
1234679
|
20-Sep-04
|
T
|
2000
|
DATA VALUE
Data value is the
basic unit of information. In the field containing information about
the account holder’s credit limit, the DATA VALUES are 2000, 3000 etc.
The DATA VALUES in the field ‘status code’ are ‘Z’,‘D’, ‘S’ and ‘T’.
VARIABLE
A set of data values that
describes a given attribute makes up a VARIABLE. Each column of data
values is a VARIABLE. For example, the first column in our data set is
reserved for the VARIABLE we'll call Account #. It has all the account
numbers of the sample we have for the credit card holders
SAS variables are of two
types - numeric and character. Values of numeric variables can only be
numbers or a period (.) for missing data. Character variables can be
made up of letters and special characters such as plus signs, dollar
signs, colons and percent signs, as well as numeric digits.
In the sample data above,
account number and credit limit are numeric variables and status code
is a character variable. Open date is a field that deserves special
mention. In SAS, dates are stored as numeric but displayed in various
format using SAS formats.
OBSERVATION
All the data values
associated with a case, a single entity, a subject, an individual, a
year, or a record and so on, make up an OBSERVATION. Each row of the
data table (or Matrix) represents one OBSERVATION. The row below
represents all the data values associated with OBSERVATION #1.
Account # |
Open Date
|
Status Code
|
Credit Limit
|
|
1234670
|
11-Sep-04
|
Z
|
2000
|
DATA SET
A DATA SET is a collection
of data values usually arranged in a rectangular table (or matrix).
A SAS DATA SET is the
special way that SAS organizes and stores the data. For example, if we
convert our sample into a SAS dataset we will have a data set with 4
columns (fields) and 10 rows with 3 numeric fields and 1 character
field. Why three numeric fields ? SAS stores date as a number.
The DATA step creates the
SAS data set and the PROC steps are instructions indicating how the SAS
data set is to be manipulated or analyzed. There are certain procedures
where the outcome of their execution results in creation of one or many
datasets.
Among the kinds of SAS
names that appear in SAS statements are variables names, SAS data
sets, formats, procedures, options, and statement labels.
- Many SAS names can be 32
characters long; others have a maximum length of 8.
- The first character must
be a letter (A, B, C, . . ., Z) or underscore (_). Subsequent
characters can be letters, numeric digits (0, 1, . . ., 9), or
underscores.
- You can use upper or
lowercase letters. SAS processes names as uppercase regardless of how
you type them.
- Blanks cannot appear in
SAS names.
- Special
characters, except for the underscore, are not allowed. In file
reference, you can use the dollar sign ($), pound sign (#), and at sign
(@).
- SAS reserves a few names
for automatic variables and variable lists. For example, _N_ and
_ERROR_.
- SAS statements may begin
in any column of the line.
- SAS statements must end
with a semicolon (;).
- Some SAS statements may
consist of more than one line of commands.
- A SAS statement may
continue over more than one line.
- One or more blanks
should be placed between items in SAS statements. If the items are
special characters such as '=', '+', '$', the blanks are not necessary.