Objectives of
this Chapter:
- Learn about SAS Variables.
- Learn how to LABEL, RENAME and FORMAT
data.
- Learn SAS commonly used functions for
data processing.
- Learn how to MERGE datasets.
- Learn conditional processing using
WHERE, IF/ELSE and DO-END
The previous chapters
demonstrated how a simple SAS program looks like and how to read data
from various sources into SAS. We did come across concepts like
informats, formats and some procedures in SAS. Let us discuss some of
these concepts in detail so that we understand some frequently used
functions and methods used for data processing – to clean, format and
combine/subset data to make it ready for analytics purposes.
Declaration, assignment,
length, keep, drop, array, PROC contents, label, macro variables and
scope of variables.
SAS variables are containers
that you create within a program to store and use character and numeric
values. There are two types of variables–Character and Numeric.
Characters are variables of type character that contain alphabetic
characters, numeric digits 0 through 9, and other special
characters. Numeric variables are variables of type numeric that
are stored as floating-point numbers, including dates and times. Yes
SAS stores date and time as Numbers.
To simplify, each and every
field/column in a SAS dataset is a SAS variable.
These are the questions we
will discuss in this chapter around SAS variables.
- How to create a variable and type?
- How to decide length of a variable?
- How to keep or drop a variable(s)
from a dataset?
- What are array variables and how to
declare them?
- How do we know the type of variables
in an already created dataset?
- How to label and apply format
to a variable?
- What’s a macro variable and how to
declare them?
- What’s scope of a variable in a
SAS program?
How to create a
variable ?
There are four ways we commonly use to create a
variable.
1. Using an assignment
statement
2. Using and INPUT
statement
3. Through a LENGTH
statement and
4. As a result of a PROC
SQL/PROC IMPORT.
There are many other ways
too but we limit to these four types, as they are used 90% of the times.
Using an
assignment statement
This is the most common
form of variable creation. Its not necessary that the variables should
be declared well in advance.
Have a look at the following program:
Data var_test;
id ='JK';
NProducts= 6;
pro_price = 4.555;
tot_cost = NProducts*pro_price;
final_price = tot_cost;
run;
proc print
data=var_test ;
run;
proc contents
data =var_test;run;
The above program creates a dataset ‘var_test’ with five variables. As
we have seen earlier, each variable will form a column/field in the
dataset. ‘Id’ is assigned with a value of ‘JK’. Nproducts and
pro_price are assigned with numbers where latter is a decimal.
Tot_cost is the variable that takes the value of a product of two other
variables. And finally, final price variable is assigned with
another variable in the dataset ie tot_cost.
PROC Print prints the data sets and all variables and this is how the
output looks like:
pro_ final_
Obs
id NProducts price
tot_cost price
------------------------------------------------------
1
JK
6
4.55
27.3 27.3
PROC contents is the procedure used to know the data types, length and
label of the variables in a dataset.
-----Alphabetic List of Variables and
Attributes-----
# Variable
Type Len
Pos
---------------------------------------------
2
NProducts
Num 8
0
5
final_price
Num 8 24
1
id
Char
2 32
3
pro_price
Num 8
8
4
tot_cost
Num 8 16
These outputs together tell us how the variable creations are done and
what values, types and lengths are assigned by SAS. Now let
us discuss some general rules of variable creation by assignment.
In a DATA step, you can
create a new variable and assign it a value by using it for the first
time on the left side of an assignment statement. SAS determines the
length of a variable from its first occurrence in the DATA step. The
new variable gets the same type and length as the expression on the
right side of the assignment statement.
When the type and length
of a variable are not explicitly set, SAS gives the variable a default
type and length as shown in the examples in the following table.
Expression |
Example
|
Resulting Type of X
|
Resulting Length of
X
|
Explanation
|
|
Numeric variable
|
a=34
x=a;
|
Numeric variable
|
8
|
Default numeric
length (8 bytes unless otherwise specified)
|
|
Character variable
|
a=’ABCD’
x=a;
|
Character variable
|
4
|
Length of source
variable
|
|
Character literal
|
x='ABC';
x='ABCDE';
|
Character variable
|
3
|
Length of first
literal encountered
|
Practical problems: Many a
time the length of the variable is not sufficient to hold the value
encountered during the data processing. This will lead to SAS
truncating the variable in to the length of the variable created. This
problem can be solved with declaring the length of the variable before
assignment.
We have already seen how to
use INPUT to read data into variables. We have also seen how to use SAS
informats to tell SAS what kind of data its reading. Below reproduced
is one example to show how its done.
DATA
acctinfo;
INPUT acctnum $8. date mmddyy10. amount
comma9.;
CARDS;
0074309801/15/2001$1,003.59
1028754301/17/2001$672.05
3320899201/19/2001$702.77
0345900601/19/2001$1,209.61
;
run;
proc contents
data =acctinfo;run;
Output :
Alphabetic List of
Variables and Attributes
#
Variable Type Len
Pos
___________________________
1 acctnum Char 8 16
3
amount
Num 8
8
2
date
Num 8
0
Here INPUT statement specifies the data type next to the variable and
also how many positions (length).
In practical situations,
when we create new variables, the length of the variable needs to be
explicitly defined. For example, when we read two character values
successively into a variable, SAS assigns the length of the variable as
that of the first. Suppose the second value is longer than the first,
SAS reads only up to the length of first variable. So it’s a good
programming practice to declare the variable with a LENGTH statement so
that we are sure it can hold all kinds of values the data has.
You can use the LENGTH statement to create a variable and set the
length of the variable. Let us modify our earlier example:
Data var_test;
length id $ 10;
length NProducts 4;
id ='JK';
NProducts= 6;
pro_price = 4.55;
tot_cost = NProducts*pro_price;
final_price = tot_cost;
run;
proc contents
data =var_test;run;
Output is :
Alphabetic List of Variables and
Attributes
#
Variable Type
Len Pos
______________________________
2
NProducts
Num 4 24
5
final_price Num
8 16
1
id
Char 10 28
3
pro_price
Num 8
0
4 tot_cost Num 8 8
Output shows that now ID
variable is a 10-character field so it can hold more characters.
Without this explicit declaration, ID field can hold only two
characters, as automatically assigned by SAS.
For character variables,
you must allow for the longest possible value in the first statement
that uses the variable, because you cannot change the length with a
subsequent LENGTH statement within the same DATA step. The maximum
length of any character variable in the SAS System is 32,767 bytes. For
numeric variables, you can change the length of the variable by using a
subsequent LENGTH statement.
We always extract data from
data warehouses or import data using SAS import utilities and we find
the dataset is created with all columns with various formats. Here what
happens during the process is SAS identifies the best format for the
database fields you are extracting and apply the same to the
datasets. PROC SQL provides flexibility in formatting the
variables. This is discussed separately in the PROC SQL chapter..