SAS Functions and Conditional Processing

Objectives of this Chapter:

  • Learn SAS commonly used functions for data processing.
  • Learn how to MERGE datasets.
  • Learn conditional processing using WHERE, IF/ELSE and DO-END

SAS Functions  in DATA steps

A SAS function performs a computation or manipulation on variables (arguments) and returns a value. Most functions use arguments supplied by the user. SAS functions are mainly used in DATA step programming statements.


data function_test;
Max_var= max(100,101);
Length_var = length(Max_var);
This_month = month(today());
run;
proc print;run;


The syntax of a function is :

Function-name (argument-1, . . .,argument-n>)


In the above example, we have shown how MAX, LENGTH and MONTH functions are used. The table below has some frequently used functions in our programs.

Function Name

Description

LENGTH

Returns the length of an argument

LOWCASE

Converts all letters in an argument to lowercase

SCAN

Selects a given word from a character expression

SUBSTR (right of =)

Extracts a substring from an argument

UPCASE

Converts all letters in an argument to uppercase

DATEPART

Extracts the date from a SAS datetime value

DAY

Returns the day of the month from a SAS date value

INTCK

Returns the integer number of time intervals in a given time span

INTNX

Advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value

MONTH

Returns the month from a SAS date value

QTR

Returns the quarter of the year from a SAS date value

TODAY

Returns the current date as a SAS date value

YEAR

Returns the year from a SAS date value

MAX

Returns the largest value

MEAN

Returns the arithmetic mean (average)

MIN

Returns the smallest value

SUM

Returns the sum of the nonmissing arguments

CALL SYMPUT

Assigns DATA step information to a macro variable

LOG

Returns the natural (base e) logarithm

MOD

Returns the remainder value

SQRT

Returns the square root of a value

RANUNI

Returns a random variate from a uniform distribution

INPUT

Returns the value produced when a SAS expression that uses a specified informat expression is read

LAG

Returns values from a queue

PUT

Returns a value using a specified format

ZIPSTATE

Converts ZIP codes to state postal codes

CEIL

Returns the smallest integer that is greater than or equal to the argument

FLOOR

Returns the largest integer that is less than or equal to the argument

ROUND

Rounds to the nearest round-off unit

TRUNC

Truncates a numeric value to a specified length

VLABEL

Returns the label that is associated with the specified variable

VNAME

Returns the name of the specified variable

VTYPE

Returns the type (character or numeric) of the specified variable

MERGE  –Combining datasets

One of the features frequently used while SAS programming is combining one or many datasets. For example to combine the application data with performance data we use MERGE utility in SAS.

Most of the time a match merging is done on SAS datasets. For example, if we are interested in the performance of the accounts that are opened only in Jan 2004, we try to match only those accounts while doing the performance data merging. In consumer Finance, account number or Account Key is usually used for match merging.

Let us look at the following examples. We have two record sets that have information about credit lines of the accounts. Intial_cl holds data for all initial credit lines assigned and new_cl has all increased credit lines. Now we need to combine these into one dataset so that it will show the latest credit lines for each account.


DATA intial_cl;
INPUT account credit_limit;
DATALINES;
1002 2000
1003 4000
1004 3000
;

DATA new_cl;
INPUT account credit_limit;
DATALINES;
1002 3000
1004 5000
;

DATA credit_limit;
  MERGE intial_cl new_cl;
  BY account;
RUN;

 

PROC PRINT;
RUN;


Output looks like as follows:


                                 credit_
       Obs    account     limit

_______________________________________________________

        1       1002       3000
        2       1003       4000
        3       1004       5000

So a simple merge statement combined the datasets and the BY statement made sure that its updated with the latest information.

What if we put ‘new_cl’ data set first in the MERGE statement? Note that the data overwritten on the first dataset.

Let us introduce some more complexity to reflect the way we use MERGE .

DATA credit_limit;
  MERGE  intial_cl(in=a) new_cl(in=b) ;
  BY account;
  IF  a and b;
RUN;


The above data step introduces a conditional processing using IF . Also a handle to the dataset is declared after the dataset name (in=a, in=b) . Using these handles we can combine the datasets conditionally.

‘IF a and b’ combines the dataset if the BY statement finds a match in the second dataset. Only those records with a match are combined. This is different from the first example where all records are output into the resultant dataset. Output looks like as follows
          

Account Sample

                               credit_
     Obs    account     limit

____________________________________________

      1       1002       3000
      2       1004       5000


There are various combinations that suits to various requirements


If a ; = combines the data with data updated for all accounts in the first dataset
If b; = combines the data with data updated for all accounts in the second data set
If a or b; combines both the datasets with data updated into the first dataset specified (Outer join)
Also it is possible to merge more than two data sets at a time. PROC SQL is also used to merge datasets. Please go through the Appendix-I for more details.

Conditional  Processing with WHERE , IF-ELSE , DO-END

When working with data we come across situations where data need to be filtered or we may have to apply several conditions before we have the final data for analysis or modeling. For example, When we want to report out the number of active accounts and total $ outstanding, we need to subset the entire data so that only accounts that are active are included. Suppose the  active definition says “ An active account is an account where current $ outstanding is greater than 100 and Status_code is not  Z”. Let us look at an example now.


DATA account_perf;
INPUT account current_os status_code $;
cards;
1002 300   A
1003 20    A
1004 1200  .
1005 800   Z
1006 450   D
1007 560   Z
1008 450   A
1009 900   C
1110  300  C
;run;

Data perf;
set account_perf;
where current_os >100 and status_code ne 'Z';
run;


Here dataset ‘perf’ is a subset of account_perf.  Conditions behind creating this subset were the activity definitions we mentioned before. Note that WHERE statement is used to evaluate two variables with and ‘and’ condition. The same way we can use OR also.

IF can also be used instead of WHERE . (You can use IF current_os >100 and status_code ne 'Z';). But WHERE is more efficient when we do the sub-setting.

IF ELSE conditions are commonly used to flag accounts or create new variables based on certain conditions. Suppose we need to flag the accounts into Good and Bad based on certain conditions, IF/ELSE can be used.

The example below shows the usage of IF/ELSE

Data perf;
set account_perf;
length status $20.;
if status_code eq 'Z' then status= "Bad-Charged Off";
else if status_code eq 'C' then status = "Cancelled";
else  Status= "Good Account";
run;

Dataset used is same as the previous example. Here we are categorizing the data into three segments. Bad Charged off, Cancelled and Good Accounts. A new variable (status) is created based on the condition.

                   current_    status_
 Obs    account       os        code      status

  1       1002        300         A       Good Account
  2       1003         20         A       Good Account
  3       1004       1200                 Good Account
  4       1005        800         Z       Bad-Charged Off
  5       1006        450         D       Good Account
  6       1007        560         Z       Bad-Charged Off
  7       1008        450         A       Good Account
  8       1009        900         C       Cancelled
  9       1110        300         C       Cancelled

When there is multiple processing done conditionally, then DO-END loop is used. For example we want to categorize the good and bad accounts and also want to compute the write-off amount based on a same condition ie Status_code =Z, this can be achieved using a DO-END loop. Let us have a look at the example below.

 Data perf;
set account_perf;
length status $20.;

if  status_code eq 'Z' then do ;
status= "Bad-Charged Off";
wo_amount = current_os ;
end;

else  do;
Status= "Good Account";
wo_amount = 0 ;
end;

run;


proc print ;run;

 

The output looks like the following:

                            current_  status_
   Obs    account       os        code          status         wo_amount

    1       1002        300         A       Good Account            0
    2       1003         20         A       Good Account            0
    3       1004       1200                 Good Account            0
    4       1005        800         Z       Bad-Charged Off       800
    5       1006        450         D       Good Account            0
    6       1007        560         Z       Bad-Charged Off       560
    7       1008        450         A       Good Account            0
    8       1009        900         C       Good Account            0
    9       1110        300         C       Good Account            0

Now let us summarize:


WHERE, IF-ELSE, DO-END are used for conditional processing in SAS data steps. WHERE and IF conditions are also used in many SAS PROCs to subset data and filter out unwanted data. 

Exercises:


Create two data sets as showed below and try out the following combination of merging. Write down your observations on how each merging worked.

DATA intial_cl;
INPUT account credit_limit;
DATALINES;
1002 2000
1003 4000
1004 3000
;

 

DATA new_cl;
INPUT account credit_limit;
DATALINES;
1002 3000
1004 5000
1005 2500
;

DATA credit_limit;
  MERGE  intial_cl(in=a) new_cl(in=b) ;
  BY account;
  IF  a ;
RUN;

PROC PRINT;
RUN;

DATA credit_limit;
  MERGE  intial_cl(in=a) new_cl(in=b) ;
  BY account;
  IF  b ;
RUN;

PROC PRINT;
RUN;

 

DATA credit_limit;
  MERGE  intial_cl(in=a) new_cl(in=b) ;
  BY account;
  IF  a or b ;
RUN;

PROC PRINT;
RUN;

 

DATA credit_limit;
  MERGE  intial_cl(in=a) new_cl(in=b) ;
  BY account;
  IF  a=b ;
RUN;

PROC PRINT;
RUN;

 

 

Copyright free public information. All trademarks,service marks, logos and names are properties of their respective owners.