First Steps with Data and SAS

  Objectives of this chapter is to:

  • Understand the terms: data value, variable, observation, and data set.
  • Understand the rules for writing SAS statements and for naming variables and data sets.

Let us start with a generic example to understand how SAS reads and understands data. Most of the times,  the data is extracted from a data warehouse. But many a times we need to create our own datasets for testing certain procedures or functions so this learning comes handy.

Below is the table that provides some information about 10 accounts holders of a credit card company. Information includes account number, Account Open Date, Account Status Code and current Credit Limit. Now let us look at some data features.
 

Account #

Open Date

Status Code

Credit Limit

1234670

11-Sep-04

Z

2000

1234671

12-Sep-04

 

3000

1234672

13-Sep-04

Z

2500

1234673

14-Sep-04

T

3200

1234674

15-Sep-04

 

8000

1234675

16-Sep-04

D

2000

1234676

17-Sep-04

 

4000

1234677

18-Sep-04

S

6000

1234678

19-Sep-04

T

8000

1234679

20-Sep-04

T

2000

Data Value, Variable, Observations, and Dataset

DATA VALUE

Data value is the basic unit of information. In the field containing information about the account holder’s credit limit, the DATA VALUES are 2000, 3000 etc. The DATA VALUES in the field ‘status code’ are ‘Z’,‘D’, ‘S’ and ‘T’.

VARIABLE

A set of data values that describes a given attribute makes up a VARIABLE. Each column of data values is a VARIABLE. For example, the first column in our data set is reserved for the VARIABLE we'll call Account #. It has all the account numbers of the sample we have for the credit card holders

SAS variables are of two types - numeric and character. Values of numeric variables can only be numbers or a period (.) for missing data. Character variables can be made up of letters and special characters such as plus signs, dollar signs, colons and percent signs, as well as numeric digits.

In the sample data above, account number and credit limit are numeric variables and status code is a character variable. Open date is a field that deserves special mention. In SAS, dates are stored as numeric but displayed in various format using SAS formats.

OBSERVATION

All the data values associated with a case, a single entity, a subject, an individual, a year, or a record and so on, make up an OBSERVATION. Each row of the data table (or Matrix) represents one OBSERVATION. The row below represents all the data values associated with OBSERVATION #1.

 


Account #

Open Date

Status Code

Credit Limit

1234670

11-Sep-04

Z

2000


DATA SET

A DATA SET is a collection of data values usually arranged in a rectangular table (or matrix).

A SAS DATA SET is the special way that SAS organizes and stores the data. For example, if we convert our sample into a SAS dataset we will have a data set with 4 columns (fields) and 10 rows with 3 numeric fields and 1 character field. Why three numeric fields ? SAS stores date as a number.

The DATA step creates the SAS data set and the PROC steps are instructions indicating how the SAS data set is to be manipulated or analyzed. There are certain procedures where the outcome of their execution results in creation of one or many datasets. 

Rules for SAS names

Among the kinds of SAS names that appear in SAS statements are variables names, SAS data sets, formats, procedures, options, and statement labels.

  1. Many SAS names can be 32 characters long; others have a maximum length of 8.
  2. The first character must be a letter (A, B, C, . . ., Z) or underscore (_). Subsequent characters can be letters, numeric digits (0, 1, . . ., 9), or underscores.
  3. You can use upper or lowercase letters. SAS processes names as uppercase regardless of how you type them.
  4. Blanks cannot appear in SAS names.
  5.  Special characters, except for the underscore, are not allowed. In file reference, you can use the dollar sign ($), pound sign (#), and at sign (@).
  6. SAS reserves a few names for automatic variables and variable lists. For example, _N_ and _ERROR_.

Rules for SAS statements

  1. SAS statements may begin in any column of the line.
  2. SAS statements must end with a semicolon (;).
  3. Some SAS statements may consist of more than one line of commands.
  4. A SAS statement may continue over more than one line.
  5. One or more blanks should be placed between items in SAS statements. If the items are special characters such as '=', '+', '$', the blanks are not necessary.
Copyright free public information. All trademarks,service marks, logos and names are properties of their respective owners.