Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

Chapter 5. Looking for Duplicates and “n... > Using PROC FREQ to Detect Duplicate ...

Using PROC FREQ to Detect Duplicate ID’s

Another way to find duplicates uses PROC FREQ to count the number of observations for each value of the patient ID variable (PATNO). Use the patient ID variable and the OUT= option in the TABLES statement to create a SAS data set that contains the value of PATNO and the frequency count (PROC FREQ uses the variable name COUNT to hold the frequency information). After you have this information, you can use it to select the original duplicate observations from your data set. To demonstrate how this works, Program 5-7 identifies duplicate patient numbers from the PATIENTS data set.

Program 5-7. Using PROC FREQ and an Output Data Set to Identify Duplicate ID’s

PROC FREQ DATA=CLEAN.PATIENTS NOPRINT;  1
   TABLES PATNO / OUT=DUP_NO(KEEP=PATNO COUNT
                             WHERE=(COUNT GT 1));  2
RUN;


PROC SORT DATA=CLEAN.PATIENTS OUT=TMP;
   BY PATNO;
RUN;


PROC SORT DATA=DUP_NO;
   BY PATNO;
RUN;
DATA DUP;
   MERGE TMP DUP_NO(IN=YES_DUP DROP=COUNT);  3
   BY PATNO;
   IF YES_DUP;  4
RUN;


PROC PRINT DATA=DUP;
   TITLE "Listing of Data Set DUP";
RUN;


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free 10-Day Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint