Date: Thu, 25 Jul 1996 17:20:18 -0600 (MDT) From: Lipe Robert To: doncram@gsb-ecu.stanford.edu Subject: SAS bug I ran across a copy of your paper on SAS installations. I just spent a very frustrating month trying to use CRSP's monthly master file. Without getting into gory details, the problem occurs because SAS DATASOURCE fails to reset the DIST vector when it goes to the next firm (next permno). Some firms have no distributions (e.g. dividends or stock splits). Thus if firm A has 5 dividends in its past, and is followed by firm B which has 0 distributions, SAS will produce a dataset where firm B has 5 distributions that are identical for firm A's distributions. This does NOT affect the returns file, but it definitely screws up the EVENTS dataset. Anyone doing research using CRSP distribution data will be using wrong data for roughly 10% of their firms! You probably already know about this problem. SAS admits that it is a bug that will not be fixed until the next release. I developed a program that removes these duplicates. Unfortunately, it also remove a few valid observations for firms with Class A and Class B shares that have the exact same distributions. You are welcomed to use this information in anyway that will save others the frustrations I have experienced in the last month. Bob Lipe, Assoc Prof of Accounting The program follows. Admittedly, parts of it are idiosyncratic to the specific firms I want to investigate. But it should provide a start: OPTIONS /*OBS=10000*/ NONUMBER NOCENTER NODATE LS=78 ; * READS MONTHLY MASTER, USES PROC MEANS TO LOOK FOR REPEATED * DIST CODES AND DELETES THEM; * ELIMINATES MOST ERRORS IN SAS's PROC DATASOURCE FOUND IN SUMMER 96; * HOWEVER, FIRMS WITH SAME DISTRIBUTIONS FOR CLASS A & B COMMON ARE; * STILL A PROBLEM; LIBNAME CRSPMLY '/busgen/crsp95/monthly'; FILENAME DD1 'mmast0.dat'; FILENAME CK1 'mmerr1.dat'; FILENAME CK2 'mmerr2.dat'; FILENAME IN1 /*'cusip_date.dat'*/ 'mm0.del'; * READ MONTHLY MASTER FILE. MUST BE KEPT IN PERMNO ORDER AT THIS POINT!; * SELECT MY FIRMS LATER; DATA MM0 ; SET CRSPMLY.mevnt95 /*(firstobs=392000 obs=400000)*/; IF EVENT='DIST' THEN OUTPUT; RUN; PROC MEANS SUM DATA=MM0 NOPRINT; BY PERMNO; VAR DATE DIVAMT FACPR; OUTPUT OUT=MM1 sum=sumdt sumdiv sumfac; RUN; DATA MM2; MERGE MM1 MM0; BY PERMNO; RUN; PROC DATASETS; DELETE MM0 MM1; RUN; DATA MM3; RETAIN LC LDT LDIV LF DELE; SET MM2; IF _N_=1 THEN DO; LC=CUSIP; LDT=SUMDT; LF=SUMFAC; LDIV=SUMDIV; DELE=0; END; IF LC^=CUSIP THEN DO; IF (LDT=SUMDT AND LF=SUMFAC AND LDIV=SUMDIV) THEN DELE=1; ELSE DO; LC=CUSIP; LDT=SUMDT; LF=SUMFAC; LDIV=SUMDIV; DELE=0; END; END; IF DELE=0 THEN OUTPUT; ELSE DELETE; KEEP CUSIP PERMNO EVENT DATE COMNAM DISTCD DIVAMT FACPR DCLRDT PAYDT; RUN; PROC DATASETS; DELETE MM2; RUN; * PROC PRINT; * var cusip permno event date distcd divamt facpr; * FORMAT DATE YYMMDD7.; PROC SORT DATA=MM3; BY CUSIP; * READ INPUT FILE WITH CUSIPS AND DATES; DATA MYFIRMS ; INFILE IN1 ; INPUT CUSIP $ 1-8 STRTDATE yymmdd7. ENDDATE yymmdd7.; STRTDATE=STRTDATE-10; ENDDATE=ENDDATE+5; SMPL=1; run; PROC SORT DATA=MYFIRMS; BY CUSIP; DATA MM4; MERGE MYFIRMS MM3; BY CUSIP; PROC DATASETS; DELETE MYFIRMS MM3; data TEST; set MM4; IF (SMPL=0 or smpl=.) THEN DELETE; PROC PRINT DATA=MM4(OBS=1000); VAR CUSIP DATE SMPL DISTCD FACPR; FORMAT DATE YYMMDD7.; DATA MMAST; SET MM4; * delete if CRSP observation not within MYFIRMS; IF (SMPL=0 or smpl=.) THEN DELETE; IF STRTDATE>DATE OR ENDDATE5499 AND DISTCD<5600) OR DISTCD>6000 THEN D1=1; IF D1=1 THEN DELETE; * FORMAT DATE YYMMDD.; * PROC PRINT DATA=CHECK1; * VAR CUSIP DATE DISTCD DIVAMT FACPR ; FILE CK1; PUT @1 cusip $8. @10 DATE YYMMDD6. @17 DISTCD DIVAMT FACPR; DATA CHECK2; SET MMAST; IF ^(FACPR=0 OR FACPR=.) and DISTCD<4999 OR DISTCD>6000 THEN D1=1; IF D1^=1 THEN DELETE; * FORMAT DATE YYMMDD.; * PROC PRINT DATA=CHECK1; * VAR CUSIP DATE DISTCD DIVAMT FACPR ; FILE CK2; PUT @1 cusip $8. @10 DATE YYMMDD6. @17 DISTCD DIVAMT FACPR; RUN; DATA OUT; SET MMAST; FILE DD1; PUT @1 cusip $8. @10 DATE yymmdd6. @17 DISTCD DIVAMT FACPR; *PROC PRINT DATA=OUT; *VAR cusip DATE RETD APRC CPA1 PRC DIVAMT FACPR ; *FORMAT DATE YYMMDD.; RUN;