The SERVER address (the place you send subscribe and unsubscribe commands) is: listserv@listserv.tamu.edu
The LIST address (the place you send your messages) is: crsp-l@listserv.tamu.edu
You may contact the list owner, H. Alan Montgomery, at owner-crsp-l@listserv.tamu.edu) if you have problems with the list.
A searchable, public archive of CRSP-L postings is open at http://listserv.tamu.edu/archives/crsp-l.html
Copyright 1999, Donald P. Cram. Permission to copy for individual reference use is granted provided that the document remains in its entirety including this copyright statement; publication or commercial use is not permitted. Please contact doncram@gsb.stanford.edu with comments, suggestions, other permission requests.
-.0001 means missing value and is translated to a period, the
simplest SAS missing value character.
-.0004 occurs only on Aggregate File,
means "Combined with another figure",
and becomes .C
-.0008 occurs only on Aggregate File,
means "Insignificanly nonzero and small
and is translated to .I
-.0002 occurs only on Quarterly File,
indicates that the figure is semi-annual instead
of a quarterly data
and becomes .S
-.0003 occurs only on Quarterly File, and indicates that
this is an annual figure instead
of a quarterly figure
and becomes .A
The codes to be replaced by PROC DATASOURCE when reading Compustat Annual,
IBM 360/370 Format files, however, are different:
.0001 becomes .
.0004 becomes .C
.0008 becomes .I
.0002 becomes .S
.0003 becomes .A
Having these replacements done is one way in which SAS conversion
programs can add value. User SAS programs will then show missing
values in calculations based upon missing values, as they should. But
it seems that the actual missing values in Compustat data can vary
across filetypes, so programmers of new SAS programs to read raw
Compustat data need to be careful.
(Summarized 9/97 by Don Cram from 12/96 discussion on CRSP-L)
Advertisers are welcome to submit an ad to the maintainer of this FAQ, for the ad to be included in a new advertisers section of this FAQ, at no charge. You will only be required to give notification when the ad is no longer relevant.
Please do NOT use your mailer's "enclosure" or "attachment" features in sending mail to CRSP-L. Even if what you enclose is an ASCII file, the enclosure link will not be ASCII for many readers and will make your file unreadable for some. Instead, embody the file directly into your email message. If you are in doubt whether you are doing it right, then DO NOT send it to the list. Send a test message somewhere else, say to this author, instead, and/or ask your local computer administrator for help.
Non-ASCII enclosures, such as mime-formatted Word files or uuencoded text, are especially unacceptable. Go to top
A: CRSP is the acronym for the Center of Research in Securities Prices, but it is commonly used to refer to the database files which that center distributes. The CRSP securities files contain stock prices, trading volumes, dividends and returns for firms traded on U.S. exchanges (NYSE, AMEX, NASDAQ) only. Go to top
This author has developed a webpage "CRSP Data Access and Analysis by Don Cram" to address this question. The webpage includes a form to request his GSB Technical Report #80 on CRSP access and analysis using SAS. Go to top
Here's one which works on the Stanford GSB unix system, where the 1993 CRSP data for the NYSE exchange was installed in ASCII format. The program extracts 10 years of daily returns data for two firms, IBM and Disney:
/* simple.sas */
filename NYSEDCAL '/crsp1/udx/CALENDAR.DAT';
filename NYSEDDAT '/crsp1/udx/DATA1.DAT';
proc datasource filetype=crspdcs infile=(NYSEDCAL NYSEDDAT) out=IBMDIS;
where cusip = '45920010' or cusip = '25468710';
range from '01jan84'd to '31dec93'd;
keep cusip date ret;
proc print data=IBMDIS;
Go to top
So what bugs exist?
A: A database from the center which provides cumulative coverage from the earliest data available through the end of that year, i.e. to 12/31/95. Each CRSP installation completely replaces the previous installation. Go to top
A: It is believed that speed-of-access is faster for SAS programs drawing on CRSP data stored as an indexed permanent data set, as compared to SAS programs using SAS's PROC DATASOURCE from an ASCII or binary CRSP data original installation. [How much faster? It would be nice to get some measurements to report here.] There's a tradeoff between speed of access vs. diskspace used, however.
Diskspace used is larger, for two reasons. First, one may have to leave up the original CRSP data for FORTRAN users. However, Gerry Pauline commented that at Pace:
> We've kept the CRSP database as permanent online SAS datasets for > at least 7 or 8 years now. In that time, we've only had a handful > of users that wanted to use FORTRAN to process the data. > > These were accomodated in two ways: 1- the raw datafiles were made > available on tapes, and 2- the SAS data was converted to raw data > format. > > As more and more users became familiar with the SAS implementation > and its advantages, requests for the raw datafiles have faded (I > probably should not have said that!).Second, the SAS permanent data set version uses far than an original CRSP file in binary format, and it even exceeds the size of an original CRSP file in ASCII format.
Note that the installation process itself requires even more diskspace temporarily, as SAS requires a large workspace for the indexing process.
I have taken some perhaps useful measurements of filesizes on the GSB-SunOS unix system. It would be interesting to see speed-of-access compared to size measurements. Here are the size measurements, anyhow, which I measured on the relatively small 1993 CRSP monthly dataset:
Installed ASCII files (which Fortran or SAS can access, slowly) NYSE monthly CRSP data: /crsp1/umx/CALENDAR.DAT: 107027 Bytes DATA1.DAT: 107270307 Bytes total 107 megNote that CRSP ASCII installations require about 3 times as much disk space as the binary installation. We can see this in the CRSP Data Specifications sheet, reproduced on page 48 of GSB Technical Report 80.
NYSE monthly CRSP data, installed as SAS data set, omitting events database (dataset indexed as SAS/ETS manual recommends, by six fields) -rw------- 1 doncram 54444032 Mar 24 17:28 nysemin3.snx01 -rw------- 1 doncram 80429056 Mar 24 17:28 nysemin3.ssd01 total 135 meg NYSE monthly CRSP data installed as SAS data set, including events database (dataset indexed as SAS/ETS manual recommends, by six fields) -rw------- 1 doncram 112173056 Mar 25 23:29 nysemev4.ssd01 -rw------- 1 doncram 54444032 Mar 25 23:29 nysemin4.snx01 -rw------- 1 doncram 80429056 Mar 25 23:29 nysemin4.ssd01 total 247 megOf course, you might choose to install just a portion of the CRSP data, say the returns data which is the most frequently utilized:
Selected NYSE monthly CRSP data, in indexed SAS permanent data sets: just the cusip & date & ret fields (indexed by cusip alone) -rwxr--r-- 1 qmao 9650176 Mar 24 15:20 nysemind.snx01 -rwxr--r-- 1 qmao 17309696 Mar 24 15:20 nysemind.ssd01 total 27 megGo to top
UPDATE: This has indeed been fixed in 6.12, as verified by a user.
SAS PROC DATASOURCE enables one to do this directly for the CRSP daily and monthly security files. See discussion of the INDEX option using PROC DATASOURCE and other approaches to index on fewer fields in my GSB Technical Report 80. In response, Gerry Pauline commented on the CRSP-L that:
> Don is right in that the main tradeoff is space for speed. Our CRSP > database (NYSE/Amex, Nasdaq daily files) and their index files oc- > cupy approximately 2100 cylinders on IBM 3380 DASD. The return is > execution speed, jobs generally execute in 60 to 90, seconds depend- > ing on what the user is doing. > > SAS Index files, as Don notes, can get very large (and this database > needs to be indexed). One way to minimize the size of the database > is to make sure that the data are ordered by the key fields BEFORE > the index is created. > > Depending on how the table files are created, this may be as easy as > specifying the SORTEDBY= dataset option. As I mentioned in a previous > note, our CRSP database methodology requires the datasets to be > in CUSIP order, which they no longer are (at least '94 wasn't). >The CRSP bond file is not accessible by PROC DATASOURCE and no programs to read this file into SAS have been shared (at least prior to 9/97).
Various CRSP files Gerry Pauline at Pace University has written a series of SAS program which will read the indices and other CRSP files (but not including the bond files), which are available by anonymous ftp at ftp.pace.edu (by ftp client only, i.e. _not_ accessible by webbrowsers as some ftp sites are using url's like "ftp://..."). The Pace University programs are written for IBM CMS and would have to be adapted for use on other operating systems. Also, as Gerry stated on the CRSP-L:
> Please note that a change will have to be made to the program since > it expects the data to be CUSIP number order. The 1994 tape files > are no longer in CUSIP order (we sorted it with Syncsort before > converting the data. If you can't sort the raw datafile outside > of SAS, the required change to the code is minor, one line at > three different locations.).He added, in email to me:
I do want to make an update to the README file. I wrote a little Syncsort procedure that will sort the raw CRSP data- files into CUSIP number order. This of course, is of use only to users that run under VM/CMS and have Syncsort (the sort code is similar for IBM's DFSORT utility).and then followed up with (6/20):
> I've made some minor updates to the CRSP readme, noting the CRSPSORT > procedure to sort the database into CUSIP order, and what to do if you > can't, or don't want to, put the database into CUSIP order.We're lucky to have such a generous contributor participating on the CRSP-L!
I believe (subject of ongoing discussion on the CRSP-L) that the CRSP Indices file (the IN product) cannot be read by SAS PROC DATASOURCE, but our list-owner H. Alan Montgomery has provided the following five SAS programs that should convert the IN data into SAS datasets: chcap.sas, chdec.sas, chreb.sas, chsbbi.sas, and chsp.sas. Alan provides little upfront explanation, but this may be enough for a SAS user who has the CRSP Indices manual. (Help me! by submitting an expanded explanation or sharing your full program to apply the installation and/or to access the resulting files.) David Louton provides an independent, alternative implementation of a SAS program to convert the CRSP Indices files: Louton's program to read CRSP indices into SAS.
This author had looked, but could find no programs which run an event study published anywhere, besides my own paper "CRSP Data Retrieval and Analysis in SAS Software: Sample Programs and Programming Tips", published in the non-peer-reviewed SAS Users Group International (SUGI) 21 Conference Proceedings, in 1996. On the CRSP-L on 6/17/96, James Conover offered, however:
There are several published event studies that report the code used. Generally, they are in the research annuals that report longer papers. Most journals don't publish the code because it appears trivial once data issues are resolved. One recent pedagogical paper that comes to mind is: John J. Glascock and Imre Karafiath, "Statistical Inference in Event Studies Using Multiple Regression" published in Research Issues in Real Estate: Alternative Ideas In Real Estate Investment, edited by Arthur L. Schwartz, Jr. and Steven D. Kapplin, published by Kluwer Academic Publishers in 1995. This paper gives a 33-line SAS program using SAS MACRO's to calculate the Brown and Warner (1985, JFE) event study statistics. This program assumes that you have yanked the returns from CRSP already, modified for missing data, etc, and put them in flat files. Dr. Karafiath is our department head here at the University of North Texas (he can be reached with electronic mail at karafiat@cobaf.unt.edu and will be joining this list soon) if you have questions about the code.Other references would be appreciated.
This author knows of no event study programs posted on the internet before 6/15/96, besides my own program cars.sas, the fourth sample application described in the GSB Technical Report 80, which has been available at a web address given in that report.
This "cars.sas" program yields Cumulative Abnormal Return (CAR) time series over event periods based on a market model, with beta for each firm calculated over a period preceding the event period. As described in the GSB report, it is easily adapted to calculate an Abnormal Price Index (as in the Ball and Brown study published in Journal Accounting Research in 1968). Returns values which are missing on the CRSP database during the beta calculation period are ignored. Returns values missing during the event period will yield missing CAR thereafter.
Kevin Harper, at University of Washington, example SAS programs including an event study program that calculates market model abnormal returns. I am hopeful that he will share the code for the SAS macros which those programs call. The date that he provided the program on the web may be as early as 11/30/94 (a "last update" date in his xamples.html page).
On 6/15/96, I added a FORTRAN program EVENT.FOR here. The latter operates on an input file EVENT.INP, calls a subroutine which I believe is part of the standard CRSP Inc. installation, and ran on Stanford GSB's now-almost-fully-retired VMS system. It was authored, I believe, by Stephen Gray when he was a finance doctoral student at Stanford GSB; he provided it in a "Routines" sample programs binder to fellow students.
The EVENT.FOR program also yields as output a CAR time series. It easily permits calculation of beta over data from both before and after the event period.
And now (6/16/96) I've also just added Terrance Jalbert's beta calculation program, in SAS, which was mentioned on CRSP-L. Terrance points out this program may be very specific to his situation. It appears to me to draw on CRSP data installed by a FORTRAN program (as it addresses the CRSP file missing value codes -66, -77, etc.) rather than by SAS PROC DATASOURCE (which recodes missing values as .P, .T etc., see p. 19 of my GSB Technical Report 80. It implements a beta calculation formula from the CRSP manual, a formula which Terrance reports was listed incorrectly in many copies of the CRSP stock file guide dated December 31, 1993, at least.
And now (6/16/96), I've also added Mark Trombley's FORTRAN event study program. Its header reports:
THIS PROGRAM CONDUCTS A TYPICAL EVENT STUDY USING DAILY RETURNS FROM CRSP TAPE. THIS PROGRAM (1) ESTIMATES MARKET MODEL USING OLS OR THE SCHOLES AND WILLIAMS [1977] METHOD; (2) CALCULATES 3 DIFFERENT ABNORMAL RETURNS METRICS: (A) UNSTANDARDIZED AR, (B) STANDARDIZED AR, AND (C) U-STATISTIC. THE ALGORITHM CLOSELY FOLLOWS PATELL [1976; JAR].It draws on the CRSP data files directly, uses subroutines crsp.include and include.crsp (are these programs in the standard CRSP Inc. installation?) and permits use of either the Value-weighted or Equally-weighted market returns indices. Mark comments:
This program was originally written by Kyung Lee to run on VM, patched by me to run on unix. However, the code is just f77, so it should compile anywhere by just changing the file definitions in the OPEN statements.
And now on 6/17/96 I am adding an event study SAS program contributed by Gerry Pauline, together with a "REXX" shell program and instructions in five text notes:overview; rexx, format setup, SAS program, and data input sample.
I summarized a discussion on CRSP-L about compounding returns, then Spencer Martin, at Wharton, provided SAS code to compound returns. Here are some incomplete further thoughts on calculations using prices, dividends, splits.
Independent evaluations of these programs, and contributions of other programs would be greatly appreciated! Let me be the first to say that my own program would be rated as somewhat inefficient in terms of runtime by SAS experts, due to its use of repeated data steps in SAS macro program loops. On the other hand, it is straightforward and understandable, and therefore economizes on one's programming time in modification and use. Alternate programming strategies in SAS might yield considerably faster runtimes; comparisons would be welcomed. Go to top
Here's one which works at Stanford GSB. It extracts data for all firms on the Compustat "research tape" and calculates market to book ratios.
/* Sample Compustat access program written by Seth Weingram */
libname sew '/d1/sew';
options nocenter replace linesize=80;
filename compstat '/d2/compustat/research/annual';
proc datasource filetype = csauc infile=compstat
out=sew.resbooks;
keep cnum year name data199 data60 data25;
rename data199=mkt data60=comeq data25=shout;
label mkt="market value";
run;
proc contents data=sew.resbooks;
run;
data _NULL_;
set sew.resbooks;
file '/d1/sew/resbooks';
mtob = mkt/(comeq/shout);
keep cic cnum date mkt comeq shout mtob;
put @1 cnum @7 cic @11 date mkt 14.5 +1
comeq 14.5 +1 shout 7.3 mtob 10.6;
run;
endsas;
Go to top
SAS PROC DATASOURCE enables one to do this directly for the COMPUSTAT annual industrial files only, not the bank file (noted by Joe Kelley on the CRSP-L). Gerry Pauline offered:
I have code that will convert the 94 Compustat files to SAS datasets on CMS. It's basically a set of Data Step routines in a macro that will handle the Industrial Annual, Industrial Quarterly, Research, Over-The-Counter, Bank Annual and Quarterly (we run SAS v6.08 TS430 on VM/ESA).As of June 17, 1996, Gerry has made this code available by anonymous ftp at ftp.pace.edu in the COMPUSTAT directory (currently by ftp client only, i.e. not accessible by webbrowsers as some ftp sites are accessible by URLs like "ftp://..."). The Pace University programs are written for the IBM CMS operating system and would have to be adapted for use on other operating systems. Go to top
In response to interest from CRSP-L readers, Gerry Pauline has made available SAS code to convert, into SAS datasets, the IMF's International Financial Statistics file (in the IFS directory of the FTP.PACE.EDU server), the I/B/E/S files (in the IBES directory) and the prices file from the Toronto Stock Exchange (in the TORONTO directory). Go to top
Gerry Pauline offers Pace University's ACCESS (code to provide a "common" interface to the online research databases used at Pace).
Citibase offers a commercial product, used by many investment banks, which handles the entire process of updating and serving financial databases. Go to top
hits since
webpage opened 6/6/96.