Stanford University GSB financial research databases

(captured from http://www.stanford.edu/class/gsb/crsp/index.html and modified to update webpage URLs only, doncram, 12/99) This webpage provides information for local use of the CRSP, Compustat, and other financial databases by Stanford University students, staff, and faculty.

Financial data files at Stanford University

The financial data files at Stanford University are located within /afs/ir/data/gsb.

        What changes in 1995 files?
                Compustat:  
                        we now have "complete" rather than "limited" files
                        Compustat added 20 vars to out dataset; some to outby
                                what are these new variables
                        
                CRSP:
                        we now have combined NYSE/AMEX/NASDAQ rather than
                                separate NYSE/AMEX vs. NASDAQ
                        1995 CRSP data is not readable from SAS 6.09

Most users will find it most convenient to access the data through SAS programs, although Fortran still has some advantages.


CRSP 1995 data

The CRSP data files are described in the 1995 versions of the CRSP Stock File Guide and the CRSP Bond File Guide on reserve at the Jackson library circulation desk. To work with CRSP data, you really have to use this documentation. Otherwise, you won't know how to interpret seemingly bizarre conventions, such as stock prices sometimes being reported as negative (In fact, the negative indicates that the reported price is the average of ask and bid prices, for a stock which was not traded on a given day). You may download your own copy of the 1994 versions off the internet, using Acrobat, from Kevin Harper's U Washington webpage, under documentation.

[NOTICE 8/98: THE CONVENIENT WEB LOOKUP OF PERMNOS HAS BEEN REMOVED DUE TO OFFICIAL CRSP REQUEST. Although it obviously makes CRSP data more valuable if it is more easily matched to other data sources, the Center for Research in Security Prices apparently seeks to protect a copyright(?) or something on the PERMNOs that is endangered by open display of permno-to-cusip correspondence, etc. It would be okay to have this feature on a password-protected webpage, apparently. doncram 8/98] Look up CUSIP or PERMNO from 1995 CRSP stock data arranged [DISABLED]:

STANFORD COMPUTER USERS MAY USE EMACS OR OTHER EDITOR ON LELAND UNIX SYSTEMS TO SEE THE TEXT DATA THAT PREVIOUSLY WAS DISPLAYED HERE. cd to /afs/ir/class/gsb/WWW/crsp/Lookup and emacs the files there.

Compustat data

The full Compustat subscription data for 1995 is now available to all Stanford researchers in the leland /afs/ir/data/gsb/compustat directories. The 1995 Compustat data files (currently located in /afs/ir/data/gsb/compustat) include:
   Name    Date range   Size [Byte]  Description
    prim-supp-tert.annual     1976-1995    274,322,360   Industrials: Primary/Supplementary/Tertiary
    full_coverage.annual     1976-1995    378,384,864   Industrials: Full Coverage
    merged_ind.annual     1976-1995    549,311,360   Industrials: Research (firms since merged, bankrupt, etc.)
    back_data_1958-1977_fc.annual     1958-1977    57,597,696   Industrials: (Back) Full Coverage
    back_data_1958-1977_pst.annual     1958-1977    91,796,328   Industrials: (Back) Primary/Supplementary/Tertiary
    canadian_us_industrial.annual     1976-1995    37,398,504   Industrials: Canadian firms (in U.S. dollars)
    aggregate.annual     1976-1995    20,932,496   Aggregate
    bank.annual     1976-1995    29,917,780   Banks
    business_info-industry.annual     1976-1995    49,671,732   Info?
    prim-supp-ter.48qtr     1984?-1995    440638801   Industrials: Primary/Supplementary/Tertiary quarterly data
    full_coverage_industrial.48qtr     1984?1995    606318956   Industrials: Full coverage, quarterly
    aggregate.48qtr     1984?1995    32617635   Aggregate ?
    bank.48qtr     1984?1995    52293426   Industrials: Primary/Supplementary/Tertiary
The file merged_ind.annual was formerly merged_industrial.annual, but was renamed on 1/28/97 to get around a Fortran filename length problem.

Apparently, the prim-supp-tert.annual is the complete industrial annual file of NYSE/AMEX companies which replaces the more limited version GSB purchased in 1994. The full_coverage.annual file covers industrials listed in the NASDAQ and regional exchanges and more. The merged_industrial.annual file is the corresponding ``Industrial Research'' file as described in the Compustat manual (Chapter 1 page 1 dated 6/94 v25) (Note this page of the Compustat manual is unchanged from 1994 to 1995 versions. Note that other pages were added or replaced by new page inserts, e.g. Chapter 5-Data Definitions page 257 dated 3/95 v25). It contains companies which have been deleted from the Primary-Supplementary-Tertiary and Full Coverage files due to bankruptcy, mergers, etc.

The canadian\_us\_industrial file is apparently the U.S. dollar-denominated version of Compustat's Canadian file, which covers major Canadian firms.

The aggregate file has information summed for 275 industry groupings, including the S\&P 500 and other S\&P and Dow Jones indices.

The bank and business information files are not readable by SAS PROC DATASOURCE.

The 1994 limited subscription data remains available for GSB researchers on gsb-crown and gsb-birr.

In addition, Jackson library has the Compustat Compact Disclosure data on CD-ROM. This is a menu-driven, friendly interface that has well served many researchers as well as general business students.

See Ellen Engel's handout entitled "Collecting data from the Compustat PC Plus CD-ROM database", which she wrote for a presentation for accounting students given 2/11/94. She also provided sample programs for accessing Compustat tape data on Lira. The CD-Rom version is available through Jackson Library via the MBA network. She recommends using the CD-Rom version for screening companies of interest (say, all companies which experienced net losses for three years running during 1980-1992). For Bank data, and for downloading some complex or large sets of data from the Industrial database, you use Fortran programs. Ellen's sample programs, in Fortran, were available on Lira in:

	userdisks:[resources.routines]compustat.demo
Peter Joos suggests that "formatted" output is preferable to "columnar" format when taking data from the Compustat CD-Rom, so that you can easily read the data into SAS, say. This permits you to take large sets of data efficiently, while the columnar data is limited to 20 columns per data set taken.

The format he recommends is like the following:

		conm y1 sal at pccf ...
		conm y2 sal at pccf ...
		conm y3 sal at pccf ...
where the number of variables collected is as many as you want (going to the right) and the number of years of data is also as many as you want, going down. Then to read this into SAS is straightforward, as all the data for one company comes at once. As each line of data output may be very long, while SAS usually expects just 250(?) characters on a line, it may be necessary to use SAS's logical record length command, for example in SAS code below to redefine length to 330 characters:
	infile filename lrecl=330 missover;
	input conm91 $ @34 sal91 at91 pccf91 ...;
	input conm92 $ @34 sal92 at92 pccf92 ...;
	company = conm92;
	drop conm91 conm92;
The "missover" tells SAS not to crash when it encounters a missing or bad value in one line, but to go on to the next line instead and take in the rest of the data, as you'd typically want.


I/B/E/S and Zacks

The I/B/E/S analyst forecast database comes with extra copyright restrictions limiting access, perhaps relating to the confidential nature of information about specific persons' forecasting ability and errors, etc.? Permission to access to the I/B/E/S analyst forecast data can be provided by Paul Reist, Research Librarian, of the GSB's Jackson Library. [5-2003; sreist@gsb.stanford.edu] Access is usually by FORTRAN programs. The new (as of May 1997) IBES data files are ASCII files stored in access-restricted afs directories:
/afs/ir/data/gsb/ibes/DETAIL
/afs/ir/data/gsb/ibes/SUMMARY

There is a current, printed version of IBES monthly summary information such as mean, median, low, high forecasts by analysts of earnings for next two fiscal years for each firm (following is from Socrates):

I/B/E/S MONTHLY SUMMARY DATA (New York, N.Y. : Lynch, Jones & Ryan)
       LOCATION: Jackson Business Periodicals (Library has NOV. 15, 1984- 
                   (INCOMPL.) (LATEST ISSUE: REFERENCE TABLE A-1; PREVIOUS 11 
                   YEARS: PERIODICAL STACKS; PRIOR YEARS IN THIRD FLOOR 
                   STACKS))

There is an equivalent(?) earnings forecast database called Zacks, to which Stanford does not subscribe(?), except in printed form: see the

ZACKS EARNINGS FORECASTER (Chicago, Ill. : Zacks Investment Research, 
       1987-). Apr. 17, 1987-
       LOCATION: Jackson Business Periodicals (Library has Apr. 17, 1987- 
                   (PERIODICAL STACKS; LATEST ISSUE: REFERENCE TABLE A-1))

Tips on dealing with huge files

Accessing these datafiles is different than other programming tasks because the files are very large. The CRSP files, for example, are easily the largest files on any computer at Stanford. Therefore programs need to be better crafted, else one runs into serious memory management and runtime problems.
  • 1. test new programs on small datasets
  • 2. in SAS, use
                                    options obs=100
                                    options nonotes
                                    qprint,qprintn,qaprint small printouts
    
  • 3. use tested library programs: in SAS: Don's SAS macros; in Fortran: ?? perhaps John Dai's programs ??
  • 4. Use the fastest computers possible: eg. gsb-kwanza
  • 5. Get access to large workspace directories, e.g. /afs/ir/data/gsb/work.

    SAS access

    Notes on SAS programs to access the 1995 Compustat data

  • Note on SAS version 6.09 on crown/birr vs. SAS 6.11 on leland. Using SAS version 6.11 is required for direct SAS PROC DATASOURCE access to 1995 crsp data. SAS 6.11 is now the default on leland machines but is not available on the old gsb-unix machines crown/birr etc, and will not be installed there.

    Usage of the local SAS macro library on GSB-unix machines such as gsb-crown and gsb-birr is not fully supported as SAS 6.11 is not available there, and several of the macros draw upon reference datasets created using SAS 6.11. Others will run okay -- for example the regression and logistic regression-related macros. Most CRSP and Compustat datasets have been converted from 6.11 format to 6.09 format for many macros in the suite to run on gsb-unix. However, that is not guaranteed. In particular, if a user encounters a message such as: ERROR: READ lock is not available for CRSP3.MRET.DATA, lock held by unknown process'' then the cause is likely to be that the data set in question (/afs/ir/data/gsb/CRSP/SAS/mret.sas in this case) is in SAS 6.09 format.

    Sample direct access programs are given in Don Cram's "SAS Macros" document.

    BUGS that new users need to know about:

    Notes on SAS programs to access the 1995 CRSP data

    BUGS that new users need to know about: I, the webpage author, have been using the '95 CRSP data in RA work, and have written a suite of SAS macros that make it easy. These now run way faster than the programs in my Technical Report 80. I believe these macros may now be effectively as fast as Fortran programs can be, and they should be far easier to use.

    For example, the macro call
            %crspmer(indata,drets,begdate,enddate,outdata); 
    
    in your program will merge in CRSP returns and corresponding market returns for each CUSIP in your dataset INDATA, for event periods from each event's BEGDATE to its ENDDATE, drawing returns data from a SAS dataset DRETS. Other macros can be used to compound or to sum the returns and corresponding market returns in OUTDATA, to set the begdate to enddate windows around event dates, to look up the next trading day from a given date, to set all the filenames and libnames for accessing CRSP and Compustat data, etc.

    Please see my draft description of the use of these macros (via netscape at http://www.stanford.edu/~doncram/sasacctg.ps which Stanford users may also print by "lpr /afs/ir/users/d/o/doncram/WWW/sasacctg.ps"). I would be happy to receive any comments. Let me know if/when you want to try them, and I will give you access to the macro code and any necessary update information.

    The CRSP-related macros rely upon SAS dataset extracts which GSB Computing's Robert Booth and Jill Fukuhara have been installing in the /afs/ir/data/gsb directories. While the installation is not fully verified, I have been using several of the files successfully and it is not too soon for you to check out how the suite of macros can be used in your work.

    Fortran access to the Compustat and CRSP data

    Fortran is faster in the reading process than SAS, although Fortran use may require more of your programming time in Fortran and then in following SAS or other programs to read the Fortran output. Here are several sample Fortran programs contributed by users: These Fortran programs may be downloaded using your webbrowser, but local users may also copy them directly from their source directory which is /afs/ir/class/gsb/WWW/crsp/Fortran/. As the format of the 1995 data is unchanged and the same across most Compustat files (i.e. all the industrial tapes), I think it should work with just the modification of the filename to an appropriate 1995 filename as tabulated in my SAS Macros documentation. Please let me know if it works for you.


    On References and Getting Help

    For links to all other known CRSP resources on the internet, see Don Cram's webpage on CRSP Data Access and Analysis.

    A very helpful general resource for users of CRSP, Compustat, IBES, and other financial data is CRSP-L email list administered by H. Alan Montgomery at Texas A&M. As of 10/96, CRSP-L has about 150 subscribers: computer administrators, finance and accounting professors and phd students. See info about the CRSP-L email list.

    For general questions on the use of SAS or Fortran, local users may seek assistance of the Statistical Applications Consultants at their office hours or by appointment or by email to their newsgroup forum su.computers.consult.statistics@news.stanford.edu .

    Questions more specific to local CRSP and Compustat access at GSB should be directed to su.school.gsb.computing@news.stanford.edu , a newsgroup help forum.

    For general questions relating to Fortran, SAS, Splus or other programs that read CRSP, Compustat, IBES, and other financial databases more broadly than at Stanford alone, consider subscribing to the CRSP-L email list administered by H. Alan Montgomery at Texas A&M. How to subscribe, and other information including a link to the CRSP-L FAQ webpage, is at the CRSP-L information webpage .

    Stanford users, especially, should first consult the GSB Technical Report "CRSP Data Retrieval and Analysis in SAS Software for Users and Administrators" by Donald P. Cram. The report itself is available from the Jackson Library reference desk, or, for users anywhere, by request from the webform at Don's CRSP Data Access and Analysis webpage . The sample programs whose use is described in that report are available at a website given in the report. Local users may also access them at /afs/ir.stanford.edu/class/gsb/WWW/crsp/TechReport80.

    For links to sites on the internet providing CRSP-related programs check Don's CRSP webpage . This includes several sites where one can download the voluminous, official, CRSP manuals. The Jackson library circulation desk also has the CRSP and Compustat manuals on reserve.


    webpage info

    Copyright 1996, all rights reserved. This page opened 3/16/96 and maintained by Don Cram .
    Please send your comments and suggestions to Don.

    Opened at Stanford, 6/6/96.