-------------------------------------------------------------------------------
       log:  h:/1433/whales.log
  log type:  text
 opened on:  18 Sep 2001, 14:05:04

. set memory 10m;
(10240k)

. use h:/1433/whales.dta, replace;

. ***********************************************************************;
. *
> * WHALES.DO
> *
> * An introduction to stata.
> *
> * edited - jmwilder 9/18/01
> *
> ***********************************************************************;
. /* describing the data */
> 
> d;

Contains data from h:/1433/whales.dta
  obs:            91                          
 vars:             6                          18 Sep 2001 12:15
 size:         1,729 (100.0% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
year            int    %9.0g                  
gs              int    %9.0g                  Vessels sailed to 'Greenland'
gw              float  %9.0g                  Whales caught off 'Greenland'
ds              int    %9.0g                  Vessels sailed to Davis Straits
dw              float  %9.0g                  Whales caught in Davis Straits
war             byte   %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  

. /* of note: the variable names are terrible.  We'll rename them to be a 
> little more descriptive.  */
> 
> rename gs GLships;

. rename gw GLwhales;

. rename ds DSships;

. rename dw DSwhales;

. /* summarizing the data */
> 
> summ;

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
        year |      91        1715   26.41338       1670       1760
     GLships |      71    126.1268   58.47806          0        246
    GLwhales |      69    578.0749   454.3196          0    2071.75
     DSships |      22    74.63636   50.90472          7        153
    DSwhales |      21     335.119   364.7955         10       1311
         war |      91     .043956   .2061331          0          1

. summ if war != 1;

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
        year |      87    1716.724   25.66814       1670       1760
     GLships |      67    133.6269   51.12043         31        246
    GLwhales |      66    604.3511   446.9993         50    2071.75
     DSships |      22    74.63636   50.90472          7        153
    DSwhales |      21     335.119   364.7955         10       1311
         war |      87           0          0          0          0

. bysort war: summ;

_______________________________________________________________________________
-> war = 0

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
        year |      87    1716.724   25.66814       1670       1760
     GLships |      67    133.6269   51.12043         31        246
    GLwhales |      66    604.3511   446.9993         50    2071.75
     DSships |      22    74.63636   50.90472          7        153
    DSwhales |      21     335.119   364.7955         10       1311
         war |      87           0          0          0          0

_______________________________________________________________________________
-> war = 1

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
        year |       4      1677.5   9.036961       1672       1691
     GLships |       4          .5          1          0          2
    GLwhales |       3           0          0          0          0
     DSships |       0
    DSwhales |       0
         war |       4           1          0          1          1


.  /* equivalently, sort war; by war: summ; */
> 
> /* or even better, we can introduce the tabulate command */
> 
> tab war;

        war |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         87       95.60       95.60
          1 |          4        4.40      100.00
------------+-----------------------------------
      Total |         91      100.00

. tab war, missing;

        war |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         87       95.60       95.60
          1 |          4        4.40      100.00
------------+-----------------------------------
      Total |         91      100.00

. tab year if war == 1;

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       1672 |          1       25.00       25.00
       1673 |          1       25.00       50.00
       1674 |          1       25.00       75.00
       1691 |          1       25.00      100.00
------------+-----------------------------------
      Total |          4      100.00

.  /* important! == vs. = */
> tab war, summ(GLships);

            |    Summary of Vessels sailed to
            |             'Greenland'
        war |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          0 |   133.62687   51.120427          67
          1 |          .5           1           4
------------+------------------------------------
      Total |   126.12676   58.478062          71

. /* alternatively, if we are interested in looking at the data itself.  We 
> can use the list command (which I use only for the smallest of tasks because 
> it is dominated by the 'browse' command.  However, output from that 
> command can not be sent to a logfile. */
> 
> list year GLships GLwhales if war == 1;

          year    GLships   GLwhales 
 88.      1691          2          .  
 89.      1674          0          0  
 90.      1673          0          0  
 91.      1672          0          0  

. /* Histograms and scatterplots can also be useful.  bin(20) gives the 
> number of bars in the histogram in the first line.  The second line 
> gives a scatterplot. */
> 
> graph GLships, bin(20) saving(gph1,replace);
(note: file gph1.gph not found)

.  graph GLships GLwhales, saving(gph2, replace);
(note: file gph2.gph not found)

. ************************ Empirical Analysis ***************************;
. /* Having taken a look at the data, I turn to the questions I wanted to 
> ask in the first place:
> 
> 1. Does technology improve over time?
> 2. What are the returns to scale?  Does sending an additional ship to a 
> fishing area crowd the others and reduce their yields?
> 3. Do fishermen respond with particular zeal to a good year by sending 
> out a large fleet of ships? */
> 
> /* To start to get at the first question, I want to generate the average 
> yield per ship in each year. */
> 
> gen GLw_s = GLwhales/GLships;
(25 missing values generated)

. gen DSw_s = DSwhales/DSships;
(70 missing values generated)

. /* now I look at how the two series trend over time */
> 
> graph GLw_s DSw_s year, connect(ll) saving(gph3,replace);
(note: file gph3.gph not found)

. /* that doesn't look good because STATA had no idea that the series was 
> naturally ordered along the x axis.  we'll need to sort first. */
> 
> sort year;

. graph GLw_s DSw_s year, connect(ll) saving(gph4,replace);
(note: file gph4.gph not found)

. /* there doesn't seem to be much evidence of this.  But let's do some 
> regressions just to get a feel for how to implement the procedures.  
> I'll focus my attention on Greenland. */
> 
> /* Let's say I want to fit a quadratic, then I'll need a linear trend 
> and a squared term.  I'll create each. */
> 
> sort year;

. gen trend = _n;

. gen trend2 = trend*trend;

. /* this wasn't exactly necessary because we already have a linear time 
> trend with the variable year.  But I wanted to use it to introduce 
> explicit subscripting via '_n' because it is among the most useful tools 
> in STATA.  Using it creatively can avoid looping as one might in C (this 
> is much faster.)  
> 
> '_n' represents the observation number.  This is not something intrinsic 
> to a particular observation but depends on how the data are sorted.  _N 
> gives the total number of observations.  Use of the 'by' command resets 
> _n for each category _and_ gives each category its own _N.  Assume we 
> have hospital blood pressure data for individual patients over time.  
> For each patient id at time t, I observe the patient's blood pressure.
> 
> To calculate the total observations per patient:
> 
> sort id date;
> by id: gen readings = _N;
> 
> To calculate the number of each reading for a particular patient:
> 
> sort id date;
> by id: gen number = _n;
> 
> But there is much more that can be done with this subscripting.  We'll 
> see some examples in a second.  For now, I return to the regression I 
> suggested above.  Note that STATA recognizes that 'war' does not vary 
> for the observations in the data (missing values are dropped) and drops 
> it owin to collinearity with the constant term.  When STATA drops 
> variables always make sure you know why, because it is a good sign in 
> general that you made a mistake in constructing them. */
> 
> regress GLw_s trend trend2 war;

      Source |       SS       df       MS              Number of obs =      66
-------------+------------------------------           F(  2,    63) =    3.10
       Model |  44.3460331     2  22.1730166           Prob > F      =  0.0520
    Residual |  450.530626    63  7.15127978           R-squared     =  0.0896
-------------+------------------------------           Adj R-squared =  0.0607
       Total |  494.876659    65  7.61348707           Root MSE      =  2.6742

------------------------------------------------------------------------------
       GLw_s |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trend |  -.0799027   .0533756    -1.50   0.139    -.1865653    .0267598
      trend2 |   .0005466    .000543     1.01   0.318    -.0005384    .0016316
         war |  (dropped)
       _cons |     6.4577   1.033434     6.25   0.000     4.392548    8.522853
------------------------------------------------------------------------------

. /* is the quadratic significant? */
> 
> test trend trend2;

 ( 1)  trend = 0.0
 ( 2)  trend2 = 0.0

       F(  2,    63) =    3.10
            Prob > F =    0.0520


. /* So there is very little evidence of my technology theory.  Perhaps 
> the whale population is being depleted at the same time, and we are 
> unable to see any effect.  Now I ask the question regarding returns to 
> scale: do additional ships tend to crowd at production?  */
> 
> regress GLw_s GLships;

      Source |       SS       df       MS              Number of obs =      66
-------------+------------------------------           F(  1,    64) =    0.53
       Model |  4.04939691     1  4.04939691           Prob > F      =  0.4701
    Residual |  490.827262    64  7.66917598           R-squared     =  0.0082
-------------+------------------------------           Adj R-squared = -0.0073
       Total |  494.876659    65  7.61348707           Root MSE      =  2.7693

------------------------------------------------------------------------------
       GLw_s |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     GLships |   .0048497   .0066741     0.73   0.470    -.0084834    .0181828
       _cons |   3.789967   .9531268     3.98   0.000     1.885877    5.694057
------------------------------------------------------------------------------

. /* Very little evidence of this.  The number of ships sent doesn't 
> decrease yields.  Now I turn to the final question: do fishermen seem 
> to respond to big yields in the prior year when deciding how many 
> ships to send out in the current year?  To answer this, we want to 
> regress GLships at time t on GLw_s at time t-1.  We can create such lags
> by virtue of the explicit subscripting introduced above.  I control for 
> the number of ships sent the previous year. */
> 
> sort year;

. gen lagGLw_s = GLw_s[_n-1];
(26 missing values generated)

. gen lagGLships = GLships[_n-1];
(21 missing values generated)

. regress GLships GLw_s lagGLships trend trend2;

      Source |       SS       df       MS              Number of obs =      62
-------------+------------------------------           F(  4,    57) =   42.30
       Model |  121196.301     4  30299.0752           Prob > F      =  0.0000
    Residual |  40827.1832    57  716.266371           R-squared     =  0.7480
-------------+------------------------------           Adj R-squared =  0.7303
       Total |  162023.484    61  2656.12269           Root MSE      =  26.763

------------------------------------------------------------------------------
     GLships |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       GLw_s |   .3517397   1.284938     0.27   0.785    -2.221304    2.924783
  lagGLships |   .7411841   .0646971    11.46   0.000     .6116304    .8707377
       trend |  -1.029278    .581699    -1.77   0.082     -2.19411     .135554
      trend2 |    .007384   .0057684     1.28   0.206    -.0041669     .018935
       _cons |   62.02602   17.75177     3.49   0.001     26.47871    97.57333
------------------------------------------------------------------------------

. /* the R2 went up, but still nothing.  Note that a regression on the 
> change in the number of ships would be equivalent to constraining the 
> coefficient of lagged ships to equal one. */
> 
> /* I might need the predicted values from the above regressions or the 
> residuals.  The respective commands to construct these variables are: */
> 
> predict GLs_hat;
(option xb assumed; fitted values)
(29 missing values generated)

. predict GLs_resid, resid;
(29 missing values generated)

. ***********************************************************************;
. set more 0;

. *save h:/14.33/whalenew, replace;
. log close;
       log:  h:/1433/whales.log
  log type:  text
 closed on:  18 Sep 2001, 14:05:04
-------------------------------------------------------------------------------