------------------------------------------------------------------------------- log: h:/1433/whales.log log type: text opened on: 18 Sep 2001, 14:05:04 . set memory 10m; (10240k) . use h:/1433/whales.dta, replace; . ***********************************************************************; . * > * WHALES.DO > * > * An introduction to stata. > * > * edited - jmwilder 9/18/01 > * > ***********************************************************************; . /* describing the data */ > > d; Contains data from h:/1433/whales.dta obs: 91 vars: 6 18 Sep 2001 12:15 size: 1,729 (100.0% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- year int %9.0g gs int %9.0g Vessels sailed to 'Greenland' gw float %9.0g Whales caught off 'Greenland' ds int %9.0g Vessels sailed to Davis Straits dw float %9.0g Whales caught in Davis Straits war byte %9.0g ------------------------------------------------------------------------------- Sorted by: . /* of note: the variable names are terrible. We'll rename them to be a > little more descriptive. */ > > rename gs GLships; . rename gw GLwhales; . rename ds DSships; . rename dw DSwhales; . /* summarizing the data */ > > summ; Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- year | 91 1715 26.41338 1670 1760 GLships | 71 126.1268 58.47806 0 246 GLwhales | 69 578.0749 454.3196 0 2071.75 DSships | 22 74.63636 50.90472 7 153 DSwhales | 21 335.119 364.7955 10 1311 war | 91 .043956 .2061331 0 1 . summ if war != 1; Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- year | 87 1716.724 25.66814 1670 1760 GLships | 67 133.6269 51.12043 31 246 GLwhales | 66 604.3511 446.9993 50 2071.75 DSships | 22 74.63636 50.90472 7 153 DSwhales | 21 335.119 364.7955 10 1311 war | 87 0 0 0 0 . bysort war: summ; _______________________________________________________________________________ -> war = 0 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- year | 87 1716.724 25.66814 1670 1760 GLships | 67 133.6269 51.12043 31 246 GLwhales | 66 604.3511 446.9993 50 2071.75 DSships | 22 74.63636 50.90472 7 153 DSwhales | 21 335.119 364.7955 10 1311 war | 87 0 0 0 0 _______________________________________________________________________________ -> war = 1 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- year | 4 1677.5 9.036961 1672 1691 GLships | 4 .5 1 0 2 GLwhales | 3 0 0 0 0 DSships | 0 DSwhales | 0 war | 4 1 0 1 1 . /* equivalently, sort war; by war: summ; */ > > /* or even better, we can introduce the tabulate command */ > > tab war; war | Freq. Percent Cum. ------------+----------------------------------- 0 | 87 95.60 95.60 1 | 4 4.40 100.00 ------------+----------------------------------- Total | 91 100.00 . tab war, missing; war | Freq. Percent Cum. ------------+----------------------------------- 0 | 87 95.60 95.60 1 | 4 4.40 100.00 ------------+----------------------------------- Total | 91 100.00 . tab year if war == 1; year | Freq. Percent Cum. ------------+----------------------------------- 1672 | 1 25.00 25.00 1673 | 1 25.00 50.00 1674 | 1 25.00 75.00 1691 | 1 25.00 100.00 ------------+----------------------------------- Total | 4 100.00 . /* important! == vs. = */ > tab war, summ(GLships); | Summary of Vessels sailed to | 'Greenland' war | Mean Std. Dev. Freq. ------------+------------------------------------ 0 | 133.62687 51.120427 67 1 | .5 1 4 ------------+------------------------------------ Total | 126.12676 58.478062 71 . /* alternatively, if we are interested in looking at the data itself. We > can use the list command (which I use only for the smallest of tasks because > it is dominated by the 'browse' command. However, output from that > command can not be sent to a logfile. */ > > list year GLships GLwhales if war == 1; year GLships GLwhales 88. 1691 2 . 89. 1674 0 0 90. 1673 0 0 91. 1672 0 0 . /* Histograms and scatterplots can also be useful. bin(20) gives the > number of bars in the histogram in the first line. The second line > gives a scatterplot. */ > > graph GLships, bin(20) saving(gph1,replace); (note: file gph1.gph not found) . graph GLships GLwhales, saving(gph2, replace); (note: file gph2.gph not found) . ************************ Empirical Analysis ***************************; . /* Having taken a look at the data, I turn to the questions I wanted to > ask in the first place: > > 1. Does technology improve over time? > 2. What are the returns to scale? Does sending an additional ship to a > fishing area crowd the others and reduce their yields? > 3. Do fishermen respond with particular zeal to a good year by sending > out a large fleet of ships? */ > > /* To start to get at the first question, I want to generate the average > yield per ship in each year. */ > > gen GLw_s = GLwhales/GLships; (25 missing values generated) . gen DSw_s = DSwhales/DSships; (70 missing values generated) . /* now I look at how the two series trend over time */ > > graph GLw_s DSw_s year, connect(ll) saving(gph3,replace); (note: file gph3.gph not found) . /* that doesn't look good because STATA had no idea that the series was > naturally ordered along the x axis. we'll need to sort first. */ > > sort year; . graph GLw_s DSw_s year, connect(ll) saving(gph4,replace); (note: file gph4.gph not found) . /* there doesn't seem to be much evidence of this. But let's do some > regressions just to get a feel for how to implement the procedures. > I'll focus my attention on Greenland. */ > > /* Let's say I want to fit a quadratic, then I'll need a linear trend > and a squared term. I'll create each. */ > > sort year; . gen trend = _n; . gen trend2 = trend*trend; . /* this wasn't exactly necessary because we already have a linear time > trend with the variable year. But I wanted to use it to introduce > explicit subscripting via '_n' because it is among the most useful tools > in STATA. Using it creatively can avoid looping as one might in C (this > is much faster.) > > '_n' represents the observation number. This is not something intrinsic > to a particular observation but depends on how the data are sorted. _N > gives the total number of observations. Use of the 'by' command resets > _n for each category _and_ gives each category its own _N. Assume we > have hospital blood pressure data for individual patients over time. > For each patient id at time t, I observe the patient's blood pressure. > > To calculate the total observations per patient: > > sort id date; > by id: gen readings = _N; > > To calculate the number of each reading for a particular patient: > > sort id date; > by id: gen number = _n; > > But there is much more that can be done with this subscripting. We'll > see some examples in a second. For now, I return to the regression I > suggested above. Note that STATA recognizes that 'war' does not vary > for the observations in the data (missing values are dropped) and drops > it owin to collinearity with the constant term. When STATA drops > variables always make sure you know why, because it is a good sign in > general that you made a mistake in constructing them. */ > > regress GLw_s trend trend2 war; Source | SS df MS Number of obs = 66 -------------+------------------------------ F( 2, 63) = 3.10 Model | 44.3460331 2 22.1730166 Prob > F = 0.0520 Residual | 450.530626 63 7.15127978 R-squared = 0.0896 -------------+------------------------------ Adj R-squared = 0.0607 Total | 494.876659 65 7.61348707 Root MSE = 2.6742 ------------------------------------------------------------------------------ GLw_s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- trend | -.0799027 .0533756 -1.50 0.139 -.1865653 .0267598 trend2 | .0005466 .000543 1.01 0.318 -.0005384 .0016316 war | (dropped) _cons | 6.4577 1.033434 6.25 0.000 4.392548 8.522853 ------------------------------------------------------------------------------ . /* is the quadratic significant? */ > > test trend trend2; ( 1) trend = 0.0 ( 2) trend2 = 0.0 F( 2, 63) = 3.10 Prob > F = 0.0520 . /* So there is very little evidence of my technology theory. Perhaps > the whale population is being depleted at the same time, and we are > unable to see any effect. Now I ask the question regarding returns to > scale: do additional ships tend to crowd at production? */ > > regress GLw_s GLships; Source | SS df MS Number of obs = 66 -------------+------------------------------ F( 1, 64) = 0.53 Model | 4.04939691 1 4.04939691 Prob > F = 0.4701 Residual | 490.827262 64 7.66917598 R-squared = 0.0082 -------------+------------------------------ Adj R-squared = -0.0073 Total | 494.876659 65 7.61348707 Root MSE = 2.7693 ------------------------------------------------------------------------------ GLw_s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- GLships | .0048497 .0066741 0.73 0.470 -.0084834 .0181828 _cons | 3.789967 .9531268 3.98 0.000 1.885877 5.694057 ------------------------------------------------------------------------------ . /* Very little evidence of this. The number of ships sent doesn't > decrease yields. Now I turn to the final question: do fishermen seem > to respond to big yields in the prior year when deciding how many > ships to send out in the current year? To answer this, we want to > regress GLships at time t on GLw_s at time t-1. We can create such lags > by virtue of the explicit subscripting introduced above. I control for > the number of ships sent the previous year. */ > > sort year; . gen lagGLw_s = GLw_s[_n-1]; (26 missing values generated) . gen lagGLships = GLships[_n-1]; (21 missing values generated) . regress GLships GLw_s lagGLships trend trend2; Source | SS df MS Number of obs = 62 -------------+------------------------------ F( 4, 57) = 42.30 Model | 121196.301 4 30299.0752 Prob > F = 0.0000 Residual | 40827.1832 57 716.266371 R-squared = 0.7480 -------------+------------------------------ Adj R-squared = 0.7303 Total | 162023.484 61 2656.12269 Root MSE = 26.763 ------------------------------------------------------------------------------ GLships | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- GLw_s | .3517397 1.284938 0.27 0.785 -2.221304 2.924783 lagGLships | .7411841 .0646971 11.46 0.000 .6116304 .8707377 trend | -1.029278 .581699 -1.77 0.082 -2.19411 .135554 trend2 | .007384 .0057684 1.28 0.206 -.0041669 .018935 _cons | 62.02602 17.75177 3.49 0.001 26.47871 97.57333 ------------------------------------------------------------------------------ . /* the R2 went up, but still nothing. Note that a regression on the > change in the number of ships would be equivalent to constraining the > coefficient of lagged ships to equal one. */ > > /* I might need the predicted values from the above regressions or the > residuals. The respective commands to construct these variables are: */ > > predict GLs_hat; (option xb assumed; fitted values) (29 missing values generated) . predict GLs_resid, resid; (29 missing values generated) . ***********************************************************************; . set more 0; . *save h:/14.33/whalenew, replace; . log close; log: h:/1433/whales.log log type: text closed on: 18 Sep 2001, 14:05:04 -------------------------------------------------------------------------------