β hat是什么分布 计量经济学模型

Uninitialized string offset: 2 in /home/wwwroot//read.php on line 81
Wooldridge计量经济学答案
PREFACE SUGGESTED COURSE OUTLINES Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 The Nature of Econometrics and Economic Data The Simple Regression Model Multiple Regression Analysis: Estimation Multiple Regression Analysis: Inference Multiple Regression Analysis: OLS Asymptotics Multiple Regression Analysis: Further Issues Multiple Regression Analysis With Qualitative Information: Binary (or Dummy) Variables Heteroskedasticity More on Specification and Data Problems Basic Regression Analysis With Time Series Data Further Issues in Using OLS With Time Series Data Serial Correlation and Heteroskedasticity in Time Series Regressions Pooling Cross Sections Across Time. Simple Panel Data Methods Advanced Panel Data Methods Instrumental Variables Estimation and Two Stage Least Squares Simultaneous Equations Models Limited Dependent Variable Models and Sample Selection Corrections iii iv 1 5 15 28 39 44 59
Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12
75 86 95 106 117
Chapter 13
Chapter 14 Chapter 15
Chapter 16 Chapter 17
Chapter 18 Chapter 19 Appendix A Appendix B Appendix C Appendix D Appendix E
Advanced Time Series Topics Carrying Out an Empirical Project Basic Mathematical Tools Fundamentals of Probability Fundamentals of Mathematical Statistics Summary of Matrix Algebra The Linear Regression Model in Matrix Form
200 215 216 218 220 224 226
This manual contains suggested course outlines, teaching notes, and detailed solutions to all of the problems and computer exercises in Introductory Econometrics: A Modern Approach, 2nd edition. For several problems I have added additional notes to the instructor about interesting asides or suggestions for how to modify or extend the problem. Some of the answers given here are subjective, and you may want to supplement or replace them with your own answers. I wrote all solutions as if I were preparing them for the students, so you may find some solutions a bit tedious. This way, if you prefer, you can distribute my answers to some of the even-numbered problems directly to the students. (The student study guide contains answers to all odd-numbered problems.) The solutions to the computer exercises were obtained using Stata, starting with version 4.0 and running through version 7.0. Nevertheless, almost all of the estimation methods covered in the text have been standardized, and different econometrics or statistical packages should give the same answers. There can be differences when applying more advanced techniques, as conventions sometimes differ on how to choose or estimate auxiliary parameters. (Examples include heteroskedasticity-robust standard errors, estimates of a random effects model, and corrections for sample selection bias.) While I have endeavored to make the solutions mistake-free, some errors may have crept in. I would appreciate hearing from you if you find mistakes. I will keep a list of any substantive errors on the Web site for the book, . I heard from many of you regarding the first edition of the text, and I incorporated many of your suggestions. I welcome any comments that will help me make improvements to future editions. I can be reached via email at wooldri1@.msu.edu. I hope you find this instructor’s manual useful, and I look forward to hearing your reactions to the second edition. Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI
SUGGESTED COURSE OUTLINES
For an introductory, one-semester course, I like to cover most of the material in Chapters 1 through 8 and Chapters 10 through 12, as well as parts of Chapter 9 (but mostly through examples). I do not typically cover all sections or subsections within each chapter. Under the chapter headings listed below, I provide some comments on the material I find most relevant for a first-semester course. An alternative course ignores time series applications altogether, while delving into some of the more advanced methods that are particularly useful for policy analysis. This would consist of Chapters 1 through 8, much of Chapter 9, and the first four sections of Chapter 13. Chapter 9 discusses the important practical topics of proxy variables, measurement error, outlying observations, and stratified sampling. In addition, I have written a more careful description of the method of least absolute deviations, including a discussion of its strengths and weaknesses. Chapter 13 covers, in a straightforward fashion, methods for pooled cross sections (including the so-called “natural experiment” approach) and two-period panel data analysis. The basic crosssectional treatment of instrumental variables in Chapter 15 is a natural topic for cross-sectional, policy-oriented courses. For an accelerated course, the nonlinear methods used for crosssectional analysis in Chapter 17 can be covered. I typically do not begin with a review of basic algebra, probability, and statistics. In my experience, this takes too long and the payoff is minimal. (Students tend to think that they are taking another statistics course, and start to drift.) Instead, when I need a tool (such as the summation or expectations operator), I briefly review the necessary definitions and key properties. Statistical inference is not more difficult to describe in terms of multiple regression than in tests of a population mean, and so I briefly review the principles of statistical inference during multiple regression analysis. Appendices A, B, and C are fairly extensive. When I cover asymptotic properties of OLS, I provide a brief discussion of the main definitions and limit theorems. If students need more than the brief review provided in class, I point them to the appendices. For a master’s level course, I include a couple of lectures on the matrix approach to linear regression. This could be integrated into Chapters 3 and 4 or covered after Chapter 4. Again, I do not summarize matrix algebra before proceeding. Instead, the material in Appendix D can be reviewed as it is needed in covering Appendix E. A second semester course, at either the undergraduate or masters level, could begin with some of the material in Chapter 9, particularly with the issues of proxy variables and measurement error. The advanced chapters, starting with Chapter 13, are useful for students who have an interest in policy analysis. The pooled cross section and panel data chapters (Chapters 13 and 14) emphasize how these data sets can be used, in conjunction with econometric methods, for policy evaluation. Chapter 15, which introduces the method of instrumental variables, is also important for policy analysis. Most modern IV applications are used to address the problems of omitted variables (unobserved heterogeneity) or measurement error. I have intentionally separated out the conceptually more difficult topic of simultaneous equations models in Chapter 16.
Chapter 17, in particular the material on probit, logit, Tobit, and Poisson regression models, is a good introduction to nonlinear econometric methods. Specialized courses that emphasize applications in labor economics can use the material on sample selection corrections. Duration models are also briefly covered as an example of a censored regression model. Chapter 18 is much different from the other advanced chapters, as it focuses on more advanced or recent developments in time series econometrics. Combined with some of the more advanced topics in Chapter 12, it can serve as the basis for a second semester course in time series topics, including forecasting. Most second semester courses would include an assignment to write an original empirical paper, and Chapter 19 should be helpful in this regard.
TEACHING NOTES You have substantial latitude about what to emphasize in Chapter 1. I find it useful to talk about the economics of crime example (Example 1.1) and the wage example (Example 1.2) so that students see, at the outset, that econometrics is linked to economic reasoning, even if the economics is not complicated theory. I like to familiarize students with the important data structures that empirical economists use, focusing primarily on cross-sectional and time series data sets, as these are what I cover in a first-semester course. It is probably a good idea to mention the growing importance of data sets that have both a cross-sectional and time dimension. I spend almost an entire lecture talking about the problems inherent in drawing causal inferences in the social sciences. I do this mostly through the agricultural yield, return to education, and crime examples. These examples also contrast experimental and nonexperimental (observational) data. Students studying business and finance tend to find the term structure of interest rates example more relevant, although the issue there is testing the implication of a simple theory, as opposed to inferring causality. I have found that spending time talking about these examples, in place of a formal review of probability and statistics, is more successful (and more enjoyable for the students and me).
SOLUTIONS TO PROBLEMS 1.1 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each student is assigned a different class size without regard to any student characteristics such as ability and family background. For reasons we will see in Chapter 2, we would like substantial variation in class sizes (subject, of course, to ethical considerations and resource constraints). (ii) A negative correlation means that larger class size is associated with lower performance. We might find a negative correlation because larger class size actually hurts performance. However, with observational data, there are other reasons we might find a negative relationship. For example, children from more affluent families might be more likely to attend schools with smaller class sizes, and affluent children generally score better on standardized tests. Another possibility is that, within a school, a principal might assign the better students to smaller classes. Or, some parents might insist their children are in the smaller classes, and these same parents tend to be more involved in their children’s education. (iii) Given the potential for confounding factors C some of which are listed in (ii) C finding a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance. Some way of controlling for the confounding factors is needed, and this is the subject of multiple regression analysis. 1.2 (i) Here is one way to pose the question: If two firms, say A and B, are identical in all respects except that firm A supplies job training one hour per worker more than firm B, by how much would firm A’s output differ from firm B’s? (ii) Firms are likely to choose job training depending on the characteristics of workers. Some observed characteristics are years of schooling, years in the workforce, and experience in a particular job. Firms might even discriminate based on age, gender, or race. Perhaps firms choose to offer training to more or less able workers, where “ability” might be difficult to quantify but where a manager has some idea about the relative abilities of different employees. Moreover, different kinds of workers might be attracted to firms that offer more job training on average, and this might not be evident to employers. (iii) The amount of capital and technology available to workers would also affect output. So, two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology. The quality of managers would also have an effect. (iv) No, unless the amount of training is randomly assigned. The many factors listed in parts (ii) and (iii) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity. 1.3 It does not make sense to pose the question in terms of causality. Economists would assume that students choose a mix of studying and working (and other activities, such as attending class, leisure, and sleeping) based on rational behavior, such as maximizing utility subject to the constraint that there are only 168 hours in a week. We can then use statistical methods to
measure the association between studying and working, including regression analysis that we cover starting in Chapter 2. But we would not be claiming that one variable “causes” the other. They are both choice variables of the student. SOLUTIONS TO COMPUTER EXERCISES C1.1 (i) The average of educ is about 12.6 years. There are two people reporting zero years of education, and 19 people reporting 18 years of education. (ii) The average of wage is about $5.90, which seems low in 2005. (iii) Using Table B-60 in the 2004 Economic Report of the President, the CPI was 56.9 in 1976 and 184.0 in 2003. (iv) To convert 1976 dollars into 2003 dollars, we use the ratio of the CPIs, which is 184 / 56.9 ≈ 3.23 . Therefore, the average hourly wage in 2003 dollars is roughly 3.23($5.90) ≈ $19.06 , which is a reasonable figure. (v) The sample contains 252 women (the number of observations with female = 1) and 274 men.
C1.2 (i) There are 1,388 observations in the sample. Tabulating the variable cigs shows that 212 women have cigs & 0. (ii) The average of cigs is about 2.09, but this includes the 1,176 women who did not smoke. Reporting just the average masks the fact that almost 85 percent of the women did not smoke. It makes more sense to say that the “typical” woman does not sm indeed, the median number of cigarettes smoked is zero. (iii) The average of cigs over the women with cigs & 0 is about 13.7. Of course this is much higher than the average over the entire sample because we are excluding 1,176 zeros. (iv) The average of fatheduc is about 13.2. There are 196 observations with a missing value for fatheduc, and those observations are necessarily excluded in computing the average. (v) The average and standard deviation of faminc are about 29.027 and 18.739, respectively, but faminc is measured in thousands of dollars. So, in dollars, the average and standard deviation are $29,027 and $18,739. C1.3 (i) The largest is 100, the smallest is 0. (ii) 38 out of 1,823, or about 2.1 percent of the sample. (iii) 17
(iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least in 2001, the reading test was harder to pass. (v) The sample correlation between math4 and read4 is about .843, which is a very high degree of (linear) association. Not surprisingly, schools that have high pass rates on one test have a strong tendency to have high pass rates on the other test. (vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the maximum is $11,957.64.]
TEACHING NOTES This is the chapter where I expect students to follow most, if not all, of the algebraic derivations. In class I like to derive at least the unbiasedness of the OLS slope coefficient, and usually I derive the variance. At a minimum, I talk about the factors affecting the variance. To simplify the notation, after I emphasize the assumptions in the population model, and assume random sampling, I just condition on the values of the explanatory variables in the sample. Technically, this is justified by random sampling because, for example, E(ui|x1,x2,…,xn) = E(ui|xi) by independent sampling. I find that students are able to focus on the key assumption SLR.4 and subsequently take my word about how conditioning on the independent variables in the sample is harmless. (If you prefer, the appendix to Chapter 3 does the conditioning argument carefully.) Because statistical inference is no more difficult in multiple regression than in simple regression, I postpone inference until Chapter 4. (This reduces redundancy and allows you to focus on the interpretive differences between simple and multiple regression.) You might notice how, compared with most other texts, I use relatively few assumptions to derive the unbiasedness of the OLS slope estimator, followed by the formula for its variance. This is because I do not introduce redundant or unnecessary assumptions. For example, once SLR.4 is assumed, nothing further about the relationship between u and x is needed to obtain the unbiasedness of OLS under random sampling.
SOLUTIONS TO PROBLEMS 2.1 (i) Income, age, and family background (such as number of siblings) are just a few possibilities. It seems that each of these could be correlated with years of education. (Income and education are probably p age and education may be negatively correlated because women in more recent cohorts have, on average, and number of siblings and education are probably negatively correlated.) (ii) Not if the factors we listed in part (i) are correlated with educ. Because we would like to hold these factors fixed, they are part of the error term. But if u is correlated with educ then E(u|educ) ≠ 0, and so SLR.4 fails. 2.2 In the equation y = β0 + β1x + u, add and subtract α0 from the right hand side to get y = (α0 + β0) + β1x + (u ? α0). Call the new error e = u ? α0, so that E(e) = 0. The new intercept is α0 + β0, but the slope is still β1. 2.3 (i) Let yi = GPAi, xi = ACTi, and n = 8. Then x = 25.875, y = 3.2125, ∑ (xi C x )(yi C y ) =
? 5.8125, and ∑ (xi C x )2 = 56.875. From equation (2.9), we obtain the slope as β1 =
? 5. ≈ .1022, rounded to four places after the decimal. From (2.17), β 0 = y C ? β x ≈ 3.2125 C (. ≈ .5681. So we can write
GPA = .5681 + .1022 ACT n = 8. The intercept does not have a useful interpretation because ACT is not close to zero for the population of interest. If ACT is 5 points higher, GPA increases by .1022(5) = .511. (ii) The fitted values and residuals ― rounded to four decimal places ― are given along with the observation number i and GPA in the following table:
i 1 2 3 4 5 6 7 8
GPA 2.8 3.4 3.0 3.5 3.6 3.0 2.7 3.7
? u GPA 2. 3. 3.2253 C.5 .9 .1 C.1 C.1 .0659
You can verify that the residuals, as reported in the table, sum to ?.0002, which is pretty close to zero given the inherent rounding error. 6
(iii) When ACT = 20, GPA = .5681 + .1022(20) ≈ 2.61. (iv) The sum of squared residuals, and the total sum of squares, regression is R2 = 1 C SSR/SST ≈ 1 C (.8) ≈ .577. Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of students. 2.4 (i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20, bwght = 109.49. This is about an 8.6% drop. (ii) Not necessarily. There are many other factors that can affect birth weight, particularly overall health of the mother and quality of prenatal care. These could be correlated with cigarette smoking during birth. Also, something such as caffeine consumption can affect birth weight, and might also be correlated with cigarette smoking. (iii) If we want a predicted bwght of 125, then cigs = (125 C 119.77)/( C.524) ≈ C10.18, or about C10 cigarettes! This is nonsense, of course, and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable. The largest predicted birth weight is necessarily 119.77. Yet almost 700 of the births in the sample had a birth weight higher than 119.77. (iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because we are using only cigs to explain birth weight, we have only one predicted birth weight at cigs = 0. The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0, and so we will under predict high birth rates. 2.5 (i) The intercept implies that when inc = 0, cons is predicted to be negative $124.84. This, of course, cannot be true, and reflects that fact that this consumption function might be a poor predictor of consumption at very low-income levels. On the other hand, on an annual basis, $124.84 is not so far from zero. (ii) Just plug 30,000 into the equation: cons = C124.84 + .853(30,000) = 25,465.16 dollars. (iii) The MPC and the APC are shown in the following graph. Even though the intercept is negative, the smallest APC in the sample is positive. The graph starts at an annual income level of $1,000 (in 1970 dollars).
i =1 n 2 i
, is about .4347 (rounded to four decimal places),
y )2, is about 1.0288. So the R-squared from the
.728 .7 1000
2.6 (i) Yes. If living closer to an incinerator depresses housing prices, then being farther away increases housing prices. (ii) If the city chose to locate the incinerator in an area away from more expensive neighborhoods, then log(dist) is positively correlated with housing quality. This would violate SLR.4, and OLS estimation is biased. (iii) Size of the house, number of bathrooms, size of the lot, age of the home, and quality of the neighborhood (including school quality), are just a handful of factors. As mentioned in part (ii), these could certainly be correlated with dist [and log(dist)]. 2.7 (i) When we condition on inc in computing an expectation, E(u|inc) = E( inc ? e|inc) = inc ? E(e|inc) = inc becomes a constant. So
inc ? 0 because E(e|inc) = E(e) = 0. inc becomes a constant. So
(ii) Again, when we condition on inc in computing a variance,
Var(u|inc) = Var( inc ? e|inc) = ( inc )2Var(e|inc) = σ e2 inc because Var(e|inc) = σ e2 . (iii) Families with low incomes do not have much discr typically, a low-income family must spend on food, clothing, housing, and other necessities. Higher income people have more discretion, and some might choose more consumption while others more saving. This discretion suggests wider variability in saving among higher income families.
2.8 (i) From equation (2.66),
% β1 = ? ∑ xi yi ?
? i =1 ? Plugging in yi = β0 + β1xi + ui gives
/ ? ∑ xi2 ? . ? ?
% β1 = ? ∑ xi ( β 0 + β1 xi + ui ) ? / ? ∑ xi2 ? .
? i =1 ? ? i =1 ? After standard algebra, the numerator can be written as
β 0 ∑ xi +β1 ∑ x 2 + ∑ xi ui .
% Putting this over the denominator shows we can write β1 as % β1 = β0 ? ∑ xi ? / ? ∑ xi2 ? + β1 + ? ∑ xi ui ? / ? ∑ xi2 ? .
? i =1 ? ? i =1 ? ? i =1 ? ? i =1 ? Conditional on the xi, we have ? n ? % E( β1 ) = β0 ? ∑ xi ? / ? i =1 ? % because E(ui) = 0 for all i. Therefore, the bias in β1 ? n 2? ? ∑ xi ? + β1 ? i =1 ? is given by the first term in this equation. ?
This bias is obviously zero when β0 = 0. It is also zero when
= 0, which is the same as
x = 0. In the latter case, regression through the origin is identical to regression with an intercept. % (ii) From the last expression for β1 in part (i) we have, conditional on the xi,
? n ? ? ? n ? n ? ? n ? % Var( β1 ) = ? ∑ xi2 ? Var ? ∑ xi ui ? = ? ∑ xi2 ? ? ∑ xi2 Var(ui ) ? ? i =1 ? ? i =1 ? ? i =1 ? ? i =1 ?
n ? n ? ? ? ? n ? = ? ∑ xi2 ? ? σ 2 ∑ xi2 ? = σ 2 / ? ∑ xi2 ? . ? i =1 ? ? i =1 ? ? i =1 ?
? n ? ? (iii) From (2.57), Var( β1 ) = σ2/ ? ∑ ( xi ? x ) 2 ? . From the hint, ? i =1 ?
∑ (x ? x )
% ? Var( β1 ) ≤ Var( β1 ). A more direct way to see this is to write
is less than
∑ ( xi ? x )2 =
? n( x ) 2 , which
unless x = 0.
% (iv) For a given sample size, the bias in β1 increases as x increases (holding the sum of the ? % % xi2 fixed). But as x increases, the variance of β1 increases relative to Var( β1 ). The bias in β1 % ? is also small when β is small. Therefore, whether we prefer β or β on a mean squared error
basis depends on the sizes of β 0 , x , and n (in addition to the size of
2.9 (i) We follow the hint, noting that c1 y = c1 y (the sample average of c1 yi is c1 times the sample average of yi) and c2 x = c2 x . When we regress c1yi on c2xi (including an intercept) we use equation (2.19) to obtain the slope:
∑ (c2 xi ? c2 x)(c1 yi ? c1 y )
∑ (c2 xi ? c2 x )2
∑ c c ( x ? x )( y ? y )
i =1 1 2 n i i
( xi ? x ) 2
c ∑ = 1 ? i =1 c2
( xi ? x )( yi ? y )
∑ (x ? x )
c1 ? β1. c2
? % % From (2.17), we obtain the intercept as β 0 = (c1 y ) C β1 (c2 x ) = (c1 y ) C [(c1/c2) β1 ](c2 x ) = ? ? ? c1( y C β x ) = c1 β ) because the intercept from regressing yi on xi is ( y C β x ).
(ii) We use the same approach from part (i) along with the fact that (c1 + y ) = c1 + y and
(c2 + x) = c2 + x . Therefore, (c1 + yi ) ? (c1 + y ) = (c1 + yi) C (c1 + y ) = yi C y and (c2 + xi) C
(c2 + x) = xi C x . So c1 and c2 entirely drop out of the slope formula for the regression of (c1 + yi) % ? % % ? on (c2 + xi), and β = β . The intercept is β = (c + y ) C β (c + x) = (c1 + y ) C β (c2 +
1 1 0 1 1 2 1
? ? ? ? x ) = ( y ? β1 x ) + c1 C c2 β1 = β 0 + c1 C c2 β1 , which is what we wanted to show.
(iii) We can simply apply part (ii) because log(c1 yi ) = log(c1 ) + log( yi ) . In other words, replace c1 with log(c1), yi with log(yi), and set c2 = 0.
(iv) Again, we can apply part (ii) with c1 = 0 and replacing c2 with log(c2) and xi with log(xi). % and β are the original intercept and slope, then β = β and β = β ? log(c ) β . % ?
If β 0 1 1 1 0 0 2 1 2.10 (i) This derivation is essentially done in equation (2.52), once (1/ SSTx ) is brought inside the summation (which is valid because SSTx does not depend on i). Then, just define wi = di / SSTx .
(ii) Because Cov( β1 , u ) = E[( β1 ? β1 )u ] , we show that the latter is zero. But, from part (i),
n n ? E[( β1 ? β1 )u ] =E ? ∑ i =1 wi ui u ? = ∑ i =1 wi E(ui u ). Because the ui are pairwise uncorrelated ? ? ? ? (they are independent), E(ui u ) = E(ui2 / n) = σ 2 / n (because E(ui uh ) = 0, i ≠ h ). Therefore,
wi E(ui u ) = ∑ i =1 wi (σ 2 / n) = (σ 2 / n)∑ i =1 wi = 0.
(iii) The formula for the OLS intercept is β 0 = y ? β x and, plugging in y = β 0 + β1 x + u
gives β = ( β + β x + u ) ? β x = β + u ? ( β ? β ) x .
0 0 1 1 0 1 1
? (iv) Because β1 and u are uncorrelated,
Var( β 0 ) = Var(u ) + Var( β1 ) x 2 = σ 2 / n + (σ 2 / SSTx ) x 2 = σ 2 / n + σ 2 x 2 / SSTx , which is what we wanted to show.
? (v) Using the hint and substitution gives Var( β 0 ) = σ 2 [( SSTx / n ) + x 2 ] / SSTx
= σ 2 ? n ?1 ∑ i =1 xi2 ? x 2 + x 2 ? / SSTx = σ 2 n ?1 ∑ i =1 xi2 / SSTx . ? ? ? ?
2.11 (i) We would want to randomly assign the number of hours in the preparation course so that hours is independent of other factors that affect performance on the SAT. Then, we would collect information on SAT score for each student in the experiment, yielding a data set {( sati , hoursi ) : i = 1,..., n} , where n is the number of students we can afford to have in the study. From equation (2.7), we should try to get as much variation in hoursi as is feasible. (ii) Here are three factors: innate ability, family income, and general health on the day of the exam. If we think students with higher native intelligence think they do not need to prepare for the SAT, then ability and hours will be negatively correlated. Family income would probably be positively correlated with hours, because higher income families can more easily afford preparation courses. Ruling out chronic health problems, health on the day of the exam should be roughly uncorrelated with hours spent in a preparation course. (iii) If preparation courses are effective, β1 should be positive: other factors equal, an increase in hours should increase sat. (iv) The intercept, β 0 , has a useful interpretation in this example: because E(u) = 0, β 0 is the average SAT score for students in the population with hours = 0. 11
SOLUTIONS TO COMPUTER EXERCISES C2.1 (i) The average prate is about 87.36 and the average mrate is about .732. (ii) The estimated equation is
prate = 83.05 + 5.86 mrate
n = 1,534, R2 = .075. (iii) The intercept implies that, even if mrate = 0, the predicted participation rate is 83.05 percent. The coefficient on mrate implies that a one-dollar increase in the match rate C a fairly large increase C is estimated to increase prate by 5.86 percentage points. This assumes, of course, that this change prate is possible (if, say, prate is already at 98, this interpretation makes no sense).
? (iv) If we plug mrate = 3.5 into the equation we get prate = 83.05 + 5.86(3.5) = 103.59. This is impossible, as we can have at most a 100 percent participation rate. This illustrates that, especially when dependent variables are bounded, a simple regression model can give strange predictions for extreme values of the independent variable. (In the sample of 1,534 firms, only 34 have mrate ≥ 3.5.)
(v) mrate explains about 7.5% of the variation in prate. This is not much, and suggests that many other factors influence 401(k) plan participation rates. C2.2 (i) Average salary is about 865.864, which means $865,864 because salary is in thousands of dollars. Average ceoten is about 7.95. (ii) There are five CEOs with ceoten = 0. The longest tenure is 37 years. (iii) The estimated equation is
log( salary ) = 6.51 + .0097 ceoten
n = 177, R2 = .013.? ? We obtain the approximate percentage change in salary given ?ceoten = 1 by multiplying the coefficient on ceoten by 100, 100(.0097) = .97%. Therefore, one more year as CEO is predicted to increase salary by almost 1%. C2.3 (i) The estimated equation is
sleep = 3,586.4 C .151 totwrk
n = 706, R2 = .103. The intercept implies that the estimated amount of sleep per week for someone who does not work is 3,586.4 minutes, or about 59.77 hours. This comes to about 8.5 hours per night.
(ii) If someone works two more hours per week then ?totwrk = 120 (because totwrk is measured in minutes), and so ? sleep = C.151(120) = C18.12 minutes. This is only a few minutes a night. If someone were to work one more hour on each of five working days, ? sleep = C.151(300) = C45.3 minutes, or about five minutes a night. C2.4 (i) Average salary is about $957.95 and average IQ is about 101.28. The sample standard deviation of IQ is about 15.05, which is pretty close to the population value of 15. (ii) This calls for a level-level model:
wage = 116.99 + 8.30 IQ
n = 935, R2 = .096. An increase in IQ of 15 increases predicted monthly salary by 8.30(15) = $124.50 (in 1980 dollars). IQ score does not even explain 10% of the variation in wage. (iii) This calls for a log-level model:
log( wage) = 5.89 + .0088 IQ
n = 935, R2 = .099. If ?IQ = 15 then ?log( wage) = .0088(15) = .132, which is the (approximate) proportionate change in predicted wage. The percentage increase is therefore approximately 13.2. C2.5 (i) The constant elasticity model is a log-log model: log(rd) = β 0 + β1 log(sales) + u, where β1 is the elasticity of rd with respect to sales. (ii) The estimated equation is
log(rd ) = C4.105 + 1.076 log(sales)
n = 32, R2 = .910. The estimated elasticity of rd with respect to sales is 1.076, which is just above one. A one percent increase in sales is estimated to increase rd by about 1.08%. C2.6 (i) It seems plausible that another dollar of spending has a larger effect for low-spending schools than for high-spending schools. At low-spending schools, more money can go toward purchasing more books, computers, and for hiring better qualified teachers. At high levels of spending, we would expend little, if any, effect because the high-spending schools already have high-quality teachers, nice facilities, plenty of books, and so on. (ii) If we take changes, as usual, we obtain
?math10 = β1? log(expend ) ≈ ( β1 /100)(%?expend ), just as in the second row of Table 2.3. So, if % ?expend = 10, ?math10 = β1 /10. (iii) The regression results are math10 = ?69.34 + 11.16 log(expend )
n = 408, R 2 = .0297
(iv) If expend increases by 10 percent, math10 increases by about 1.1 percentage points. This is not a huge effect, but it is not trivial for low-spending schools, where a 10 percent increase in spending might be a fairly small dollar amount. (v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100. In fact, the largest fitted values is only about 30.2.
TEACHING NOTES For undergraduates, I do not work through most of the derivations in this chapter, at least not in detail. Rather, I focus on interpreting the assumptions, which mostly concern the population. Other than random sampling, the only assumption that involves more than population considerations is the assumption about no perfect collinearity, where the possibility of perfect collinearity in the sample (even if it does not occur in the population) should be touched on. The more important issue is perfect collinearity in the population, but this is fairly easy to dispense with via examples. These come from my experiences with the kinds of model specification issues that beginners have trouble with. The comparison of simple and multiple regression estimates C based on the particular sample at hand, as opposed to their statistical properties C usually makes a strong impression. Sometimes I do not bother with the “partialling out” interpretation of multiple regression. As far as statistical properties, notice how I treat the problem of including an irrelevant variable: no separate derivation is needed, as the result follows form Theorem 3.1. I do like to derive the omitted variable bias in the simple case. This is not much more difficult than showing unbiasedness of OLS in the simple regression case under the first four GaussMarkov assumptions. It is important to get the students thinking about this problem early on, and before too many additional (unnecessary) assumptions have been introduced. I have intentionally kept the discussion of multicollinearity to a minimum. This partly indicates my bias, but it also reflects reality. It is, of course, very important for students to understand the potential consequences of having highly correlated independent variables. But this is often beyond our control, except that we can ask less of our multiple regression analysis. If two or more explanatory variables are highly correlated in the sample, we should not expect to precisely estimate their ceteris paribus effects in the population. I find extensive treatments of multicollinearity, where one “tests” or somehow “solves” the multicollinearity problem, to be misleading, at best. Even the organization of some texts gives the impression that imperfect multicollinearity is somehow a violation of the Gauss-Markov assumptions: they include multicollinearity in a chapter or part of the book devoted to “violation of the basic assumptions,” or something like that. I have noticed that master’s students who have had some undergraduate econometrics are often confused on the multicollinearity issue. It is very important that students not confuse multicollinearity among the included explanatory variables in a regression model with the bias caused by omitting an important variable. I do not prove the Gauss-Markov theorem. Instead, I emphasize its implications. Sometimes, and certainly for advanced beginners, I put a special case of Problem 3.12 on a midterm exam, where I make a particular choice for the function g(x). Rather than have the students directly compare the variances, they should appeal to the Gauss-Markov theorem for the superiority of OLS over any other linear, unbiased estimator. 15
SOLUTIONS TO PROBLEMS 3.1 (i) hsperc is defined so that the smaller it is, the lower the student’s standing in high school. Everything else equal, the worse the student’s standing in high school, the lower is his/her expected college GPA. (ii) Just plug these values into the equation:
colgpa = 1.392 ? .0135(20) + .) = 2.676.
(iii) The difference between A and B is simply 140 times the coefficient on sat, because hsperc is the same for both students. So A is predicted to have a score .) ≈ .207 higher. (iv) With hsperc fixed, ? colgpa = .00148?sat. Now, we want to find ?sat such that
? colgpa = .5, so .5 = .00148(?sat) or ?sat = .5/(.00148) ≈ 338. Perhaps not surprisingly, a large ceteris paribus difference in SAT score C almost two and one-half standard deviations C is needed to obtain a predicted difference in college GPA or a half a point.
3.2 (i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a family, the less education any one child in the family has. To find the increase in the number of siblings that reduces predicted education by one year, we solve 1 = .094(?sibs), so ?sibs = 1/.094 ≈ 10.6. (ii) Holding sibs and feduc fixed, one more year of mother’s education implies .131 years more of predicted education. So if a mother has four more years of education, her son is predicted to have about a half a year (.524) more years of education. (iii) Since the number of siblings is the same, but meduc and feduc are both different, the coefficients on meduc and feduc both need to be accounted for. The predicted difference in education between B and A is .131(4) + .210(4) = 1.364. 3.3
(i) If adults trade off sleep for work, more work implies less sleep (other things equal), so
(ii) The signs of β 2 and β 3 are not obvious, at least to me. One could argue that more educated people like to get more out of life, and so, other things equal, they sleep less ( β 2 & 0). The relationship between sleeping and age is more complicated than this model suggests, and economists are not in the best position to judge such things. (iii) Since totwrk is in minutes, we must convert five hours into minutes: ?totwrk = 5(60) = 300. Then sleep is predicted to fall by .148(300) = 44.4 minutes. For a week, 45 minutes less sleep is not an overwhelming change.
(iv) More education implies less predicted time sleeping, but the effect is quite small. If we assume the difference between college and high school is four years, the college graduate sleeps about 45 minutes less per week, other things equal. (v) Not surprisingly, the three explanatory variables explain only about 11.3% of the variation in sleep. One important factor in the error term is general health. Another is marital status, and whether the person has children. Health (however we measure that), marital status, and number and ages of children would generally be correlated with totwrk. (For example, less healthy people would tend to work less.) 3.4 (i) A larger rank for a law school means that the scho this lowers starting salaries. For example, a rank of 100 means there are 99 schools thought to be better. (ii) β1 & 0, β 2 & 0. Both LSAT and GPA are measures of the quality of the entering class. No matter where better students attend law school, we expect them to earn more, on average. β 3 ,
β 4 & 0. The number of volumes in the law library and the tuition cost are both measures of the
school quality. (Cost is less obvious than library volumes, but should reflect quality of the faculty, physical plant, and so on.) (iii) This is just the coefficient on GPA, multiplied by 100: 24.8%. (iv) This is an elasticity: a one percent increase in library volumes implies a .095% increase in predicted median starting salary, other things equal. (v) It is definitely better to attend a law school with a lower rank. If law school A has a ranking 20 less than law school B, the predicted difference in starting salary is 100(.0033)(20) = 6.6% higher for law school A. 3.5 (i) No. By definition, study + sleep + work + leisure = 168. Therefore, if we change study, we must change at least one of the other categories so that the sum is still 168. (ii) From part (i), we can write, say, study as a perfect linear function of the other independent variables: study = 168 ? sleep ? work ? leisure. This holds for every observation, so MLR.3 violated. (iii) Simply drop one of the independent variables, say leisure:
GPA = β 0 + β1 study + β 2 sleep + β 3 work + u.
Now, for example, β1 is interpreted as the change in GPA when study increases by one hour, where sleep, work, and u are all held fixed. If we are holding sleep and work fixed but increasing study by one hour, then we must be reducing leisure by one hour. The other slope parameters have a similar interpretation.
$ ? ? Conditioning on the outcomes of the explanatory variables, we have E(θ 1 ) = E( β1 + β 2 ) = ? ? E( β1 ) + E( β 2 ) = β1 + β2 = θ1 .
3.7 Only (ii), omitting an important variable, can cause bias, and this is true only when the omitted variable is correlated with the included explanatory variables. The homoskedasticity assumption, MLR.5, played no role in showing that the OLS estimators are unbiased. ? (Homoskedasticity was used to obtain the usual variance formulas for the β j .) Further, the
degree of collinearity between the explanatory variables in the sample, even if it is reflected in a correlation as high as .95, does not affect the Gauss-Markov assumptions. Only if there is a perfect linear relationship among two or more explanatory variables is MLR.3 violated.
We can use Table 3.2. By definition, β 2 & 0, and by assumption, Corr(x1,x2) & 0. % % Therefore, there is a negative bias in β : E( β ) & β . This means that, on average across
different random samples, the simple regression estimator underestimates the effect of the % training program. It is even possible that E( β1 ) is negative even though β1 & 0.
(i) β1 & 0 because more pollution can be expected to
note that β1 is
the elasticity of price with respect to nox. β 2 is probably positive because rooms roughly measures the size of a house. (However, it does not allow us to distinguish homes where each room is large from homes where each room is small.) (ii) If we assume that rooms increases with quality of the home, then log(nox) and rooms are negatively correlated when poorer neighborhoods have more pollution, something that is often true. We can use Table 3.2 to determine the direction of the bias. If β 2 & 0 and % Corr(x1,x2) & 0, the simple regression estimator β has a downward bias. But because β & 0,
% this means that the simple regression, on average, overstates the importance of pollution. [E( β1 ) is more negative than β1 .]
(iii) This is what we expect from the typical sample based on our analysis in part (ii). The simple regression estimate, ?1.043, is more negative (larger in magnitude) than the multiple regression estimate, ?.718. As those estimates are only for one sample, we can never know which is closer to β1 . But if this is a “typical” sample, β1 is closer to ?.718.
3.10 (i) Because x1 is highly correlated with x2 and x3 , and these latter variables have large
partial effects on y, the simple and multiple regression coefficients on x1 can differ by large amounts. We have not done this case explicitly, but given equation (3.46) and the discussion with a single omitted variable, the intuition is pretty straightforward.
% ? (ii) Here we would expect β1 and β1 to be similar (subject, of course, to what we mean by “almost uncorrelated”). The amount of correlation between x 2 and x3 does not directly effect
the multiple regression estimate on x1 if x1 is essentially uncorrelated with x 2 and x3 . (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression: x 2 and x3 have small partial effects on y and yet x 2 and x3 are highly correlated with x1 . Adding ? x and x like increases the standard error of the coefficient on x substantially, so se( β ) is
% likely to be much larger than se( β1 ).
(iv) In this case, adding x 2 and x3 will decrease the residual variance without causing ? much collinearity (because x1 is almost uncorrelated with x 2 and x3 ), so we should see se( β1 ) % smaller than se( β ). The amount of correlation between x and x does not directly affect
? se( β1 ).
3.11 From equation (3.22) we have
? where the ri1 are defined in the problem. As usual, we must plug in the true model for yi:
∑ r? ( β
+ β1 xi1 + β 2 xi 2 + β 3 xi 3 + ui
The numerator of this expression simplifies because
∑ r?i1 = 0,
∑ r?i1 xi 2 = 0, and
? . These all follow from the fact that the ri1 are the residuals from the regression of xi1 on
? xi 2 : the ri1 have zero sample average and are uncorrelated in sample with xi 2 . So the numerator % of β can be expressed as
β1 ∑ ri1 + β 3 ∑ ri1 xi 3 + ∑ ri1ui . i =1 i =1 i =1
Putting these back over the denominator gives
% β1 = β 1 + β 3
∑ r i3 i1 x
i =1 n i =1 2 i1
i =1 n 2 i1
Conditional on all sample values on x1, x2, and x3, only the last term is random due to its dependence on ui. But E(ui) = 0, and so
% E( β1 ) = β1 + β 3
which is what we wanted to show. Notice that the term multiplying β 3 is the regression ? coefficient from the simple regression of xi3 on ri1 .
3.12 (i) The shares, by definition, add to one. If we do not omit one of the shares then the equation would suffer from perfect multicollinearity. The parameters would not have a ceteris paribus interpretation, as it is impossible to change one share while holding all of the other shares fixed.
(ii) Because each share is a proportion (and can be at most one, when all other shares are zero), it makes little sense to increase sharep by one unit. If sharep increases by .01 C which is equivalent to a one percentage point increase in the share of property taxes in total revenue C holding shareI, shareS, and the other factors fixed, then growth increases by β1 (.01). With the other shares fixed, the excluded share, shareF, must fall by .01 when sharep increases by .01.
3.13 (i) For notational simplicity, define szx =
∑ ( z ? z ) this is not quite the sample
covariance between z and x because we do not divide by n C 1, but we are only using it to % simplify notation. Then we can write β1 as
∑ (z ? z ) y
This is clearly a linear function of the yi: take the weights to be wi = (zi ? z )/szx. To show unbiasedness, as usual we plug yi = β 0 + β1 xi + ui into this equation, and simplify:
∑ ( z ? z )( β
+ β1 xi + ui )
β 0 ∑ ( zi ? z ) + β1szx + ∑ ( zi ? z )ui
∑ ( z ? z )u
szx Now szx is a function of the zi and xi and the
where we use the fact that
∑ ( z ? z ) = 0 always.
expected value of each ui is zero conditional on all zi and xi in the sample. Therefore, conditional on these values,
% E( β1 ) = β1 + because E(ui) = 0 for all i.
∑ ( z ? z )E(u )
(ii) From the fourth equation in part (i) we have (again conditional on the zi and xi in the sample),
% Var ( β1 ) = Var
∑ ( zi ? z )ui
i =1 2 szx n 2 i
∑ ( z ? z ) Var (u )
2 i =1 i i 2 szx
∑ (z ? z )
i =1 2 szx
because of the homoskedasticity assumption [Var(ui) = σ2 for all i]. Given the definition of szx, this is what we wanted to show. ? (iii) We know that Var( β1 ) = σ2/ [∑ ( xi ? x ) 2 ]. Now we can rearrange the inequality in the
i =1 2 hint, drop x from the sample covariance, and cancel n-1 everywhere, to get [∑ ( zi ? z ) 2 ] / szx ≥? i =1 n n
% ? 1/[∑ ( xi ? x ) 2 ]. ??When?we?multiply?through?by?σ2 we get Var( β1 ) ≥?Var( β1 ), which is what
we wanted to show.
SOLUTIONS TO COMPUTER EXERCISES C3.1 (i) Probably β 2 & 0, as more income typically means better nutrition for the mother and better prenatal care.
(ii) On the one hand, an increase in income generally increases the consumption of a good, and cigs and faminc could be positively correlated. On the other, family incomes are also higher for families with more education, and more education and cigarette smoking tend to be negatively correlated. The sample correlation between cigs and faminc is about ?.173, indicating a negative correlation. (iii) The regressions without and with faminc are
bwght = 119.77 ? .514 cigs
n = 1,388, R 2 = .023
bwght = 116.97 ? .463 cigs + .093 faminc
n = 1,388, R 2 = .030.
The effect of cigarette smoking is slightly smaller when faminc is added to the regression, but the difference is not great. This is due to the fact that cigs and faminc are not very correlated, and the coefficient on faminc is practically small. (The variable faminc is measured in thousands, so $10,000 more in 1988 income increases predicted birth weight by only .93 ounces.)
C3.2 (i) The estimated equation is
price = ?19.32 + .128 sqrft + 15.20 bdrms n = 88, R 2 = .632
(ii) Holding square footage constant, ? price = 15.20 ?bdrms, and so price increases by 15.20, which means $15,200. (iii) Now ? price = .128 ?sqrft + 15.20 ?bdrms = .128(140) + 15.20 = 33.12, or $33,120. Because the size of the house is increasing, this is a much larger effect than in (ii).
(iv) About 63.2%. (v) The predicted price is C19.32 + .128(2,438) + 15.20(4) = 353.544, or $353,544. (vi) From part (v), the estimated value of the home based only on square footage and number of bedrooms is $353,544. The actual selling price was $300,000, which suggests the buyer underpaid by some margin. But, of course, there are many other features of a house (some that we cannot even measure) that affect price, and we have not controlled for these.
C3.3 (i) The constant elasticity equation is
log( salary ) = 4.62 + .162 log( sales ) + .107 log(mktval )
n = 177, R 2 = .299.
(ii) We cannot include profits in logarithmic form because profits are negative for nine of the companies in the sample. When we add it in levels form we get log( salary ) = 4.69 + .161 log( sales ) + .098 log(mktval ) + .000036 profits
n = 177, R 2 = .299.
The coefficient on profits is very small. Here, profits are measured in millions, so if profits increase by $1 billion, which means ?profits = 1,000 C a huge change C predicted salary increases by about only 3.6%. However, remember that we are holding sales and market value fixed. Together, these variables (and we could drop profits without losing anything) explain almost 30% of the sample variation in log(salary). This is certainly not “most” of the variation. (iii) Adding ceoten to the equation gives log( salary ) = 4.56 + .162 log( sales ) + .102 log(mktval ) + .000029 profits + .012ceoten
n = 177, R 2 = .318.
This means that one more year as CEO increases predicted salary by about 1.2%. (iv) The sample correlation between log(mktval) and profits is about .78, which is fairly high. As we know, this causes no bias in the OLS estimators, although it can cause their variances to be large. Given the fairly substantial correlation between market value and firm profits, it is not too surprising that the latter adds nothing to explaining CEO salaries. Also, profits is a short term measure of how the firm is doing while mktval is based on past, current, and expected future profitability.
C3.4 (i) The minimum, maximum, and average values for these three variables are given in the table below:
Variable Average Minimum Maximum 6.25 100 atndrte 81.71
priGPA ACT
(ii) The estimated equation is
2.59 22.51 13
atndrte = 75.70 + 17.26 priGPA ? 1.72 ACT
n = 680, R2 = .291.
The intercept means that, for a student whose prior GPA is zero and ACT score is zero, the predicted attendance rate is 75.7%. But this is clearly not an interesting segment of the population. (In fact, there are no students in the college population with priGPA = 0 and ACT = 0, or with values even close to zero.) (iii) The coefficient on priGPA means that, if a student’s prior GPA is one point higher (say, from 2.0 to 3.0), the attendance rate is about 17.3 percentage points higher. This holds ACT fixed. The negative coefficient on ACT is, perhaps initially a bit surprising. Five more points on the ACT is predicted to lower attendance by 8.6 percentage points at a given level of priGPA. As priGPA measures performance in college (and, at least partially, could reflect, past attendance rates), while ACT is a measure of potential in college, it appears that students that had more promise (which could mean more innate ability) think they can get by with missing lectures. (iv) We have atndrte = 75.70 + 17.267(3.65) C 1.72(20) ≈ 104.3. Of course, a student cannot have higher than a 100% attendance rate. Getting predictions like this is always possible when using regression methods for dependent variables with natural upper or lower bounds. In practice, we would predict a 100% attendance rate for this student. (In fact, this student had an actual attendance rate of 87.5%.) (v) The difference in predicted attendance rates for A and B is 17.26(3.1 ? 2.1) ? (21 ? 26) = 25.86.
C3.5 The regression of educ on exper and tenure yields
? educ = 13.57 ? .074 exper + .048 tenure + r1 .
n = 526, R2 = .101.
? Now, when we regress log(wage) on r1 we obtain ? log( wage) = 1.62 + .092 r1
n = 526, R2 = .207.
? As expected, the coefficient on r1 in the second regression is identical to the coefficient on educ in equation (3.19). Notice that the R-squared from the above regression is below that in (3.19). ? In effect, the regression of log(wage) on r1 explains log(wage) using only the part of educ that is uncorrelated w separate effects of exper and tenure are not included.
C3.6 (i) The slope coefficient from the regression IQ on educ is (rounded to five decimal places) δ%1 = 3.53383.
% (ii) The slope coefficient from log(wage) on educ is β1 = .05984.
(iii) The slope coefficients from log(wage) on educ and IQ are
β1 = .03912 and β 2 = .00586, respectively.
(iv) We have β1 + δ%1 β 2 = .03912 + 3.5) ≈ .05983, which is very close to .05984; the small difference is due to rounding error.
C3.7 (i) The results of the regression are
math10 = ?20.36 + 6.23 log(expend ) ? .305 lnchprg n = 408, R2 = .180.
The signs of the estimated slopes imply that more spending increases the pass rate (holding lnchprg fixed) and a higher poverty rate (proxied well by lnchprg) decreases the pass rate (holding spending fixed). These are what we expect. (ii) As usual, the estimated intercept is the predicted value of the dependent variable when all regressors are set to zero. Setting lnchprg = 0 makes sense, as there are schools with low poverty rates. Setting log(expend) = 0 does not make sense, because it is the same as setting expend = 1, and spending is measured in dollars per student. Presumably this is well outside any sensible range. Not surprisingly, the prediction of a ?20 pass rate is nonsensical. (iii) The simple regression results are
math10 = ?69.34 + 11.16 log(expend ) n = 408, R2 = .030
and the estimated spending effect is larger than it was in part (i) C almost double. (iv) The sample correlation between lexpend and lnchprg is about ?.19 , which means that, on average, high schools with poorer students spent less per student. This makes sense, especially in 1993 in Michigan, where school funding was essentially determined by local property tax collections. ? (v) We can use equation (3.23). Because Corr(x1,x2) & 0, which means δ%1 & 0 , and β 2 & 0 , % ? the simple regression estimate, β , is larger than the multiple regression estimate, β . Intuitively,
failing to account for the poverty rate leads to an overestimate of the effect of spending.
3.8 (i) The average of prpblck is .113 with standard deviation .182; the average of income is 47,053.78 with standard deviation 13,179.29. It is evident that prpblck is a proportion and that income is measured in dollars.
(ii) The results from the OLS regression are
psoda = .956 + .115 prpblck + .0000016 income n = 401, R2 = .064.
If, say, prpblck increases by .10 (ten percentage points), the price of soda is estimated to increase by .0115 dollars, or about 1.2 cents. While this does not seem large, there are communities with no black population and others that are almost all black, in which case the difference in psoda is estimated to be almost 11.5 cents. (iii) The simple regression estimate on prpblck is .065, so the simple regression estimate is actually lower. This is because prpblck and income are negatively correlated (-.43) and income has a positive coefficient in the multiple regression. (iv) To get a constant elasticity, income should be in logarithmic form. I estimate the constant elasticity model:
log( psoda ) = ?.794 + .122 prpblck + .077 log(income)
n = 401, R2 = .068.
If prpblck increases by .20, log(psoda) is estimated to increase by .20(.122) = .0244, or about 2.44 percent.
? (v) β prpblck falls to about .073 when prppov is added to the regression. (vi) The correlation is about ?.84 , which makes sense because poverty rates are determined by income (but not directly in terms of median income).
(vii) There is no argument that they are highly correlated, but we are using them simply as controls to determine if the is price discrimination against blacks. In order to isolate the pure discrimination effect, we need to control for as many measures including both variables makes sense.
TEACHING NOTES At the start of this chapter is good time to remind students that a specific error distribution played no role in the results of Chapter 3. That is because only the first two moments were derived under the full set of Gauss-Markov assumptions. Nevertheless, normality is needed to obtain exact normal sampling distributions (conditional on the explanatory variables). I emphasize that the full set of CLM assumptions are used in this chapter, but that in Chapter 5 we relax the normality assumption and still perform approximately valid inference. One could argue that the classical linear model results could be skipped entirely, and that only large-sample analysis is needed. But, from a practical perspective, students still need to know where the t distribution comes from because virtually all regression packages report t statistics and obtain pvalues off of the t distribution. I then find it very easy to cover Chapter 5 quickly, by just saying we can drop normality and still use t statistics and the associated p-values as being approximately valid. Besides, occasionally students will have to analyze smaller data sets, especially if they do their own small surveys for a term project. It is crucial to emphasize that we test hypotheses about unknown population parameters. I tell ? my students that they will be punished if they write something like H0: β1 = 0 on an exam or, even worse, H0: .632 = 0. One useful feature of Chapter 4 is its illustration of how to rewrite a population model so that it contains the parameter of interest in testing a single restriction. I find this is easier, both theoretically and practically, than computing variances that can, in some cases, depend on numerous covariance terms. The example of testing equality of the return to two- and four-year colleges illustrates the basic method, and shows that the respecified model can have a useful interpretation. Of course, some statistical packages now provide a standard error for linear combinations of estimates with a simple command, and that should be taught, too. One can use an F test for single linear restrictions on multiple parameters, but this is less transparent than a t test and does not immediately produce the standard error needed for a confidence interval or for testing a one-sided alternative. The trick of rewriting the population model is useful in several instances, including obtaining confidence intervals for predictions in Chapter 6, as well as for obtaining confidence intervals for marginal effects in models with interactions (also in Chapter 6). The major league baseball player salary example illustrates the difference between individual and joint significance when explanatory variables (rbisyr and hrunsyr in this case) are highly correlated. I tend to emphasize the R-squared form of the F statistic because, in practice, it is applicable a large percentage of the time, and it is much more readily computed. I do regret that this example is biased toward students in countries where baseball is played. Still, it is one of the better examples of multicollinearity that I have come across, and students of all backgrounds seem to get the point.
SOLUTIONS TO PROBLEMS 4.1 (i) and (iii) generally cause the t statistics not to have a t distribution under H0. Homoskedasticity is one of the CLM assumptions. An important omitted variable violates Assumption MLR.3. The CLM assumptions contain no mention of the sample correlations among independent variables, except to rule out the case where the correlation is one. 4.2 (i) H0: β 3 = 0. H1: β 3 & 0. (ii) The proportionate effect on salary is .00024(50) = .012. To obtain the percentage effect, we multiply this by 100: 1.2%. Therefore, a 50 point ceteris paribus increase in ros is predicted to increase salary by only 1.2%. Practically speaking, this is a very small effect for such a large change in ros. (iii) The 10% critical value for a one-tailed test, using df = ∞, is obtained from Table G.2 as 1.282. The t statistic on ros is .0 ≈ .44, which is well below the critical value. Therefore, we fail to reject H0 at the 10% significance level. (iv) Based on this sample, the estimated ros coefficient appears to be different from zero only because of sampling variation. On the other hand, including ros may not it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation). 4.3 (i) Holding profmarg fixed, ? rdintens = .321 ?log(sales) = (.321/100)[100 ?? log( sales) ] ≈ .00321(%?sales). Therefore, if %?sales = 10,
? rdintens ≈ .032, or only about 3/100 of a percentage point. For such a large percentage increase in sales, this seems like a practically small effect.
(ii) H0: β1 = 0 versus H1: β1 & 0, where β1 is the population slope on log(sales). The t statistic is .321/.216 ≈ 1.486. The 5% critical value for a one-tailed test, with df = 32 C 3 = 29, is obtained from Table G.2 as 1.699; so we cannot reject H0 at the 5% level. But the 10% critical value is 1.311; since the t statistic is above this value, we reject H0 in favor of H1 at the 10% level. (iii) Not really. Its t statistic is only 1.087, which is well below even the 10% critical value for a one-tailed test. 4.4 (i) H0: β 3 = 0. H1: β 3 ≠ 0. (ii) Other things equal, a larger population increases the demand for rental housing, which should increase rents. The demand for overall housing is higher when average income is higher, pushing up the cost of housing, including rental rates.
(iii) The coefficient on log(pop) is an elasticity. A correct statement is that “a 10% increase in population increases rent by .066(10) = .66%.” (iv) With df = 64 C 4 = 60, the 1% critical value for a two-tailed test is 2.660. The t statistic is about 3.29, which is well above the critical value. So β 3 is statistically different from zero at the 1% level. 4.5 (i) .412 ± 1.96(.094), or about .228 to .596. (ii) No, because the value .4 is well inside the 95% CI. (iii) Yes, because 1 is well outside the 95% CI. 4.6 (i) With df = n C 2 = 86, we obtain the 5% critical value from Table G.2 with df = 90. Because each test is two-tailed, the critical value is 1.987. The t statistic for H0: β 0 = 0 is about .89, which is much less than 1.987 in absolute value. Therefore, we fail to reject β 0 = 0. The t statistic for H0: β1 = 1 is (.976 C 1)/.049 ≈ -.49, which is even less significant. (Remember, we reject H0 in favor of H1 in this case only if |t| & 1.987.) (ii) We use the SSR form of the F statistic. We are testing q = 2 restrictions and the df in the unrestricted model is 86. We are given SSRr = 209,448.99 and SSRur = 165,644.51. Therefore,
(209, 448.99 ? 165, 644.51) ? 86 ? ? ? ? ≈ 11.37, 165, 644.51 ? 2 ?
which is a strong rejection of H0: from Table G.3c, the 1% critical value with 2 and 90 df is 4.85. (iii) We use the R-squared form of the F statistic. We are testing q = 3 restrictions and there are 88 C 5 = 83 df in the unrestricted model. The F statistic is [(.829 C .820)/(1 C .829)](83/3) ≈ 1.46. The 10% critical value (again using 90 denominator df in Table G.3a) is 2.15, so we fail to reject H0 at even the 10% level. In fact, the p-value is about .23. (iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F statistic would not have an F distribution under the null hypothesis. Therefore, comparing the F statistic against the usual critical values, or obtaining the p-value from the F distribution, would not be especially meaningful. 4.7 (i) While the standard error on hrsemp has not changed, the magnitude of the coefficient has increased by half. The t statistic on hrsemp has gone from about C1.47 to C2.21, so now the coefficient is statistically less than zero at the 5% level. (From Table G.2 the 5% critical value with 40 df is C1.684. The 1% critical value is C2.423, so the p-value is between .01 and .05.) (ii) If we add and subtract β 2 log(employ) from the right-hand-side and collect terms, we have
log(scrap) =
β 0 + β1 hrsemp + [ β 2 log(sales) C β 2 log(employ)]
+ [ β 2 log(employ) + β 3 log(employ)] + u
β 0 + β1 hrsemp + β 2 log(sales/employ)
+ ( β 2 + β 3 )log(employ) + u,
where the second equality follows from the fact that log(sales/employ) = log(sales) C log(employ). Defining θ 3 ≡ β 2 + β 3 gives the result. (iii) No. We are interested in the coefficient on log(employ), which has a t statistic of .2, which is very small. Therefore, we conclude that the size of the firm, as measured by employees, does not matter, once we control for training and sales per employee (in a logarithmic functional form). (iv) The null hypothesis in the model from part (ii) is H0: β 2 = C1. The t statistic is [C.951 C (C1)]/.37 = (1 C .951)/.37 ≈ .132; this is very small, and we fail to reject whether we specify a one- or two-sided alternative.
? ? ? ? 4.8 (i) We use Propert

我要回帖

更多关于 计量经济学 pdf 的文章

 

随机推荐