第六講 多元回歸分析深入專題_第1頁
第六講 多元回歸分析深入專題_第2頁
第六講 多元回歸分析深入專題_第3頁
第六講 多元回歸分析深入專題_第4頁
第六講 多元回歸分析深入專題_第5頁
已閱讀5頁,還剩47頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

1、1Multiple Regression Analysis: Further Issues y = b0 + b1x1 + b2x2 + . . . bkxk + u2.6.1 數(shù)據(jù)的測度單位對OLS統(tǒng)計量的影響nChanging the scale of the y variable will lead to a corresponding change in the scale of the coefficients and standard errors, so no change in the significance or interpretationnChanging the sc

2、ale of one x variable will lead to a change in the scale of that coefficient and standard error, so no change in the significance or interpretation3. 1111122112221122122,var,1/,/11nniiiiiiniiiniinxiijjjjjrurjjurxxyyr yrxxyySSERsSSTyytseSSTRseSSRSSRqFc seSSRnkSERSSRnkbbbbbbbb.n因變量或自變量以對數(shù)形式出現(xiàn),改

3、變度量單位不會影響斜率系數(shù),只會改變截距項。 11logloglogiic ycy 11logloglogjjc xcx8.Beta CoefficientsnOccasional youll see reference to a “standardized coefficient” or “beta coefficient” which has a specific meaningnIdea is to replace y and each x variable with a standardized version i.e. subtract mean and divide by stan

4、dard deviationnCoefficient reflects standard deviation of y for a one standard deviation change in x 9.10.Beta Coefficients (cont)1 12211j.where denote the z-score of , is the z score of ,and so on. Andb(/) for 1,., are called beta coefficients.ykkyjyjzb zb zb zerrorzy zxjkb11.nExample 6.1 :Effects

5、of Pollution on Housing Prices(數(shù)據(jù)名:HPRICE2)Stata 命令語句: reg price nox crime rooms dist stradio,beta (標準化后的回歸分析)12.6.2 對函數(shù)形式的進一步討論n OLS can be used for relationships that are not strictly linear in x and y by using nonlinear functions of x and y will still be linear in the parametersn Can take the nat

6、ural log of x, y or bothn Can use quadratic forms of xn Can use interactions of x variables13.Interpretation of Log Modelsnln(y) = b0 + b1ln(x) + u b1 is the elasticity of y with respect to xnln(y) = b0 + b1x + u b1 is approximately the percentage change in y given a 1 unit change in x ny = b0 + b1l

7、n(x) + u b1 is approximately the change in y for a 100 percent change in x14.Why use log models?n1.使用自然對數(shù)使得對系數(shù)的解釋頗具有吸引力,可以直接以彈性的形式體現(xiàn)出來。n2.斜率系數(shù)不隨測量單位的變化而變化。n3.取對數(shù)后,即使不能消除異方差的影響,但可以使之有所緩解。n4.取對數(shù)通常會縮小變量的取值范圍,在某些情況下還相當可觀。n5.缺點:變量不能取零和負值;更難預測原變量的值,原模型使我們預測log(y),而不是y。15.Quadratic Models(二次式模型)(二次式模型)n Fo

8、r a model of the form y = b0 + b1x + b2x2 + u we cant interpret b1 alone as measuring the change in y with respect to x, we need to take into account b2 as well, since1212122, so 20,maxmin2yxxyxxywhenthat isxy reach itsi orvaluexbbbbbb 16.More on Quadratic Modelsn Suppose that the coefficient on x i

9、s positive and the coefficient on x2 is negativen Then y is increasing in x at first, but will eventually turn around and be decreasing in x21*212at be willpoint turning the0 and 0For bbbbx17.More on Quadratic Modelsn Suppose that the coefficient on x is negative and the coefficient on x2 is positiv

10、en Then y is decreasing in x at first, but will eventually turn around and be increasing in x0 and 0 when as same theis which ,2at be willpoint turning the0 and 0For 2121*21bbbbbbx18.How to describe decreasing effect19.How to describe increasing effect20.二次項模型案例:污染對住房價格的影響(數(shù)據(jù)名:HPRICE2)2log(price) 13

11、.39 0.902log(nox) 0.087log(dist) 0.545rooms 0.0620.048roomsstratio (0.57)(0.115) (0.043) (0.165) (0.013) (0.006)N=506 ,R2=0.603 log(price)(0.54520.062) room sroom s% (price)100( 0.5452 0.062)rooms rooms 100( 0.5452 12.4)rooms rooms 在room*=0.545/(2*0.062)4.4的右邊,增加一個臥室對價格的百分比變化具有遞增的影響。21.n比如,rooms從5增加

12、到6會導致價格提高約為 -54.5+12.4*5=7.5%; rooms從6增加到7會導致價格提高約為-54.5+12.4*6=19.9%。這是一個很強的遞增影響。STATA命令語句:gen rooms2=rooms*rooms gen ldist=log(dist)reg lprice lnox ldist rooms rooms2 stratiodisplay -1*_brooms/(2*_brooms2)(求轉(zhuǎn)折點)display 100*(_brooms+2*_brooms2*6(求rooms從6增加到7會導致價格提高的百分比)。22.Interaction Terms(有交互作用項的

13、模型)(有交互作用項的模型)n For a model of the form y = b0 + b1x1 + b2x2 + b3x1x2 + u we cant interpret b1 alone as measuring the change in y with respect to x1, we need to take into account b3 as well, since 132112, so to summarizethe effect of on we typicallyevaluate the above at yxxxyxbb23.n交互效應通常需將模型重新參數(shù)化:n

14、原模型:01 1223 12yxxx xbbbb2b 是x2=0時,X2對y的偏效應,這通常沒有什么意義,我們轉(zhuǎn)而將模型重新參數(shù)化為:01 1221122()(x)yxxx其中, 和 分別是x1和x2的總體均值。很容易計算出:我們立即得到在均值的偏效應。 122231bb 24.nExample: Effects of Attendance on Final Exam Performance(數(shù)據(jù)名:ATTEND.DTA)natndrte系數(shù)為負,是否意味著聽課對期末考試分數(shù)具有負面影響? b1僅考慮了priGPA=0時的影響。natndrte和priGPAatndrte系數(shù)估計值t值不顯著,

15、是否意味著兩者對期末考試分數(shù)無影響? F檢驗的p值為0.014.25.nAtndrte對stndfnl的偏效應: 其含義是:在priGPA的平均水平(2.59)上,atndrte提高10個百分點,使stndfnl比期末考試平均分數(shù)高出0.078倍。 0.00670.0056 2.590.00781161162.59,2.59bbbb01660612.592.59stndfnlatndrtepriGPA atndrteuatndrtepriGPAatndrteubbbbb10.0078 0.00263t26.STATA命令語句:nsum priGPAngen priGPA2=priGPA*pri

16、GPAngen ACT2=ACT*ACTngen priatn=priGPA*atndrtenreg stndfnl atndrte priGPA priGPA2 ACT2 priatn27.6.3 擬合優(yōu)度和回歸元選擇的進一步探討n Recall that the R2 will always increase as more variables are added to the modelnThe adjusted R2 takes into account the number of variables in a model, and may decrease 2222221111111

17、1111uySSR nRSST nSSRnkRSSTnSSTnRnnk 28.n調(diào)整R方的作用:為在一個模型中另外增加自變量施加了懲罰。 隨著一個新的自變量加入回歸方程,SSR下降,但回歸中的自由度df=n-k-1也下降。因此,SSR/(n-k-1)可能上升,也可能下降。n作為一個結(jié)論有: 在回歸中增加一個新變量,當且僅當新變量的t統(tǒng)計量在絕對值上大于1,調(diào)整R方才會有所提高; 在回歸中增加一組變量時,當且僅當這組新變量聯(lián)合顯著性的F統(tǒng)計量大于1,調(diào)整R方才會有所提高。29.Adjusted R-Squared (cont)n Its easy to see that the adjusted

18、 R2 is just (1 R2)(n 1) / (n k 1), but most packages will give you both R2 and adj-R2n You can compare the fit of 2 models (with the same y) by comparing the adj-R2n You cannot use the adj-R2 to compare models with different ys (e.g. y vs. ln(y)30.Using adjusted R-squared to choose between nonnested

19、 models.20.6211R 20.6226R 220.061,0.03RR220.148,0.09RR31.391732982salarySST66.72lsalarySST32.Controlling too many factors in regression analysis回歸分析中控制了過多的因素nImportant not to fixate too much on adj-R2 and lose sight of theory and common sensenIf economic theory clearly predicts a variable belongs, g

20、enerally leave it innDont want to include a variable that prohibits a sensible interpretation of the variable of interest remember ceteris paribus interpretation of multiple regression33.u在研究啤酒稅對交通死亡率影響的回歸模型中, 是否應該將人均啤酒消費量變量包括在模型之中?u在保持beercons不變的情況下,死亡率因tax提高 1個百分點而導致的差異。這一說法是否有意義?34.Adding regress

21、ors to reduce the error of variance-增加回歸元以減少誤差方差 n在回歸中增加一個新的自變量會加劇多重共線性問題;另一方面,從誤差項中取出一些因素作為解釋變量可以減少誤差方差。n應該將那些影響y而又與所有我們關心的自變量都無關的自變量包括進來。35.6.4 預測和殘差分析n Suppose we want to use our estimates to obtain a specific prediction.n First, suppose that we want an estimate of E(y|x1=c1,xk=ck) = 0 = b0+b1c1+

22、 + bkckn This is easy to obtain by substituting the xs in our estimated model with cs , but what about a standard error?n Really just a test of a linear combination36.Predictions (cont)n Can rewrite as b0 = 0 b1c1 bkckn Substitute in to obtain y = 0 + b1 (x1 - c1) + + bk (xk - ck) + u n So, if you r

23、egress yi on (xij - cij) the intercept will give the predicted value and its standard errorn Note that the standard error will be smallest when the cs equal the means of the xs37.Example: CI for predicted college GPA2.7 1.96 0.020: 2.66 2.74CI01200,030,05satsathsperchsperchsizehsize38.Predictions (c

24、ont)n This standard error for the expected value is not the same as a standard error for an new outcome on yn We need to also take into account the variance in the unobserved error. 39. 00000001 1000001 10000110001 10000012200202000 , so kkkkkkkkeyyxxuyyxxE yEExExxxE eE uVar eVar yVar usVar yye ese

25、yebbbbbbbbbbbbbj的方差有兩個來源:第一個是 的抽樣誤差,來自我們對20var1jjsecnynb的估計。因為,所以與成比例。第二個是總體誤差的方差,它不隨樣本容量的變化而變化。40. 00000CLM10.95,95%j000.0250.025ueese enktP -tese etyb在經(jīng)典線性模型()假定下,和都是正態(tài)分布,所以也是正態(tài)分布的。服從一個自由度為的 分布。則 的一個的預測區(qū)間: 000.025ytse e41.42.Prediction interval 0025.00000100for interval prediction 95% a have we gi

26、ven that so ,esetyyyyeteseeknnUsually the estimate of s2 is much larger than the variance of the prediction, thusnThis prediction interval will be a lot wider than the simple confidence interval for the prediction43.Residual AnalysisnInformation can be obtained from looking at the residuals (i.e. pr

27、edicted vs. observed)nExample: Regress price of cars on characteristics big negative residuals indicate a good dealnExample: Regress average earnings for students from a school on student characteristics big positive residuals indicate greatest value-addediiiuyy44.n例如,HPRICE1.RAW的住房價格模型中。01238181811

28、20.206pricelotsizesqrftbdrmsuupricepricebbbb 45.46.Predicting y in a log modelnSimple exponentiation of the predicted ln(y) will underestimate the expected value of ynInstead need to scale this up by an estimate of the expected value of exp(u)47. 01 101 101 101 1200121logexpexpexp|expexpIn this case can predict y as followsexp2 exp

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論