機器學習題庫_第1頁
機器學習題庫_第2頁
機器學習題庫_第3頁
機器學習題庫_第4頁
機器學習題庫_第5頁
已閱讀5頁,還剩52頁未讀, 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、機器學習題庫一、極大似然1、 ML estimati on of exponen tial model (10)A Gaussia n distributi on is often used to model data on the real li ne, but is sometimesinapprop riate whe n the data are ofte n close to zero but con stra ined to be nonn egative. In suchcases one can fit an exponen tial distributi on, whose

2、p robability den sity fun cti on is give n by1 x p X -e pbGiven N observations x i drawn from such a distribution:(a)Write down the likelihood as a function of the scale parameter b.(b) Write dow n the derivative of the log likelihood.(c) Give a simple exp ressi on for the ML estimate for b.V.V 1 1_

3、 _/jx:驚 _ F廠_ f廠嚴二 IfiaJLiX.: h =: =、= J jUI I二 L Lii!1 1 乙=X:A1 = 口片山= .n = .1-E=lE=l2、換成Poisson 分布: p x|xe,y 0,1,2,. x!Nlog p x |i 1NXi log Ni 1NXi logi 1N log Xi!i 1log x!二、貝葉斯1、貝葉斯公式應用假設在考試的多項選擇中,考生知道正確答案的概率為P,猜測答案的概率為1-P,并且假設考生知道正確答案答對題的概率為1,猜中正確答案的概率為1 m,其中m為多那么已知考生答對題目,求他知道正確答案的概率。:選項的數目。dist

4、ributio n p |p known | correctp known, correctp known2、 Con jugate p riorswith hyperparameters 丫 , such that the posterior distributionP |X, P X | P與先驗的分布族相同(a) Suppose that the likelihood is given by the exponential distribution with rate parameterGive n a likelihoodx|for a class models with p aram

5、eters 0 , a con jugate p rior is a3Show fha二he gamma disfribufionGammais a conjugate prior for fhe exponenfiaL Derive fhe parameter updafe given observations X:K - xz andfhe predicHon dismbujon p xz一嚴 K - XN .s Exp=nrlri二=1=1 二rl+-T1 弓 lilw=hL=邑 bi %詞-A)A) H H 廠一 三說pl.v.f二二h1 二三 pl.mrpl.mr T T 一二K -

6、 -y)-y) H H jrjrswjrjrsw二一 一 =:二 H H g g二JAL L工 K 二nnw r-三三注*rvari-m?i ,rl:,rr 三耳-M rhirrhir m=lLm=lL T T- -三二 二怎 PCPC玉 uricr its扌 % 一 x! * 一亍空二 AUXPTAM 卜 L L r(2r(2H 空wmr二 A 一 = + N:7 + X-V-L=4 VI-盤二W + HP匕Thcl,f4hl, fLS” lal“ll=prcr 二prJnl.= +ls 才巴、-+ NFur rhu wu三L-iiul二Tir二zmul二m-cll亡三L-二壬 tb-low

7、ilhx亙.r.r 一 .rl .r壬二rN + l 一 七Ml - ,rl:-r.Nh?一 =叱 +v-二W + .NE+Nr-rn +N)r-m + M E + nf J g了 + F +. + i5(s + 二+N n +NU U + + yryr + + -2+1)0-2+1)0 -241-241 .*v v + + 4.+-4.arHl=s?l一 11】二#.ray.ray 占一吋4=電1(11 壬二1(1 hfih 塞仝I Hf+I/7f蘭蒼- = +A, I二5精選文庫6(c) SupposeEi IT tlh* jnrdirtiuih disTliliirLuii *7 olH

8、pi iro hr* hhViU Llilrgk iil:J/jfXj = i II Al =Hl -屮 1FD bP hj + i. A + A- “川 FJlj fkj -1)1(;, + A-+1 -2 f r.,. - - - - - - 1 Hfjr 打“甘 17 十 J.1 h = A 十 Fr(fj + i)r(ft + A- u r(H +山 + 口+ 0-H7r(fj 于 b + k re 斤十=MfJ + 1口力 + A- 1 1 ) ) ffrF + 力十 AT * 1 j U 十 b TT A 十 J - ir(fj + b + k rh + k 亠 r空)rd + &

9、 1)口“ + 山丁矗 + 0 亠 jwljrii ilir pp|iLli jLihih s4p iu riraudiird 仍rHLuh H/】 + .$)iin ihi 你卩”討切】煜山i”I M B討;iiLiftriburujii.is a con jugate p rior for the likelihood p x| ; show that the mixture priorMWm P| mm 1is also con jugate for the same likelihood, assum ing the mixture weights w m sum to 1.(r) A

10、Fixtiiie PnavTliv prior given by rh inbcruiic.M尸用 hl7.u)= E.:ji=l:ji=lMoiiover, we aiu tint宵 J is H cui.ji腳to prior for the JikoJilioo*I 鞏X | 旳:in otliri廠但 I x.7, s =I 糾修 I g = r.ff a 吶 J -Wlii丄丄 M宅 iitiiltiijiy Ik* inixChiT priorthe likHihooH. 肚譏 the tbllowiiL pohterLor:m=lm=lA/A/卄ululAJAJ“Ll“Ll 5

11、5AJAJFn=lFn=lur hscnc rlint tipluis tlic SHIIH tuiiii the 卩Iiur, i.-f. M uiLxtLirc ilistiiluLtioiLwitli iipcLittc weightE bind liypeiparaiiictei-s.(d) Rep eat part (c) for the case where the p rior is a sin gle distributi on and the likelihood is a mixture, and the p rior is con jugate for each mixt

12、ure component of the likelihood.some p riors can be conjugate for several different likelihoods; for exam ple, the beta is conjugate for the Bernoulli and the geometric distributions and the gamma is conjugate for the expo nential and for the gamma with fixed(e) (Extra credit, 20) Explore the case w

13、here the likelihood is a mixture with fixed components and unknown weights; i.e., the weights are the p arameters to be lear ned.精選文庫7所以極大似然估計的風險最小。而局部線性回歸只需利用Problem 2CuiLHitlfr till prohaliility Jciihiry fuiictioii (of fiuictiun. if X b 山沁rett?) fur tlit ?xjuueiiriiiJ fmiuiv:嵐J:吋三/心)(丹心)一出川一(n)(n)

14、 Sliuv rluitrluit thethe innvuriutG huiiunlhuiiunl EUHI ilioilio niukiituinial distributions lieloiiglieloiig to fLuutly.fLuutly.!) SIKW Hinr, ill n fciicTiUive cbissitifntioii iiicidfd, ifrlM、(iiiHLitional titiisitiiw lnl()iig h) thn cxpoiHiiriiil iainily. then rhe jjtjMtcriur tlistrihntiun tbr a c

15、hikis is a sotnnax fiof M linear function uf the feiunie VUtIOt .i(r) (.oiisiciciius tf tci曲汕ir“ Hue an fXEircssini) fifr 駕嚴 fVliPre ivill this pxrrssiciu he(I) (Ftu-(Ftu- e!xti-e!xti-:i i ciedit)ciedit) A sn1 LS( ic i:s 胡 id h) heoror n pjLLFiiikJrc iF 網科八一“:旳=川廠(f),or inin other ivords. itit is ii

16、vlcEicndeur of 7, 丫 thfir fur H rand0111 viiriaMc A drawn from mi exponential family d乜itiity 尹(E:心 I心iri ti Hiitticieut ist就ktic fur yShew iluit a bieturiziitiuii p(j-.= /j (2 )/2(j/)i?i iin母;LQ- fiiul suthciriiit, to hr :i iiHticiriir riNitisrif hw q).(cj For extraextra credit)credit) Snppu沁 Xi. a

17、iviid fiuin an tx卩oiwuTial family fleiiirv /(-/: r/). Wli;rthe siiffitiebt sititisfie .J-) tuj- 7?二、判斷題(1)給定n個數據點,如果其中一半用于訓練,另一半用于測試,則訓練誤差和測試誤差之間的差別會隨著n的增加而減小。(2)極大似然估計是無偏估計且在所有的無偏估計中方差最小,(3)回歸函數 A和B,如果A比B更簡單,則 A幾乎一定會比 B在測試集上表現更好。(4)全局線性回歸需要利用全部樣本點來預測新輸入的對應輸岀值,查詢點附近的樣本來預測輸岀值。所以全局線性回歸比局部線性回歸計算代價更高。(5

18、) Boosting和Bagging都是組合多個分類器投票的方法,二者都是根據單個分類器的正確率決定其權重。(6 ) In the boosting iterations, the training error of each new decision stump and the training error of the comb ined classifier vary roughly in con cert( F)While the training error of the comb ined classifier typ ically decreases as a fun cti o

19、n of boost ing iterati ons, the error of the in dividual decisi on stu mps typ ically in creases since the exa mple weights become concen trated at the most difficult exa mpl es.(7 ) One advantage of Boosting is that it does not overfit. ( F)(8 ) Support vector mach ines are resista nt to outliers,

20、i.e., very no isy exa mp les draw n from a differe nt distributi on. (F)(9 )在回歸分析中,最佳子集選擇可以做特征選擇,當特征數目較多時計算量大;嶺回歸和Lasso 模型計算量小,且 Lasso也可以實現特征選擇。(10)當訓練數據較少時更容易發生過擬合。精選文庫8(11)梯度下降有時會陷于局部極小值,但EM算法不會。精選文庫9(12 )在核回歸中,最影響回歸的過擬合性和欠擬合之間平衡的參數為核函數的寬度。(13)In the AdaBoost algorithm, the weights on all the misc

21、lassified points will go up by the same mult ip licative factor.( T)7, 2 卩oiiitsj true/false In A(IEIBoost, weighted training terror 斜 of the 少 weak(?hm閃ifkT (l aiiihig inla vvil lis Dt l(ni(|s I(i iiici PHsx HH Hion 門/.* * SOLUTIONSOLUTION: True. In the course of boosting itertions the weak classif

22、iers are forced to try to classify more difFioult examplE. The weights will increase for exampl that are repeatedly mischssified by the weak classifiers. The weighted training error g of the ph vveak classifier on the trainning data therefore tends to increase.9. 2 points Consider point is coiTPctly

23、瀘ifiM and flist.aiu Iroii the decisionboniiflarv. Viiv would SVMs disioii boiindarv be unaffected bv rliifs 卩oiiK. but the one IrrtTiKci l)y k曙inlirEg門嚇!4訕1 bp* * SOLinSOLin iONiON: The hinge loss used by SVMs gives zero weight to these pointE while the log-loss used by logistic regression gives a l

24、ittle bit of weight to these points.(14)True/False: In a least-squares lin ear regressi on p roblem, addi ng an L 2 regularizati on pen altycannot decrease the L2 error of the solution w ?on the training data.( F)(15)True/False: In a least-squares lin ear regressi on p roblem, addi ng an L 2 regular

25、izati on pen alty always decreases the expected L2 error of the solution w ?on unseen test data ( F).(16)除了 EM算法,梯度下降也可求混合高斯模型的參數。(T)(20) Any decisi on boun dary that we get from a gen erative model with class-c on diti onal Gaussia n distributi ons could in principle be rep roduced with an SVM and

26、a polyno mial kern el.True! In fact, since class-c on diti onal Gaussia ns always yield quadratic decisi on boun daries, they can be rep roduced with an SVM with kernel of degree less tha n or equal to two.(21) AdaBoost will eve ntually reach zero trai ning error, regardless of the type of weak clas

27、sifier it uses, pro vided eno ugh weak classifiers have bee n comb in ed.False! If the data is not sep arable by a lin ear comb in ati on of the weak classifiers, AdaBoost can t achieve zero training error.(22) The L2 pen alty in a ridge regressi on is equivale nt to a Lapl ace p rior on the weights

28、. ( F)(23) The log-likelihood of the data will always in crease through successive iterati ons of the exp ectati on maximati on algorithm. (F)(24) In trai ning a logistic regressi on model by maximizi ng the likelihood of the labels give n the inputs we have mult iple locally op timal soluti ons. (F

29、)精選文庫102、考慮線性回歸模型:yN W02W1X,,訓練數據如下圖所示。(10 分)(1)用極大似然估計參數,并在圖(a)中畫岀模型。(3分)(2)用正則化的極大似然估計參數,即在log似然目標函數中加入正則懲罰函數wW并在圖(b)中畫岀當參數 C取很大值時的模型。(3分)(3)在正則化后,高斯分布的方差2是變大了、變小了還是不變? (4分)四、回歸1、考慮回歸一個正則化回歸問題。在下圖中給岀了懲罰函數為二次正則函數,當正則化參數取不同值時,在訓練集和測試集上的log似然(mean log-probability )。(10分)(1)說法“隨著C的增加,圖2中訓練集上的log似然永遠不會

30、增加”是否正確,并說明理由。(2)解釋當C取較大值時,圖2中測試集上的log似然下降的原因。精選文庫113.52.52.51.50.50.5-0.5-1-1-1 50 01.5. 斗. .1 . . 圖3.考慮二維輸入空間點X X1, X2T上的回歸問題,其中Xj1,1 , j1,2在單位正方形內。訓練樣本和測試樣本在單位正方形中均勻分布輸岀模型為yN xx: 10 x1x2 7為2 5x23, 1 ,我們用1-10階多項式特征,采用線性回歸模型來學習x與y之間的關系(高階特征模型包含所有低階特征),損失函數取平方誤差損失。(1)現在n 20個樣本上,訓練 1階、2階、8階和10階特征的模型,

31、然后在一個大規模的獨立的測試集上測試,則在下的模型為什么測試誤差小。3列中選擇合適的模型(可能有多個選項),并解釋第3列中你選擇訓練誤差最小訓練誤差最大測試誤差最小1階特征的線性模型X2階特征的線性模型X8階特征的線性模型X10階特征的線性模型X(10 分)3.5320.5-05-1-1.54圖(b)精選文庫126(2)現在n 10個樣本上,訓練1階、2階、8階和10階特征的模型,然后在一個大規模的獨精選文庫13立的測試集上測試,則在下的模型為什么測試誤差小。 N 0,1 .3列中選擇合適的模型(可能有多個選項),并解釋第3列中你選擇4、We are trying to learn regre

32、ssion parameters for a dataset which we know was gen erated from a polyno mial of a certa in degree, but we do not know what this degree is. Assume the data was actually gen erated from a polyno mial of degree 5 with some added Gaussia n no ise (that is2345y w) WX w2Xw3X w4X w/For training we have 1

33、00 x,y p airs and for test ing we are using an additi onal set of 100 x,y p airs. Since we do not know the degree of the polyno mial we lear n two models from the data. Model A lear ns p arameters for a polyno mial of degree 4 and model B lear ns p arameters for a polyno mial of degree 6. Which of t

34、hese two models is likely to fit the test data better?An swer: Degree 6 polyno mial. Since the model is a degree 5 polyno mial and we have eno ugh trai ning data, the model we lear n for a six degree polyno mial will likely fit a very small coefficie nt for x6 . Thus, eve n though it is a six degree

35、 polyno mial it will actually behave in a very similar way to a fifth degree polyno mial which is the correct model leadi ng to better fit to the data.5、Input-dependent noise in regression訓練誤差最小訓練誤差最大測試誤差最小1階特征的線性模型X2階特征的線性模型8階特征的線性模型XX10階特征的線性模型X(10 分)(3) The app roximati on error of a polyno mial

36、regressi on model depends on the n umber of trainingpoin ts.仃)(4) The structural error of a polyno mial regressi on model depends on the n umber of tra ining poin ts. (F)14ii.iii.(iii) is correct. In a Gaussia n distributi on over y, the varia nee is deter mined by the coefficie nt of y2 2; so2 2by

37、rep lac ingby x , we get a varia nee that in creases lin early with x. (Note also the cha nge tothe no rmalizati on“ constant.” ) (i) has quadratic depende(iiedo oes not change the varianee atall, it just renames w i.b)Circle the pl ots in Figure 1 that could p lausibly have bee n gen erated by some

38、 in sta nee of the model family(ies) you chose.(ii) and (iii). (Note that (iii) works for20 .) (i) exhibits a large varianee at x = 0, and thec)d)varia nee app ears independent of x.True/False: Regressi on with inp ut-de pendent no ise gives the same soluti on as ordinary regressi on for an infin it

39、e data set gen erated accord ing to the corres ponding model.True. In both cases the algorithm will recover the true un derly ing model.For the model you chose in p art (a), write dow n the derivative of the n egative log likelihood with resp ect to wi.Ordinary least-squares regressi on is equivale

40、nt to assum ing that each data point is gen erated accord ing to a lin ear fun cti on of the input p lus zero-mea n, con sta nt-varia nee Gaussia n no ise. In many systems, however, the no ise varia nee is itself a p ositive lin ear fun cti on of the inp ut (which is assumed to be non-n egative, i.e

41、., x = 0).a) Which of the follow ing families of p robability models correctly describes this situati on in the uni variate case? (Hint: only one of them does.)J訕rU斗現(呂存空/Tj-v27r VZYEZp, I , I“e - 01:亠H十皿3 =話即Tlickhg likelihood h,=E =1圖処二!簽護應LIIKL rlic flerivEULVe劉祗將 =-W(加-皿-?otc that for lilies; tl

42、iruiLli die crifin (如u=0)- Hit cpriiihEd Huliitiun bjis the pnU-精選文庫15titiihirlv riiinple tunit M:i = j/Z.T.It isti ibike the leiivutiT? uf tlieiiutirii嗎 cliittex卩丿we iiriv lu氏 likelilioodHfur a gLnnlPills, Ti(y Mm卩lig rh,MmLlh以nvdtiph duta 卩厲】心,rbo prcKlncr ufiT(jh:il)i|idrsa sum of kn prohKhLlitir

43、s.五、分類1.產生式模型VS.判別式模型(a) Your billi on aire friend n eeds your help. She n eeds to classify job app licatio ns into good/bad categories, and also to detect job app lica nts who lie in their app licati ons using den sity estimati on to detect outliers. To meet these n eeds, do you recomme nd using a

44、discrim in ative or gen erative classifier? Why?產生式模型因為要估計密度P x|y (b) Your billi on aire friend also wan ts to classify software app licati ons to detect bug-prone app licatio ns using features of the source code. This p ilot p roject only has a few app licati ons to be used as training data, though

45、. To create the most accurate classifier, do you recomme nd using a discrim in ative or gen erative classifier? Why?判別式模型樣本數較少,通常用判別式模型直接分類效果會好些(d) Finally, your billionaire friend also wants to classify companies to decide which one to acquire. This p roject has lots of training data based on sever

46、al decades of research. To create the most accurate classifier, do you recomme nd using a discrim in ative or gen erative classifier? Why?產生式模型樣本數很多時,可以學習到正確的產生式模型2、logstic 回歸精選文庫16-0.40.5 1.522.53regularization parameter CFigure 2: Log-probability of labels as a fun cti on of regularizati on p aram

47、eter CHere we use a logistic regressi on model to solve a classificati on p roblem .In Figure 2, we have pl otted the mean log-pr obability of labels in the tra ining and test sets after hav ing trained the classifier with quadratic regularizati on pen alty and differe nt values of the regularizati

48、on p arameter C.1、In training a logistic regressi on model by maximiz ing the likelihood of the labels give n the inp uts we have mult iple locally op timal soluti ons. (F)An swer: The log-probability of labels give n exa mples imp lied by the logistic regressi on model is a con cave (con vex dow n)

49、 fun cti on with res pect to the weights. The (only) locally op timal soluti on is also globally op timal2、A stochastic gradie nt algorithm for tra ining logistic regressi on models with a fixed lear ning rate will find the optimal setting of the weights exactly. ( F)An swer: A fixed lear ning rate

50、mea ns that we are always tak ing a fin ite ste p towards improving the log-probability of any sin gle trai ning exa mple in the up date equati on. Un less the exa mples are somehow “ aligned ” , wecontinue jumping from side to side of the optimal solution, and will not be able to get arbitrarily cl

51、ose to it. The lear ning rate has to app roach to zero in the course of the up dates for the weights to con verge.3、The average log-pr obability of training labels as in Figure 2 can n ever in crease as wein crease C.(T)Stron ger regularizati on means more con stra ints on the soluti on and thus the

52、 (average) log-p robability of the tra ining exa mples can only get worse.4、 Explain why in Figure 2 the test log-p robability of labels decreases for large values of C.As C in creases, we give more weight to con stra ining the p redictor, and thus give less flexibility to fitting the tra ining set.

53、 The in creased regularizati on guara ntees that the test p erforma nee gets closer to2g-一qBqodlBO_精選文庫17the trai ning p erforma nee, but as we over-c on strain our allowed pr edictors, we are not able to fit the training set at all, and although the test p erforma nee is now very close to the train

54、ing p erforma nee, both are low.5、The log-probability of labels in the test set would decrease for large values of C eve n if wehad a large n umber of training exa mp les.(T)The above argume nt still holds, but the value of C for which we will observe such a decrease will scale up with the n umber o

55、f exa mp les.6、Adding a quadratic regularizati on pen alty for the p arameters whe n estimat ing a logistic regressi on model en sures that some of the p arameters (weights associated with the components of the input vectors) vani sh.A regularizati on pen alty for feature selecti on must have non- z

56、ero derivative at zero. Otherwise, the regularizati on has no effect at zero, and weight will tend to be slightly non-zero, eve n whe n this does not impr ove the log-p robabilities by much.3、正則化的Logstic回歸This p roblem we will refer to the bi nary classificati on task dep icted in Figure 1(a), which

57、 we atte mpt to solve with the sim ple lin ear logistic regressi on model廠(” =1 |x,門,吧)=曲 n + 叫出= -;-;1 +皿|】(一川1;心一勺空)(for sim plicity we do not use the bias p arameter wO). The trai ning data can be sep arated with zero training error - see line L1 in Figure 1(b) for instanee.(a) The 2-dime nsional

58、 data set used inP roblem 2(b) The points can be separated by LI (solid IConsider a regularization approach where we try to maximize52 心耳 P.吋Xh , 2-石瞇1=11=1 - -for large C. Note that only w2 is penalized. We d like to know which of the fines in Figure 1(b) could arise as a result of such regularizat

59、i on. For each p ote ntial line L2, L3 or L4 deter mine whether it can result from regularizing w2. If not, explain very briefly why not.L2: No. When we regularize w 2, the lin e). P ossible other decisi on boundaries are shown by L2;L3;L4.resulting boundary can rely less on the value of x 2 and the

60、refore becomes more vertical. L 2 here seems to be more horizontal than the unregularized solution so it精選文庫181、SVMcannot come as a result of penalizing w 2L3: Yes. Here w22 is small relative to w 1人2 (as evide need by high slop e), and eve n though it would assig n a rather low log-pr obability to

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論