人工智能機器學習objectnet-a-large-scale-bias-controlled-dataset-for-pushing-the-limits-of-object-recognition-models_第1頁
人工智能機器學習objectnet-a-large-scale-bias-controlled-dataset-for-pushing-the-limits-of-object-recognition-models_第2頁
人工智能機器學習objectnet-a-large-scale-bias-controlled-dataset-for-pushing-the-limits-of-object-recognition-models_第3頁
人工智能機器學習objectnet-a-large-scale-bias-controlled-dataset-for-pushing-the-limits-of-object-recognition-models_第4頁
人工智能機器學習objectnet-a-large-scale-bias-controlled-dataset-for-pushing-the-limits-of-object-recognition-models_第5頁
已閱讀5頁,還剩7頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

ObjectNet:Alarge-scalebias-controlleddatasetfor

pushingthelimitsofobjectrecognitionmodels

AndreiBarbu?

MIT,CSAIL&CBMM

DavidMayo?

MIT,CSAIL&CBMM

JulianAlverio

MIT,CSAIL

WilliamLuo

MIT,CSAIL

ChristopherWang

MIT,CSAIL

DanGutfreund

MIT-IBMWatsonAI

JoshuaTenenbaum

MIT,BCS&CBMM

BorisKatz

MIT,CSAIL&CBMM

Abstract

Wecollectalargereal-worldtestset,ObjectNet,forobjectrecognitionwithcontrols

whereobjectbackgrounds,rotations,andimagingviewpointsarerandom.Most

scienti?cexperimentshavecontrols,confoundswhichareremovedfromthedata,

toensurethatsubjectscannotperformataskbyexploitingtrivialcorrelationsin

thedata.Historically,largemachinelearningandcomputervisiondatasetshave

lackedsuchcontrols.Thishasresultedinmodelsthatmustbe?ne-tunedfornew

datasetsandperformbetterondatasetsthaninreal-worldapplications.When

testedonObjectNet,objectdetectorsshowa40-45%dropinperformance,with

respecttotheirperformanceonotherbenchmarks,duetothecontrolsforbiases.

ControlsmakeObjectNetrobustto?ne-tuningshowingonlysmallperformance

increases.Wedevelopahighlyautomatedplatformthatenablesgatheringdatasets

withcontrolsbycrowdsourcingimagecapturingandannotation.ObjectNetis

thesamesizeastheImageNettestset(50,000images),andbydesigndoesnot

comepairedwithatrainingsetinordertoencouragegeneralization.Thedataset

isbotheasierthanImageNet–objectsarelargelycenteredandunoccluded–and

harder,duetothecontrols.Althoughwefocusonobjectrecognitionhere,data

withcontrolscanbegatheredatscaleusingautomatedtoolsthroughoutmachine

learningtogeneratedatasetsthatexercisemodelsinnewwaysthusproviding

valuablefeedbacktoresearchers.Thisworkopensupnewavenuesforresearch

ingeneralizable,robust,andmorehuman-likecomputervisionandincreating

datasetswhereresultsarepredictiveofreal-worldperformance.

1Introduction

Datasetsareofcentralimportancetocomputervisionandmorebroadlymachinelearning.Particularly

withtheadventoftechniquesthatarelesswellunderstoodfromatheoreticalpointofview,raw

performanceondatasetsisnowthemajordriverofnewdevelopmentsandthemajorfeedbackabout

thestateofthe?eld.Yet,asacommunity,wecollectdatasetsinawaythatisunusualcomparedto

otherscienti?c?elds.Werelyalmostexclusivelyondatasetsizetominimizeconfounds(arti?cial

correlationsbetweenthecorrectlabelsandfeaturesintheinput),toattestunusualphenomena,and

encouragegeneralization.Unfortunately,scaleisnotenoughbecauseofrareeventsandbiases–

Sunetal.[1]provideevidencethatweshouldexpecttoseelogarithmicperformanceincreasesasa

functionofdatasetsizealone.Thesourcesofdatathatdatasetsdrawontodayarehighlybiased,e.g.,

objectclassiscorrelatedwithbackgrounds[2],andomitmanyphenomena,e.g.,objectsappearin

stereotypicalrotationswithlittleocclusion.Theresultingdatasetsthemselvesaresimilarlybiased[3].

?Equalcontribution.Websitehttps://objectnet.dev.Correspondingauthorabarbu@

33rdConferenceonNeuralInformationProcessingSystems(NeurIPS2019),Vancouver,Canada.

100

90

80

70

60

ImageNetTop-5

ImageNetTop-1

OverlapTop-5

OverlapTop-1

ObjectNetTop-5

ObjectNetTop-1

50

40

40-45%

performance

drop

30

20

10

Detectors

byyear

0

Figure1:PerformanceonObjectNetforhigh-performingdetectorstrainedonImageNetinrecent

years:AlexNet[4],VGG-19[5],ResNet-152[6],Inception-v4[7],NASNET-A[8],andPNASNet-5

Large[9].Solidlinesshowtop-1performance,dashedlinesshowtop-5performance.ImageNet

performanceonall1000classesisshowningreen.ImageNetperformanceonclassesthatoverlap

withObjectNetisshowninblue;thetwooverlapin113classesoutof313ObjectNetclasses,which

areonlyslightlymoredif?cultthantheaverageImageNetclass.PerformanceonObjectNetfor

thoseoverlappingclasses.Weseea40-45%dropinperformance.Objectdetectorshaveimproved

substantially.PerformanceonObjectNettracksperformanceonImageNetbutthegapbetweenthe

tworemainslarge.

Inotherareasofscience,suchissuesarecontrolledforwithcarefuldatacreationandcurationthat

intentionallycoversphenomenaandcontrolsforbiases–importantideasthatdonoteasilyscaleto

largedatasets.Forexample,modelsfornaturallanguageinference,NLI,thatperformwellonlarge

datasetsfailwhensystematicallyvaryingaspectsoftheinput[10],butthesearenotcollectedatscale.

Incomputervision,datasetslikeCLEVR[11]dothesamethroughsimulation,butsimulateddatais

mucheasierformoderndetectorsthanreal-worlddata.Weshowthatwithsigni?cantautomationand

crowdsourcing,youcanhavescaleandcontrolsinreal-worlddataandthatthisprovidesfeedback

aboutthephenomenathatmustbeunderstoodtoachievehuman-levelaccuracy.

ObjectNetisanewlargecrowdsourcedtestsetforobjectrecognitionthatincludescontrolsforobject

rotations,viewpoints,andbackgrounds.Objectsareposedbyworkersintheirownhomesinnatural

settingsaccordingtospeci?cinstructionsdetailingwhatobjectclasstheyshoulduse,howandwhere

theyshouldposetheobject,andwheretoimagethescenefrom.Everyimageisannotatedwiththese

properties,allowingustotesthowwellobjectdetectorsworkacrosstheseconditions.Eachofthese

propertiesisrandomlysampledleadingtoamuchmorevarieddataset.

Ineffect,weareremovingsomeofthebrittlepriorsthatobjectdetectorscanexploittoperformwell

onexistingdatasets.Overall,currentobjectdetectorsexperiencealargeperformanceloss,40-45%,

whensuchpriorsareremoved;see?g.1forperformancecomparisons.Eachofthecontrolsremoves

aprioranddegradestheperformanceofdetectors;see?g.2forsampleimagesfromthedataset.

Practically,thismeansthatanimportantfeedbackforthecommunityaboutthelimitationsofmodels

ismissing,andthatperformanceondatasetsislimitedasapredictoroftheperformanceuserscan

expectontheirownunrelatedtasks.

2

ImageNetObjectNet

ChairsChairsby

rotation

Chairsby

background

Chairsby

viewpointTeapotsT-shirts

Figure2:ImageNet(leftcolumn)oftenshowsobjectsontypicalbackgrounds,withfewrotations,and

fewviewpoints.TypicalObjectNetobjectsareimagedinmanyrotations,ondifferentbackgrounds,

frommultipleviewpoints.The?rstthreecolumnsshowchairsvaryingbythethreepropertiesthatare

beingcontrolledfor:rotation,background,andviewpoint.Onecanseethelargevarietyintroduced

tothedatasetbecauseofthesemanipulations.ObjectNetimagesarelightlycroppedforthis?gure

duetoinconsistentaspectratios.MostdetectorsfailonmostoftheimagesincludedinObjectNet.

Toencouragegeneralization,wemakethreeotherunusualchoiceswhenconstructingObjectNet.

First,ObjectNetisonlyatestset,anddoesnotcomepairedwithatrainingset.Separatingtraining

andtestsetcollectionmaybeanimportanttooltoavoidcorrelationsbetweenthetwowhichare

easilyaccessibletolargemodelsbutnotdetectablebyhumans.Sincehumanseasilygeneralize

tonewdatasets,adoptingthisseparationcanencouragenewmachinelearningtechniquesthatdo

thesame.Second,whileObjectNetwillbefreelyavailable,itcomeswithanimportantstipulation:

onecannotupdatetheparametersofanymodelforanyreasonontheimagespresentinObjectNet.

While?ne-tuningfortransferlearningiscommon,itencouragesover?ttingtoparticulardatasets

–wedisallow?ne-tuningbutreportsuchexperimentsinsection4.3todemonstratetherobustness

ofthedataset.Third,wemarkeveryimagebyaonepixelredborderthatmustberemovedonthe

?ybeforetesting.Aslarge-scalewebdatasetsaregathered,thereisadangerthatdatawillleak

betweenthetrainingandtestsetsofdifferentdatasets.Thishasalreadyhappened,asCaltech-UCSD

Birds-200-2011,apopulardataset,andImageNetwerediscoveredtohaveoverlapputtinginto

questionsomeresults[12].Withtestsetimagesmarkedbyaredborderandavailableonline,onecan

performreverseimagesearchanddetermineifanimageisincludedinanytrainingsetanywhere.We

encourageallcomputervisiondatasets–notjustonesforobjectdetection–toadoptthisstandard.

3

Whileitincludescontrols,ObjectNetisnothardinarbitraryways.Itisinmanywaysintentionally

easycomparedtoImageNetorotherdatasets.Objectsarehighlycentralizedintheimage,they

arerarelyoccludedandeventhenlightlyso,andmanybackgroundsarenotparticularlycluttered.

Inothersenses,ObjectNetisharder,asmallpercentageofviewpoints,rotations,andevenobject

instances,arealsodif?cultforhumans.Thisdemonstratesamuchwiderrangeofdif?cultyand

providesanopportunitytoalsotestthelimitsofhumanobjectrecognition–ifobjectdetectorsare

toaugmentorreplacehumans,suchknowledgeiscritical.Ouroverallgoalistotestthebiasof

detectorsandtheirabilitytogeneralizetospeci?cmanipulations,nottojustcreateimagesthatare

dif?cultforarbitraryreasons.Futureversionsofthedatasetwillratchetupthisdif?cultyintermsof

clutter,occlusion,lighting,etc.withadditionalcontrolsfortheseproperties.

Ourcontributionsare:

1.anewmethodologytoevaluatecomputervisionapproachesondatasetsthathavecontrols,

2.anautomatedplatformtogatherdataatscaleforcomputervision,

3.anewobjectrecognitiontestset,ObjectNet,consistingof50,000images(thesamesizeas

theImageNettestset)and313objectclasses,and

4.ananalysisofbiasesatscaleandtheroleof?ne-tuning.

2Relatedwork

ManylargedatasetsforobjectrecognitionexistsuchasImageNet[13],MSCOCO[14],and

OpenImages[15].Whilethetrainingsetsforthesedatasetsarehuge,thetestsetsarecomparableto

thesizeofthedatasetpresentedhere,withImageNethaving50,000testimages,MSCOCOhaving

81,434,andOpenImageshaving125,436,comparedtoObjectNet’s50,000testimages.Suchdatasets

arecollectedfromrepositoriesofexistingimages,particularlyFlickr,whichconsistofphotographs–

imagesthatuserswanttoshareonline.Thisintentbiasesagainstmanyobjectinstances,backgrounds,

rotations,occlusion,lightingconditions,etc.Biasesleadsimultaneouslytomodelsthatdonottransfer

wellbetweendatasets[3]–detectorspickuponbiasesinsideadatasetandfailwhenthosebiases

change–andthatachievegoodperformancewithlittle?ne-tuningonnewdatasets[16]–detectors

canquicklyacquirethenewbiasesevenwithonlyafewtrainingimagesperclass.Incomputer

visionapplications,biasesmaynotmatchthoseofanyexistingdataset,theymaychangeovertime,

adversariesmayexploitthebiasesofasystem,etc.

Thedataset-dependentnatureofexistingobjectdetectorsiswell-understoodwithseveralother

approaches–asidefromscale–havingbeenattemptedtoalleviatethisproblem.Somefocuson

thedatasetsthemselves,e.g.,Khoslaetal.[17]subdividedatasetsintopartitionsthataresuf?ciently

different,somethingpossibleonlyifdatasetshaveenoughvarietyinthem.Othersfocusonthe

models,e.g.,Zhuetal.[2]trainmodelsthatseparateforegroundsandbackgroundsexplicitlyto

becomemoreresilienttobiases.Demonstratingthevalueofmodelsthathaverobustnessbuiltinto

thembydesignrequiresdatasetsthatcontrolforbiases–controlsarenotjustasanitycheck,they

encouragebetterresearch.

Somedatasets,suchasMPIIcooking[18],KITTI[19],TACoS[20],CHARADES[21],Something-

Something[22],AVA[23],andPartiallyOccludedHands[24]collectnoveldata.Explicitlycollecting

dataisdif?cult,asevidencedbythelargegapinscalebetweenthesedatasetsandthosecollected

fromexistingonlinesources.Atthesametime,explicitinstructionsandcontrolscanleadtomore

variedandinterestingdatasets.Thesedatasetsonthewholedonotattempttoimposecontrolsby

systematicallyvaryingsomeaspectofthedata–usersarepromptedtoperformactionsorhold

objectsbutarenottoldhowtodothisorwhatpropertiesthoseactionsshouldhave.Workerschoose

convenientsettingsandmannersinwhichtoperformactionsleadingtobiasesindatasets.

3Datasetconstruction

ObjectNetiscollectedbyworkersonMechanicalTurkwhoimageobjectsintheirhomes;see?g.3.

Thisgivesuscontroloverthepropertiesofthoseobjectswhilealsoensuringthattheimagesare

natural.Weaskedworkerstoimageobjectsin4backgrounds(kitchens,livingrooms,bedrooms,

washrooms),from3viewpoints(top,angledat45degrees,andside),andin50objectrotations.

Rotationswereuniformlydistributedonasphere,afterwhichnearbypointsweresnappedtothe

equatorandthepoles.Wefoundthatworkersareabletoposeobjectstowithinaround20degreesof

4

Figure3:Workersselectoneobjectthattheyhaveavailablefromasmallnumberofchoices.They

areshownarectangularprism,inblue,withtwolabeledorthogonalaxesinredandyellow.These

labelsareobject-classspeci?c,sothatworkerscanregistertheobjectcorrectlyagainsttherectangular

prism.Wedonotshowworkersimagesofdesiredobjectstonotbiasthemtowardcertaininstances.

Workersseeananimationofhowtheobjectshouldbemanipulated,performthismanipulation,and

thenaligntheobjectagainstthe?nalrectangularprismrenderedontheircamera.Notshownaboveis

thepost-capturereviewUItoensurethatimagescontaintherightobjectsandarenotblurry.

rotationdependingontheaxis,althoughtheuniformityoftheresultingrotationsvariesbyclass.This

couldbemoreaccurate,butweintentionallydidnotshowinstancesofobjectclassestoworkersin

ordertoavoidbiasingthemtowardparticularinstances.Inroughlyonethirdofthetrialsweshowed

arotated3Dcar(carsdonotappearinourdataset)asanadditionalcueforthedesiredrotation.

WorkersaretransitionedtotheirphoneusingaQRcode,anobjectisdescribedtothem(butno

exampleisshown),andtheyverifyifanobjectthatmatchesthedescriptionisavailable.Arectangular

prismisthenpresentedwithlabeledfacesthataresemanticallyrelevanttothatobject,e.g.,thefront

andtopofachair.Eachobjectclasswasannotatedwithtwosemanticallymeaningfulorthogonal

axes,asingleaxisiftheobjectclasswasrotationallysymmetric,ornoaxisifitwasspherical.We

foundthatdescribingsuchpartsinamannerthatleadstolittledisagreementisdif?cultandrequires

carefulvalidation.Whilethisprovidesaweakbiastowardparticularobjectinstances–onemight

imagineachairwithnodistinctivefront–itisnecessaryforexplainingthedesiredobjectpose.

Therectangularprismisalsoanimatedtoshowthedesiredobjectpose.Theanimationstartswith

therectangularprismrepresentingtheobjectinadefaultandcommonpose,e.g.,thefrontofachair

facingauserandthetoppointedupward,andthentransitionsitintothedesiredpose.Another

animationshowstheviewpointfromwhichtheobjectshouldbeimaged.Wefoundthatanimating

suchinstructionswascriticalinallowingworkerstodeterminethedesiredobjectposes.

Workersareaskedtomovetheobjectintoaspeci?croom,poseit,andimageitfromacertain

angle.Therectangularprismwasoverlayedontheirphonecamerainthe?naldesiredposition

withthearrowsmarkingtheclass-speci?csemantically-relevantfaces.Thisalsoprovedcriticalas

rememberingthedesiredrotationforanobjectistoounreliable.

Thisprocessannotateseveryimagewiththreeproperties(rotation,viewpoint,andbackground);it

controlsforbiasesbysamplingthesepropertiesrandomly,thusallowingustoincludeobjectsin

rotationsandscenesthatareunusual.Eachimageisvalidatedtoensurethatitcontainsthecorrect

objectsandthatanyidentifyinginformationisremoved.

Toselectobjectclassesforthedataset,welisted420commonhouseholdobjects.Ofthese,55classes

wereeliminatedbecausetheyarenoteasilymovable,e.g.,beds(16classes),poseasafetyconcern,

e.g.,?realarms(8),weretooconfusingtosubjects,e.g.,wefoundlittleagreementonwhatarmbands

are(10),posedprivacyconcerns,e.g.,people(5),orwerealiveandcannotbemanipulatedsafely,

e.g.,plants(2);numbersdonotaddbecauseclasseswereexcludedformultiplereasons.Inaddition,52objectclassesweretoorare,e.g.,golfclubs.Datawascollectedfor313objectclasses,with≈160imagesperclassonaveragewithastandarddeviationof44.

5

Workersdidnotalwayshaveinstancesofeveryclass.Foreachimagetobecollected,theywere

giventenchoicesoutofwhichtoselectonethatisavailableorrequesttenotherchoices.This

naturallywouldleadtoanextremeclassimbalanceastheeasiestandmostcommonclasseswouldbe

vastlyoverrepresented.Tomaketheclassdistributionmoreuniform,wepresentedobjectsinversely

proportionaltohowfrequenttheyare;theresultingdistributionisfairlyuniform,see?g.4.

Objectsweredescribedtoworkersusingonetofourwords,dependingontheclass.Twoexceptions

weremade,forforksandspoons,asuseragreementonhowtolabeltwoorthogonalfacesofthese

objectclassesisverylow;roughsketcheswereshowninstead.Whenaligningtheirobjectandphone,

workerswereinstructedtoignoretheaspectratiooftherectangularprism.Wefoundthathavinga

singleaspectratio,acubeforexample,forallobjectclasseswasveryconfusingtoworkers.Each

objectclassisannotatedwitharoughaspectratioforitsrectangularprism.Thisagainrepresentsa

smallbiastowardparticularkindsofobjects,althoughthisisalleviatedbythefactthatmostobjects

didnot?tarectangularprismanyway.Deformableobjectswerestillrotatedandusersfollowedthose

rotationsaligningthesemanticallymeaningfulaxeswithobjectparts,butotherdetailsoftheobject

posewerenotcontrolledfor.

Noinstructionsweregivenabouthowtostabilizeobjectsinthedesiredposes.Whennecessary,some

workersheldtheobjectswhileotherproppedthemup.Foreachimage,workerswereaskedtwo

questionsontheirphonecollectionUI:toverifythattheimagedepictsanobjectoftheintendedclass

andthatitisnottooblurry.Inmanyindoorlightingconditions,particularlywithlow-endcameras,it

iseasytotakeunrecognizablephotoswithoutcarefulstabilization.Weestimatethetasktookaround

1.5minutesperobjectonaverageandworkerswerepaid10dollarsperhouronaverage.

Intotal,95,824imageswerecollectedfrom5,982workersoutofwhich50,000imageswereretained

aftervalidationandincludedinthedataset.Eachimagewasmanuallyveri?ed.About48%ofthe

datacollectedwasremoved.In10%ofimages,objectswereplacedinincorrectbackgrounds,showed

faces(0.2%ofimages),orcontainedotherprivateinformation(0.03%ofimages).Wefoundthat

despiteinstructions,manyuserstookphotosofscreensiftheydidnothaveanobject(23%)–these

wereremovedbecauseonthewholetheyareveryeasyformodelstorecognize.Centralizedlocations

thatemployworkersonMechanicalTurkwereeliminatedfromthedatasettoensurethatobjectsare

notimagedonthesamebackgroundsacrossmanyworkers(20%).Notethatsomeproblemcategories

overlapped.Soasnottobiasthedatasettowardimageswhichareeasyforhumans,validatorswere

instructedtobepermissiveandonlyruleoutanimageofanobjectifitclearlyviolatedtheconstraints.

Sinceworkerswhocarryoutthetaskcorrectlydosonearlyperfectly,whileworkerswhodonot,

carriedoutalmosteverytrialincorrectly,wehaveadditionalcon?dencethatimageswhicharehard

torecognizedepictthecorrectobjectclasses.

Thisdatasetconstructionmethodisnotwithoutitslimitations.Allobjectsareindoorobjectswhich

areeasytomanipulate,theycannotbetoolargeorsmall,?xedtothewall,ordangerous.Wecannot

askworkerstomanipulateobjectsinwaysthatwoulddamageorotherwisepermanentlyalterthem.

Someobjectclasseswhicharerarecanbedif?culttogatherandaremorelikelytohaveincorrect

imagesbeforevalidation.Notallundesirablecorrelationsareremovedbythisprocess;forexample,

someobjectsaremorelikelytobeheldthanotherswhilecertainobjectclassesarepredisposedto

haveparticularcolors.Wearenotguaranteedtocoverthespaceofshapesortexturesforeachobject

class.Finally,notallobjectclassesareaseasytorotate,sotheresultingposesarestillcorrelated

withtheobjectclass.

4Results

WeinvestigateobjectdetectorperformanceonObjectNetusinganimagelabelingtask;seesection4.1.

Thenweexplainthisperformancebybreakingdownhowcontrolsaffectresults;section4.2.Finally

wedemonstratethatthedif?cultyofObjectNetliesinthecontrols,andnotintheparticularproperties

oftheimages,by?ne-tuningonthedataset;section4.3.

4.1TransferfromImageNet

WetestedsixobjectdetectorspublishedoverthepastseveralyearsonObjectNet,choosingtop

performersforeachyear:AlexNet(2012)[4],VGG-19(2014)[5],ResNet-152(2016)[6],Inception-

v4(2017)[7],NASNET-A(2018)[8],andPNASNet-5L(2018)[9].Alldetectorswerepre-trained

6

ObjectclassBackground

Rotation?Viewpoint

Figure4:Thedistributionofthe313objectclasses,backgrounds,rotations,andviewpointsinthe

dataset.Theclassdistributionisfairlyuniformduetobiasingworkerstowardlow-frequencyobjects.

Objectbackgrounds,viewpoints,androtationsweresampleduniformlybutrejecteddatacanskew

thedistribution.Eachimageisalsolabeledwitha3Drectangularprismandsemanticallymeaningful

facesforeachobject.Sphericalobjectspopoutoftherotationhistogramastheyhaveasingle

rotation.(?)Notethatobjectrotationsarelessreliablethanthisindicates:notallobjectsareequally

easytorotate,theactualrotationsofobjectspicturedinthedatasetarelessuniform.Thisrepresents

theobjectrotationsthatworkerswereaskedtocollect.Whilethisisalsotrueforbackgroundand

viewpoint,weexpectthatthetruerotationgraphismoreskewedthantheothertwo.

Airfreshener,Alarmclock,Backpack,Bakingsheet,Banana,Bandaid,Baseballbat,Baseballglove,Basket,

Bathrobe,Bathtowel,Battery,Bedsheet,Beerbottle,Beercan,Belt,Bench,Bicycle,Bikepump,Bills

(money),Binder(closed),Biscuits,Blanket,Blender,Blouse,Boardgame,Book(closed),Bookend,Boots,

Bottlecap,Bottleopener,Bottlestopper,Box,Bracelet,Breadknife,Breadloaf,Briefcase,Brooch,Broom,

Bucket,Butcher’sknife,Butter,Button,CD/DVDcase,Calendar,Canopener,Candle,Cannedfood,Cellphone,

Cellphonecase,Cellphonecharger,Cereal,Chair,Cheese,Chesspiece,Chocolate,Chopstick,Clotheshamper,

Clotheshanger,Coaster,Coffeebeans,Coffeegrinder,Coffeemachine,Coffeetable,Coin(money),Comb,

Combinationlock,Computermouse,Contactlenscase,Cookingoilbottle,Cork,Cuttingboard,DVDplayer,

Deodorant,Desklamp,Detergent,Dishragorhandtowel,Dishsoap,Documentfolder(closed),Dogbed,

Doormat,Drawer(open),Dress,Dresspants,Dressshirt,Dressshoe(men),Dressshoe(women),Drill,Drinking

Cup,Drinkingstraw,Dryingrackforclothes,Dryingrackforplates,Dustpan,Earbuds,Earring,Egg,Eggcarton,

Envelope,Eraser(whiteboard),Extensioncable,Eyeglasses,Fan,Figurineorstatue,Firstaidkit,Flashlight,

Flosscontainer,Flourcontainer,Fork,Frenchpress,Fryingpan,Gluecontainer,Hairbrush,Hairclip,Hair

dryer,Hairtie,Hammer,Handmirror,Handbag,Hat,Headphones(overear),Helmet,Honeycontainer,Ice,Ice

cubetray,Iron,Ironingboard,Jam,Jar,Jeans,Kettle,Keyboard,Keychain,Ladle,Lampshade,Laptop(open),

Laptopcharger,Leaf,Leggings,Lemon,Letteropener,Lettuce,Lightbulb,Lighter,Lipstick,Loofah,Magazine,

Makeup,Makeupbrush,Marker,Match,Measuringcup,Microwave,Milk,Mixing/SaladBowl,Monitor,Mouse

pad,Mouthwash,Mug,Multitool,Nail,Nailclippers,Nail?le,Nailpolish,Napkin,Necklace,Newspaper,Night

light,Nightstand,Notebook,Notepad,Nutforascrew,Orange,Ovenmitts,Padlock,Paintbrush,Paintcan,

Paper,Paperbag,Paperplates,Papertowel,Paperclip,Peeler,Pen,Pencil,Peppershaker,Petfoodcontainer,

Landlinephone,Photograph,Pillbottle,Pillorganizer,Pillow,Pitcher,Placemat,Plasticbag,Plasticcup,Plastic

wrap,Plate,Playingcards,Pliers,Plunger,Popcan,Portableheater,Poster,Powerbar,Powercable,Printer,

Raincoat,Rake,Razor,Receipt,Remotecontrol,Removableblade,Ribbon,Ring,Rock,Rollingpin,Ruler,

Runningshoe,Safetypin,Saltshaker,Sandal,Scarf,Scissors,Screw,Scrubbrush,Shampoobottle,Shoelace,

Shorts,Shovel,Skateboard,Skirt,Sleepingbag,Slipper,Soapbar,Soapdispenser,Sock,SoupBowl,Sewingkit,

Spatula,Speaker,Sponge,Spoon,Spraybottle,Squeegee,Squeezebottle,Standinglamp,Stapler,Stepstool,

StillCamera,SinkStopper,Strainer,Stuffedanimal,Sugarcontainer,Suitjacket,Suitcase,Sunglasses,Sweater,

Swimmingtrunks,T-shirt,TV,Tableknife,Tablecloth,Tablet,Tanktop,Tape,Tapemeasure,Tarp,Teabag,

Teapot,Tennisracket,Thermometer,Thermos,Throwpillow,Tie,Tissue,Toaster,Toiletpaperroll,Tomato,

Tongs,Toothbrush,Toothpaste,Totebag,Toy,Trashbag,Trashbin,Travelcase,Tray,Trophy,Tweezers,

Umbrella,USBcable,USB?ashdrive,Vacuumcleaner,Vase,Videocamera,Walker,Walkingcane,Wallet,

Watch,Waterbottle,Water?lter,Webcam,Weight(exercise),Weightscale,Wheel,Whisk,Whistle,Winebottle,

Wineglass,Winterglove,Wok,Wrench,Ziplocbag

Figure5:The313objectclassesinObjectNet.Wechoseobjectclassesthatwerefairlycommon,

nottoosimilartooneanother,coverawiderangeofobjectsavailableinhomes,andcanbesafely

manipulatedbyworkers.The113classeswhichoverlapwithImageNetaremarkedinitalics.

7

ObjectclassBackground

RotationViewpoint

Figure6:Top-1performanceofResNet-152pretrainedonImageNetonthesubsetofObjectNet

–113classeswhichoverlapwithImageNet–asafunctionofcontrolsused.No?ne-tuningwas

performed;seesection4.3.Classes

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論