A Compact Method for Backface Culling - Gamasutra

文章推薦指數: 80 %
投票人數:10人

Backface culling is an important part of how a 3D engine performs visibility checks. Its purpose is to detect polygons that are invisible in a particular ... ACompactMethodforBackfaceCulling ByOsnatLevi Backface cullingisanimportantpartofhowa3Dengineperformsvisibilitychecks. Itspurposeistodetectpolygonsthatareinvisibleinaparticularscene –thatis,polygonsthatfaceawayfromtheviewer.Detectingback-facing polygonsallowsustoeliminatethemfromanengine'srenderingpipeline atanearlystage,thusreducingtheamountofcomputationandmemory traffic.Therearevariouswaystodetectwhetherapolygonfacestowards orawayfromtheviewer,andeachmethodhasitsownlevelofcomplexity andaccuracy. Thisarticle describesanewcullingtechniquethatreducesthestoragerequirements forfacetnormalswhenperformingbackfaceculling.Thishighlyaccurate technique,whichperformsaccuratebackfacecullinginobjectspace,is particularlyapplicabletofront-endculling,andrequiresonlyhalfas muchstorageasthestandardfacet-normaltechnique.Includedattheend ofthisarticleisanappendixcontainingC++implementationsofboth thetraditionalmethodandthecompactmethod,bothofwhichcomeinscalar andSIMDmodes.Thecodeforthesemethodsiscontainedinadownloadable archive. Motivation forfront-endculling Backface cullingcanbeperformedatseveralstagesinthe3Dpipeline.Though wecouldjustsitbackandlettherasterizercullforus,it'sadvantageous toperformthecullingstepearlier.Afterall,theearlierwegetrid ofirrelevantdata,thelessdataneedstobepushedaroundthesystem (savingbandwidth)andthefewercalculationsthatmustbeperformed(saving CPUload).Cullmaybeperformedduringoneofthreestages: 1.Before transformationandlighting 2.After transformationbutbeforelighting 3.Inthe rasterizer(aftertransformationandlighting). Culling duringstages2and3istypicallyperformedinscreenspacebychecking theclockwise/counterclockwiseorderofapolygon'svertices.Front-end culling(stage1)istypicallyperformedbycalculatingthedotproduct oftheviewingvectorandthefacetnormalofthepolygon.Thefacetnormal canbecalculatedontheflyorprecalculatedandstoredwiththedata set.Inanycase,cullingupfront(stage1)typicallyisfasterthan theothercullingstrategies(stages2or3)sinceitsavesbandwidth andrequiresfewercalculations. Thereal expenseoffront-endcullingisthatweeitherhavetocalculatethefacet normalontheflyoruseaprecalculatedfacetnormal,whichincreases thesizeofourmodeldatabase.However,thisincreasecanbereduced byhalf,asweshowbelow.Andalthoughtheprecalculatedfacetnormals addtothemodelsize,themethodaccessesmemorysequentially,withis advantageoustous. Ontheother hand,withthecullingbasedonvertex-ordertesting(stages2or3), thecullingprocedureloopsthroughthetrianglesandaccessesthevertices foreachtriangle.Theverticesofneighboringtriangles(andevenof thesametriangle)canbespreadalloverthevertexpool,causingrandom memoryaccessesduringtheculling.Theserandomaccessesareslowand leadtopoorcacheutilization. Theculling techniquebasedonfacetnormalsalsoloopsthroughthetriangles,but itaccessestheirfacetnormalsinsteadoftheirvertices.Sincethefacet normalsarestoredonper-trianglebasis,theyarefetchedsequentially inthecullingloop.Thereforethesequentialmemoryaccessesarefast, theyutilizethecacheeffectively,andtheycanbefurtheraccelerated usingprefetch. Ourwork withgamedevelopershasshownthatfront-endcullingoftenincreases performancesubstantially(a10to20percentboostintheframe-rate) ifdoneproperly.Inthefollowingsections,weshowhowtoimplement efficientfront-endculling,usingprecalculatedfacetnormalsstored inacompactformwithonlyhalfthestoragerequirementsofthestandard form. Accurate front-endcullingusingfacetnormals Thistechnique performsaper-polygontestusingthecamerapositionandthepolygon's facetnormalandpositiontodeterminewhetheritisfrontorbackfacing. Itsmainadvantagesareitsaccuracy—themethodcullsthemaximumnumber ofpolygons—andthefactthatitscalculationsaredoneinobjectspace, whicheliminatesverticesinthepretransformphase. Theclassic implementationofthistechniquerequiresustoprecalculateandstore thefollowinginformationaboutthepolygon: •Thenormal ofthepolygon(facetnormal)storedinobjectspace.(Sometoolswill exportthenormal,butotherwiseitmustbecalculatedusingthepolygon's vertices.) •Aposition ontheplaneofthepolygoninobjectspace—eitheroneofthevertices orthepolygon'scenter. •Alist holdingtheindicesofthepolygon'svertices(indiceslist) Todetermine whetherapolygonisfrontorbackfacing,wemustfirsttransformthe viewerposition(orcameraposition)toobjectspace,calculatetheviewing vector(thevectorinwhichthevieweris"looking"atthepolygon),and thencalculatethedotproductbetweentheviewingvectorandthepolygon normal.Iftheresultispositive,thenthepolygonisfacingbackwards. Here'san exampleofthetraditionalimplementationforindexedtriangles.Thestructure we'reassumingforeachfaceis32bytesinsize,andisasfollows: struct{   floatx,y,z; }Vector; struct{   Vector position; //positiononpolygon //12bytes Vector normal; //normalofpolygon //12bytes WORDp1,p2,p3; //indicestovertexpool //6bytes WORDstub; // neededtoavoid2bytealignment //stallsandfillingacacheline. }faceData; //= totalsize32bytes Here'spseudo-code fortheculling(pleaserefertotheappendix foraC++implementation): for(eachvertexin pool) {   markvertex "notvisible" } Transform Vw(viewerinworldspace)toVo(viewerinobjectspace)usingtheworld-to-object transformationmatrix. for(eachfacein array) {   Calculatevector T=face->position–Vo Tistheviewing vectorinobjectspace. Calculatethedot productbetweenTandthefacenormal. (T*face->normal) isthecosinebetweenthefacenormalandtheviewingvector. Iftheresult<=0 thenthepolygonisfrontfacing. if(dotproduct <=0) //polygonisfrontfacing { face->p1,face->p2 andface->p3areindicestothevertexpool. Usethemtomark theappropriatevertices"visible" } } Compact culling Compact cullingisderivedfromfront-endcullingandprovidesanalgorithmthat isaccurateandisdoneinobjectspace.Applyingthefollowingtwooptimizations reducesthestorageperpolygonbyhalf: 1.Precalculate thedotproduct.Let'sassumethat: P= positiononpolygon(inobjectspace) N= normalofpolygon(inobjectspace) V= viewingvector(transformedtoobjectspace) Thevisibility checkcalculates(P-V)*N,whichcanalsobewrittenas(P*N)-(V*N). Wecanprecalculate (P*N)andstoretheresult.Bystoringthedotproduct,we'veeliminated theneedtostorethepolygon'sposition. 2.Store thenormalinfixed-pointrepresentation.Becausethepolygonnormalis normalized,itscomponentsarewithintherange–1to+1.Ifwemultiply thenormalbythemaximumvalueforsignedword,itwillbewithinthe rangeof–32767to+32767.Wecannowstorethenormalasa2-bytesigned wordinsteadofa4-bytefloat. Ourface structureisnow struct{   floatpn; // precalculatedposition*normal //4bytes signedshortnormal[3]; //normalofpolygon //6bytes WORD p1,p2, p3; //indicestovertexpool //6bytes }faceData;//=total size16bytes Thesize ofthecompactstructureis50percentofthesizeofthestandardstructure. Ourcullingloopnowbecomes: for(eachfacein thearray) { Expandthenormalbacktofloatfromitsfixed-pointrepresentation. Calculateface->tn –(face->normal*Vo) Previouslywecalculated (P-V)*Nhere,butsincewehavetn=P*N(precalculated)we nowneedtocalculate(tn–V*N) if(result<= 0) //polygonisfront-facing { face->p1,face->p2 andface->p3areindicestothevertexpool. Usethemtomark theappropriatevertices"visible"   } } Performance results Theperformance improvementsthatcompactcullingoffersdependsontheimplementation. However,eventhemoststraightforwardimplementationshouldhavesome importantbenefits.Forone,compactcullingoffersbettermemoryutilization. Becausethecompactstructureishalfthestandardsize,therewillbe fewercachemisses.Furthermore,compactcullingusesfront-endculling formoreobjects.Ifwehavebothaheavycontentapplicationandmemory sizerestrictions,wecanusefront-endcullingforobjectsthatweotherwise wouldn'tbeabletoincludeinourscene. Theappendix containsadowloadabletestapplicationthatcomparescullingfortwo kindsofobjects.Foregroundandbackgroundobjectsarerepresented byasphereandaterrain,respectively.Mostgameobjectscorrespond tooneofthesetypeswithregardtothenumberofculledtriangles.On average,aboutof50percentofthetrianglesinaforegroundobjectare backfacing.Backgroundobjectshavefewerback-facingtriangles,depending onthekindofbackground. Thetest applicationcontainsfourcullingimplementationsinC++: 1.Compact culling–Scalarimplementation. 2.Compact culling–SIMDimplementation. 3.Compact culling–SIMDimplementationwithprefetchinstructions. 4.Non-compact (standard)cullingimplementation. Thefirst isascalarimplementationwithacompactstructure,asmentionedpreviously. ThesecondandthirduseacompactstructureexpandedforSIMD(single instruction,multipledata);eachcomponentofthestructurebecomesan arrayof4.TheseimplementationsallowustouseSIMDinstructions,including MMXtechnologyandStreamingSIMDExtensions,andtoprocessfourelements inparallel.Thethirdimplementationisidenticaltothesecond,with theadditionofmemoryprefetchinstructions.Thefourthisthestandard non-compactcullingimplementation. Performance ratiosareasfollows.Notethatahighertrianglecountyieldshigher ratiosbecauserelativelymoretimeisspentintheoptimizedkernel. SPHEREOBJECT #triangles 400 6400 20,000 CompactScalar/ Non-Compact 1.15 1.2 1.2 CompactSIMD/ Non-Compact 1.45 1.6 1.7 CompactSIMD+prefetch / Non-Compact 2.3 2.7 2.7 Thefacet normal-basedfront-endcullingtechniqueincreasesthetotalapplication performancebyeliminatingtheexcessivetrianglesearlyinthe3Dpipeline. Italsoofferstheadvantageofthesequentialmemoryaccess.Themodel sizeincreaseduetopre-calculatedfacetnormalscanbelimitedusing thecompactcullingtechniquedescribedinthispaper.Ourmeasurements haveshownthatusingMMXtechnologyandStreamingSIMDExtensionscan doubletheperformanceofthecompactculling. Osnat LevihasaB.SC.(1991)incomputersciencefromTheTechnion-Israel InstituteofTehnology.Herareasofconcentrationarein3Dgraphics andoptimizationtechniques.OsnatworksintheMediaTeamatIntel's IsraelDesignCenter(IDC)inHaifa,Israel.BeforejoiningIntel,Osnat wasdevelopingsoftwareforembeddedrealtimemilitarysystems.Shecan becontactedat[email protected]. Ronen ZoharhasaB.A.(1998)incomputersciencefromTheTechnion-Israel InstituteofTechnology.Hisareasofconcentrationarein3Dgraphics, geometricmodelingandcomputervision.RonencurrentlyworksintheMedia TeamatIntel'sIsraelDesignCenter(IDC)inHaifa,Israel.Hecanbe reachedat[email protected]. Haim BaradhasaBSEEandBSCSfromTulaneUniv(1981).HealsohasanMSEE (1983)andPhD(1987)fromUnivofSouthernCalifornia.Hisareasofconcentration arein3Dgraphics,videoandimageprocessing.Haimcurrentlyleadsthe MediaTeamatIntel'sIsraelDesignCenter(IDC)inHaifa,Israel. Alex KlimovitskiisaseniorapplicationengineerwithIntelDeveloperRelations andEngineeringGroupEurope.Alexworksonsoftwarelibrariesandtools forthelatestIntelprocessors,mostrecentlyforPentium®III.He assistedseveralleadingsoftwarevendorsinportingtheirproductsto PentiumIII.BeforejoiningIntel,Alexwasdevelopingsignalprocessing andscientificvisualizationsoftware. Appendix Thisappendix containsaC++implementationwithall fourmethods:non-compactculling,compactscalarculling,compactSIMD culling,andcompactSIMDcullingwithprefetchinstructions. Thetraditional, non-compactstructureforstoringtriangledatais32bytesandisthe following: structSTriangleData {   floatpx,py,pz; //positiononthetriangle floatnx,ny,nz; //thenormalofthetriangle WORDv1,v2,v3; //indicestothevertexpool WORDstub; //paddingtofillacachelineand //avoidalignmentstalls }; Thecompact structureisisonly16bytes.Westorethenormalinfixed-pointrepresentation insteadoffloat.Thetnfieldisaprecalculateddotproductoftheposition*normal ofthetriangle.Hereitis: struct SCompactTriangleData {   float tn; // precalculateddotproduct //(position*normal) signedshort nx,ny,nz; //thenormalofthetriangle WORD v1,v2,v3; //indicestothevertexpool }; Thecompact SIMDstructurehasthesamefieldsasthecompactstructure,exceptthat eachisfourelementswide.TheF32vec4andIs16vec4classesaredefined intheIntelVectorClassLibraries.TheselibrariesprovideaC++interface forusingMMXTechnologyandStreamingSIMDExtensions.Hereisthecompact SIMDstructure: struct SCompactSIMDTriangleData {   F32vec4 tn; //4itemsofprecalculateddot //product(position*normal) Is16vec4 nx,ny,nz; //4itemsofthetriangle'snormal WORD v[4][3]; //4itemsofindicestothevertexpool }; Thetraditional backfacecullingroutineisasfollows: intC3DObject::backfaceCulling() {   //With1bitper vertexcalculatenumberof //dwordsincullflagarray DWORDcullFlagsLen =(m_nVerts+31)/32; inti; intff= 0;//frontfacingtriangles //Markallvertices as"invisible" for(i=0 ;i 0) //anyofthe4dotproductsisnegative {   for(int j=0;j<4;j++) //do4times {   if (mask&1) //dotproductisnegative {   ff++; // markthe //triangle3 //verticesas //"visible" SELECT_VTX(vertsCulled, m_triData[i].v[j][0]); SELECT_VTX(vertsCulled, m_triData[i].v[j][1]); SELECT_VTX(vertsCulled, m_triData[i].v[j][2]); } // shiftright mask >>=1; } } } _m_empty(); return ff; } Returntothefullversionofthisarticle Copyright© UBMTech,Allrightsreserved



請為這篇文章評分?