A Compact Method for Backface Culling - Gamasutra
文章推薦指數: 80 %
Backface culling is an important part of how a 3D engine performs visibility checks. Its purpose is to detect polygons that are invisible in a particular ...
ACompactMethodforBackfaceCulling
ByOsnatLevi
Backface
cullingisanimportantpartofhowa3Dengineperformsvisibilitychecks.
Itspurposeistodetectpolygonsthatareinvisibleinaparticularscene
–thatis,polygonsthatfaceawayfromtheviewer.Detectingback-facing
polygonsallowsustoeliminatethemfromanengine'srenderingpipeline
atanearlystage,thusreducingtheamountofcomputationandmemory
traffic.Therearevariouswaystodetectwhetherapolygonfacestowards
orawayfromtheviewer,andeachmethodhasitsownlevelofcomplexity
andaccuracy.
Thisarticle
describesanewcullingtechniquethatreducesthestoragerequirements
forfacetnormalswhenperformingbackfaceculling.Thishighlyaccurate
technique,whichperformsaccuratebackfacecullinginobjectspace,is
particularlyapplicabletofront-endculling,andrequiresonlyhalfas
muchstorageasthestandardfacet-normaltechnique.Includedattheend
ofthisarticleisanappendixcontainingC++implementationsofboth
thetraditionalmethodandthecompactmethod,bothofwhichcomeinscalar
andSIMDmodes.Thecodeforthesemethodsiscontainedinadownloadable
archive.
Motivation
forfront-endculling
Backface
cullingcanbeperformedatseveralstagesinthe3Dpipeline.Though
wecouldjustsitbackandlettherasterizercullforus,it'sadvantageous
toperformthecullingstepearlier.Afterall,theearlierwegetrid
ofirrelevantdata,thelessdataneedstobepushedaroundthesystem
(savingbandwidth)andthefewercalculationsthatmustbeperformed(saving
CPUload).Cullmaybeperformedduringoneofthreestages:
1.Before
transformationandlighting
2.After
transformationbutbeforelighting
3.Inthe
rasterizer(aftertransformationandlighting).
Culling
duringstages2and3istypicallyperformedinscreenspacebychecking
theclockwise/counterclockwiseorderofapolygon'svertices.Front-end
culling(stage1)istypicallyperformedbycalculatingthedotproduct
oftheviewingvectorandthefacetnormalofthepolygon.Thefacetnormal
canbecalculatedontheflyorprecalculatedandstoredwiththedata
set.Inanycase,cullingupfront(stage1)typicallyisfasterthan
theothercullingstrategies(stages2or3)sinceitsavesbandwidth
andrequiresfewercalculations.
Thereal
expenseoffront-endcullingisthatweeitherhavetocalculatethefacet
normalontheflyoruseaprecalculatedfacetnormal,whichincreases
thesizeofourmodeldatabase.However,thisincreasecanbereduced
byhalf,asweshowbelow.Andalthoughtheprecalculatedfacetnormals
addtothemodelsize,themethodaccessesmemorysequentially,withis
advantageoustous.
Ontheother
hand,withthecullingbasedonvertex-ordertesting(stages2or3),
thecullingprocedureloopsthroughthetrianglesandaccessesthevertices
foreachtriangle.Theverticesofneighboringtriangles(andevenof
thesametriangle)canbespreadalloverthevertexpool,causingrandom
memoryaccessesduringtheculling.Theserandomaccessesareslowand
leadtopoorcacheutilization.
Theculling
techniquebasedonfacetnormalsalsoloopsthroughthetriangles,but
itaccessestheirfacetnormalsinsteadoftheirvertices.Sincethefacet
normalsarestoredonper-trianglebasis,theyarefetchedsequentially
inthecullingloop.Thereforethesequentialmemoryaccessesarefast,
theyutilizethecacheeffectively,andtheycanbefurtheraccelerated
usingprefetch.
Ourwork
withgamedevelopershasshownthatfront-endcullingoftenincreases
performancesubstantially(a10to20percentboostintheframe-rate)
ifdoneproperly.Inthefollowingsections,weshowhowtoimplement
efficientfront-endculling,usingprecalculatedfacetnormalsstored
inacompactformwithonlyhalfthestoragerequirementsofthestandard
form.
Accurate
front-endcullingusingfacetnormals
Thistechnique
performsaper-polygontestusingthecamerapositionandthepolygon's
facetnormalandpositiontodeterminewhetheritisfrontorbackfacing.
Itsmainadvantagesareitsaccuracy—themethodcullsthemaximumnumber
ofpolygons—andthefactthatitscalculationsaredoneinobjectspace,
whicheliminatesverticesinthepretransformphase.
Theclassic
implementationofthistechniquerequiresustoprecalculateandstore
thefollowinginformationaboutthepolygon:
•Thenormal
ofthepolygon(facetnormal)storedinobjectspace.(Sometoolswill
exportthenormal,butotherwiseitmustbecalculatedusingthepolygon's
vertices.)
•Aposition
ontheplaneofthepolygoninobjectspace—eitheroneofthevertices
orthepolygon'scenter.
•Alist
holdingtheindicesofthepolygon'svertices(indiceslist)
Todetermine
whetherapolygonisfrontorbackfacing,wemustfirsttransformthe
viewerposition(orcameraposition)toobjectspace,calculatetheviewing
vector(thevectorinwhichthevieweris"looking"atthepolygon),and
thencalculatethedotproductbetweentheviewingvectorandthepolygon
normal.Iftheresultispositive,thenthepolygonisfacingbackwards.
Here'san
exampleofthetraditionalimplementationforindexedtriangles.Thestructure
we'reassumingforeachfaceis32bytesinsize,andisasfollows:
struct{
floatx,y,z;
}Vector;
struct{
Vector position;
//positiononpolygon
//12bytes
Vector normal;
//normalofpolygon
//12bytes
WORDp1,p2,p3;
//indicestovertexpool
//6bytes
WORDstub; //
neededtoavoid2bytealignment
//stallsandfillingacacheline.
}faceData; //=
totalsize32bytes
Here'spseudo-code
fortheculling(pleaserefertotheappendix
foraC++implementation):
for(eachvertexin
pool)
{
markvertex
"notvisible"
}
Transform
Vw(viewerinworldspace)toVo(viewerinobjectspace)usingtheworld-to-object
transformationmatrix.
for(eachfacein
array)
{
Calculatevector
T=face->position–Vo
Tistheviewing
vectorinobjectspace.
Calculatethedot
productbetweenTandthefacenormal.
(T*face->normal)
isthecosinebetweenthefacenormalandtheviewingvector.
Iftheresult<=0
thenthepolygonisfrontfacing.
if(dotproduct
<=0)
//polygonisfrontfacing
{
face->p1,face->p2
andface->p3areindicestothevertexpool.
Usethemtomark
theappropriatevertices"visible"
}
}
Compact
culling
Compact
cullingisderivedfromfront-endcullingandprovidesanalgorithmthat
isaccurateandisdoneinobjectspace.Applyingthefollowingtwooptimizations
reducesthestorageperpolygonbyhalf:
1.Precalculate
thedotproduct.Let'sassumethat:
P=
positiononpolygon(inobjectspace)
N=
normalofpolygon(inobjectspace)
V=
viewingvector(transformedtoobjectspace)
Thevisibility
checkcalculates(P-V)*N,whichcanalsobewrittenas(P*N)-(V*N).
Wecanprecalculate
(P*N)andstoretheresult.Bystoringthedotproduct,we'veeliminated
theneedtostorethepolygon'sposition.
2.Store
thenormalinfixed-pointrepresentation.Becausethepolygonnormalis
normalized,itscomponentsarewithintherange–1to+1.Ifwemultiply
thenormalbythemaximumvalueforsignedword,itwillbewithinthe
rangeof–32767to+32767.Wecannowstorethenormalasa2-bytesigned
wordinsteadofa4-bytefloat.
Ourface
structureisnow
struct{
floatpn; //
precalculatedposition*normal
//4bytes
signedshortnormal[3];
//normalofpolygon
//6bytes
WORD p1,p2,
p3; //indicestovertexpool
//6bytes
}faceData;//=total
size16bytes
Thesize
ofthecompactstructureis50percentofthesizeofthestandardstructure.
Ourcullingloopnowbecomes:
for(eachfacein
thearray)
{
Expandthenormalbacktofloatfromitsfixed-pointrepresentation.
Calculateface->tn
–(face->normal*Vo)
Previouslywecalculated
(P-V)*Nhere,butsincewehavetn=P*N(precalculated)we
nowneedtocalculate(tn–V*N)
if(result<=
0) //polygonisfront-facing
{
face->p1,face->p2
andface->p3areindicestothevertexpool.
Usethemtomark
theappropriatevertices"visible"
}
}
Performance
results
Theperformance
improvementsthatcompactcullingoffersdependsontheimplementation.
However,eventhemoststraightforwardimplementationshouldhavesome
importantbenefits.Forone,compactcullingoffersbettermemoryutilization.
Becausethecompactstructureishalfthestandardsize,therewillbe
fewercachemisses.Furthermore,compactcullingusesfront-endculling
formoreobjects.Ifwehavebothaheavycontentapplicationandmemory
sizerestrictions,wecanusefront-endcullingforobjectsthatweotherwise
wouldn'tbeabletoincludeinourscene.
Theappendix
containsadowloadabletestapplicationthatcomparescullingfortwo
kindsofobjects.Foregroundandbackgroundobjectsarerepresented
byasphereandaterrain,respectively.Mostgameobjectscorrespond
tooneofthesetypeswithregardtothenumberofculledtriangles.On
average,aboutof50percentofthetrianglesinaforegroundobjectare
backfacing.Backgroundobjectshavefewerback-facingtriangles,depending
onthekindofbackground.
Thetest
applicationcontainsfourcullingimplementationsinC++:
1.Compact
culling–Scalarimplementation.
2.Compact
culling–SIMDimplementation.
3.Compact
culling–SIMDimplementationwithprefetchinstructions.
4.Non-compact
(standard)cullingimplementation.
Thefirst
isascalarimplementationwithacompactstructure,asmentionedpreviously.
ThesecondandthirduseacompactstructureexpandedforSIMD(single
instruction,multipledata);eachcomponentofthestructurebecomesan
arrayof4.TheseimplementationsallowustouseSIMDinstructions,including
MMXtechnologyandStreamingSIMDExtensions,andtoprocessfourelements
inparallel.Thethirdimplementationisidenticaltothesecond,with
theadditionofmemoryprefetchinstructions.Thefourthisthestandard
non-compactcullingimplementation.
Performance
ratiosareasfollows.Notethatahighertrianglecountyieldshigher
ratiosbecauserelativelymoretimeisspentintheoptimizedkernel.
SPHEREOBJECT
#triangles
400
6400
20,000
CompactScalar/
Non-Compact
1.15
1.2
1.2
CompactSIMD/
Non-Compact
1.45
1.6
1.7
CompactSIMD+prefetch
/
Non-Compact
2.3
2.7
2.7
Thefacet
normal-basedfront-endcullingtechniqueincreasesthetotalapplication
performancebyeliminatingtheexcessivetrianglesearlyinthe3Dpipeline.
Italsoofferstheadvantageofthesequentialmemoryaccess.Themodel
sizeincreaseduetopre-calculatedfacetnormalscanbelimitedusing
thecompactcullingtechniquedescribedinthispaper.Ourmeasurements
haveshownthatusingMMXtechnologyandStreamingSIMDExtensionscan
doubletheperformanceofthecompactculling.
Osnat
LevihasaB.SC.(1991)incomputersciencefromTheTechnion-Israel
InstituteofTehnology.Herareasofconcentrationarein3Dgraphics
andoptimizationtechniques.OsnatworksintheMediaTeamatIntel's
IsraelDesignCenter(IDC)inHaifa,Israel.BeforejoiningIntel,Osnat
wasdevelopingsoftwareforembeddedrealtimemilitarysystems.Shecan
becontactedat[email protected].
Ronen
ZoharhasaB.A.(1998)incomputersciencefromTheTechnion-Israel
InstituteofTechnology.Hisareasofconcentrationarein3Dgraphics,
geometricmodelingandcomputervision.RonencurrentlyworksintheMedia
TeamatIntel'sIsraelDesignCenter(IDC)inHaifa,Israel.Hecanbe
reachedat[email protected].
Haim
BaradhasaBSEEandBSCSfromTulaneUniv(1981).HealsohasanMSEE
(1983)andPhD(1987)fromUnivofSouthernCalifornia.Hisareasofconcentration
arein3Dgraphics,videoandimageprocessing.Haimcurrentlyleadsthe
MediaTeamatIntel'sIsraelDesignCenter(IDC)inHaifa,Israel.
Alex
KlimovitskiisaseniorapplicationengineerwithIntelDeveloperRelations
andEngineeringGroupEurope.Alexworksonsoftwarelibrariesandtools
forthelatestIntelprocessors,mostrecentlyforPentium®III.He
assistedseveralleadingsoftwarevendorsinportingtheirproductsto
PentiumIII.BeforejoiningIntel,Alexwasdevelopingsignalprocessing
andscientificvisualizationsoftware.
Appendix
Thisappendix
containsaC++implementationwithall
fourmethods:non-compactculling,compactscalarculling,compactSIMD
culling,andcompactSIMDcullingwithprefetchinstructions.
Thetraditional,
non-compactstructureforstoringtriangledatais32bytesandisthe
following:
structSTriangleData
{
floatpx,py,pz;
//positiononthetriangle
floatnx,ny,nz;
//thenormalofthetriangle
WORDv1,v2,v3;
//indicestothevertexpool
WORDstub;
//paddingtofillacachelineand
//avoidalignmentstalls
};
Thecompact
structureisisonly16bytes.Westorethenormalinfixed-pointrepresentation
insteadoffloat.Thetnfieldisaprecalculateddotproductoftheposition*normal
ofthetriangle.Hereitis:
struct SCompactTriangleData
{
float tn; //
precalculateddotproduct
//(position*normal)
signedshort nx,ny,nz;
//thenormalofthetriangle
WORD v1,v2,v3;
//indicestothevertexpool
};
Thecompact
SIMDstructurehasthesamefieldsasthecompactstructure,exceptthat
eachisfourelementswide.TheF32vec4andIs16vec4classesaredefined
intheIntelVectorClassLibraries.TheselibrariesprovideaC++interface
forusingMMXTechnologyandStreamingSIMDExtensions.Hereisthecompact
SIMDstructure:
struct SCompactSIMDTriangleData
{
F32vec4 tn;
//4itemsofprecalculateddot
//product(position*normal)
Is16vec4 nx,ny,nz;
//4itemsofthetriangle'snormal
WORD v[4][3];
//4itemsofindicestothevertexpool
};
Thetraditional
backfacecullingroutineisasfollows:
intC3DObject::backfaceCulling()
{
//With1bitper
vertexcalculatenumberof
//dwordsincullflagarray
DWORDcullFlagsLen
=(m_nVerts+31)/32;
inti;
intff=
0;//frontfacingtriangles
//Markallvertices
as"invisible"
for(i=0
;i
延伸文章資訊
- 1learnopengl学习笔记——面剔除(face culling) - CSDN博客
这正是面剔除(Face Culling)所做的。OpenGL能够检查所有面向(Front Facing)观察者的面,并渲染它们,而丢弃那些背向(Back Facing)的面,节省我们很多 ...
- 2Face culling - LearnOpenGL
face culling does. OpenGL checks all the faces that are · front facing towards the viewer and ren...
- 3What is Back-face Culling? - Computer Hope
Back-face culling is a method in computer graphics programming which determines whether a polygon...
- 4OpenGL Tutorial 16 - Face Culling & FPS Counter - YouTube
- 5OpenGL學習腳印:背面剔除(Face Culling) - 程式人生
OpenGL學習腳印:背面剔除(Face Culling). 阿新• • 發佈:2019-02-02. 寫在前面在繪製封閉型別的幾何物件時,開啟背面剔除功能能夠提高渲染效能。本節簡要介紹下背面 ...