Contact-hardening Soft Shadows Made Fast - GameDev.net

文章推薦指數: 80 %
投票人數:10人

Soft shadows are slower than standard shadow mapping because they usually require taking more than one shadow map sample, as in aforementioned [ ... AllContent Blogs Forums News Tutorials LogIn SignUp  Login Username/Email Password Rememberme Forgotpassword? Login or Don'thaveaGameDev.netaccount?Signup  Forgotyourpassword? EmailAddress ResetPassword Pleasecontactusifyouhaveanytroubleresettingyourpassword. Home Blogs Careers Careers Forums News Portfolios Projects Tutorials New?Learnaboutgamedevelopment FollowUs ChatintheGameDev.netDiscord! BacktoGraphicsandGPUProgramming Contact-hardeningSoftShadowsMadeFast Programming GraphicsandGPUProgramming 3D PublishedSeptember02,2018 byWojciechSterna,postedbymaxest Doyouseeissueswiththisarticle?Letusknow. Advertisement Figure1:Contact-hardeningSoftShadowsinSponzascene.Notehowthegreenadornments’shadowsarehard(becausetheshadowsareclosetotheadornments)whereasshadowsofpoleshangedhighintheair(polesarenotvisibleinthisscreenshot)castveryblurry,barelyvisibleshadows(likeonthegreenadornment inthecenterofthefigureoronthelitwallabovethearc).ShadowsofthecolumnsinthecenterofSponzabestpresentcontact-hardeningnatureofshadows.Thecontentpresentedhere,butfromadifferentpointofview,willbeasubjectoffollowingscreenshots.Comparethisscreenshotwithscreenshot2tobetterunderstandwhichobjects’shadowswewillbeinspectingthroughoutthearticle.   ShadowMapping[15]isbyfarthemostprevalenttechniqueusedtorendershadowsinreal-time.Oneofitsadvantagesisthatit’squiteeasytogetdecentsoftshadowswiththistechniqueasin[14].Fromthatpointonwecangostraighttocontact-hardeningsoftshadows(CHSS)whichwasfirstintroducedinPercentage-CloserSoftShadows(PCSS)paper[7].[2]reviewsthesubjectextensively. Softshadowsareslowerthanstandardshadowmappingbecausetheyusuallyrequiretakingmorethanoneshadowmapsample,asinaforementioned[14].Therearealternativeapproacheshowever–ExponentialShadowMaps(ESM)[6]andVarianceShadowMaps(VSM)[10]–thattreatshadowmaptexelsasrandomvariablesandapplystatisticalmethodsonthem.Thesemethodsareusuallyfasterbutsufferfromannoyingartifactslikelight-bleeding.Acompletelydifferentapproachtoproducingsoftshadowsistoblurtheminscreenspaceasin[4].It’saveryinterestingapproachbutcomeswithallproblemsassociatedwithcomputinginscreenspace–likehowtoblurashadowboundarythatiscoveringtheentirescreenwithoutkillingtheGPU? Contact-hardeningsoftshadows,basedon[7],areevenslowerthanregularsoftshadowsbecausetheyrequireevenmoresamplestolookforanaverageoccluder’sdepththatisusedinestimationofpenumbra(regionoftransitionbetweenfully-litandfully-shadowedareas).Soupfrombarelyoneshadowmapsamplewemightendupwith16foroccluder’sdepthestimationand32foractualshadowmapfiltering.Thatisalotbytoday’sstandards,especiallyhavingvideogameconsolesinmind.Oneimplementationworthcheckingoutthatdeliveredwithabigcommericalgameis[5]. Statisticalmethods,likeESMandVSM,andscreenspacemethodsbothhavetheirmerits.InthisarticlehoweverI’llonlyfocusonhowtoimproveonPCSS.Wewillstartwithhowtoachievedecentlybigpenumbraswithouttheneedtotakedozensofsamplesandwillproceedonhowtomakesoftshadowscontact-hardeningatonlyafractionofadditionalGPUtime.Wewillalsoinvestigatecheckerboardingandseehowthatrelativelynewoptimizationapproachperformswithregardtoshadowsrendering. Thisarticleismeantforaudiencethatalreadyhasexperienceinimplementingshadowmappingandwouldliketohavefastcontact-hardeningsoftshadowsintheirgames/simulations.ThisworkisheavilybasedonPCSS[7]andaspirestobeitsimprovement.Thereisademoapplication[11]presentingtechniquesdescribedhere. Figure1presentscontact-hardeningsoftshadowsinanactualscenario.   1SoftShadowswithFewSamples Inordertohavebiggerpenumbraregionsalargenumberofshadowmapsamplesisneeded.Aninitialapproachusuallyinvolvestakingallshadowmapsampleswithinagivenregion.Thatcanbealotofsamples,particularlywhentheshadowmapisofhighresolution(forahigh-resshadowmap,tocoverthesameworldspaceregion,moresamplesareneededthanforalower-resshadowmap,assumingtheybothcoverthesameregioninworldspace).AwaytospeedthisupistouseGPU’ssamplegatherinstructions,justliketheydidin[5]andwhatisdescribedin[12].Withgatherinstructionswecanreducethenumberofsamplingoperationsinashaderfourtimes.Buteventhatmightnotbeenoughforbigsamplekernelsthat onemightwantforelongatedsoftsunlightshadows. Analternativetosamplingthewholeregionistorandomlypickafewsamplesintheregionandonlyusethose.Butifwetakethesamerandomsetofsamplesandapplyittoallpixelsintheregionwewillendupwithquitenastybanding.Comparefigures2and3.Tocounterthiseffectwecanmakesurethateachpixelintheregionwilluseadifferentsetofrandomsamples.Thiswillresultinnoisyshadows,whatusuallyispreferredtobanding,butnotalways.Bothhavetheirmertis–noisegivesrandomnesswhereasbandinggivesstability.Ideallywewouldlikesomethingin-between.Turnsoutthatvariousresearchershaveworkedonthisproblemandcameupwithinterestingsolutions. Wehaveactuallytwoproblemsrightnowtosolve.Thefirstoneistofigureoutwhatsamplestopickforshadowmapsampling.Thesecondoneisinwhatwayweshouldpickdifferentsamplesfordifferentpixelsintheregion–asweknow,usingexactlythesamesamplesforallpixelsintheregioncausesbanding.WewillsolvethefirstproblemusingVogeldisksamplesandthesecondwithinterleavedgradientnoise. Figure2:Naiveshadowswith11x11kernel,thatis121samples. Figure3:Shadowswith16samples.Bandingisseenbecauseeachpixelusesthesameshadowmapsamplecoordinates.   1.1VogelDisk Vogeldiskalgorithm[3][1]spreadssamplesonadiskevenly,asshowninfigure4.AgreatfeatureofVogeldiskisthatwhenyourotatethepointstheywillnevermapontothemselves(agivensamplewillnotmapontoitselfnoranyothersample),unlessyourotateby2π. Figure4:Vogeldisk. Listing1showshowtogenerateVogeldiskcoordinatesinashader. float2VogelDiskSample(intsampleIndex,intsamplesCount,floatphi) { floatGoldenAngle=2.4f; floatr=sqrt(sampleIndex+0.5f)/sqrt(samplesCount); floattheta=sampleIndex∗GoldenAngle+phi; floatsine,cosine; sincos(theta,sine,cosine); returnfloat2(r∗cosine,r∗sine); } Listing1:Vogeldisksamplesgeneration   AlternativelytoVogeldiskasomewhatsimilarsetofsamplescanbegeneratedwithbluenoise.Bluenoiseisalsooftenusedwhennot-that-random-randomnessisneeded,asin[8].Icheckedbothsetsofsamplesandfoundthat,atleastforshadows,Vogelseemstobeproducingmoreaccurateresultswithlessundersampling.AlsoVogelhastheadvantagethatit’sverycheaptocomputeatruntime.Insourcecodeofthedemothereisanarraydeclaredthatstoresbluenoisesamples.   1.2InterleavedGradientNoise Figure3actuallyusesVogeldisksamplesforallscreen’spixelsbutthatisnotenoughtocompensateforalownumberofsamplesused.Vogeldiskreallyshineswhenusedincombinationwithagoodrandomfunctionappliedper-pixel.Let’ssaythatyouhaveapixelsregionwhereeachpixelintheregionusesthesameVogeldisksamples(showninfigure4)buteachpixelappliesslightlydifferentrotationtoalltheseVogelsamples,suchthatthisrotationvaluespansrangeof[0;2p]fordifferentpixels.Thiswillguaranteethatallpixelsintheregionwillsampletheshadowmapwithdifferentsetsofsamples. Agoodrandomfunction,calledinterleavedgradientnoise,wasdevelopedby[9].Thisfunctiontakesasinputthepixel’swindowspacecoordinatesandoutputsa"random"numberin[0;1]range.Multiplyingtheresultby2pandpassingasargumentphitoVogelDiskSamplewillresultinveryhighquality,verycheapshadows,aspresentedinfigure5.Usageofinterleavedgradientnoiseistheonlydifferencebetweentheresultfromfigure5and3.Alsonotehowshadowsinfigure5arenicelyroundedascomparedto squary/blockyinfigure2.Thereasonforthisisthatinthelattercaseanaive11x11rectangularfilterwasused. Figure5:InterleavedGradientNoisetogetherwithVogeldisk.Only16shadowmapsamples. Listing2presentsinterleavedgradientnoisefunction. floatInterleavedGradientNoise(float2position_screen) { float3magic=float3(0.06711056f,0.00583715f,52.9829189f); returnfrac(magic.z*frac(dot(position_screen,magic.xy))); } Listing2:InterleavedGradientNoise   2Contact-hardeningSoftShadows Inthissection,wewillseehowcontact-hardeningshadowswork,aspresentedin[7].Afterthat,wewillgothroughtwodifferentwaysthatspeedupthebasealgorithm.   2.1RegularSolution Toaddcontact-hardening’nesstosoftshadowsallweneedistoknowhowbigpenumbraforagivenpixelis,orinotherwords,howbigtheshadowmapsamplingkernelforagivenpixelshouldbe.Thisisestimatedwithaprocedurecalledaverageblockersearchshowninfigure6.Wehaveareceiver,whoselittleredsquareisthepixelwe’recalculatingshadowsfor.Thereisalsoablockerfloatingabovethereceiverthatblockssomelightandfinallyalightsource(anditsshadowmap).Weneedtofinddepths(inlightspace)ofpixelsthatblocktheredsquarefromlight,averagethosedepthsandusethataveragetocomputesizeofpenumbra inthatarea. Thinkaboutwhatthisparticularshadowmapfromfigure6contains.Therightsideofthatshadowmapcontainsdepthsoftheblocker,whereastheleftsidestoresthebluereceiver’sdepths.Ifwetakesomekernelaroundthereceiver’sredareaandtakeafewsamples(orangedots)tosampletheshadowmapsomeofthosesampleswillsampletheblocker’sdepths(samplesthatareontherightside)andsomewillsamplethereceiver(samplesthatareontheleftside).Sincetheredarea’sdepthismoreorlessequal(uptosomebiasthatweusuallyemployinshadowmappingtechniques)tothedepthstotheleftwedon’tconsiderthem(thosedepths)asblockers.Butwedotreatasblockersdepthsthataresmaller(closertothelight)thanthedepthoftheredarea,whichinthiscasearedepthsthatcomefromthegreenblocker.Thesedepths,fromthegreenblocker,areaveragedtogethertoyieldaverageoccluder’sdepththatwillletusestimatepixelshadow’spenumbra. Figure6:Averageblockersearch.Orangedots/samples’depthsarecheckedagainstdepthsintheshadowmap. Shadowmap’sdepthsthatareclosertothelightareconsideredblockersandareaveragedtogether.   Itisimportanttorealizethatthisalgorithmhasitsdrawbacks.Lookagainatfigure6andtheblocker. Nowimagineyouhavenotonebuttwosuchblockerswhereoneofthem(callitb1)isveryclosetothelightsourceandthesecondone(callitb2)isveryclosetothereceiver.Sinceshadowmaponlystoresdepthsoftheclosestlayer,thatisblockerb1inthiscase,ithasnoinformationaboutblocker’sb2depths.Anditisblockerb2’sdepthsthatwecareaboutinthiscase.Thisiswhereaverageblockersearchbasedononlyone-layershadowmapwillfail.Anobvioussolutiontothisproblemwouldbetostoremorelayersofdepthsintheshadowmapbutthatwouldbeanoverkill.Fortunately,artifactsthataretheresultofthisdrawbackareoftennegligible. Listing3showsafunctionthatcalculatespenumbra,whichisavalueusedtoscaleshadowmapsamplingkernel. floatPenumbra(floatgradientNoise,float2shadowMapUV,floatz_shadowMapView,intsamplesCount) { floatavgBlockersDepth=0.0f; floatblockersCount=0.0f; for(inti=0;i0.0f) { avgBlockersDepth/=blockersCount; returnAvgBlockersDepthToPenumbra(z_shadowMapView,avgBlockersDepth); } else { return0.0f; } } Listing3:Penumbracalculation.   FunctionPenumbrafirstfindsaverageblockersdepthandthenconvertsthatvaluetoactualpenumbrawithAvgBlockersDepthToPenumbra. Listing4showstheimplementationofthatfunction. floatAvgBlockersDepthToPenumbra(floatz_shadowMapView,floatavgBlockersDepth) { floatpenumbra=(z_shadowMapView-avgBlockersDepth)/avgBlockersDepth; penumbra*=penumbra; returnsaturate(80.0f*penumbra); } Listing4:Averageblockersdepthtopenumbraconversion.   Thisfunctionshouldbeimplementedsuchthatitsuitsyourneeds. Thebasicideaistotakethedistancebetweenpixelwe’recalculatingshadowsforz_shadowMapViewandaverageblockersdepth.Thebiggerthatdistanceisthebiggerthepenumbrashouldbe.Incaseofthedemoaccompanyingthisarticle,depthsareallinshadowmap’sviewspace.The3rdlineofcodecalculatesthedistanceand"normalizes"it.Lateron,wesquareittogetridoffofpossibleminussignbutalsotomake amorevisibletransitionbetweenfully-hardandfully-softshadows.Finally,thepenumbraisscaledbysomeconstantandbroughtdownto[0;1]range. Theoriginalformulafrom[7]isabitdifferentfromlisting4andtakeslight’ssizeintoaccountdirectly,makingitmorephysicallycorrect.Itisshowninlisting5. floatAvgBlockersDepthToPenumbra(floatlightSize,floatz_shadowMapView,floatavgBlockersDepth) { floatpenumbra=lightSize*(z_shadowMapView-avgBlockersDepth)/avgBlockersDepth; } Listing5:Averageblockersdepthtopenumbraconversionfrom[7].   Youshouldfiddlewiththatfunctiontogetthelookandfeelofsoftshadowsandhard-to-softtransitionthataccomodatesyourneeds. Forthesakeofcompletenesslisting6showshowpenumbraestimationfunctionisusedinconjunctionwithactualshadowmapsampling. floatpenumbra=Penumbra(gradientNoise,shadowMapUV,z_shadowMapView,16); floatshadow=0.0f; for(inti=0;i<16;i++) { float2sampleUV=VogelDiskOffset(i,16,gradientNoise); sampleUV=shadowMapUV+sampleUV*penumbra*shadowFilterMaxSize; shadow+=shadowMapTexture.SampleCmp(linearClampComparisonSampler,sampleUV,z_shadowMapView).x; } shadow/=16.0f; Listing6:Shadowscomputationwithpenumbrausedaskernelscale.   Inline9,insteadofusingtraditionalshadowmappingbysamplingshadowmapandcomparingdepths,weusefunctionSampleCmptogetherwith linearClampComparisonSampler samplertolethardwaretakefourclosestsamples,comparethemallandbilinearlyfiltertheresults.Thisismuchfasterthandoingthatinthetraditionalway. Figure7presentscontact-hardeningsoftshadowsinaction. Figure7:Regularcontact-hardeningsoftshadows.Weclearlyseehowfarfromthegroundobjectsare.Thered rectangleindicatesanareawithshadowswithhighlyvaryingpenumbrasizehencesomejittering/jaggiescanbeseen.Thesejaggiesarecausedbythefactthatsomepixels(randomizedbyinterleavedgradientnoise)fallinareasofzeropenumbraandsomeinareasofmaximumpenumbrahencevaryinglightness/shadownessofpixels.   2.2PenumbraMask Penumbraestimation,ifusesthesamenumberofsamplesasactualshadowmapping,canbeequallyexpensiveasactualfullsoftshadowmapping.Sothecostofcontact-hardeningsoftshadowsistwiceasbiginthiscase.Loweringthenumberofsamplesbyhalf(onlyforpenumbra)cansignificantlyreducerenderingtime.Itwillstillbeafewdozenspercentslowerthanfullnon-contact-hardeningsoftshadowsthough.Thesimplestwaytosignificantlyreducerenderingtimeistoapplythesametrickthatweuseallthetimeinreal-timerendering–renderatlowerresolution. Shadowsthemselvesneedfull-screenresolutionbecausetheyareveryhighfrequencyphenomenon(therearerapidchangesinintensityofthesignal–thereisashadowofanobjectforafewpixelsandthensuddenlytherecanbeafullylitarea,withshadowendingabruptly).Butpenumbraisactuallychanginggraduallyquiteoftensoitisagoodcandidateforrenderingatlowerresandthenupsamplingtheresults.Inessence,wesplittheshadowmaskgenerationpassintwopasses:firstonethatcomputespenumbramaskatquarterres(penumbramaskpass)andsecondthatcomputesactualsoftshadowsbutsamplespenumbrafromthepenumbramask(shadowmaskpass). Figure8showstheresultofthatchange. Figure8:Softshadowswithpenumbrarenderedinaseparatepassinlowerres. Asyoucanseeit’squitenicewiththeexceptionthatpenumbraoftenendsabruptly(lookatthoseoblongpoles’shadows).Ifwelookathowupsampled(bilinearly)penumbramasklookslikewewillunderstandwhy(figure9).Wehaveforinstanceareaswithmaximumpenumbra(whitecolor),interleavedwithzeropenumbraareas(black color).Theproblemwastherebeforebutbecausewerenderedinfullresinterleavedgradientnoisewasequallygoodforsamplingforbothshadowsandpenumbra.Nowthatwerenderpenumbrainlowerresitisnotenough.Notallislostthough.Asimplesolutionistoincreasepenumbra’ssamplingkernel’ssize.Multiplyingby1:2(withregardtowhatisusedforactualshadowmapping)willresultinwhatisseeninfigure10. Figure9:Penumbramaskvisualized.   Figure10:Softshadowswithpenumbrarenderedinaseparatepassinlowerresusingkernelscalingtrick.Theredrectanglesoutlinethesametypesofareasthatweredescribedin captionoffigure7.Thistime,becausewe’rerenderinginlowerresandupsample,jitteringismoreobviousandlookslikesortofbubbles(despitethekernelscalingtrick).Itwouldn’tbeabigproblemifitwasn’tforafactthatverydistractingswimmingoccursundercameramotion.   Figure10’scaptiondescribesproblemwithswimmingbubbles–eventhoughwescaledthekernelitisnotenoughtoeliminateallartifacts.Weneedanotherhacktofixthis.Thesolutionistoblurthepenumbramasktosortofsoftenthebubbles.Figure11showshowpenumbramasklookslikewhenblurredwith7*7separableGaussianblurwiths=3.Figure12showsthefinalresultofbothapplyingkernelscalingandblurringthepenumbramask. Figure11:Penumbramaskblurredvisualized.   Figure12:Softshadowswithpenumbrarenderedinaseparatepassinlowerres.Kernelscalingof1:2forpenumbraestimationapplied.Also,penumbramaskusedisblurredwith7*7separableGaussianblurwiths=3.   Ifyouhaveaneyefordetailyouwillnoticethattheproblemwehadbefore,withabruptlyendingpenumbra,hasbeenreintroducedsomewhat(compareshadowofapostontheleft).It’snotasbadasinfigure8butlittleworsethaninfigure10.Sowhyisitbackexactly?Becauseweblurredthepenumbramaskandthusmadezero-penumbraregionstobleedontofull-penumbraregions.Wecanfixthisagainbysimplyscalingpenumbrakernelbymorethan1:2.Ofcourse,wecannotincreasethatscaleindefinitelyasatsomepointpenumbraestimationwillstarttomissoccludersandtheresultwillnotbeconformantwithwhattheshadowmaskpassissampling.Buttherearevaluesinrangeabout[1:2;1:6]wherethe vast majorityofartifactsisgone. Blurringpenumbramaskintroducestwootherproblemsthatdon’tnecessarilyneedtobefixedfortheeffecttolookgoodbutit’sworthknowingaboutthem. Asweknow,penumbramaskiscalculatedinlowerresinscreenspace.Weblurittoalleviateartifactsthatstemfromrenderinginlowres.Butbecausewe’reworkinginthecamera’sscreenspace,blurring"just likethat"willmakepenumbramasktobleed/blendbetweengeometriesthatareatpossiblyverydifferentdepths/distancesfromthecamera.Thisisahugeproblemforscreenspaceeffectslikescreenspaceambientocclusionwheretheyareimmediatelyseen.Butitturnsoutthisisalmostnoproblemforpenumbramaskcomputation.Ifyouwantyoucandosmartblurringbysamplingthedepthbufferandcomparedepthsbutthatisreallynotnecessaryinmyopinion. Anotherproblemisabitmoreannoying.Becauseweblurinscreenspacewithaconstant-sizekernel,thefartherawaythecameragoesfromashadow,themorepenumbrablurringkicksin.Thiswillhavetheeffectofdecreasingpenumbrasizeforshadowsthatarefarawayfromthecamera(makingsoftshadowsharder).Tofightthisjustscalepenumbrablurkernelusingthedistancefrompixeltothecamera,justasitisusedineffectslikescreenspaceambientocclusion.Notethatthissolutionisnotimplementedinthedemoapplication,butIquicklyprototypeditlocallyandfoundittoworkasexpected. Penumbrablurringisamusttoavoidswimmingbubbles.Butitispossibletoavoidblurringinofsmallervaluesontobiggervalues(thiseffectdecreasespenumbrasize).Beforeblurringthepenumbramaskwith 7*7kernelyoucanfirstuse7*7maxfilter.Thesetwo,maxandblurcombined,willhavetheeffectofonlyblurringoutwardssonobiggervalueswilleverbewashedoutbysmallerones.Thisideaactuallysoundsbetterthanitworks.Theproblemisthatmaxfilterlosesinformation,penumbrainformationinthiscase,andthisleadstoalotofsubtleartifacts.HenceI’mnotrecommendingusingthatfilter.I’vetriedamultitudeofcombinationsofmaxandGaussianblurfiltersandalwaysretractedtoonesingleGaussianblurpass.Alsoduetoperformancereasons. Thereisactuallyonemorealternativesolutiontodrawbacksthatblurringofthepenumbramaskbrings.Lookatlisting3,line27.Now,lookatfigure9.Asyoucansee,whenapixelisfullylit,i.e.itdoesnothaveanyoccluders,wereturn0.Butwhatifwereturned1instead?Thepole’s(ontheleft)penumbrawouldnotbeaffectedbyblurringatallandthatshadowwouldlookcorrectregardlessofdistanceofapixelfromthecamera.Atleast,inthiscase,itwouldbeallfine.Inothercases,wewouldhavereversedartifacts.Earlierourproblemwasthatdistantshadowsthatshouldbesoftwouldbecomehard.Hereitwouldbetheopposite–distantshadowsthatarehardwouldbecomesoft.UnlesswescaledGaussianblurkernelaccordingtopixel’sdistancefromthecameraofcourse. Onemoreimportantpropertyofoutputting1bydefaultinsteadof0isthatscalingthekernelinpenumbra maskpassisnolongernecessary.Sointheendyoumightprefertooutput1ondefault.   Therehasbeenalotofdicussionaboutproblemsrelatedtorenderingpenumbrainaseparatepassinlowerresandhowtofixthem.Let’snowsumthingsup.Togetcontact-hardeningsoftshadowsweneedtoknowwhatpenumbraapixelhas.Tocalculatepenumbrafastweneedtodoitinapassseparatefromshadowmaskbecausewecandoitinlowerresolutionaspenumbraisratherlowfrequencyphenomenon.Whenwedoso,thingsgenerallyworkexceptforsomeartifacts.Tofixthoseartifactsitisusuallysufficienttoblurthepenumbramaskandincreasekernelsizewhensamplingtheshadowmapinthepenumbramaskpass.Thelatterisnotnecessaryifpenumbramaskoutputs1insteadof0ondefault. Atthispoint,youshouldhavenicecontact-hardeningsoftshadowsthatarealmostasfastasnon-contacthardeningsoftshadows.ThereisonelastproblemthoughthatIwasn’tabletosolve.Again,becausewe’rerenderingpenumbrainlowresandwealsoblurit,therearesomeshadowsswimmingwhereareaswithhighlyvaryingpenumbrasblend.Thisusuallyisaproblemifyouusetoosmallshadowmapfortoobigscene’sarea.Butthatissomethingyouusuallyavoidbyusingcascadeshadowmappingforinstance.Soallinalllittleshadowsswimmingthatisleftinthosesituationscanbeneglected.   2.3MinFilter BeforeIevercameupwithanideaofpenumbramaskIfirstcamewithanideaofusingaminfilteronashadowmask.ThissolutionisnotappealingtometodaybutIwantedtomentionit.Itisimplementedinthedemoapplicationbutit’sturnedoff"hardcodingly"soyouneedtofiddlewiththesourcecodetobeabletoturniton. InstandardCHSSwefirstfindaverageblocker’sdepth.SincethatisexpensivepartofthealgorithmIlookedforwaystospeedthisup,eveniftheyareonlyapproximations.Icameupwithanideato,insteadoffindingaverageblocker’sdepth,useminimumdepthfromtheshadowmapoversometexelsregion(regionofmaxpenumbrasizeofcourse).Inaverageblocker’sdepthwesampleabunchoftexels,sayina5*5shadowmap’sregion,toseeiftheyareclosertothelightsourcethanthepixelwe’recalculatingpenumbrafor;thosethatareclosertothelightsourceareaveragedtogether.Theideaofminfilteristojusttakeasingleminvalueoverthewholefilterregion55andusethatdirectlyinsteadoftheaverageblocker’sdepth.Thereasonwhyminfiltermakessenseisthatthisvaluewillalwaysbeclosertothelightsourcethanthepixelwe’recalculatingpenumbrafor.Soitissomekindofblocker’sdepth.Notverysophisticatedandrathercrudeonebutstill. Butwhywouldyoudothatifyoucanjustcomputetheaverage?Becauseapplyingminfilteroveraregiondoesnotrequirefromusdoingitintheshadowmaskpass.Wecancomputeminshadowmapoverashadowmapinaseparatepass.Moreover,minfilterisaseparablefilter,whichmeanswedon’thavetotaken2samples(assumingn*nregion)butwecandotwopasses,horizontalandvertical,eachrequiringnsamples,totalling2n.Wetradedquadraticcomplexity(dependentonthescreen’sresolution)forlinearcomplexity(dependentonshadowmapsize). Oneproblemwiththeminfilteristhat,byitsnature,itlosesalotofinformation.Itwillcreatelumpsofpixels,withallpixelsinalumphavingthesamedepth,andthesedepthswilloftendiffersignificantly.Asolutiontothatistoblurtheminshadowmap.Later,intheshadowmaskpass,yousimplysampletheblurredminshadowmapandusethatinsteadfortheaverageblocker’sdepth.Blurringtheminshadowmapwillhavetheeffectofpushingthoseminvaluesabittowardspixels,fartherfromthelightsource,whatcouldpossiblypushthembehindpixelsthatusethem(an"occluder"endsupbeingbehindapixel).Inpractice,however,thatisnotaproblem. Figure13showstheresults.Comparethattofigure12.Onechangeisinroundshadowsofanadornmentinthetopmiddlepartofthefigure.Figure12handledthatnicelywhereasinfigure13weseehowmindepthfromsomedistantgeometryisfloodingneighbours,includingtheadornment,andmakesitspenumbraverylarge.Anotherproblemoccursontheshadowsofacolumnthatisspreadfromthelefttothecenterofthefigures.Boththetoppole’smindepthisfloodingthecolumnaswellassomedistantgeometryonthebottom leftofthepicture,bothmakingthecolumn’sshadowsinthemiddleverypenumbra-large. Figure13:Softshadowswithpenumbracalculatedusingminfilterappliedtoshadowmap. Soweknowminfilterkindofsucks.Doesithaveanyadvantagesoversolutionwediscussedintheprevioussection?Yes.Becauseminfilterworksinshadowmap’sspaceandnotinscreenspacewedon’thaveanyswimmingartifactsundercamera’smotion.Thatisanicefeatureofthatalgorithmasitmakesitmorestable.   3Checkerboarding Sincetheadventof"evenmorenext-gen"consoleslikePS4Prodevelopersdecidedtoaimevenhigherintermsofqualityandtrytouchingthemagical4kbarrier.PS4ProindeedhasmeasurablymorecomputingpowerthanbarebonesPS4butnotenoughtopulloff4kjustlikePS4handledfullHD.Sodevelopersresortedtokindofcheatingtoachieve4kwithonlyhalfofthepoweravailable.Tocheckerboarding. Figure14:Checkerboardpattern. Lookatfigure14.Theideaofcheckerboardingistoperformcalculationsonlyeveryotherpixelincheckerboardpattern.Thiswayweonlyneedhalfofcomputingpower.Themissingpixelsarethenfilledinsomeway.Thetrickypartisfiguringoutthis"someway".Thewaythatisimplementedinthedemoisbyaveragingtheneighbours.Takealookatthefigureagain.Assumeblackpixelsrepresentthoseforwhichwecalculateshadows.Nowthinkaboutoneofthewhitepixels,whichisnotcomputed.Theideaofaveragingistosimplytakethefourneighbouringblackpixels,averagethemandplugthatvalueintothewhitepixel.Thereisonecatchthough.Theneighboursmightbelongtodifferentgeometriesandsinceshadowsareratherahigh-frequencyphenomenon(asopposedtopenumbra)averagingthemthoughtlesslywillendupinartifacts.Todoitrightweneedtoperformdepthcomparisonsoftheneighbourswiththepixelwe’reaveragingforandonlyblendinthoseneighbourswhosedepthsdon’tdiffermuch–it’stheverysamewaywedoitwhenperformingbilateralupsamplinginmanypostprocessingalgorithmstosaveoncomputingpower,likeSSAOordepthoffield. Alternatively,insteadofcomputingaverageswecouldemploytemporalfilteringandflipthecheckerboardpatterneverysecondframe.Thisshouldresultinshadowsqualitybeingonparwithfullresshadowmask. Ifyouhaveevertriedcomputingahigh-frequencysignalinlowresandthenupsampleityouknowthat thisjustdoesnotworkwell.It’shardtomakeSSAOstable,whichisquitelow-frequencyeffect,letaloneshadowmask.Thatistruewhenthelowreswearetalkingaboutmeanscomputinginabufferwhichishalfofthescreen’swidthinwidthandhalfofthescreen’sheightinheight,orinotherwordsfourtimessmaller.Inthisscenarioproblemsaremostlyvisibleatobliqueangles.Whenthecameraislookingstraightaheadonawallwithsomeeven-not-that-smooth-shadowsit’sokay.Butoncethecameraisatanobliqueangletosomefloor,swimmingartifactsappear,stemmingfromundersampling,andtheyareextremelydistracting.Youmightincreasetheresolutionofthelowresbuffersuchthatoverallitistwicethesizeofthescreen’sresolutionbutthatwillnoteliminatetheproblem–swimmingwillremain,althoughitwillbereducedabit. Ifinsteadoftryingtosqueezeasmuchaspossiblefromrenderinginlowresandupsamplingyouwillrenderinfullresbutincheckerboardpattern,whatrequiresonlyhalfofmemoryandcomputingonlyhalfofthepixels,youwillfindoutthattheswimmingproblematobliqueanglesiscompletelygone.Thatisthemostpowerfulfeatureofthisalgorithm.Therewillstillbesomeminorswimmingartifactsinverydetailedareasofthescreenwhereyoujustdon’thavethatinfoyouwouldhavehadifyouhadstayedinstandardfullresbuttherearehighchancesthattheresultwillbegoodenough. Truthtobetold,thereisalittlelieinstatementthatcheckerboardingcostshalfasmuchasfullrespath.Checkerboardingrequiressmartupsampling,whichincludessamplingthedepthbuffer,andthatcostssometime.Thispass’stimeishoweverconstant.Sothemoreexpensivethecheckerboardrescomputationpasstheless,relatively,upsamplingcosts. Checkerboardingissupportedinthedemoapplication.Theapplicationcreatesaseparaterendertargetforstoringresultsofcheckerboardedshadowmask.Thisrendertargetisofheightthesameasthescreen’sheightbutwithhalfthewidth.Checkerboard-computedpixelsare"packed"intothisbufferandlater,intheupsamplepass,unpacked.Also,themissingpixelsarefilledin(interpolated).Sourcecodethathandles checkerboardedshadowmaskinshadersislocatedintwoplaces.First,therearesomeminormodifications totheshadowmaskpassitselftohandlecheckerboarding.Thesecondplaceisadedicatedshaderthatperformsupsampling.Thecodeisrathereasytofollow.Oneimportantchangeisinthegradientnoiseformulaintheshadowmaskpass.Usingtheoriginalformulamadesampleslooklessrandomthaninfullres.IexperimentedwiththeformulaandfoundonemodificationtoworkreasonablywellbutIthinkthatonecouldcomeupwithamorescientific-basedformula.Let’snowseetheonlymodificationtotheshadowmaskshadercode,presentedinlisting7. float2CheckerResPixelCoordToFullResUV(int2pixelCoord) { pixelCoord.x*=2; if(pixelCoord.y%2==1) pixelCoord.x+=1; return((float2)pixelCoord+float2(0.5f,0.5f))*screenPixelSize; } ... #ifndefUSE_CHECKER float2uv=input.texCoord; floatgradientNoise=TwoPi*InterleavedGradientNoise(uv*screenSize); #else float2uv=CheckerResPixelCoordToFullResUV(pixelCoord); floatgradientNoise=TwoPi*InterleavedGradientNoise(uv*screenSize*float2(1.0f,4.0f)); #endif Listing7:Checkerboardingintheshadowmaskshader.   Asyoucanseetheonlymodificationtotheinterleavedgradientnoisefunctionismultiplicationofthey-coordinateby4. Intheupsamplingshaderafewoptimizationscanbeturnedon/offviadefineswhatcanmaketheshaderhardertofollowbutitreallydoesnomagic.Oneoftheoptimizationsisusing16-bitfloating-point(linear)depthbuffer.Secondandthirdones,moreimportant,involveusingGatherRedinstructions,bothontheshadowmaskandthedepthbuffer(bothareone-channeltextures),tosaveonsamplinginstructions. Figure15showscheckerboardedshadowmaskinaction.Withcarefuleyeinspectiononecanindeedseethattheditheredsmoothshadowsareabitmorecrudeinthecheckerboardvariant. Figure15:Shadowmaskcalculatedincheckerboardresolutionandupsampled.   4Demo Asitwasalreadymentionedthereisthedemoapplicationaccompanyingthisarticle.ItisimplementedusingframeworkMaxestFramework[13]andusesDirect3D11asrenderingAPI.Itislocatedin[11]. Keyconfiguration: WSAD+mouse–cameramovement. Shift–speedingup. Space–alignsthelightdirectionwiththecurrentcamera’sviewvector. Q–ifpresseditwilldisplaypenumbramask. E–ifpressedpenumbramaskblurringisturnedoff,whatwillletseebubblesartifacts. F4–recompileshaders. ESC–exit. Thedemo’sHUDdisplaysamultitudeofvariousoptionstoturnon/offandtweak.   4.1PerformanceComparison Table1:PerformanceonGeForce660GTXin1080pand20482shadowmapresolution.Allshadowmaskpassestake16samples.PenumbraMasktakes32samples.   Table1showsexemplaryperformancecomparisonsfortheviewpresentedinthescreenshotsshownthroughoutthearticle.Noteherethatpenumbrapasstakestwotimesmoresamples(32)thantheshadowmaskpass(16)whileintheoriginalpaper[7]thisproportionwasreversed.Onereasonisthatinthisdemoweusehardwarebilinearshadowmapsamplesfilteringsointheendwearenotusing16shadowmapsamplesbutactually64.Anotherreasonisthatwecouldinfactuseasmallernumberofsamplesforpenumbra,like16,andthisworksgoodbutonlyintheoriginalPCSSimplementation.Whenusingpenumbramaskpasswithonly16samplesthensomesmallartifactsstarttopopup.Ifound32samplestoworkgoodinquarterresofpenumbramaskpassand16tobeokayfororiginalPCSS.Thedemoletsyouswitchbetween16and32samplesusedinpenumbraestimationtoseethedifference. RegularShadowMaskisapassthatusesnocontact-hardeningandwhoseresultsareshowninfigure5.Next,thereisRegularCHSSShadowMaskwhoseresultsareinfigure12.CHSSisroughlytwotimesmoreexpensivethanno-CHSSapproach.Noteherethatpenumbrapasstakes32sampleswhereasshadowmasktakes16.ThatwouldimplythatRegularCHSSShadowMaskshouldbemorethantwotimesasexpensiveasRegularShadowMask.It’snotbecauseasyoucanseeinfigure12mostshadowshavelowpenumbras,thusshadowmapsamplingkernelsarenotwide.Incasewhereshadowmapsamplingkernelsofallscreen’spixelsaremaxedtheperformancedifferencewouldbegreater. Nowgoingtothegist.PerformancemeasurementsforShadowMaskusingPenumbraMaskclearlydepictgreatperformanceboost.TogetherwithPenumbraMaskboththesepassesjointlytake0:74mswhatismorethantwotimesfasterthanRegularCHSSShadowMaskpassat1:72ms.ThereasonforthatisthatinRegularCHSSShadowMaskmostperformancehitcomesfrompenumbraestimation(32shadowmapsamples)andthispartiswhathasbeenoptimized.NotethatperformanceofPenumbraMask+ShadowMaskusingPenumbraMask(0:74ms)isevenbetterthanRegularShadowMask’s(0:84ms).Thereasonforthatsurprisingdifferenceisobviouslythatforatestedscenemostscreen’spixelsinpenumbravarianthavesmallshadowmapsamplingkernelsandthatiswhatismostlyresponsibleforgreatperformance.However,insituationswherethewholescreenisfilledwithmax-penumbrapixelstheperformanceofPenumbraMask+ShadowMaskusingPenumbraMaskisonlyabout10-15%worsethaninRegularShadowMaskpass. Finally,checkerboardingindeediscapableofspeedingupRegularCHSSShadowMask.Itdoesnotperformwellwhenweusepenumbramaskthough.ThereasonisthatShadowMaskwithPenumbraMaskpassbecomessocheapthatcheckerboarding-relatedoperationslikeupsamplingbecomeveryexpensive,relatively.Thetake-awayforcheckerboardingistouseitonlywhenthepassthatitisoptimizingisrelativelyexpensive.   4.2FutureWork Thisworkcanfurtherbeimproved,bothonthepenumbragenerationsideaswellshadowmask. Usingpenumbramaskgreatlyimprovesperformancebutcomeswithsomeswimmingartifactswhichmightormightnotbeaproblem.Possiblyusingtemporalfilteringcouldhelphere.Insteadofrenderingpenumbramaskinquarterreswecouldstillrenderinfullresbutonlycomputeonepixelina2*2quad,fillingtheremainingpixelsinsubsequentframes.Giventhatpenumbramaskcontainslow-frequencydatathiscouldwork. Whenitcomestoshadowmaskitcouldbebeneficialtousemorethanonemipmapoftheshadowmap.Thiswouldbejustforefficiencypurposes.Byincreasingfilteringradiusweinducemoreperformancehitduetotexturecachemisses.Forsamplesthatarefartherwecouldsamplethesecondmipmapoftheshadowmap.Thisappliestopenumbramaskaswell. Implementationoftemporalcheckerboardingisalsotempting.Qualityshouldbecomparabletofullresshadowmaskandperformanceincreaseshouldbesignificant.Thiscouldbeparticularlydesirableifweneededmoreshadowmapsamples.   5Conclusions Thepointofthisarticlewastopresentanefficientwaytorendercontact-hardeningsoftshadows,basedon originalideafrom[7].Ibelievethatseparationofpenumbraestimation(inlowerres)andshadowmaskrenderingisthewayasitallowsforachievingperformancenearlyasgoodaswhenrenderingregularsoftshadows.Thereisacostinformofsomeswimmingartifactswhichshouldnotbeadeal-breaker.TheoverallimagequalityboostthatCHSSbringsisatleastworthexploring,giventhatperformancewithregardtonon-CHSSwillsufferonlyatinybitwithpenumbramaskapproach. Inthispaperwediscussed: EfficientsoftshadowswithVogeldiskandinterleavedgradientnoise. OriginalPCSSimplementation. Penumbramaskasawaytodecouplepenumbraestimationfromshadowmaskgenerationtogreatlyimproveperformance. Minfilterasanotherwaytodecouplepenumbraestimationfromshadowmaskgeneration.Worsequalityandperformancethaninpenumbramaskapproachthough. Useofcheckerboardingwithshadowmaskgeneration.   6Acknowledgments IwouldliketothankKrzysztofNarkowiczofEpicGamesforproofreadingthisarticleandAdamCichockiofMicrosoftforhisideastouseVogeldiskandcheckerboarding.   References [1]4rknova.Shadertoy:Vogel’sDistributionMethod.https://www.shadertoy.com/view/XtXXDN. [2]LouisBavoil.AdvancedSoftShadowMappingTechniques.http://gamedevs.org/uploads/advanced-soft-shadow-mapping-techniques.pdf. [3]AlexandreDevert.Spreadingpointsonadiscandonasphere.http://blog.marmakoide.org/?p=1. [4]MartonTamasetal.Chapter4.1inGPUPro6:PracticalScreenSpaceSoftShadows.https://www.crcpress.com/GPU-Pro-6-Advanced-Rendering-Techniques/Engel/p/book/9781482264616. [5]PeterSikachevetal.Chapter2.1inGPUPro6:Next-GenRenderinginThief.https://www.crcpress.com/GPU-Pro-6-Advanced-Rendering-Techniques/Engel/p/book/9781482264616. [6]ThomasAnnenetal.ExponentialShadowMaps.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.177&rep=rep1&type=pdf. [7]RandimaFernando.Percentage-CloserSoftShadows.http://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf. [8]MikkelGjolandMikkelSvendsen.TheRenderingofInside.https://github.com/playdeadgames/publications/blob/master/INSIDE/rendering_inside_gdc2016.pdf. [9]JorgeJimenez.NextGenerationPostProcessinginCallofDuty:AdvancedWarfare,2014.http://www.slideshare.net/guerrillagames/killzone-shadow-fall-demo-postmortem. [10]AndrewLauritzen.Summed-AreaVarianceShadowMaps.https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch08.html. [11]WojciechSterna.Contact-hardeningSoftShadowsDemo.https://github.com/maxest/MaxestFramework/tree/master/samples/shadows. [12]WojciechSterna.DirectX11,HLSL,GatherRed.http://wojtsterna.blogspot.com/2018/02/directx-11-hlsl-gatherred.html. [13]WojciechSterna.MaxestFramework.https://github.com/maxest/MaxestFramework. [14]YuryUralsky.EfficientSoft-EdgedShadowsUsingPixelShaderBranching.https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter17.html. [15]Wikipedia.Shadowmapping.https://en.wikipedia.org/wiki/Shadow_mapping.     Abouttheauthor: Wojciechhasover6yearsofprofessionalengine/technology/gameprogrammingexperience,withstrongC++andgraphics/renderingemphasis,enrichedwithexperiencein otherareasincludingmultiplatform,multithreadedandmultiplayerprogramming, high-performancecomputing(GPGPU)andmachinelearning(firststepshere).Wojciechiskeenon understandingmathematicaland/oralgorithmicfundamentalsoftoolsheuses.   Thispaperwasoriginallypublishedontheauthor'shomepageandisreproducedherewithkindpermission.   [WaybackMachineArchive] Cancel Save 5Likes 2Comments Share: LatestComments IcyTower Hey, thebookyouwroteonOpenGLwasthefirstgraphicsbookIread   I'mdoingaverysimilarthing,butinraytracing.Ialsoseparatedshadowblurringfromdist-to-occluderspreading/blurring.Mytextureslookverysimilartoyours.IalsotriedtocombineitwithShadowMapping,butthereisoneproblemthatisnotfixable.ShadowMapsstoreonlytheclosest-to-lightoccluder,thushavingproperhardshadowsfromobjectpartiallyshadowedbyotherobjectsisimpossible.Inraytracingthatisnotapoblem-wecangetdistancetotheclosest-to-surfaceoccluder.Butthereisstilloneseriousproblemwithsuchprocessing-someregionsarecoveredinhardandsoftshadowsatthesametime,whichrequiresstoringmultiplevaluesofdist-to-occluderandmultipleshadowvaluesperpixel(asthoseshadowshideundereachotherbeforeblurring,butshouldbevisibleafterblurring).Whenyouprocesshard/softshadowsseparatelyitobviouslyalsohelpswithshadowbending/bubblesetcwheretheyintersect.Butabunchofproblemsariseaswellwithcombiningthem   Cancel Save September11,201809:19AM vladAlex420 IfyouwanttovisualizehowVogelDiskworksIhaveimplementeditindesmos:https://www.desmos.com/calculator/ewz1wocedf Cancel Save November26,202106:02PM Youmustlogintojointhediscussion. Don'thaveanaccount?Signup! FeaturedTutorial ... maxest Publisher Advertisement RecentTutorialsbymaxest maxesthasnotpostedanyothertutorials.Encouragethemtowritemore! Advertisement Reticulatingsplines AboutGameDev.net TermsofService PrivacyPolicy ContactUs Copyright(c)1999-2021GameDev.net,LLC BacktoTop



請為這篇文章評分?