Soft shadows are slower than standard shadow mapping because they usually require taking more than one shadow map sample, as in aforementioned [ ...
AllContent
Blogs
Forums
News
Tutorials
LogIn
SignUp
Login
Username/Email
Password
Rememberme
Forgotpassword?
Login
or
Don'thaveaGameDev.netaccount?Signup
Forgotyourpassword?
EmailAddress
ResetPassword
Pleasecontactusifyouhaveanytroubleresettingyourpassword.
Home
Blogs
Careers
Careers
Forums
News
Portfolios
Projects
Tutorials
New?Learnaboutgamedevelopment
FollowUs
ChatintheGameDev.netDiscord!
BacktoGraphicsandGPUProgramming
Contact-hardeningSoftShadowsMadeFast
Programming
GraphicsandGPUProgramming
3D
PublishedSeptember02,2018
byWojciechSterna,postedbymaxest
Doyouseeissueswiththisarticle?Letusknow.
Advertisement
Figure1:Contact-hardeningSoftShadowsinSponzascene.Notehowthegreenadornments’shadowsarehard(becausetheshadowsareclosetotheadornments)whereasshadowsofpoleshangedhighintheair(polesarenotvisibleinthisscreenshot)castveryblurry,barelyvisibleshadows(likeonthegreenadornment inthecenterofthefigureoronthelitwallabovethearc).ShadowsofthecolumnsinthecenterofSponzabestpresentcontact-hardeningnatureofshadows.Thecontentpresentedhere,butfromadifferentpointofview,willbeasubjectoffollowingscreenshots.Comparethisscreenshotwithscreenshot2tobetterunderstandwhichobjects’shadowswewillbeinspectingthroughoutthearticle.
ShadowMapping[15]isbyfarthemostprevalenttechniqueusedtorendershadowsinreal-time.Oneofitsadvantagesisthatit’squiteeasytogetdecentsoftshadowswiththistechniqueasin[14].Fromthatpointonwecangostraighttocontact-hardeningsoftshadows(CHSS)whichwasfirstintroducedinPercentage-CloserSoftShadows(PCSS)paper[7].[2]reviewsthesubjectextensively.
Softshadowsareslowerthanstandardshadowmappingbecausetheyusuallyrequiretakingmorethanoneshadowmapsample,asinaforementioned[14].Therearealternativeapproacheshowever–ExponentialShadowMaps(ESM)[6]andVarianceShadowMaps(VSM)[10]–thattreatshadowmaptexelsasrandomvariablesandapplystatisticalmethodsonthem.Thesemethodsareusuallyfasterbutsufferfromannoyingartifactslikelight-bleeding.Acompletelydifferentapproachtoproducingsoftshadowsistoblurtheminscreenspaceasin[4].It’saveryinterestingapproachbutcomeswithallproblemsassociatedwithcomputinginscreenspace–likehowtoblurashadowboundarythatiscoveringtheentirescreenwithoutkillingtheGPU?
Contact-hardeningsoftshadows,basedon[7],areevenslowerthanregularsoftshadowsbecausetheyrequireevenmoresamplestolookforanaverageoccluder’sdepththatisusedinestimationofpenumbra(regionoftransitionbetweenfully-litandfully-shadowedareas).Soupfrombarelyoneshadowmapsamplewemightendupwith16foroccluder’sdepthestimationand32foractualshadowmapfiltering.Thatisalotbytoday’sstandards,especiallyhavingvideogameconsolesinmind.Oneimplementationworthcheckingoutthatdeliveredwithabigcommericalgameis[5].
Statisticalmethods,likeESMandVSM,andscreenspacemethodsbothhavetheirmerits.InthisarticlehoweverI’llonlyfocusonhowtoimproveonPCSS.Wewillstartwithhowtoachievedecentlybigpenumbraswithouttheneedtotakedozensofsamplesandwillproceedonhowtomakesoftshadowscontact-hardeningatonlyafractionofadditionalGPUtime.Wewillalsoinvestigatecheckerboardingandseehowthatrelativelynewoptimizationapproachperformswithregardtoshadowsrendering.
Thisarticleismeantforaudiencethatalreadyhasexperienceinimplementingshadowmappingandwouldliketohavefastcontact-hardeningsoftshadowsintheirgames/simulations.ThisworkisheavilybasedonPCSS[7]andaspirestobeitsimprovement.Thereisademoapplication[11]presentingtechniquesdescribedhere.
Figure1presentscontact-hardeningsoftshadowsinanactualscenario.
1SoftShadowswithFewSamples
Inordertohavebiggerpenumbraregionsalargenumberofshadowmapsamplesisneeded.Aninitialapproachusuallyinvolvestakingallshadowmapsampleswithinagivenregion.Thatcanbealotofsamples,particularlywhentheshadowmapisofhighresolution(forahigh-resshadowmap,tocoverthesameworldspaceregion,moresamplesareneededthanforalower-resshadowmap,assumingtheybothcoverthesameregioninworldspace).AwaytospeedthisupistouseGPU’ssamplegatherinstructions,justliketheydidin[5]andwhatisdescribedin[12].Withgatherinstructionswecanreducethenumberofsamplingoperationsinashaderfourtimes.Buteventhatmightnotbeenoughforbigsamplekernelsthat
onemightwantforelongatedsoftsunlightshadows.
Analternativetosamplingthewholeregionistorandomlypickafewsamplesintheregionandonlyusethose.Butifwetakethesamerandomsetofsamplesandapplyittoallpixelsintheregionwewillendupwithquitenastybanding.Comparefigures2and3.Tocounterthiseffectwecanmakesurethateachpixelintheregionwilluseadifferentsetofrandomsamples.Thiswillresultinnoisyshadows,whatusuallyispreferredtobanding,butnotalways.Bothhavetheirmertis–noisegivesrandomnesswhereasbandinggivesstability.Ideallywewouldlikesomethingin-between.Turnsoutthatvariousresearchershaveworkedonthisproblemandcameupwithinterestingsolutions.
Wehaveactuallytwoproblemsrightnowtosolve.Thefirstoneistofigureoutwhatsamplestopickforshadowmapsampling.Thesecondoneisinwhatwayweshouldpickdifferentsamplesfordifferentpixelsintheregion–asweknow,usingexactlythesamesamplesforallpixelsintheregioncausesbanding.WewillsolvethefirstproblemusingVogeldisksamplesandthesecondwithinterleavedgradientnoise.
Figure2:Naiveshadowswith11x11kernel,thatis121samples.
Figure3:Shadowswith16samples.Bandingisseenbecauseeachpixelusesthesameshadowmapsamplecoordinates.
1.1VogelDisk
Vogeldiskalgorithm[3][1]spreadssamplesonadiskevenly,asshowninfigure4.AgreatfeatureofVogeldiskisthatwhenyourotatethepointstheywillnevermapontothemselves(agivensamplewillnotmapontoitselfnoranyothersample),unlessyourotateby2π.
Figure4:Vogeldisk.
Listing1showshowtogenerateVogeldiskcoordinatesinashader.
float2VogelDiskSample(intsampleIndex,intsamplesCount,floatphi)
{
floatGoldenAngle=2.4f;
floatr=sqrt(sampleIndex+0.5f)/sqrt(samplesCount);
floattheta=sampleIndex∗GoldenAngle+phi;
floatsine,cosine;
sincos(theta,sine,cosine);
returnfloat2(r∗cosine,r∗sine);
}
Listing1:Vogeldisksamplesgeneration
AlternativelytoVogeldiskasomewhatsimilarsetofsamplescanbegeneratedwithbluenoise.Bluenoiseisalsooftenusedwhennot-that-random-randomnessisneeded,asin[8].Icheckedbothsetsofsamplesandfoundthat,atleastforshadows,Vogelseemstobeproducingmoreaccurateresultswithlessundersampling.AlsoVogelhastheadvantagethatit’sverycheaptocomputeatruntime.Insourcecodeofthedemothereisanarraydeclaredthatstoresbluenoisesamples.
1.2InterleavedGradientNoise
Figure3actuallyusesVogeldisksamplesforallscreen’spixelsbutthatisnotenoughtocompensateforalownumberofsamplesused.Vogeldiskreallyshineswhenusedincombinationwithagoodrandomfunctionappliedper-pixel.Let’ssaythatyouhaveapixelsregionwhereeachpixelintheregionusesthesameVogeldisksamples(showninfigure4)buteachpixelappliesslightlydifferentrotationtoalltheseVogelsamples,suchthatthisrotationvaluespansrangeof[0;2p]fordifferentpixels.Thiswillguaranteethatallpixelsintheregionwillsampletheshadowmapwithdifferentsetsofsamples.
Agoodrandomfunction,calledinterleavedgradientnoise,wasdevelopedby[9].Thisfunctiontakesasinputthepixel’swindowspacecoordinatesandoutputsa"random"numberin[0;1]range.Multiplyingtheresultby2pandpassingasargumentphitoVogelDiskSamplewillresultinveryhighquality,verycheapshadows,aspresentedinfigure5.Usageofinterleavedgradientnoiseistheonlydifferencebetweentheresultfromfigure5and3.Alsonotehowshadowsinfigure5arenicelyroundedascomparedto squary/blockyinfigure2.Thereasonforthisisthatinthelattercaseanaive11x11rectangularfilterwasused.
Figure5:InterleavedGradientNoisetogetherwithVogeldisk.Only16shadowmapsamples.
Listing2presentsinterleavedgradientnoisefunction.
floatInterleavedGradientNoise(float2position_screen)
{
float3magic=float3(0.06711056f,0.00583715f,52.9829189f);
returnfrac(magic.z*frac(dot(position_screen,magic.xy)));
}
Listing2:InterleavedGradientNoise
2Contact-hardeningSoftShadows
Inthissection,wewillseehowcontact-hardeningshadowswork,aspresentedin[7].Afterthat,wewillgothroughtwodifferentwaysthatspeedupthebasealgorithm.
2.1RegularSolution
Toaddcontact-hardening’nesstosoftshadowsallweneedistoknowhowbigpenumbraforagivenpixelis,orinotherwords,howbigtheshadowmapsamplingkernelforagivenpixelshouldbe.Thisisestimatedwithaprocedurecalledaverageblockersearchshowninfigure6.Wehaveareceiver,whoselittleredsquareisthepixelwe’recalculatingshadowsfor.Thereisalsoablockerfloatingabovethereceiverthatblockssomelightandfinallyalightsource(anditsshadowmap).Weneedtofinddepths(inlightspace)ofpixelsthatblocktheredsquarefromlight,averagethosedepthsandusethataveragetocomputesizeofpenumbra inthatarea.
Thinkaboutwhatthisparticularshadowmapfromfigure6contains.Therightsideofthatshadowmapcontainsdepthsoftheblocker,whereastheleftsidestoresthebluereceiver’sdepths.Ifwetakesomekernelaroundthereceiver’sredareaandtakeafewsamples(orangedots)tosampletheshadowmapsomeofthosesampleswillsampletheblocker’sdepths(samplesthatareontherightside)andsomewillsamplethereceiver(samplesthatareontheleftside).Sincetheredarea’sdepthismoreorlessequal(uptosomebiasthatweusuallyemployinshadowmappingtechniques)tothedepthstotheleftwedon’tconsiderthem(thosedepths)asblockers.Butwedotreatasblockersdepthsthataresmaller(closertothelight)thanthedepthoftheredarea,whichinthiscasearedepthsthatcomefromthegreenblocker.Thesedepths,fromthegreenblocker,areaveragedtogethertoyieldaverageoccluder’sdepththatwillletusestimatepixelshadow’spenumbra.
Figure6:Averageblockersearch.Orangedots/samples’depthsarecheckedagainstdepthsintheshadowmap.
Shadowmap’sdepthsthatareclosertothelightareconsideredblockersandareaveragedtogether.
Itisimportanttorealizethatthisalgorithmhasitsdrawbacks.Lookagainatfigure6andtheblocker.
Nowimagineyouhavenotonebuttwosuchblockerswhereoneofthem(callitb1)isveryclosetothelightsourceandthesecondone(callitb2)isveryclosetothereceiver.Sinceshadowmaponlystoresdepthsoftheclosestlayer,thatisblockerb1inthiscase,ithasnoinformationaboutblocker’sb2depths.Anditisblockerb2’sdepthsthatwecareaboutinthiscase.Thisiswhereaverageblockersearchbasedononlyone-layershadowmapwillfail.Anobvioussolutiontothisproblemwouldbetostoremorelayersofdepthsintheshadowmapbutthatwouldbeanoverkill.Fortunately,artifactsthataretheresultofthisdrawbackareoftennegligible.
Listing3showsafunctionthatcalculatespenumbra,whichisavalueusedtoscaleshadowmapsamplingkernel.
floatPenumbra(floatgradientNoise,float2shadowMapUV,floatz_shadowMapView,intsamplesCount)
{
floatavgBlockersDepth=0.0f;
floatblockersCount=0.0f;
for(inti=0;i0.0f)
{
avgBlockersDepth/=blockersCount;
returnAvgBlockersDepthToPenumbra(z_shadowMapView,avgBlockersDepth);
}
else
{
return0.0f;
}
}
Listing3:Penumbracalculation.
FunctionPenumbrafirstfindsaverageblockersdepthandthenconvertsthatvaluetoactualpenumbrawithAvgBlockersDepthToPenumbra.
Listing4showstheimplementationofthatfunction.
floatAvgBlockersDepthToPenumbra(floatz_shadowMapView,floatavgBlockersDepth)
{
floatpenumbra=(z_shadowMapView-avgBlockersDepth)/avgBlockersDepth;
penumbra*=penumbra;
returnsaturate(80.0f*penumbra);
}
Listing4:Averageblockersdepthtopenumbraconversion.
Thisfunctionshouldbeimplementedsuchthatitsuitsyourneeds.
Thebasicideaistotakethedistancebetweenpixelwe’recalculatingshadowsforz_shadowMapViewandaverageblockersdepth.Thebiggerthatdistanceisthebiggerthepenumbrashouldbe.Incaseofthedemoaccompanyingthisarticle,depthsareallinshadowmap’sviewspace.The3rdlineofcodecalculatesthedistanceand"normalizes"it.Lateron,wesquareittogetridoffofpossibleminussignbutalsotomake amorevisibletransitionbetweenfully-hardandfully-softshadows.Finally,thepenumbraisscaledbysomeconstantandbroughtdownto[0;1]range.
Theoriginalformulafrom[7]isabitdifferentfromlisting4andtakeslight’ssizeintoaccountdirectly,makingitmorephysicallycorrect.Itisshowninlisting5.
floatAvgBlockersDepthToPenumbra(floatlightSize,floatz_shadowMapView,floatavgBlockersDepth)
{
floatpenumbra=lightSize*(z_shadowMapView-avgBlockersDepth)/avgBlockersDepth;
}
Listing5:Averageblockersdepthtopenumbraconversionfrom[7].
Youshouldfiddlewiththatfunctiontogetthelookandfeelofsoftshadowsandhard-to-softtransitionthataccomodatesyourneeds.
Forthesakeofcompletenesslisting6showshowpenumbraestimationfunctionisusedinconjunctionwithactualshadowmapsampling.
floatpenumbra=Penumbra(gradientNoise,shadowMapUV,z_shadowMapView,16);
floatshadow=0.0f;
for(inti=0;i<16;i++)
{
float2sampleUV=VogelDiskOffset(i,16,gradientNoise);
sampleUV=shadowMapUV+sampleUV*penumbra*shadowFilterMaxSize;
shadow+=shadowMapTexture.SampleCmp(linearClampComparisonSampler,sampleUV,z_shadowMapView).x;
}
shadow/=16.0f;
Listing6:Shadowscomputationwithpenumbrausedaskernelscale.
Inline9,insteadofusingtraditionalshadowmappingbysamplingshadowmapandcomparingdepths,weusefunctionSampleCmptogetherwith linearClampComparisonSampler samplertolethardwaretakefourclosestsamples,comparethemallandbilinearlyfiltertheresults.Thisismuchfasterthandoingthatinthetraditionalway.
Figure7presentscontact-hardeningsoftshadowsinaction.
Figure7:Regularcontact-hardeningsoftshadows.Weclearlyseehowfarfromthegroundobjectsare.Thered rectangleindicatesanareawithshadowswithhighlyvaryingpenumbrasizehencesomejittering/jaggiescanbeseen.Thesejaggiesarecausedbythefactthatsomepixels(randomizedbyinterleavedgradientnoise)fallinareasofzeropenumbraandsomeinareasofmaximumpenumbrahencevaryinglightness/shadownessofpixels.
2.2PenumbraMask
Penumbraestimation,ifusesthesamenumberofsamplesasactualshadowmapping,canbeequallyexpensiveasactualfullsoftshadowmapping.Sothecostofcontact-hardeningsoftshadowsistwiceasbiginthiscase.Loweringthenumberofsamplesbyhalf(onlyforpenumbra)cansignificantlyreducerenderingtime.Itwillstillbeafewdozenspercentslowerthanfullnon-contact-hardeningsoftshadowsthough.Thesimplestwaytosignificantlyreducerenderingtimeistoapplythesametrickthatweuseallthetimeinreal-timerendering–renderatlowerresolution.
Shadowsthemselvesneedfull-screenresolutionbecausetheyareveryhighfrequencyphenomenon(therearerapidchangesinintensityofthesignal–thereisashadowofanobjectforafewpixelsandthensuddenlytherecanbeafullylitarea,withshadowendingabruptly).Butpenumbraisactuallychanginggraduallyquiteoftensoitisagoodcandidateforrenderingatlowerresandthenupsamplingtheresults.Inessence,wesplittheshadowmaskgenerationpassintwopasses:firstonethatcomputespenumbramaskatquarterres(penumbramaskpass)andsecondthatcomputesactualsoftshadowsbutsamplespenumbrafromthepenumbramask(shadowmaskpass).
Figure8showstheresultofthatchange.
Figure8:Softshadowswithpenumbrarenderedinaseparatepassinlowerres.
Asyoucanseeit’squitenicewiththeexceptionthatpenumbraoftenendsabruptly(lookatthoseoblongpoles’shadows).Ifwelookathowupsampled(bilinearly)penumbramasklookslikewewillunderstandwhy(figure9).Wehaveforinstanceareaswithmaximumpenumbra(whitecolor),interleavedwithzeropenumbraareas(black color).Theproblemwastherebeforebutbecausewerenderedinfullresinterleavedgradientnoisewasequallygoodforsamplingforbothshadowsandpenumbra.Nowthatwerenderpenumbrainlowerresitisnotenough.Notallislostthough.Asimplesolutionistoincreasepenumbra’ssamplingkernel’ssize.Multiplyingby1:2(withregardtowhatisusedforactualshadowmapping)willresultinwhatisseeninfigure10.
Figure9:Penumbramaskvisualized.
Figure10:Softshadowswithpenumbrarenderedinaseparatepassinlowerresusingkernelscalingtrick.Theredrectanglesoutlinethesametypesofareasthatweredescribedin captionoffigure7.Thistime,becausewe’rerenderinginlowerresandupsample,jitteringismoreobviousandlookslikesortofbubbles(despitethekernelscalingtrick).Itwouldn’tbeabigproblemifitwasn’tforafactthatverydistractingswimmingoccursundercameramotion.
Figure10’scaptiondescribesproblemwithswimmingbubbles–eventhoughwescaledthekernelitisnotenoughtoeliminateallartifacts.Weneedanotherhacktofixthis.Thesolutionistoblurthepenumbramasktosortofsoftenthebubbles.Figure11showshowpenumbramasklookslikewhenblurredwith7*7separableGaussianblurwiths=3.Figure12showsthefinalresultofbothapplyingkernelscalingandblurringthepenumbramask.
Figure11:Penumbramaskblurredvisualized.
Figure12:Softshadowswithpenumbrarenderedinaseparatepassinlowerres.Kernelscalingof1:2forpenumbraestimationapplied.Also,penumbramaskusedisblurredwith7*7separableGaussianblurwiths=3.
Ifyouhaveaneyefordetailyouwillnoticethattheproblemwehadbefore,withabruptlyendingpenumbra,hasbeenreintroducedsomewhat(compareshadowofapostontheleft).It’snotasbadasinfigure8butlittleworsethaninfigure10.Sowhyisitbackexactly?Becauseweblurredthepenumbramaskandthusmadezero-penumbraregionstobleedontofull-penumbraregions.Wecanfixthisagainbysimplyscalingpenumbrakernelbymorethan1:2.Ofcourse,wecannotincreasethatscaleindefinitelyasatsomepointpenumbraestimationwillstarttomissoccludersandtheresultwillnotbeconformantwithwhattheshadowmaskpassissampling.Buttherearevaluesinrangeabout[1:2;1:6]wherethe vast majorityofartifactsisgone.
Blurringpenumbramaskintroducestwootherproblemsthatdon’tnecessarilyneedtobefixedfortheeffecttolookgoodbutit’sworthknowingaboutthem.
Asweknow,penumbramaskiscalculatedinlowerresinscreenspace.Weblurittoalleviateartifactsthatstemfromrenderinginlowres.Butbecausewe’reworkinginthecamera’sscreenspace,blurring"just likethat"willmakepenumbramasktobleed/blendbetweengeometriesthatareatpossiblyverydifferentdepths/distancesfromthecamera.Thisisahugeproblemforscreenspaceeffectslikescreenspaceambientocclusionwheretheyareimmediatelyseen.Butitturnsoutthisisalmostnoproblemforpenumbramaskcomputation.Ifyouwantyoucandosmartblurringbysamplingthedepthbufferandcomparedepthsbutthatisreallynotnecessaryinmyopinion.
Anotherproblemisabitmoreannoying.Becauseweblurinscreenspacewithaconstant-sizekernel,thefartherawaythecameragoesfromashadow,themorepenumbrablurringkicksin.Thiswillhavetheeffectofdecreasingpenumbrasizeforshadowsthatarefarawayfromthecamera(makingsoftshadowsharder).Tofightthisjustscalepenumbrablurkernelusingthedistancefrompixeltothecamera,justasitisusedineffectslikescreenspaceambientocclusion.Notethatthissolutionisnotimplementedinthedemoapplication,butIquicklyprototypeditlocallyandfoundittoworkasexpected.
Penumbrablurringisamusttoavoidswimmingbubbles.Butitispossibletoavoidblurringinofsmallervaluesontobiggervalues(thiseffectdecreasespenumbrasize).Beforeblurringthepenumbramaskwith 7*7kernelyoucanfirstuse7*7maxfilter.Thesetwo,maxandblurcombined,willhavetheeffectofonlyblurringoutwardssonobiggervalueswilleverbewashedoutbysmallerones.Thisideaactuallysoundsbetterthanitworks.Theproblemisthatmaxfilterlosesinformation,penumbrainformationinthiscase,andthisleadstoalotofsubtleartifacts.HenceI’mnotrecommendingusingthatfilter.I’vetriedamultitudeofcombinationsofmaxandGaussianblurfiltersandalwaysretractedtoonesingleGaussianblurpass.Alsoduetoperformancereasons.
Thereisactuallyonemorealternativesolutiontodrawbacksthatblurringofthepenumbramaskbrings.Lookatlisting3,line27.Now,lookatfigure9.Asyoucansee,whenapixelisfullylit,i.e.itdoesnothaveanyoccluders,wereturn0.Butwhatifwereturned1instead?Thepole’s(ontheleft)penumbrawouldnotbeaffectedbyblurringatallandthatshadowwouldlookcorrectregardlessofdistanceofapixelfromthecamera.Atleast,inthiscase,itwouldbeallfine.Inothercases,wewouldhavereversedartifacts.Earlierourproblemwasthatdistantshadowsthatshouldbesoftwouldbecomehard.Hereitwouldbetheopposite–distantshadowsthatarehardwouldbecomesoft.UnlesswescaledGaussianblurkernelaccordingtopixel’sdistancefromthecameraofcourse.
Onemoreimportantpropertyofoutputting1bydefaultinsteadof0isthatscalingthekernelinpenumbra
maskpassisnolongernecessary.Sointheendyoumightprefertooutput1ondefault.
Therehasbeenalotofdicussionaboutproblemsrelatedtorenderingpenumbrainaseparatepassinlowerresandhowtofixthem.Let’snowsumthingsup.Togetcontact-hardeningsoftshadowsweneedtoknowwhatpenumbraapixelhas.Tocalculatepenumbrafastweneedtodoitinapassseparatefromshadowmaskbecausewecandoitinlowerresolutionaspenumbraisratherlowfrequencyphenomenon.Whenwedoso,thingsgenerallyworkexceptforsomeartifacts.Tofixthoseartifactsitisusuallysufficienttoblurthepenumbramaskandincreasekernelsizewhensamplingtheshadowmapinthepenumbramaskpass.Thelatterisnotnecessaryifpenumbramaskoutputs1insteadof0ondefault.
Atthispoint,youshouldhavenicecontact-hardeningsoftshadowsthatarealmostasfastasnon-contacthardeningsoftshadows.ThereisonelastproblemthoughthatIwasn’tabletosolve.Again,becausewe’rerenderingpenumbrainlowresandwealsoblurit,therearesomeshadowsswimmingwhereareaswithhighlyvaryingpenumbrasblend.Thisusuallyisaproblemifyouusetoosmallshadowmapfortoobigscene’sarea.Butthatissomethingyouusuallyavoidbyusingcascadeshadowmappingforinstance.Soallinalllittleshadowsswimmingthatisleftinthosesituationscanbeneglected.
2.3MinFilter
BeforeIevercameupwithanideaofpenumbramaskIfirstcamewithanideaofusingaminfilteronashadowmask.ThissolutionisnotappealingtometodaybutIwantedtomentionit.Itisimplementedinthedemoapplicationbutit’sturnedoff"hardcodingly"soyouneedtofiddlewiththesourcecodetobeabletoturniton.
InstandardCHSSwefirstfindaverageblocker’sdepth.SincethatisexpensivepartofthealgorithmIlookedforwaystospeedthisup,eveniftheyareonlyapproximations.Icameupwithanideato,insteadoffindingaverageblocker’sdepth,useminimumdepthfromtheshadowmapoversometexelsregion(regionofmaxpenumbrasizeofcourse).Inaverageblocker’sdepthwesampleabunchoftexels,sayina5*5shadowmap’sregion,toseeiftheyareclosertothelightsourcethanthepixelwe’recalculatingpenumbrafor;thosethatareclosertothelightsourceareaveragedtogether.Theideaofminfilteristojusttakeasingleminvalueoverthewholefilterregion55andusethatdirectlyinsteadoftheaverageblocker’sdepth.Thereasonwhyminfiltermakessenseisthatthisvaluewillalwaysbeclosertothelightsourcethanthepixelwe’recalculatingpenumbrafor.Soitissomekindofblocker’sdepth.Notverysophisticatedandrathercrudeonebutstill.
Butwhywouldyoudothatifyoucanjustcomputetheaverage?Becauseapplyingminfilteroveraregiondoesnotrequirefromusdoingitintheshadowmaskpass.Wecancomputeminshadowmapoverashadowmapinaseparatepass.Moreover,minfilterisaseparablefilter,whichmeanswedon’thavetotaken2samples(assumingn*nregion)butwecandotwopasses,horizontalandvertical,eachrequiringnsamples,totalling2n.Wetradedquadraticcomplexity(dependentonthescreen’sresolution)forlinearcomplexity(dependentonshadowmapsize).
Oneproblemwiththeminfilteristhat,byitsnature,itlosesalotofinformation.Itwillcreatelumpsofpixels,withallpixelsinalumphavingthesamedepth,andthesedepthswilloftendiffersignificantly.Asolutiontothatistoblurtheminshadowmap.Later,intheshadowmaskpass,yousimplysampletheblurredminshadowmapandusethatinsteadfortheaverageblocker’sdepth.Blurringtheminshadowmapwillhavetheeffectofpushingthoseminvaluesabittowardspixels,fartherfromthelightsource,whatcouldpossiblypushthembehindpixelsthatusethem(an"occluder"endsupbeingbehindapixel).Inpractice,however,thatisnotaproblem.
Figure13showstheresults.Comparethattofigure12.Onechangeisinroundshadowsofanadornmentinthetopmiddlepartofthefigure.Figure12handledthatnicelywhereasinfigure13weseehowmindepthfromsomedistantgeometryisfloodingneighbours,includingtheadornment,andmakesitspenumbraverylarge.Anotherproblemoccursontheshadowsofacolumnthatisspreadfromthelefttothecenterofthefigures.Boththetoppole’smindepthisfloodingthecolumnaswellassomedistantgeometryonthebottom
leftofthepicture,bothmakingthecolumn’sshadowsinthemiddleverypenumbra-large.
Figure13:Softshadowswithpenumbracalculatedusingminfilterappliedtoshadowmap.
Soweknowminfilterkindofsucks.Doesithaveanyadvantagesoversolutionwediscussedintheprevioussection?Yes.Becauseminfilterworksinshadowmap’sspaceandnotinscreenspacewedon’thaveanyswimmingartifactsundercamera’smotion.Thatisanicefeatureofthatalgorithmasitmakesitmorestable.
3Checkerboarding
Sincetheadventof"evenmorenext-gen"consoleslikePS4Prodevelopersdecidedtoaimevenhigherintermsofqualityandtrytouchingthemagical4kbarrier.PS4ProindeedhasmeasurablymorecomputingpowerthanbarebonesPS4butnotenoughtopulloff4kjustlikePS4handledfullHD.Sodevelopersresortedtokindofcheatingtoachieve4kwithonlyhalfofthepoweravailable.Tocheckerboarding.
Figure14:Checkerboardpattern.
Lookatfigure14.Theideaofcheckerboardingistoperformcalculationsonlyeveryotherpixelincheckerboardpattern.Thiswayweonlyneedhalfofcomputingpower.Themissingpixelsarethenfilledinsomeway.Thetrickypartisfiguringoutthis"someway".Thewaythatisimplementedinthedemoisbyaveragingtheneighbours.Takealookatthefigureagain.Assumeblackpixelsrepresentthoseforwhichwecalculateshadows.Nowthinkaboutoneofthewhitepixels,whichisnotcomputed.Theideaofaveragingistosimplytakethefourneighbouringblackpixels,averagethemandplugthatvalueintothewhitepixel.Thereisonecatchthough.Theneighboursmightbelongtodifferentgeometriesandsinceshadowsareratherahigh-frequencyphenomenon(asopposedtopenumbra)averagingthemthoughtlesslywillendupinartifacts.Todoitrightweneedtoperformdepthcomparisonsoftheneighbourswiththepixelwe’reaveragingforandonlyblendinthoseneighbourswhosedepthsdon’tdiffermuch–it’stheverysamewaywedoitwhenperformingbilateralupsamplinginmanypostprocessingalgorithmstosaveoncomputingpower,likeSSAOordepthoffield.
Alternatively,insteadofcomputingaverageswecouldemploytemporalfilteringandflipthecheckerboardpatterneverysecondframe.Thisshouldresultinshadowsqualitybeingonparwithfullresshadowmask.
Ifyouhaveevertriedcomputingahigh-frequencysignalinlowresandthenupsampleityouknowthat thisjustdoesnotworkwell.It’shardtomakeSSAOstable,whichisquitelow-frequencyeffect,letaloneshadowmask.Thatistruewhenthelowreswearetalkingaboutmeanscomputinginabufferwhichishalfofthescreen’swidthinwidthandhalfofthescreen’sheightinheight,orinotherwordsfourtimessmaller.Inthisscenarioproblemsaremostlyvisibleatobliqueangles.Whenthecameraislookingstraightaheadonawallwithsomeeven-not-that-smooth-shadowsit’sokay.Butoncethecameraisatanobliqueangletosomefloor,swimmingartifactsappear,stemmingfromundersampling,andtheyareextremelydistracting.Youmightincreasetheresolutionofthelowresbuffersuchthatoverallitistwicethesizeofthescreen’sresolutionbutthatwillnoteliminatetheproblem–swimmingwillremain,althoughitwillbereducedabit.
Ifinsteadoftryingtosqueezeasmuchaspossiblefromrenderinginlowresandupsamplingyouwillrenderinfullresbutincheckerboardpattern,whatrequiresonlyhalfofmemoryandcomputingonlyhalfofthepixels,youwillfindoutthattheswimmingproblematobliqueanglesiscompletelygone.Thatisthemostpowerfulfeatureofthisalgorithm.Therewillstillbesomeminorswimmingartifactsinverydetailedareasofthescreenwhereyoujustdon’thavethatinfoyouwouldhavehadifyouhadstayedinstandardfullresbuttherearehighchancesthattheresultwillbegoodenough.
Truthtobetold,thereisalittlelieinstatementthatcheckerboardingcostshalfasmuchasfullrespath.Checkerboardingrequiressmartupsampling,whichincludessamplingthedepthbuffer,andthatcostssometime.Thispass’stimeishoweverconstant.Sothemoreexpensivethecheckerboardrescomputationpasstheless,relatively,upsamplingcosts.
Checkerboardingissupportedinthedemoapplication.Theapplicationcreatesaseparaterendertargetforstoringresultsofcheckerboardedshadowmask.Thisrendertargetisofheightthesameasthescreen’sheightbutwithhalfthewidth.Checkerboard-computedpixelsare"packed"intothisbufferandlater,intheupsamplepass,unpacked.Also,themissingpixelsarefilledin(interpolated).Sourcecodethathandles checkerboardedshadowmaskinshadersislocatedintwoplaces.First,therearesomeminormodifications
totheshadowmaskpassitselftohandlecheckerboarding.Thesecondplaceisadedicatedshaderthatperformsupsampling.Thecodeisrathereasytofollow.Oneimportantchangeisinthegradientnoiseformulaintheshadowmaskpass.Usingtheoriginalformulamadesampleslooklessrandomthaninfullres.IexperimentedwiththeformulaandfoundonemodificationtoworkreasonablywellbutIthinkthatonecouldcomeupwithamorescientific-basedformula.Let’snowseetheonlymodificationtotheshadowmaskshadercode,presentedinlisting7.
float2CheckerResPixelCoordToFullResUV(int2pixelCoord)
{
pixelCoord.x*=2;
if(pixelCoord.y%2==1)
pixelCoord.x+=1;
return((float2)pixelCoord+float2(0.5f,0.5f))*screenPixelSize;
}
...
#ifndefUSE_CHECKER
float2uv=input.texCoord;
floatgradientNoise=TwoPi*InterleavedGradientNoise(uv*screenSize);
#else
float2uv=CheckerResPixelCoordToFullResUV(pixelCoord);
floatgradientNoise=TwoPi*InterleavedGradientNoise(uv*screenSize*float2(1.0f,4.0f));
#endif
Listing7:Checkerboardingintheshadowmaskshader.
Asyoucanseetheonlymodificationtotheinterleavedgradientnoisefunctionismultiplicationofthey-coordinateby4.
Intheupsamplingshaderafewoptimizationscanbeturnedon/offviadefineswhatcanmaketheshaderhardertofollowbutitreallydoesnomagic.Oneoftheoptimizationsisusing16-bitfloating-point(linear)depthbuffer.Secondandthirdones,moreimportant,involveusingGatherRedinstructions,bothontheshadowmaskandthedepthbuffer(bothareone-channeltextures),tosaveonsamplinginstructions.
Figure15showscheckerboardedshadowmaskinaction.Withcarefuleyeinspectiononecanindeedseethattheditheredsmoothshadowsareabitmorecrudeinthecheckerboardvariant.
Figure15:Shadowmaskcalculatedincheckerboardresolutionandupsampled.
4Demo
Asitwasalreadymentionedthereisthedemoapplicationaccompanyingthisarticle.ItisimplementedusingframeworkMaxestFramework[13]andusesDirect3D11asrenderingAPI.Itislocatedin[11].
Keyconfiguration:
WSAD+mouse–cameramovement.
Shift–speedingup.
Space–alignsthelightdirectionwiththecurrentcamera’sviewvector.
Q–ifpresseditwilldisplaypenumbramask.
E–ifpressedpenumbramaskblurringisturnedoff,whatwillletseebubblesartifacts.
F4–recompileshaders.
ESC–exit.
Thedemo’sHUDdisplaysamultitudeofvariousoptionstoturnon/offandtweak.
4.1PerformanceComparison
Table1:PerformanceonGeForce660GTXin1080pand20482shadowmapresolution.Allshadowmaskpassestake16samples.PenumbraMasktakes32samples.
Table1showsexemplaryperformancecomparisonsfortheviewpresentedinthescreenshotsshownthroughoutthearticle.Noteherethatpenumbrapasstakestwotimesmoresamples(32)thantheshadowmaskpass(16)whileintheoriginalpaper[7]thisproportionwasreversed.Onereasonisthatinthisdemoweusehardwarebilinearshadowmapsamplesfilteringsointheendwearenotusing16shadowmapsamplesbutactually64.Anotherreasonisthatwecouldinfactuseasmallernumberofsamplesforpenumbra,like16,andthisworksgoodbutonlyintheoriginalPCSSimplementation.Whenusingpenumbramaskpasswithonly16samplesthensomesmallartifactsstarttopopup.Ifound32samplestoworkgoodinquarterresofpenumbramaskpassand16tobeokayfororiginalPCSS.Thedemoletsyouswitchbetween16and32samplesusedinpenumbraestimationtoseethedifference.
RegularShadowMaskisapassthatusesnocontact-hardeningandwhoseresultsareshowninfigure5.Next,thereisRegularCHSSShadowMaskwhoseresultsareinfigure12.CHSSisroughlytwotimesmoreexpensivethanno-CHSSapproach.Noteherethatpenumbrapasstakes32sampleswhereasshadowmasktakes16.ThatwouldimplythatRegularCHSSShadowMaskshouldbemorethantwotimesasexpensiveasRegularShadowMask.It’snotbecauseasyoucanseeinfigure12mostshadowshavelowpenumbras,thusshadowmapsamplingkernelsarenotwide.Incasewhereshadowmapsamplingkernelsofallscreen’spixelsaremaxedtheperformancedifferencewouldbegreater.
Nowgoingtothegist.PerformancemeasurementsforShadowMaskusingPenumbraMaskclearlydepictgreatperformanceboost.TogetherwithPenumbraMaskboththesepassesjointlytake0:74mswhatismorethantwotimesfasterthanRegularCHSSShadowMaskpassat1:72ms.ThereasonforthatisthatinRegularCHSSShadowMaskmostperformancehitcomesfrompenumbraestimation(32shadowmapsamples)andthispartiswhathasbeenoptimized.NotethatperformanceofPenumbraMask+ShadowMaskusingPenumbraMask(0:74ms)isevenbetterthanRegularShadowMask’s(0:84ms).Thereasonforthatsurprisingdifferenceisobviouslythatforatestedscenemostscreen’spixelsinpenumbravarianthavesmallshadowmapsamplingkernelsandthatiswhatismostlyresponsibleforgreatperformance.However,insituationswherethewholescreenisfilledwithmax-penumbrapixelstheperformanceofPenumbraMask+ShadowMaskusingPenumbraMaskisonlyabout10-15%worsethaninRegularShadowMaskpass.
Finally,checkerboardingindeediscapableofspeedingupRegularCHSSShadowMask.Itdoesnotperformwellwhenweusepenumbramaskthough.ThereasonisthatShadowMaskwithPenumbraMaskpassbecomessocheapthatcheckerboarding-relatedoperationslikeupsamplingbecomeveryexpensive,relatively.Thetake-awayforcheckerboardingistouseitonlywhenthepassthatitisoptimizingisrelativelyexpensive.
4.2FutureWork
Thisworkcanfurtherbeimproved,bothonthepenumbragenerationsideaswellshadowmask.
Usingpenumbramaskgreatlyimprovesperformancebutcomeswithsomeswimmingartifactswhichmightormightnotbeaproblem.Possiblyusingtemporalfilteringcouldhelphere.Insteadofrenderingpenumbramaskinquarterreswecouldstillrenderinfullresbutonlycomputeonepixelina2*2quad,fillingtheremainingpixelsinsubsequentframes.Giventhatpenumbramaskcontainslow-frequencydatathiscouldwork.
Whenitcomestoshadowmaskitcouldbebeneficialtousemorethanonemipmapoftheshadowmap.Thiswouldbejustforefficiencypurposes.Byincreasingfilteringradiusweinducemoreperformancehitduetotexturecachemisses.Forsamplesthatarefartherwecouldsamplethesecondmipmapoftheshadowmap.Thisappliestopenumbramaskaswell.
Implementationoftemporalcheckerboardingisalsotempting.Qualityshouldbecomparabletofullresshadowmaskandperformanceincreaseshouldbesignificant.Thiscouldbeparticularlydesirableifweneededmoreshadowmapsamples.
5Conclusions
Thepointofthisarticlewastopresentanefficientwaytorendercontact-hardeningsoftshadows,basedon originalideafrom[7].Ibelievethatseparationofpenumbraestimation(inlowerres)andshadowmaskrenderingisthewayasitallowsforachievingperformancenearlyasgoodaswhenrenderingregularsoftshadows.Thereisacostinformofsomeswimmingartifactswhichshouldnotbeadeal-breaker.TheoverallimagequalityboostthatCHSSbringsisatleastworthexploring,giventhatperformancewithregardtonon-CHSSwillsufferonlyatinybitwithpenumbramaskapproach.
Inthispaperwediscussed:
EfficientsoftshadowswithVogeldiskandinterleavedgradientnoise.
OriginalPCSSimplementation.
Penumbramaskasawaytodecouplepenumbraestimationfromshadowmaskgenerationtogreatlyimproveperformance.
Minfilterasanotherwaytodecouplepenumbraestimationfromshadowmaskgeneration.Worsequalityandperformancethaninpenumbramaskapproachthough.
Useofcheckerboardingwithshadowmaskgeneration.
6Acknowledgments
IwouldliketothankKrzysztofNarkowiczofEpicGamesforproofreadingthisarticleandAdamCichockiofMicrosoftforhisideastouseVogeldiskandcheckerboarding.
References
[1]4rknova.Shadertoy:Vogel’sDistributionMethod.https://www.shadertoy.com/view/XtXXDN.
[2]LouisBavoil.AdvancedSoftShadowMappingTechniques.http://gamedevs.org/uploads/advanced-soft-shadow-mapping-techniques.pdf.
[3]AlexandreDevert.Spreadingpointsonadiscandonasphere.http://blog.marmakoide.org/?p=1.
[4]MartonTamasetal.Chapter4.1inGPUPro6:PracticalScreenSpaceSoftShadows.https://www.crcpress.com/GPU-Pro-6-Advanced-Rendering-Techniques/Engel/p/book/9781482264616.
[5]PeterSikachevetal.Chapter2.1inGPUPro6:Next-GenRenderinginThief.https://www.crcpress.com/GPU-Pro-6-Advanced-Rendering-Techniques/Engel/p/book/9781482264616.
[6]ThomasAnnenetal.ExponentialShadowMaps.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.177&rep=rep1&type=pdf.
[7]RandimaFernando.Percentage-CloserSoftShadows.http://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf.
[8]MikkelGjolandMikkelSvendsen.TheRenderingofInside.https://github.com/playdeadgames/publications/blob/master/INSIDE/rendering_inside_gdc2016.pdf.
[9]JorgeJimenez.NextGenerationPostProcessinginCallofDuty:AdvancedWarfare,2014.http://www.slideshare.net/guerrillagames/killzone-shadow-fall-demo-postmortem.
[10]AndrewLauritzen.Summed-AreaVarianceShadowMaps.https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch08.html.
[11]WojciechSterna.Contact-hardeningSoftShadowsDemo.https://github.com/maxest/MaxestFramework/tree/master/samples/shadows.
[12]WojciechSterna.DirectX11,HLSL,GatherRed.http://wojtsterna.blogspot.com/2018/02/directx-11-hlsl-gatherred.html.
[13]WojciechSterna.MaxestFramework.https://github.com/maxest/MaxestFramework.
[14]YuryUralsky.EfficientSoft-EdgedShadowsUsingPixelShaderBranching.https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter17.html.
[15]Wikipedia.Shadowmapping.https://en.wikipedia.org/wiki/Shadow_mapping.
Abouttheauthor:
Wojciechhasover6yearsofprofessionalengine/technology/gameprogrammingexperience,withstrongC++andgraphics/renderingemphasis,enrichedwithexperiencein
otherareasincludingmultiplatform,multithreadedandmultiplayerprogramming, high-performancecomputing(GPGPU)andmachinelearning(firststepshere).Wojciechiskeenon
understandingmathematicaland/oralgorithmicfundamentalsoftoolsheuses.
Thispaperwasoriginallypublishedontheauthor'shomepageandisreproducedherewithkindpermission.
[WaybackMachineArchive]
Cancel
Save
5Likes
2Comments
Share:
LatestComments
IcyTower
Hey,
thebookyouwroteonOpenGLwasthefirstgraphicsbookIread
I'mdoingaverysimilarthing,butinraytracing.Ialsoseparatedshadowblurringfromdist-to-occluderspreading/blurring.Mytextureslookverysimilartoyours.IalsotriedtocombineitwithShadowMapping,butthereisoneproblemthatisnotfixable.ShadowMapsstoreonlytheclosest-to-lightoccluder,thushavingproperhardshadowsfromobjectpartiallyshadowedbyotherobjectsisimpossible.Inraytracingthatisnotapoblem-wecangetdistancetotheclosest-to-surfaceoccluder.Butthereisstilloneseriousproblemwithsuchprocessing-someregionsarecoveredinhardandsoftshadowsatthesametime,whichrequiresstoringmultiplevaluesofdist-to-occluderandmultipleshadowvaluesperpixel(asthoseshadowshideundereachotherbeforeblurring,butshouldbevisibleafterblurring).Whenyouprocesshard/softshadowsseparatelyitobviouslyalsohelpswithshadowbending/bubblesetcwheretheyintersect.Butabunchofproblemsariseaswellwithcombiningthem
Cancel
Save
September11,201809:19AM
vladAlex420
IfyouwanttovisualizehowVogelDiskworksIhaveimplementeditindesmos:https://www.desmos.com/calculator/ewz1wocedf
Cancel
Save
November26,202106:02PM
Youmustlogintojointhediscussion.
Don'thaveanaccount?Signup!
FeaturedTutorial
...
maxest
Publisher
Advertisement
RecentTutorialsbymaxest
maxesthasnotpostedanyothertutorials.Encouragethemtowritemore!
Advertisement
Reticulatingsplines
AboutGameDev.net
TermsofService
PrivacyPolicy
ContactUs
Copyright(c)1999-2021GameDev.net,LLC
BacktoTop