Fast Rounded Rectangle Shadows - Made by Evan

2025-01-09

文章推薦指數： 80 %

投票人數：10人

Rectangular drop shadows are used all over the place in UI design and are an important feature to accelerate, especially given the rise of animation in ... EvanWallace Twitter GitHub ThisdemonstratesashaderIcameupwithforconstant-timesingle-steproundedrectangledropshadowsontheGPU. Ibecameinterestedinthisproblemafterreadingthatbrowsersstillrenderboxshadows ontheCPU anduploadthemtotheGPU.RectangulardropshadowsareusedallovertheplaceinUIdesignandareanimportant featuretoaccelerate,especiallygiventheriseofanimationininterfacedesign. Youcanpausetheanimationandtogglebetweenroundedandsquarecornersforcomparison: Userectangleshader Useroundedrectangleshader Pauseanimation FastRectangleBlur There'sa well-knownshortcut forrenderingdropshadowsofaxis-alignedrectangles. Itturnsoutthata2Ddropshadowcanbedefinedasthemultiplicationoftwoperpendicular1Dblurredboxes. Sincethere'saclosed-formsolutiontotheconvolutionofa1Dgaussianwitha1Dbox(it'sjustthepiecewiseintegralofagaussian),thisbecomesaconstant-timesingle-steprenderingalgorithm. Itlookslikethis: //License:CC0(http://creativecommons.org/publicdomain/zero/1.0/) //Thisapproximatestheerrorfunction,neededforthegaussianintegral vec4erf(vec4x){ vec4s=sign(x),a=abs(x); x=1.0+(0.278393+(0.230389+0.078108*(a*a))*a)*a; x*=x; returns-s/(x*x); } //Returnthemaskfortheshadowofaboxfromlowertoupper floatboxShadow(vec2lower,vec2upper,vec2point,floatsigma){ vec4query=vec4(point-lower,upper-point); vec4integral=0.5+0.5*erf(query*(sqrt(0.5)/sigma)); return(integral.z-integral.x)*(integral.w-integral.y); } Theintegralofthegaussianf(x)=exp(-x^2/(2sigma^2))/(sigmasqrt(2pi))isexactlyF(x)=(1+erf(x/(sigmasqrt(2))))/2,whereerf()istheerrorfunction. FastRoundedRectangleBlur Unfortunatelythere'snoclosed-formsolutionforaroundedrectangledropshadow. Afteralotofexperimentation,I'vefoundthebestapproachistousetheclosed-formsolutionalongthefirstdimensionandsamplingalongtheseconddimension. Ifthesamplelocationsarechosenintelligently,onlyasmallfixednumberofsamplesareneededforagoodapproximation. Here'sastraightforward,unoptimizedimplementation: //License:CC0(http://creativecommons.org/publicdomain/zero/1.0/) //Astandardgaussianfunction,usedforweightingsamples floatgaussian(floatx,floatsigma){ constfloatpi=3.141592653589793; returnexp(-(x*x)/(2.0*sigma*sigma))/(sqrt(2.0*pi)*sigma); } //Thisapproximatestheerrorfunction,neededforthegaussianintegral vec2erf(vec2x){ vec2s=sign(x),a=abs(x); x=1.0+(0.278393+(0.230389+0.078108*(a*a))*a)*a; x*=x; returns-s/(x*x); } //Returntheblurredmaskalongthexdimension floatroundedBoxShadowX(floatx,floaty,floatsigma,floatcorner,vec2halfSize){ floatdelta=min(halfSize.y-corner-abs(y),0.0); floatcurved=halfSize.x-corner+sqrt(max(0.0,corner*corner-delta*delta)); vec2integral=0.5+0.5*erf((x+vec2(-curved,curved))*(sqrt(0.5)/sigma)); returnintegral.y-integral.x; } //Returnthemaskfortheshadowofaboxfromlowertoupper floatroundedBoxShadow(vec2lower,vec2upper,vec2point,floatsigma,floatcorner){ //Centereverythingtomakethematheasier vec2center=(lower+upper)*0.5; vec2halfSize=(upper-lower)*0.5; point-=center; //Thesignalisonlynon-zeroinalimitedrange,sodon'twastesamples floatlow=point.y-halfSize.y; floathigh=point.y+halfSize.y; floatstart=clamp(-3.0*sigma,low,high); floatend=clamp(3.0*sigma,low,high); //Accumulatesamples(wecangetawaywithsurprisinglyfewsamples) floatstep=(end-start)/4.0; floaty=start+step*0.5; floatvalue=0.0; for(inti=0;i<4;i++){ value+=roundedBoxShadowX(point.x,point.y-y,sigma,corner,halfSize)*gaussian(y,sigma)*step; y+=step; } returnvalue; } InstructioncountcanbereducedfurtherusingforwarddifferencingandothertricksbutI'veleftthatstuffoutsinceitobfuscatesthealgorithm. OtherApproaches Dropshadowsaredefinedastheconvolutionofacirculargaussianblobwiththemaskoftheobjectreceivingtheshadow. 2DconvolutionisprohibitivelyexpensivebecauseeachpixelrequiresO(r2)samples,whereristheradiusofthegaussian. Sinceconvolutionbyagaussianblobisseparable,theseblursareusuallydoneusingtwosuccessive1Dconvolutions,onealongxandonealongy,whichbringsthecomplexitydowntoO(r). Thisismuchbetterthannaiveconvolutionbutisstillprettyexpensive. GPUblurscanusetrickssuchas double-samplingusinglinearinterpolation and downsamplingbeforeblurring butanytexturesamplingtechniquewillsufferfromtheoverheadofextramemory,extrafill-rate,andextrarenderpasses. DropshadowscanalsoberenderedontheCPUusingrepeatedapplicationofa movingaveragebox-blur toapproximateagaussianblur. ThisworksouttoO(1)costperpixelassumingtheblurradiusismuchlessthanthesizeoftheimage. Everyadditionalboxblurpassimprovestheapproximationandinpracticestoppingafterthreepassesisfine. ThisisalegitimateapproachbutthedatauploadtotheGPUendsupbecomingasignificantbottleneck.