Introducing FlockDB - Twitter Blog

2025-01-20

文章推薦指數： 80 %

投票人數：10人

FlockDB measures the latency distribution of each query type across each service (MySQL, Kestrel, Thrift) so we can tune timeouts, and reports ... IntroducingFlockDB By Monday,3May2010 Linkcopiedsuccessfully Twitterstoresmanygraphsofrelationshipsbetweenpeople:whoyou’refollowing,who’sfollowingyou,whoyoureceivephonenotificationsfrom,andsoon. Someofthefeaturesofthesegraphshavebeenchallengingtostoreinscalablewaysaswe’vegrown.Forexample,insteadofrequiringeachfriendshiptoberequestedandconfirmed,youcanbuildone-wayrelationshipsbyjustfollowingotherpeople.There’salsonolimittohowmanypeopleareallowedtofollowyou,sosomepeoplehavemillionsoffollowers(like@aplusk),whileothershaveonlyafew. Todeliveratweet,weneedtobeabletolookupsomeone’sfollowersandpagethroughthemrapidly.Butwealsoneedtohandleheavywritetraffic,asfollowersareaddedorremoved,orspammersarecaughtandputonice.Andforsomeoperations,likedeliveringa@mention,weneedtodosetarithmeticlike“who’sfollowingbothoftheseusers?”Thesefeaturesaredifficulttoimplementinatraditionalrelationaldatabase. Avalianteffort Wewentthroughseveralstoragelayersintheearlydays,includingabusiveuseofrelationaltablesandkey-valuestorageofdenormalizedlists.Theywereeithergoodathandlingwriteoperationsorgoodatpagingthroughgiantresultsets,butnevergoodatboth. Alittleoverayearago,wecouldseethatweneededtotrysomethingnew.Ourgoalswere: Writethesimplestpossiblethingthatcouldwork. Useoff-the-shelfMySQLasthestorageengine,becauseweunderstanditsbehavior—innormaluseaswellasunderextremeloadandunusualfailureconditions.Giveitenoughmemorytokeepeverythingincache. Allowforhorizontalpartitioningsowecanaddmoredatabasehardwareasthecorpusgrows. Allowwriteoperationstoarriveoutoforderorbeprocessedmorethanonce.(Allowfailurestoresultinredundantworkratherthanlostwork.) FlockDBwastheresult.Wefinishedmigratingtoitabout9monthsagoandneverlookedback. Avaliant-ereffort FlockDBisadatabasethatstoresgraphdata,butitisn’tadatabaseoptimizedforgraph-traversaloperations.Instead,it’soptimizedforverylargeadjacency lists,fastreadsandwrites,andpage-ablesetarithmeticqueries. Itstoresgraphsassetsofedgesbetweennodesidentifiedby64-bitintegers.Forasocialgraph,thesenodeIDswillbeuserIDs,butinagraphstoring“favorite”tweets,thedestinationmaybeatweetID.Eachedgeisalsomarkedwitha64-bitposition,usedforsorting.(Twitterputsatimestamphereforthe“following”graph,sothatyourfollowerlistisdisplayedlatest-first.) Whenanedgeis“deleted”,therowisn’tactuallydeletedfromMySQL;it’sjustmarkedasbeinginthedeletedstate,whichhastheeffectofmovingtheprimarykey(acompoundkeyofthesourceID,state,andposition).Similarly,userswhodeletetheiraccountcanhavetheiredgesputintoanarchivedstate,allowingthemtoberestoredlater(butonlyforalimitedtime,accordingtoourtermsofservice).Wekeeponlyacompoundprimarykeyandasecondaryindexforeachrow,andanswerallqueriesfromasingleindex.ThiskindofschemaoptimizationallowsMySQLtoshineandgivesuspredictableperformance. Acomplexquerylike“What’stheintersectionofpeopleIfollowandpeoplewhoarefollowingPresidentObama?”canbeansweredquicklybydecomposingitintosingle-userqueries(“WhoisfollowingPresidentObama?”).Dataispartitionedbynode,sothesequeriescaneachbeansweredbyasinglepartition,usinganindexedrangequery.Similarly,pagingthroughlongresultsetsisdonebyusingthepositionfieldasacursor,ratherthanusingLIMIT/OFFSET,soanypageofresultsforaqueryisindexedandisequallyfast. Writeoperationsareidempotentandcommutative,basedonthetimetheyenterthesystem.Wecanprocessoperationsoutoforderandendupwiththesameresult,sowecanpaperovertemporarynetworkandhardwarefailures,orevenreplaylostdatafromminutesorhoursago.Thiswasespeciallyhelpfulduringtheinitialroll-out. Commutativewritesalsosimplifytheprocessofbringingupnewpartitions.Anewpartitioncanreceivewritetrafficimmediately,andreceiveadumpofdatafromtheoldpartitionsslowlyinthebackground.Oncethedumpisover,thepartitionisimmediately“live”andreadytoreceivereads. Theappservers(affectionatelycalled“flapps”)arewritteninScala,arestateless,andarehorizontallyscalable.Wecanaddmoreasqueryloadincreases,independentofthedatabases.TheyexposeaverysmallthriftAPItoclients,thoughwe’vewrittenaRubyclientwithamuchricherinterface. WeusetheGizzardlibrarytohandlethepartitioninglayer.AforwardinglayermapsrangesofsourceIDstophysicaldatabases,andreplicationishandledbybuildingatreeofsuchtablesunderthesameforwardingaddress.Writeoperationsareacknowledgedafterbeingjournalledlocally,sothatdisruptionsindatabaseavailabilityorperformancearedecoupledfromwebsiteresponsetimes. Eachedgeisactuallystoredtwice:onceinthe“forward”direction(indexedandpartitionedbythesourceID)andonceinthe“backward”direction(indexedandpartitionedbythedestinationID).Thatwayaquerylike“Whofollowsme?”isjustasefficientas“WhodoIfollow?”,andtheanswertoeachquerycanbefoundentirelyonasinglepartition. Theendresultisaclusterofcommodityserversthatwecanexpandasneeded.Overthewinter,weadded50%databasecapacitywithoutanyonenoticing.Wecurrentlystoreover 13billionedges andsustainpeaktrafficof 20kwrites/second and 100kreads/second . Lessonslearned Somehelpfulpatternsfelloutofourexperience,eventhoughtheyweren’tgoalsoriginally: Useaggressivetimeoutstocutoffthelongtail.Youcan’tevershakeoutalltheunfairnessinthesystem,sosomerequestswilltakeanunreasonablylongtimetofinish—wayoverthe99.9thpercentile.Iftherearemultiplestatelessappservers,youcanjustcutaclientloosewhenithaspasseda“reasonable”amountoftime,andletittryitsluckwithadifferentappserver. Makeeverycaseanerrorcase.Or,toputitanotherway,usethesamecodepathforerrorsasyouuseinnormaloperation.Don’tcreaterarely-testedmodulesthatonlykickinduringemergencies,whenyou’releastlikelytofeelliketryingnewthings.Wequeueallwriteoperationslocally(usingKestrelasalibrary),andanythatfailarethrownintoaseparateerrorqueue.Thiserrorqueueisperiodicallyflushedbackintothewritequeue,sothatretriesusethesamecodepathastheinitialattempt. Donothingautomaticallyatfirst.Providelotsofgaugesandlevers,andautomatewithscriptsoncepatternsemerge.FlockDBmeasuresthelatencydistributionofeachquerytypeacrosseachservice(MySQL,Kestrel,Thrift)sowecantunetimeouts,andreportscountsofeachoperationsowecanseewhenaclientlibrarysuddenlydoublesitsqueryload(orweneedtoaddmorehardware).Writeoperationsthatcyclethroughtheerrorqueuetoomanytimesaredumpedintoalogformanualinspection.Ifitturnsouttobeabug,wecanfixit,andre-injectthejob.Ifit’saclienterror,wehaveagoodbugreport. Checkitout Thesourceisingithub:http://github.com/twitter/flockdb Inparticular,checkoutthedemotogetafeelforthekindofdatathatcanbestoredandwhatyoucandowithit: http://github.com/twitter/flockdb/blob/master/doc/demo.markdown TalktousonIRC,in#twinfra(irc.freenode.net),orjointhemailinglist: http://groups.google.com/group/flockdb —@robey,@nk,@asdf,@jkalucki Share: Linkcopiedsuccessfully Follow Tweets Following Followers ByusingTwitter’sservicesyouagreetoourCookiesUse.Weusecookiesforpurposesincludinganalytics,personalisation,andads. OK

請為這篇文章評分？

延伸文章資訊

Introducing FlockDB - Twitter Blog

FlockDB measures the latency distribution of each query type across each service (MySQL, Kestrel,...

FlockDB in 2022 - Reviews, Features, Pricing, Comparison

FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer ...

FlockDB

FlockDB was a distributed, fault-tolerant graph database. History. Twitter announced in 2017 that...

FlockDB - Wikipedia

FlockDB is an open-source distributed, fault-tolerant graph database for managing wide but shallo...

Flockdb: A Simple Graph Database for Social Media Training ...

FlockDB is an open source distributed, fault-tolerant graph database for managing wide but shallo...

Introducing FlockDB - Twitter Blog

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用