Introducing FlockDB - Twitter Blog
文章推薦指數: 80 %
FlockDB measures the latency distribution of each query type across each service (MySQL, Kestrel, Thrift) so we can tune timeouts, and reports ... IntroducingFlockDB By Monday,3May2010 Linkcopiedsuccessfully Twitterstoresmanygraphsofrelationshipsbetweenpeople:whoyou’refollowing,who’sfollowingyou,whoyoureceivephonenotificationsfrom,andsoon. Someofthefeaturesofthesegraphshavebeenchallengingtostoreinscalablewaysaswe’vegrown.Forexample,insteadofrequiringeachfriendshiptoberequestedandconfirmed,youcanbuildone-wayrelationshipsbyjustfollowingotherpeople.There’salsonolimittohowmanypeopleareallowedtofollowyou,sosomepeoplehavemillionsoffollowers(like@aplusk),whileothershaveonlyafew. Todeliveratweet,weneedtobeabletolookupsomeone’sfollowersandpagethroughthemrapidly.Butwealsoneedtohandleheavywritetraffic,asfollowersareaddedorremoved,orspammersarecaughtandputonice.Andforsomeoperations,likedeliveringa@mention,weneedtodosetarithmeticlike“who’sfollowingbothoftheseusers?”Thesefeaturesaredifficulttoimplementinatraditionalrelationaldatabase. Avalianteffort Wewentthroughseveralstoragelayersintheearlydays,includingabusiveuseofrelationaltablesandkey-valuestorageofdenormalizedlists.Theywereeithergoodathandlingwriteoperationsorgoodatpagingthroughgiantresultsets,butnevergoodatboth. Alittleoverayearago,wecouldseethatweneededtotrysomethingnew.Ourgoalswere: Writethesimplestpossiblethingthatcouldwork. Useoff-the-shelfMySQLasthestorageengine,becauseweunderstanditsbehavior—innormaluseaswellasunderextremeloadandunusualfailureconditions.Giveitenoughmemorytokeepeverythingincache. Allowforhorizontalpartitioningsowecanaddmoredatabasehardwareasthecorpusgrows. Allowwriteoperationstoarriveoutoforderorbeprocessedmorethanonce.(Allowfailurestoresultinredundantworkratherthanlostwork.) FlockDBwastheresult.Wefinishedmigratingtoitabout9monthsagoandneverlookedback. Avaliant-ereffort FlockDBisadatabasethatstoresgraphdata,butitisn’tadatabaseoptimizedforgraph-traversaloperations.Instead,it’soptimizedforverylargeadjacency lists,fastreadsandwrites,andpage-ablesetarithmeticqueries. Itstoresgraphsassetsofedgesbetweennodesidentifiedby64-bitintegers.Forasocialgraph,thesenodeIDswillbeuserIDs,butinagraphstoring“favorite”tweets,thedestinationmaybeatweetID.Eachedgeisalsomarkedwitha64-bitposition,usedforsorting.(Twitterputsatimestamphereforthe“following”graph,sothatyourfollowerlistisdisplayedlatest-first.) Whenanedgeis“deleted”,therowisn’tactuallydeletedfromMySQL;it’sjustmarkedasbeinginthedeletedstate,whichhastheeffectofmovingtheprimarykey(acompoundkeyofthesourceID,state,andposition).Similarly,userswhodeletetheiraccountcanhavetheiredgesputintoanarchivedstate,allowingthemtoberestoredlater(butonlyforalimitedtime,accordingtoourtermsofservice).Wekeeponlyacompoundprimarykeyandasecondaryindexforeachrow,andanswerallqueriesfromasingleindex.ThiskindofschemaoptimizationallowsMySQLtoshineandgivesuspredictableperformance. Acomplexquerylike“What’stheintersectionofpeopleIfollowandpeoplewhoarefollowingPresidentObama?”canbeansweredquicklybydecomposingitintosingle-userqueries(“WhoisfollowingPresidentObama?”).Dataispartitionedbynode,sothesequeriescaneachbeansweredbyasinglepartition,usinganindexedrangequery.Similarly,pagingthroughlongresultsetsisdonebyusingthepositionfieldasacursor,ratherthanusingLIMIT/OFFSET,soanypageofresultsforaqueryisindexedandisequallyfast. Writeoperationsareidempotentandcommutative,basedonthetimetheyenterthesystem.Wecanprocessoperationsoutoforderandendupwiththesameresult,sowecanpaperovertemporarynetworkandhardwarefailures,orevenreplaylostdatafromminutesorhoursago.Thiswasespeciallyhelpfulduringtheinitialroll-out. Commutativewritesalsosimplifytheprocessofbringingupnewpartitions.Anewpartitioncanreceivewritetrafficimmediately,andreceiveadumpofdatafromtheoldpartitionsslowlyinthebackground.Oncethedumpisover,thepartitionisimmediately“live”andreadytoreceivereads. Theappservers(affectionatelycalled“flapps”)arewritteninScala,arestateless,andarehorizontallyscalable.Wecanaddmoreasqueryloadincreases,independentofthedatabases.TheyexposeaverysmallthriftAPItoclients,thoughwe’vewrittenaRubyclientwithamuchricherinterface. WeusetheGizzardlibrarytohandlethepartitioninglayer.AforwardinglayermapsrangesofsourceIDstophysicaldatabases,andreplicationishandledbybuildingatreeofsuchtablesunderthesameforwardingaddress.Writeoperationsareacknowledgedafterbeingjournalledlocally,sothatdisruptionsindatabaseavailabilityorperformancearedecoupledfromwebsiteresponsetimes. Eachedgeisactuallystoredtwice:onceinthe“forward”direction(indexedandpartitionedbythesourceID)andonceinthe“backward”direction(indexedandpartitionedbythedestinationID).Thatwayaquerylike“Whofollowsme?”isjustasefficientas“WhodoIfollow?”,andtheanswertoeachquerycanbefoundentirelyonasinglepartition. Theendresultisaclusterofcommodityserversthatwecanexpandasneeded.Overthewinter,weadded50%databasecapacitywithoutanyonenoticing.Wecurrentlystoreover 13billionedges andsustainpeaktrafficof 20kwrites/second and 100kreads/second . Lessonslearned Somehelpfulpatternsfelloutofourexperience,eventhoughtheyweren’tgoalsoriginally: Useaggressivetimeoutstocutoffthelongtail.Youcan’tevershakeoutalltheunfairnessinthesystem,sosomerequestswilltakeanunreasonablylongtimetofinish—wayoverthe99.9thpercentile.Iftherearemultiplestatelessappservers,youcanjustcutaclientloosewhenithaspasseda“reasonable”amountoftime,andletittryitsluckwithadifferentappserver. Makeeverycaseanerrorcase.Or,toputitanotherway,usethesamecodepathforerrorsasyouuseinnormaloperation.Don’tcreaterarely-testedmodulesthatonlykickinduringemergencies,whenyou’releastlikelytofeelliketryingnewthings.Wequeueallwriteoperationslocally(usingKestrelasalibrary),andanythatfailarethrownintoaseparateerrorqueue.Thiserrorqueueisperiodicallyflushedbackintothewritequeue,sothatretriesusethesamecodepathastheinitialattempt. Donothingautomaticallyatfirst.Providelotsofgaugesandlevers,andautomatewithscriptsoncepatternsemerge.FlockDBmeasuresthelatencydistributionofeachquerytypeacrosseachservice(MySQL,Kestrel,Thrift)sowecantunetimeouts,andreportscountsofeachoperationsowecanseewhenaclientlibrarysuddenlydoublesitsqueryload(orweneedtoaddmorehardware).Writeoperationsthatcyclethroughtheerrorqueuetoomanytimesaredumpedintoalogformanualinspection.Ifitturnsouttobeabug,wecanfixit,andre-injectthejob.Ifit’saclienterror,wehaveagoodbugreport. Checkitout Thesourceisingithub:http://github.com/twitter/flockdb Inparticular,checkoutthedemotogetafeelforthekindofdatathatcanbestoredandwhatyoucandowithit: http://github.com/twitter/flockdb/blob/master/doc/demo.markdown TalktousonIRC,in#twinfra(irc.freenode.net),orjointhemailinglist: http://groups.google.com/group/flockdb —@robey,@nk,@asdf,@jkalucki Share: Linkcopiedsuccessfully Follow Tweets Following Followers ByusingTwitter’sservicesyouagreetoourCookiesUse.Weusecookiesforpurposesincludinganalytics,personalisation,andads. OK
延伸文章資訊
- 1Introducing FlockDB - Twitter Blog
FlockDB measures the latency distribution of each query type across each service (MySQL, Kestrel,...
- 2FlockDB in 2022 - Reviews, Features, Pricing, Comparison
FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer ...
- 3FlockDB
FlockDB was a distributed, fault-tolerant graph database. History. Twitter announced in 2017 that...
- 4FlockDB - Wikipedia
FlockDB is an open-source distributed, fault-tolerant graph database for managing wide but shallo...
- 5Flockdb: A Simple Graph Database for Social Media Training ...
FlockDB is an open source distributed, fault-tolerant graph database for managing wide but shallo...