A quick tale about FEFF, an invisible UTF-8 character that ...

文章推薦指數: 80 %
投票人數:10人

Our friend FEFF means different things, but it's basically a signal for a program on how to read the text. It can be UTF-8 (more common), UTF-16 ... Search Submityoursearchquery Forum Donate EumirGaspar EumirGaspar Today,weencounteredanerrorwhiletryingtocreatesomedatabaseseedsfromaCSV.ThisCSVwasoriginallygeneratedbymeusingaRubyscriptwhichpipedtheoutputtoafileandsavedasaCSV.TheCSVwascheckedintoGitandhadbeenusedforawhileuntilwehadtoupdatesomepartsofitbyaddinganewcolumnandfixingsomevalues.Whilewedon’tknowtheexactreasonyet,mytheoryisthatsomehow,ExcelforMac(weareallusingMacs)addedsomeadditionalmetadatatoitevenaftersavingthefileasaCSV.Thisinturnmadeanyoneusingtheseedreceivethefollowingerror:CSV::MalformedCSVError:Illegalquotinginline1.IopenedtheCSVfileandnothinglookedsuspicious.Myfirstthoughtwassomeleft/rightquotationmarksweresomehowmixedintothefileinsteadofjustthe‘normal’doublequotes:".Butuponfurtherinvestigation,therewasnothingoutoftheordinary.Thisledmetojustwipeoutthewholefile,andactuallytypeoutthefirstrowagain.Isavedthatfileagainandranthemigration:CSV::MalformedCSVError:Illegalquotinginline1.What?!Okay,thiswasdrivingmenuts.Iopenedupanewfile,typedtheexactsinglelineagain,andranthemigration.Itworked.Sowhatwasinthatfile?!Onlyonewaytofindout:catcompanies.csv|pbcopy|pbpaste>temp.csv rmcompanies.csv mvtemp.csvcompanies.csv gitdiffSoOSXhasthesetwofunctionsthatareveryuseful:pbcopyandpbpaste.Basicallyanythingpipedtopbcopygetsintoyourclipboardandpbpasteputswhatyouhaveonyourclipboardtostandardoutput(stdout).Butitremovesallformatting.VeryusefulwhenyouwanttojustcopysometextfromsomewhereandyouwanttopasteitintoaWYSIWYGeditorwithoutalltheformatting.LikewhenwritinganemailfromGmail,forexample.Ithenremovedtheoriginalfileandsavedthenew‘unformatted’filewiththesamefilenamesoIcouldseethedifference.Andwefinallysawtheinvisibleman:TheinvisiblecharactershowinginAtlassian’sBitbucket.Theinvisiblecharacter’sactualname!AquickGooglesearchtoldusthatourfriendU+FEFFwascalledaZEROWIDTHNO-BREAKSPACE.Also,aquicktriptoWikipediatoldusabouttheactualusesforU+FEFF,morecommonlyknownasByteordermarkorBOM.OurfriendFEFFmeansdifferentthings,butit’sbasicallyasignalforaprogramonhowtoreadthetext.ItcanbeUTF-8(morecommon),UTF-16,orevenUTF-32.FEFFitselfisforUTF-16—inUTF-8itismorecommonlyknownas0xEF,0xBB,or0xBF.Frommyunderstanding,whentheCSVfilewasopenedinExcelandsaved,Excelcreatedaspaceforourinvisiblestowaway,U+FEFF.Andinfrontofthefiletoboot!Exceldidsomemagic,anditwasprobablysavedinUTF-16insteadofUTF-8.UTF-8doesnotunderstandBOMandjusttreatsitasanon-charactersovisually,thefilewasokay.ButRuby’sCSVthoughtthattherewassomethingwrongbecauseitassumedthefileitwasreadingwasUTF-8anditcouldn’tignoreMr.U+FEFF.Solessonlearned:don’topen(andsave!)aCSVfileinExcelifyouwanttofeedittoRuby’sCSVparser.Ifyoudoeverencounteranerrorlikethat,besuretolookforhiddencharactersnotshownbyyoureditor.Ifyoustillcan’tseeitandareusingOSX,thenpbcopyandpbpastewillhelpyouout—theystripoutanyformattingorhiddencharactersfromtextinadditiontocopyingandpastingit. ADVERTISEMENT ADVERTISEMENT ADVERTISEMENT EumirGaspar EumirGaspar Cryptoenthusiast.Rubydeveloperbyday,CTO/Elixirdeveloperatnight.SASSloverallday,everyday. Ifyoureadthisfar,tweettotheauthortoshowthemyoucare.Tweetathanks Learntocodeforfree.freeCodeCamp'sopensourcecurriculumhashelpedmorethan40,000peoplegetjobsasdevelopers.Getstarted ADVERTISEMENT



請為這篇文章評分?