A quick tale about FEFF, an invisible UTF-8 character that ...
文章推薦指數: 80 %
Our friend FEFF means different things, but it's basically a signal for a program on how to read the text. It can be UTF-8 (more common), UTF-16 ... Search Submityoursearchquery Forum Donate EumirGaspar EumirGaspar Today,weencounteredanerrorwhiletryingtocreatesomedatabaseseedsfromaCSV.ThisCSVwasoriginallygeneratedbymeusingaRubyscriptwhichpipedtheoutputtoafileandsavedasaCSV.TheCSVwascheckedintoGitandhadbeenusedforawhileuntilwehadtoupdatesomepartsofitbyaddinganewcolumnandfixingsomevalues.Whilewedon’tknowtheexactreasonyet,mytheoryisthatsomehow,ExcelforMac(weareallusingMacs)addedsomeadditionalmetadatatoitevenaftersavingthefileasaCSV.Thisinturnmadeanyoneusingtheseedreceivethefollowingerror:CSV::MalformedCSVError:Illegalquotinginline1.IopenedtheCSVfileandnothinglookedsuspicious.Myfirstthoughtwassomeleft/rightquotationmarksweresomehowmixedintothefileinsteadofjustthe‘normal’doublequotes:".Butuponfurtherinvestigation,therewasnothingoutoftheordinary.Thisledmetojustwipeoutthewholefile,andactuallytypeoutthefirstrowagain.Isavedthatfileagainandranthemigration:CSV::MalformedCSVError:Illegalquotinginline1.What?!Okay,thiswasdrivingmenuts.Iopenedupanewfile,typedtheexactsinglelineagain,andranthemigration.Itworked.Sowhatwasinthatfile?!Onlyonewaytofindout:catcompanies.csv|pbcopy|pbpaste>temp.csv rmcompanies.csv mvtemp.csvcompanies.csv gitdiffSoOSXhasthesetwofunctionsthatareveryuseful:pbcopyandpbpaste.Basicallyanythingpipedtopbcopygetsintoyourclipboardandpbpasteputswhatyouhaveonyourclipboardtostandardoutput(stdout).Butitremovesallformatting.VeryusefulwhenyouwanttojustcopysometextfromsomewhereandyouwanttopasteitintoaWYSIWYGeditorwithoutalltheformatting.LikewhenwritinganemailfromGmail,forexample.Ithenremovedtheoriginalfileandsavedthenew‘unformatted’filewiththesamefilenamesoIcouldseethedifference.Andwefinallysawtheinvisibleman:TheinvisiblecharactershowinginAtlassian’sBitbucket.Theinvisiblecharacter’sactualname!AquickGooglesearchtoldusthatourfriendU+FEFFwascalledaZEROWIDTHNO-BREAKSPACE.Also,aquicktriptoWikipediatoldusabouttheactualusesforU+FEFF,morecommonlyknownasByteordermarkorBOM.OurfriendFEFFmeansdifferentthings,butit’sbasicallyasignalforaprogramonhowtoreadthetext.ItcanbeUTF-8(morecommon),UTF-16,orevenUTF-32.FEFFitselfisforUTF-16—inUTF-8itismorecommonlyknownas0xEF,0xBB,or0xBF.Frommyunderstanding,whentheCSVfilewasopenedinExcelandsaved,Excelcreatedaspaceforourinvisiblestowaway,U+FEFF.Andinfrontofthefiletoboot!Exceldidsomemagic,anditwasprobablysavedinUTF-16insteadofUTF-8.UTF-8doesnotunderstandBOMandjusttreatsitasanon-charactersovisually,thefilewasokay.ButRuby’sCSVthoughtthattherewassomethingwrongbecauseitassumedthefileitwasreadingwasUTF-8anditcouldn’tignoreMr.U+FEFF.Solessonlearned:don’topen(andsave!)aCSVfileinExcelifyouwanttofeedittoRuby’sCSVparser.Ifyoudoeverencounteranerrorlikethat,besuretolookforhiddencharactersnotshownbyyoureditor.Ifyoustillcan’tseeitandareusingOSX,thenpbcopyandpbpastewillhelpyouout—theystripoutanyformattingorhiddencharactersfromtextinadditiontocopyingandpastingit. ADVERTISEMENT ADVERTISEMENT ADVERTISEMENT EumirGaspar EumirGaspar Cryptoenthusiast.Rubydeveloperbyday,CTO/Elixirdeveloperatnight.SASSloverallday,everyday. Ifyoureadthisfar,tweettotheauthortoshowthemyoucare.Tweetathanks Learntocodeforfree.freeCodeCamp'sopensourcecurriculumhashelpedmorethan40,000peoplegetjobsasdevelopers.Getstarted ADVERTISEMENT
延伸文章資訊
- 1Linux下文件开头的feff的问题 - CSDN博客
但是发现每个文件第一行都会有“<feff>”这个字符串。google了下发现问题的所在了。 原来这是个被称作BOM(Byte Order Mark)的不可见字符,是Unicode用来 ...
- 2The FEFF9 code - The FEFF Project - University of Washington
FEFF is an automated program for ab initio multiple scattering calculations of X-ray Absorption F...
- 3The FEFF Project
The FEFF Project at the University of Washington specializes in theoretical methods for spectrosc...
- 4去除文件中<feff> - 腾讯云开发者社区
最近生成的文件中出现了<feff>乱码,而且单个文件中出现不止一次,在vim中打该文件显示<feff>,而在idea中则显示一个“-”,对数据处理造成了极大的 ...
- 5A quick tale about FEFF, an invisible UTF-8 character that ...
Our friend FEFF means different things, but it's basically a signal for a program on how to read ...