Remove \ufeff - actorsfit
文章推薦指數: 80 %
Remove \ufeff. Language: python. Programming tool: pycharm. Hardware environment: win10 64 bit. A problem was found in the process of reading the file: ... Togglenavigation actorsfit homeHome webOptions contactsContact policyPolicies infoAbout Remove/ufeff Language:python Programmingtool:pycharm Hardwareenvironment:win1064bit Aproblemwasfoundintheprocessofreadingthefile:thereisanotepadfile(notempty),transcodedtoUTF-8,copiedtopycharm,theprintingresultwillappear/ufeffatthebeginning,theprintingcodeisasfollows f=open('new2.txt',encoding='UTF-8')#openfileinUTF-8encoding l=[] forlineinF: l.append(line.strip()) print(l) Theprintresultis: ViewImage Justchangetheencoding,changetheUTF-8encodingtoUTF-8-sig f=open('new2.txt',encoding='UTF-8-sig') l=[] forlineinf: l.append(line.strip()) print(l) Theprintresultis: ViewImage Thedifferencebetweenutf-8andutf-8-sigtwoencodingformats: AsUTF-8isan8-bitencodingnoBOMisrequiredandanyU+FEFFcharacterinthedecodedUnicodestring(evenifit'sthefirstcharacter)istreatedasaZEROWIDTHNO-BREAKSPACE. UTF-8usesbytesastheencodingunit,anditsbyteorderisthesameinallsystems,andthereisnoendiannessproblem,soitdoesnotactuallyneedBOM("ByteOrderMark").ButUTF-8withBOM,utf-8-signeedstoprovideBOM. Someinformationabout/ufeff(quotedfromWikipedia): Thebyte-ordermark(English:byte-ordermark,BOM)islocatedatthecodepointU+FEFFThenameoftheUnicodecharacter.WhenusingUTF-16orUTF-32toencodeastringcomposedofUCS/Unicodecharacters,thischaracterisusedtoindicateitsendianness.ItisoftenusedasasymboltoindicatethatthefileisencodedinUTF-8,UTF-16orUTF-32. IfthecharacterU+FEFFappearsatthebeginningofthebytestream,itisusedtoidentifythebyteorderofthebytestream,whetheritishigh-endorlow-end.Ifitappearsinthemiddleofthebytestream,itexpressesthemeaningofazero-widthnon-newlinespace,andtheuserappearstobeaspace.StartingfromUnicode3.2,U+FEFFItcanonlyappearatthebeginningofthebytestream,andcanonlybeusedtoidentifytheendianness,asitsname-endiannessmark-indicates;otherusagehasbeenabandoned.Instead,useU+2060Toexpresszero-widthunbrokenblanks. InUTF-16,thebyteordermarkisplacedasthefirstcharacterofthefileorstringstreamtomarktheendorderofthecharactercodeinunitsofallsixteenbits(wordSectionorder). Ifthesixteen-bitunitisexpressedasbigendian,thebyteordermarkcharacterwillappearinthesequence0xFE,Followedby0xFF(oneofthem0xUsedtoindicatehexadecimal).Ifthesixteen-bitunituseslittleendian,thebytesequenceis0xFF,Followedby0xFE. IntheUnicode,thevalueisU+FFFEThecodepointofisguaranteednottobedesignatedasaUnicodecharacter.thismeans0xFF,0xFEWillonlybeinterpretedaslittle-endianU+FEFF(BecauseitcannotbeinbigendianU+FFFE). UTF-8hasnoissueofbyteorder.TheUTF-8encodedbyteordermarkisusedtoindicatethatitisaUTF-8file.ItisonlyusedtomarkaUTF-8file,nottoexplainthebyteorder.[1]ManyWindowsprograms(includingNotepad)addbyteordermarkstoUTF-8files.However,inUnix-likesystems(largeuseoftextfiles,fileformats,andinter-processcommunication),thisapproachisnotrecommended.BecauseitwillhinderthecorrectprocessingofsomeimportantcodessuchasShebangatthebeginningoftheinterpreterscript.Itwillalsoaffectprogramminglanguagesthatcannotrecognizeit.Forexample,gccwillreportunrecognizedcharactersatthebeginningofthesourcefile.InPHP,ifoutputbufferingisnotactivated,itwillcausethepagecontenttobegintobesenttothebrowser(ie:theuserheaderfilehasbeensubmitted),whichpreventsthePHPscriptfromspecifyingtheuserheaderfile(HTTPHeader).ThebyteordermarkisrepresentedasasequenceinUTF-8EFBBBF,FormosttexteditorsandwebbrowsersthatarenotreadytohandleUTF-8,itwillbedisplayedinanISO-8859-1environment. AlthoughthebyteordermarkcanalsobeusedinUTF-32,thisencodingisrarelyusedfortransmission,anditsrulesaresimilartoUTF-16.ForthecharactersetsUTF-16BE,UTF-16LE,UTF-32BE,andUTF-32LEthathavebeenregisteredwithIANA,thebyteordermarkcannotbeused.TheU+FEFFatthebeginningofthedocumentwillbeinterpretedasa(discarded)"zero-widthunbrokenwhitespace"becausethenamesofthesecharactersetshavedeterminedtheirbyteorder.FortheregisteredcharactersetsUTF-16andUTF-32,aU+FEFFatthebeginningisusedtoindicatethebyteorder. ViewImage Reprintedat:https://www.cnblogs.com/chongzi1990/p/8694883.html
延伸文章資訊
- 1Remove \ufeff from a string in Python | bobbyhadz
Use the str.replace() method to remove \ufeff BOM character from a string, e.g. result = my_str.r...
- 2Remove \ufeff - actorsfit
Remove \ufeff. Language: python. Programming tool: pycharm. Hardware environment: win10 64 bit. A...
- 3linux文件格式转换:<U+FEFF> character showing up ... - 博客园
You can easily remove them using vim, here are the steps: 1) In your terminal, open the file usin...
- 4A quick tale about FEFF, an invisible UTF-8 character that wrecked our ...
- 5<U+FEFF> character showing up in files. How to remove them?
1) In your terminal, open the file using vim: vim file_name. 2) Remove all BOM characters: :set n...