Remove \ufeff - actorsfit

文章推薦指數: 80 %
投票人數:10人

Remove \ufeff. Language: python. Programming tool: pycharm. Hardware environment: win10 64 bit. A problem was found in the process of reading the file: ... Togglenavigation actorsfit homeHome webOptions contactsContact policyPolicies infoAbout Remove/ufeff Language:python Programmingtool:pycharm Hardwareenvironment:win1064bit Aproblemwasfoundintheprocessofreadingthefile:thereisanotepadfile(notempty),transcodedtoUTF-8,copiedtopycharm,theprintingresultwillappear/ufeffatthebeginning,theprintingcodeisasfollows f=open('new2.txt',encoding='UTF-8')#openfileinUTF-8encoding l=[] forlineinF: l.append(line.strip()) print(l)  Theprintresultis: ViewImage  Justchangetheencoding,changetheUTF-8encodingtoUTF-8-sig f=open('new2.txt',encoding='UTF-8-sig') l=[] forlineinf: l.append(line.strip()) print(l) Theprintresultis: ViewImage    Thedifferencebetweenutf-8andutf-8-sigtwoencodingformats: AsUTF-8isan8-bitencodingnoBOMisrequiredandanyU+FEFFcharacterinthedecodedUnicodestring(evenifit'sthefirstcharacter)istreatedasaZEROWIDTHNO-BREAKSPACE. UTF-8usesbytesastheencodingunit,anditsbyteorderisthesameinallsystems,andthereisnoendiannessproblem,soitdoesnotactuallyneedBOM("ByteOrderMark").ButUTF-8withBOM,utf-8-signeedstoprovideBOM. Someinformationabout/ufeff(quotedfromWikipedia): Thebyte-ordermark(English:byte-ordermark,BOM)islocatedatthecodepointU+FEFFThenameoftheUnicodecharacter.WhenusingUTF-16orUTF-32toencodeastringcomposedofUCS/Unicodecharacters,thischaracterisusedtoindicateitsendianness.ItisoftenusedasasymboltoindicatethatthefileisencodedinUTF-8,UTF-16orUTF-32.   IfthecharacterU+FEFFappearsatthebeginningofthebytestream,itisusedtoidentifythebyteorderofthebytestream,whetheritishigh-endorlow-end.Ifitappearsinthemiddleofthebytestream,itexpressesthemeaningofazero-widthnon-newlinespace,andtheuserappearstobeaspace.StartingfromUnicode3.2,U+FEFFItcanonlyappearatthebeginningofthebytestream,andcanonlybeusedtoidentifytheendianness,asitsname-endiannessmark-indicates;otherusagehasbeenabandoned.Instead,useU+2060Toexpresszero-widthunbrokenblanks. InUTF-16,thebyteordermarkisplacedasthefirstcharacterofthefileorstringstreamtomarktheendorderofthecharactercodeinunitsofallsixteenbits(wordSectionorder). Ifthesixteen-bitunitisexpressedasbigendian,thebyteordermarkcharacterwillappearinthesequence0xFE,Followedby0xFF(oneofthem0xUsedtoindicatehexadecimal).Ifthesixteen-bitunituseslittleendian,thebytesequenceis0xFF,Followedby0xFE. IntheUnicode,thevalueisU+FFFEThecodepointofisguaranteednottobedesignatedasaUnicodecharacter.thismeans0xFF,0xFEWillonlybeinterpretedaslittle-endianU+FEFF(BecauseitcannotbeinbigendianU+FFFE). UTF-8hasnoissueofbyteorder.TheUTF-8encodedbyteordermarkisusedtoindicatethatitisaUTF-8file.ItisonlyusedtomarkaUTF-8file,nottoexplainthebyteorder.[1]ManyWindowsprograms(includingNotepad)addbyteordermarkstoUTF-8files.However,inUnix-likesystems(largeuseoftextfiles,fileformats,andinter-processcommunication),thisapproachisnotrecommended.BecauseitwillhinderthecorrectprocessingofsomeimportantcodessuchasShebangatthebeginningoftheinterpreterscript.Itwillalsoaffectprogramminglanguages​​thatcannotrecognizeit.Forexample,gccwillreportunrecognizedcharactersatthebeginningofthesourcefile.InPHP,ifoutputbufferingisnotactivated,itwillcausethepagecontenttobegintobesenttothebrowser(ie:theuserheaderfilehasbeensubmitted),whichpreventsthePHPscriptfromspecifyingtheuserheaderfile(HTTPHeader).ThebyteordermarkisrepresentedasasequenceinUTF-8EFBBBF,FormosttexteditorsandwebbrowsersthatarenotreadytohandleUTF-8,itwillbedisplayedinanISO-8859-1environment. AlthoughthebyteordermarkcanalsobeusedinUTF-32,thisencodingisrarelyusedfortransmission,anditsrulesaresimilartoUTF-16.ForthecharactersetsUTF-16BE,UTF-16LE,UTF-32BE,andUTF-32LEthathavebeenregisteredwithIANA,thebyteordermarkcannotbeused.TheU+FEFFatthebeginningofthedocumentwillbeinterpretedasa(discarded)"zero-widthunbrokenwhitespace"becausethenamesofthesecharactersetshavedeterminedtheirbyteorder.FortheregisteredcharactersetsUTF-16andUTF-32,aU+FEFFatthebeginningisusedtoindicatethebyteorder. ViewImage   Reprintedat:https://www.cnblogs.com/chongzi1990/p/8694883.html



請為這篇文章評分?