Oddly with vim 8 on a mac, I have a csv utf-8 file made by Excel and it starts with , yet :set nobomb doesn't modify or remove it. – ...
Unix&LinuxStackExchangeisaquestionandanswersiteforusersofLinux,FreeBSDandotherUn*x-likeoperatingsystems.Itonlytakesaminutetosignup.
Signuptojointhiscommunity
Anybodycanaskaquestion
Anybodycananswer
Thebestanswersarevotedupandrisetothetop
Home
Public
Questions
Tags
Users
Companies
Unanswered
Teams
StackOverflowforTeams
–Startcollaboratingandsharingorganizationalknowledge.
CreateafreeTeam
WhyTeams?
Teams
CreatefreeTeam
Teams
Q&Aforwork
Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch.
LearnmoreaboutTeams
HowcanIremovetheBOMfromaUTF-8file?
AskQuestion
Asked
5years,2monthsago
Modified
6monthsago
Viewed
166ktimes
139
IhaveafileinUTF-8encodingwithBOMandwanttoremovetheBOM.Arethereanylinuxcommand-linetoolstoremovetheBOMfromthefile?
$filetest.xml
test.xml:XML1.0document,UTF-8Unicode(withBOM)text,withverylonglines
command-linefilesunicode
Share
Improvethisquestion
Follow
editedJul23,2017at10:06
MichaelHomer
71.8k1616goldbadges203203silverbadges227227bronzebadges
askedJul23,2017at10:05
m13rm13r
2,47722goldbadges1616silverbadges1414bronzebadges
4
Similar:AWKwithBOM:IsthereanycoolwaytohandleUnicodeBOMwithregexp?
– StéphaneChazelas
Jul23,2017at10:40
1
I'vemadeafarilysimpletooltodojustthatafewmonthsago:oskog97.com/read/?path=/small-scripts/killbom&referer=/…Mightbeworthinstallingsomethinglikeitin/usr/local/binifyouhavemanyUTF-8encodedfileswithBOMs.
– OskarSkog
Jul23,2017at11:24
Weirdly,cross-postedatstackoverflow.com/questions/45240387/…
– tripleee
Jan12,2021at7:27
InUTF8,U+FEFFisencodedas3bytes:EFBBBF,onethingyoucoulddoiscombinexxdandxxd-rtochangethosefirstthreebytestosomethingwithinprintableasciirange,like414141,sothat"AAA"willappearintheBOM'splace,whichyoucanthensimplydeleteandsavewitharegulartexteditor.Bitofaroundaboutwaybutitworks.
– BradenBest
Aug11,2021at23:29
Addacomment
|
10Answers
10
Sortedby:
Resettodefault
Highestscore(default)
Datemodified(newestfirst)
Datecreated(oldestfirst)
140
Ifyou'renotsureifthefilecontainsaUTF-8BOM,thenthis(assumingtheGNUimplementationofsed)willremovetheBOMifitexists,ormakenochangesifitdoesn't.
sed'1s/^\xEF\xBB\xBF//'new.txt
Youcanalsooverwritetheexistingfilewiththe-ioption:
sed-i'1s/^\xEF\xBB\xBF//'orig.txt
IfyouareusingtheBSDversionofsed(egmacOS)thenyouneedtohavebashdotheescaping:
sed$'1s/\xef\xbb\xbf//'new.txt
Share
Improvethisanswer
Follow
editedMay28,2020at13:05
MatthewBuckett
15311silverbadge55bronzebadges
answeredJul23,2017at14:08
CSMCSM
1,92011goldbadge99silverbadges77bronzebadges
11
4
thismaynotworkinautf8locale,butprependingalocaleoverridetocorposixwillalwayswork.
– hildred
Jul23,2017at15:29
3
@hildredI'vetesteditwiththeen_US.UTF-8localeanditworked.Whenwillitfail?
– m13r
Jul24,2017at6:55
2
@m13r,Itdependsontheversionofsedandcompileoptions.InthefailurecaseaverynewversionofsedwithUnicodecharacterclasseswillbringthethreebytesequenceinasasinglecharacterwhichdoesnotmatchthethreecharactersequence.Howeverinsuchcaseyoucandoasixteenbitcharactermatch.Howeverthisisanewfeatureandnotuniversallypresent.IfyouwanttotestIrecommendcompilingthelatestversion.
– hildred
Jul24,2017at16:25
4
Tofixittoworkwithaunicode-enabledseddoLC_ALL=Csed'1s/^\xEF\xBB\xBF//'
– Joshua
Jul24,2017at17:41
2
@mazunki,1s/meansonlysearchthefirstline;otherlinesareunaffected.The^meansonlymatchatthestartofthe(first)line.\xEF\xBB\xBFistheUTF-8BOM(escapedhexstring).//meansreplacewithnothing.Icouldhaveadded1totheend(for1s/^xEF\xBB\xBF//1),whichwouldmeanonlymatchthefirstoccurrenceofthepatternontheline.Butasthethesearchisanchoredwith^,thiswon'tmakeanydifference.Ifthefiledoesn'thavetheBOMatthestartofthefirstline,thepatternwon'tmatch,andthusnochangeismade.
– CSM
Oct27,2019at18:47
|
Show6morecomments
117
ABOMdoesn'tmakesenseinUTF-8.ThosearegenerallyaddedbymistakebybogussoftwareonMicrosoftOSes.
dos2unixwillremoveitandalsotakecareofotheridiosyncrasiesofWindowstextfiles.
dos2unixtest.xml
Share
Improvethisanswer
Follow
answeredJul23,2017at10:42
StéphaneChazelasStéphaneChazelas
484k8989goldbadges948948silverbadges14041404bronzebadges
12
22
IagreethataUTF-8encodedBOMdoesnotmakesense,butbelieveitornot,therearelotsofpeoplewhothinkitisagreatideathathelpsdifferentiateUTF-8fromother8-bitencodings.Soitisamatteroftaste.WindowsNotepadaddsaBOMonpurpose.
– JohanMyréen
Jul23,2017at14:02
24
Whatdoesitmatterifitmakessenseornot,whenthecontextisjustaquestiononhowtoremoveit?AccordingtoWikipedia,NotepadrequirestheBOMtorecognizeafileasUTF-8,andGoogleDocsalsoaddsitwhileexportingafileastext.Idoubttheyalldoitbymistake.
– ilkkachu
Jul23,2017at14:09
3
IsthereawayofnotconvertingthelineendingsandjustremovetheBOMwithdos2unix?
– m13r
Jul25,2017at7:55
3
@m13rThenusethesedscriptinthisanswer.Thatwillremoveonlythebom(ifitexist),nothingelsewillbechanged.
– user232326
Jul26,2017at5:51
5
@JohanMyréenyes,butitisnotcorrectcallingthemUTF-8.TheyarenotUTF-8files.TheyareUTF-8-with-BOMfiles,whichisanotherfileformat.IsupposethoseWindowsfreakswon'tbehappygettingODTfilescallledMSOfficefiles:)
– 9ilsdx9rvj0lo
Nov9,2018at9:47
|
Show7morecomments
84
UsingVIM
OpenfileinVIM:
vitext.xml
RemoveBOMencoding:
:setnobomb
Saveandquit:
:wq
Foranon-interactivesolution,trythefollowingcommandline:
vi-c":setnobomb"-c":wq"text.xml
ThatshouldremovetheBOM,savethefileandquit,allfromthecommandline.
Share
Improvethisanswer
Follow
editedAug4,2020at20:54
answeredDec24,2017at18:05
JoshuaPinterJoshuaPinter
1,05099silverbadges99bronzebadges
3
1
Oddlywithvim8onamac,Ihaveacsvutf-8filemadebyExcelanditstartswith,yet:setnobombdoesn'tmodifyorremoveit.
– dlamblin
Oct9,2019at21:11
1
Thisismuchfasterthantailonlargefiles.
– user239558
Dec2,2019at20:14
Formultiplefiles:vim-c":bufdosetnobomb|update"-c"q"*
– DennisWilliamson
Sep7,2021at13:41
Addacomment
|
33
ItispossibletoremovetheBOMfromafilewiththetailcommand:
tail-c+4withBOM.txt>withoutBOM.txt
Beawarethatthischopsthefirst4bytesfromthefile,sobesurethatthefilereallycontainstheBOMbeforerunningtail.
Share
Improvethisanswer
Follow
editedOct13,2021at14:30
answeredJul23,2017at10:05
m13rm13r
2,47722goldbadges1616silverbadges1414bronzebadges
8
2
Why4?TheBOMhas3byte.
– deviantfan
Jul23,2017at17:12
12
@deviantfanWhichiswhyyouneedtostartatthe4thbyteifyouwanttoskipit.
– StéphaneChazelas
Jul23,2017at18:33
13
tailisusing1basedindexing?!WTF!
– CodesInChaos
Jul23,2017at19:31
6
@CodesInChaos,tail-c-1ortail-c1(whattailisgenerallyusedfor)isthecontentstartingwiththelastbyte,tail-c+1startingwiththefirstbyte.tail-c0/tail-c+0forthatwouldbealotmoreunintuitive.
– StéphaneChazelas
Jul23,2017at23:05
2
@deviantfan:(ddbs=1count=3of=/dev/null;cat)output.OrwithGNU(head-c3>/dev/null;cat)--eveninUTF8orothernon-singlebytelocale;GNUheaddoes'char'=byte.
– dave_thompson_085
Jul24,2017at6:16
|
Show3morecomments
7
Youcanuse
LANG=CLC_ALL=Csed-e's/\r$//;1s/^\xef\xbb\xbf//'-i--filename
toremovethebyteordermarkfromthebeginningofthefile,ifithasany,aswellasconvertanyCRLFnewlinestoLFonly.TheLANG=CLC_ALL=CtellstheshellyouwantthecommandtoruninthedefaultClocale(alsoknownasthedefaultPOSIXlocale),wherethethreebytesformingtheByteOrderMarkaretreatedasbytes.The-ioptiontosedmeansin-place.Ifyouuse-i.old,thensedsavestheoriginalfileasfilename.old,andthenewfile(withthemodifications,ifany)asfilename.
Ipersonallyliketohavethisas~/bin/fix-ms;forexample,as
#!/bin/dash
exportLANG=CLC_ALL=C
if[$#-gt0];then
forFILEin"$@";do
sed-e's/\r$//;1s/^\xef\xbb\xbf//'-i--"$FILE"||exit1
done
else
execsed-e's/\r$//;1s/^\xef\xbb\xbf//'
fi
sothatifIneedtoapplythistosayallCsourcefilesandheaders(myoldcodefromtheMS-DOSera,forexample!),Ijustrun
find.-name'*.[CHch]'-print0|xargs-r0~/bin/ms-fix
or,ifIjustwanttolookatsuchafile,withoutmodifyingit,Icanrun
~/bin/ms-fixinmyUTF-8terminal.
Share
Improvethisanswer
Follow
editedJul24,2017at14:25
answeredJul23,2017at19:10
NominalAnimalNominalAnimal
3,0551414silverbadges1313bronzebadges
3
1
Whynotsimplysed-e's/\r$//;1s/^\xef\xbb\xbf//'-i--"$@"?
– StéphaneChazelas
Jul24,2017at14:02
@StéphaneChazelas:BecauseIwantthescripttoexitimmediatelyifthereisanissuewithareplacement,whichsed-e's/\r$//;1s/^\xef\xbb\xbf//'-i--"$@"doesnotdo;itdoesreturnanexitcode,butitprocessesallfileslistedintheargumentlistbeforeexiting.
– NominalAnimal
Jul24,2017at14:24
@StéphaneChazelas:The--beforethefilename(s)is,ofcourse,important:withoutit,filenamesbeginningwithadashmaybeconsideredoptionsbysed.Ieditedthoseintomyanswer;thankyouforthereminder!
– NominalAnimal
Jul24,2017at14:27
Addacomment
|
7
Iuseavimone-linerontheregularforthis:
vim--clean-c'senobomb|wq'filename
vim--clean-c'bufdosenobomb|wqa'filename1filename2...
Share
Improvethisanswer
Follow
answeredJan23,2020at19:40
RobynMurdockRobynMurdock
7111silverbadge11bronzebadge
1
ThisshouldalsobeachievableusingVIM'sexpersonality.
– JdeBP
Oct7,2020at9:46
Addacomment
|
2
Ihaveaslightlydifferentproblem,andamputtingthishereforsomeonewho,likeme,endsupherewithdatafullofZEROWIDTHNO-BREAKSPACEcharacters(whichareknownasByteOrderMarkwhentheyarethefirstcharacterofthefile).
Igotthisdatabycopyingoutofgrafanaquerymetricsfield,andithadmultiple(17)\xef\xbb\xbfsequences(whichshowupinvimasrate(node{job)inasinglelinewithonly81actualcharacters.
ImodifiedNominalAnimal'scodejustslightly:
LANG=CLC_ALL=Csed-e's/\xef\xbb\xbf//g'
Andthe:setnobombthinginvimonlyremovestheveryfirstoneinthefile.
triedthis:
LANG=Cvimb
Thenvimdoesn'tshowthem,buttheyarestillthere(evenafterawrite...)
Share
Improvethisanswer
Follow
answeredAug4,2020at22:15
WayneWalkerWayneWalker
95388silverbadges1212bronzebadges
Addacomment
|
1
Ihadthesamequestionandendedupwritingadedicatedutilitybom(1)forthis.It'savailablehere.
Here'sthemanpage:
NAME
bom--DecodeUnicodebyteordermark
SYNOPSIS
bom--strip[--expecttypes][--lenient][--prefer32][--utf8][file]
bom--detect[--expecttypes][--prefer32][file]
bom--printtype
bom--list
bom--help
bom--version
DESCRIPTION
bomdecodes,verifies,reports,and/orstripsthebyteordermark(BOM)atthe
startofthespecifiedfile,ifany.
Whennofileisspecified,orwhenfileis-,readstandardinput.
OPTIONS
-d,--detect
ReportthedetectedBOMtypetostandardoutputandthenexit.
SeeSUPPORTEDBOMTYPESforpossiblevalues.
-e,--expecttypes
ExpecttofindoneofthespecifiedBOMtypes,otherwiseexitwithan
error.
Multipletypesmaybespecified,separatedbycommas.
SpecifyingNONEisacceptableandmatcheswhenthefilehasno(sup-
ported)BOM.
-h,--help
Outputcommandlineusagehelp.
-l,--lenient
Silentlyignoreanyillegalbytesequencesencounteredwhenconverting
theremainderofthefiletoUTF-8.
Withoutthisflag,bomwillexitimmediatelywithanerrorifanille-
galbytesequenceisencountered.
Thisflaghasnoeffectunlessthe--utf8flagisgiven.
--listListthesupportedBOMtypesandexit.
-p,--printtype
Outputthebytesequencecorrespondingtothetypebyteordermark.
--prefer32
UsedtodisambiguatethebytesequenceFFFE0000,whichcanbe
eitheraUTF-32LEBOMoraUTF-16LEBOMfollowedbyaNULcharacter.
Withoutthisflag,UTF-16LEisassumed;withthisflag,UTF-32LEis
assumed.
-s,--strip
StriptheBOM,ifany,fromthebeginningofthefileandoutputthe
remainderofthefile.
-u,--utf8
ConverttheremainderofthefiletoUTF-8,assumingthecharacter
encodingimpliedbythedetectedBOM.
Forfileswithno(supported)BOM,thisflaghasnoeffectandthe
remainderofthefileiscopiedunmodified.
ForfileswithaUTF-8BOM,theidentitytransformationisstill
applied,so(forexample)illegalbytesequenceswillbedetected.
-v,--version
Outputprogramversionandexit.
SUPPORTEDBOMTYPES
ThesupportedBOMtypesare:
NONENosupportedBOMwasdetected.
UTF-7AUTF-7BOMwasdetected.
UTF-8AUTF-8BOMwasdetected.
UTF-16BE
AUTF-16(BigEndian)BOMwasdetected.
UTF-16LE
AUTF-16(LittleEndian)BOMwasdetected.
UTF-32BE
AUTF-32(BigEndian)BOMwasdetected.
UTF-32LE
AUTF-32(LittleEndian)BOMwasdetected.
GB18030
AGB18030(ChineseNationalStandard)BOMwasdetected.
EXAMPLES
Totellwhatkindofbyteordermarkafilehas:
$bom--detect
TonormalizefileswithbyteordermarksintoUTF-8,andpassotherfiles
throughunchanged:
$bom--strip--utf8
Sameaspreviousexample,butdiscardillegalbytesequencesinsteadofgener-
atinganerror:
$bom--strip--utf8--lenient
ToverifyaproperlyencodedUTF-8orUTF-16filewithabyte-order-markand
outputitasUTF-8:
$bom--strip--utf8--expectUTF-8,UTF-16LE,UTF-16BE
Tojustremoveanybyteordermarkandgetonwithyourlife:
$bom--stripfile
RETURNVALUES
bomexitswithoneofthefollowingvalues:
0Success.
1Ageneralerroroccurred.
2The--expectflagwasgivenbutthedetectedBOMdidnotmatch.
3Anillegalbytesequencewasdetected(and--lenientwasnotspeci-
fied).
SEEALSO
iconv(1)
bom:DecodeUnicodebyteordermark,https://github.com/archiecobbs/bom.
Share
Improvethisanswer
Follow
answeredApr6at19:08
ArchieArchie
11122bronzebadges
Addacomment
|
0
RecentlyIfoundthistinycommand-linetoolwhichaddsorremovestheBOMonarbitaryUTF-8encodedfiles:UTFBOMUtils(newlinkatgithub)
Littledrawback,youcandownloadonlytheplainC++sourcecode.Youhavetocreatethemakefile(withCMake,forexample)andcompileitbyyourself,binariesarenotprovidedonthispage.
Share
Improvethisanswer
Follow
answeredOct16,2018at17:58
WernfriedDomscheitWernfriedDomscheit
13111silverbadge55bronzebadges
Addacomment
|
0
Iknowit'sbeenawhile,butsinceIhadaslightlydifferentissue,I'mpostingsoothersmaybenefit.
Mytextfilewasrandomlyhauntedbycharacters\fe\ff,luckilyformetheyappearedatstartofthelinesandthesetofallowedcharactersislimitedtoalphanumeric.
Thebelowcommandinvimcutsfirstnon-alphanumericcharacter,butuseitwithcautionasyoursetofallowedcharactersmightvary.
:%s/^[^a-zA-Z0-9]//g
Share
Improvethisanswer
Follow
editedNov10,2021at11:07
AdminBee
19.4k1616goldbadges4343silverbadges6767bronzebadges
answeredNov10,2021at10:54
SmirkSmirk
1
Addacomment
|
YourAnswer
ThanksforcontributingananswertoUnix&LinuxStackExchange!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers.
Draftsaved
Draftdiscarded
Signuporlogin
SignupusingGoogle
SignupusingFacebook
SignupusingEmailandPassword
Submit
Postasaguest
Name
Email
Required,butnevershown
PostYourAnswer
Discard
Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy
Nottheansweryou'relookingfor?Browseotherquestionstaggedcommand-linefilesunicodeoraskyourownquestion.
TheOverflowBlog
HowtoearnamillionreputationonStackOverflow:beofservicetoothers
Therightwaytojobhop(Ep.495)
FeaturedonMeta
BookmarkshaveevolvedintoSaves
Inboximprovements:markingnotificationsasread/unread,andafiltered...
Linked
2
~./zshrc:commandnotfound:#
6
#!/bin/bash:Nosuchfileordirectory
1
Isthissomebyteordermarkproblem
1
catleavesUTF-8BOMalone
Related
35
ConvertbetweenUnicodeNormalizationFormsontheunixcommand-line
25
HowcanIcheckifaUTF-8textfilehasaBOM?
7
AWKwithBOM:IsthereanycoolwaytohandleUnicodeBOMwithregexp?
1
CanlinuxcommandcommhandleUTF-8encodedtextfiles?
3
WhyisitnotpossibletosearchthroughtextfilecontentsencodedinUTF-16?
0
ProcessUnicodefileswithBOMcorrectlywithPOSIXtools
2
HowcanIexaminetheUnicodeencodingofatextdocument
0
Howtoremoveallsofthyphens(U+00AD)fromafile
HotNetworkQuestions
Applying5Vto3.3Voutputpins
ElectronicCircuitsforSafeInitiationofPyrotechnics?
Canananimalfilealawsuitonitsownbehalf?
Shouldselectedoptionsberemovedfromsingle-andmulti-selectdropdownlists?
Whatisthedefinitionofatrollinthelegalcontext?
Sciencefictionbook/novelaboutaliensinhumansbodies
PacifistethosblockingmyprogressinStellaris
Theunusualphrasing"verb+the+comparativeadjective"intheLordoftheRingsnovels
Howtoremovetikznode?
WhatdothecolorsindicateonthisKC135tankerboom?
Howdoyoucalculatethetimeuntilthesteady-stateofadrug?
Myfavoriteanimalisa-singularandpluralform
I2C(TWI)vsSPIEMInoiseresistance
WhydopeopleinsistonusingTikzwhentheycanusesimplerdrawingtools?
2016PutnamB6difficultsummationproblem
circuitikz:Addingarrowheadtotapofvariableinductance?
Alternativeversionsofbreathing?
Isthereawordfor"amessagetomyself"?
WhydidGodprohibitwearingofgarmentsofdifferentmaterialsinLeviticus19:19?
keyless/flatkeyboard
Howtoelegantlyimplementthisoneusefulobject-orientedfeatureinMathematica?
Workplaceidiomfor"beiGelegenheit"-ordertodoeventually,butdonotprovidepriority
Unknownnotation:squarebrackets,triangles,andnumbers
InD&D3.5,whathappenswhenyouplopaheadbandofintellectonananimal?
morehotquestions
Questionfeed
SubscribetoRSS
Questionfeed
TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader.
Yourprivacy
Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy.
Acceptallcookies
Customizesettings