How do I encode/decode UTF-16LE byte arrays with a BOM?

文章推薦指數: 80 %
投票人數:10人

The "UTF-16" charset name will always encode with a BOM and will decode data using either big/little endianness, but "UnicodeBig" and ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams HowdoIencode/decodeUTF-16LEbytearrayswithaBOM? AskQuestion Asked 13years,4monthsago Modified 5years,1monthago Viewed 39ktimes 24 Ineedtoencode/decodeUTF-16bytearraystoandfromjava.lang.String.ThebytearraysaregiventomewithaByteOrderMarker(BOM),andIneedtoencodedbytearrayswithaBOM. Also,becauseI'mdealingwithaMicrosoftclient/server,I'dliketoemittheencodinginlittleendian(alongwiththeLEBOM)toavoidanymisunderstandings.IdorealizethatwiththeBOMitshouldworkbigendian,butIdon'twanttoswimupstreamintheWindowsworld. Asanexample,hereisamethodwhichencodesajava.lang.StringasUTF-16inlittleendianwithaBOM: publicstaticbyte[]encodeString(Stringmessage){ byte[]tmp=null; try{ tmp=message.getBytes("UTF-16LE"); }catch(UnsupportedEncodingExceptione){ //shouldnotpossible AssertionErrorae= newAssertionError("CouldnotencodeUTF-16LE"); ae.initCause(e); throwae; } //usebruteforcemethodtoaddBOM byte[]utf16lemessage=newbyte[2+tmp.length]; utf16lemessage[0]=(byte)0xFF; utf16lemessage[1]=(byte)0xFE; System.arraycopy(tmp,0, utf16lemessage,2, tmp.length); returnutf16lemessage; } WhatisthebestwaytodothisinJava?IdeallyI'dliketoavoidcopyingtheentirebytearrayintoanewbytearraythathastwoextrabytesallocatedatthebeginning. Thesamegoesfordecodingsuchastring,butthat'smuchmorestraightforwardbyusingthejava.lang.Stringconstructor: publicString(byte[]bytes, intoffset, intlength, StringcharsetName) javaunicodeutf-16byte-order-mark Share Follow editedMay18,2009at20:27 JaredOberhaus askedMay18,2009at19:55 JaredOberhausJaredOberhaus 14.4k44goldbadges5555silverbadges5555bronzebadges Addacomment  |  5Answers 5 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 32 The"UTF-16"charsetnamewillalwaysencodewithaBOMandwilldecodedatausingeitherbig/littleendianness,but"UnicodeBig"and"UnicodeLittle"areusefulforencodinginaspecificbyteorder.UseUTF-16LEorUTF-16BEfornoBOM-seethispostforhowtouse"\uFEFF"tohandleBOMsmanually.Seehereforcanonicalnamingofcharsetstringnamesor(preferably)theCharsetclass.Alsotakenotethatonlyalimitedsubsetofencodingsareabsolutelyrequiredtobesupported. Share Follow answeredMay18,2009at20:08 McDowellMcDowell 106k2929goldbadges199199silverbadges262262bronzebadges 6 1 Thanks!Onemoreissuethough...Using"UTF-16"encodesthedataasBigEndian,whichIsuspectwillnotgooverwellwithMicrosoftdata(eventhoughtheBOMexists).AnywaytoencodeUTF-16LEwithBOMwithJava?I'llupdatemyquestiontoreflectwhatIwasreallylookingfor... – JaredOberhaus May18,2009at20:14 Clickonthe"seethispost"linkhegave.Basically,youstuffa\uFEFFcharacteratthebeginningofyourstring,andthenencodetoUTF-16LE,andtheresultwillhaveaproperBOM. – DanielMartin May18,2009at20:17 Use"UnicodeLittle"(assumingyourJREsupportsit-("\uEFFF"+"mystring").getBytes("UTF-16LE")otherwise).ThoughIwouldbesurprisedifMicrosoftAPIsexpectedaBOMbutcouldn'thandlebig-endiandata-theytendtolikeusingBOMsmorethanotherplatforms.Testwithemptystrings-youmaygetemptyarraysifthereisnodata. – McDowell May18,2009at20:22 4 IwouldbecompletelyunsurprisedatMicrosoftdefiningaformatwhereitexpectsaUTF-16LEBOMtobeginafileandwillnotbehaveifthefilebeginswithaUTF-8BOMoraUTF-16BEBOM.IwouldbecompletelyunsurprisedbecausethisisexactlythebehaviorIhaveobservedwithexcelloadingCSVfiles-ifthefilebeginswithaUTF-16LEBOM,thenitloadsthedatainUTF-16LEandexpectstabsbetweencolumns.Anyothercharactersequenceanditloadsdatainsomelocalcharactersetwith","or";"(locale-dependent!)betweencolumns. – DanielMartin May18,2009at20:42 7 Justtoreiterate:"UnicodeLittle"(a.k.a."x-UTF-16LE-BOM")willwritethefileasUTF-16little-endianwithaBOM.ThisshouldbethepreferredmethodforWRITINGthefiles,butitonlyseemstobeavailablesinceJava6(JDK1.6).ForREADING,youshouldstickwith"UTF-16". – AlanMoore May18,2009at23:51  |  Show1morecomment 6 Firstoff,fordecodingyoucanusethecharacterset"UTF-16";thatautomaticallydetectsaninitialBOM.ForencodingUTF-16BE,youcanalsousethe"UTF-16"characterset-that'llwriteaproperBOMandthenoutputbigendianstuff. ForencodingtolittleendianwithaBOM,Idon'tthinkyourcurrentcodeistoobad,evenwiththedoubleallocation(unlessyourstringsaretrulymonstrous).Whatyoumightwanttodoiftheyareisnotdealwithabytearraybutratherajava.nioByteBuffer,andusethejava.nio.charset.CharsetEncoderclass.(WhichyoucangetfromCharset.forName("UTF-16LE").newEncoder()). Share Follow answeredMay18,2009at20:15 DanielMartinDanielMartin 22.6k66goldbadges4949silverbadges6868bronzebadges 0 Addacomment  |  6 Thisishowyoudoitinnio: returnCharset.forName("UTF-16LE").encode(message) .put(0,(byte)0xFF) .put(1,(byte)0xFE) .array(); Itiscertainlysupposedtobefaster,butIdon'tknowhowmanyarraysitmakesunderthecovers,butmyunderstandingofthepointoftheAPIisthatitissupposedtominimizethat. Share Follow answeredMay18,2009at23:09 YishaiYishai 88.9k3131goldbadges186186silverbadges257257bronzebadges 1 Thisoneactuallydoesn'twork.Theput(0)andput(1)callsoverwritesthefirsttwobytesoftheencodedmessage'sByteBuffer. – hopia Aug24,2017at22:18 Addacomment  |  3 ByteArrayOutputStreambyteArrayOutputStream=newByteArrayOutputStream(string.length()*2+2); byteArrayOutputStream.write(newbyte[]{(byte)0xFF,(byte)0xFE}); byteArrayOutputStream.write(string.getBytes("UTF-16LE")); returnbyteArrayOutputStream.toByteArray(); EDIT:Rereadingyourquestion,Iseeyouwouldratheravoidthedoublearrayallocationaltogether.UnfortunatelytheAPIdoesn'tgiveyouthat,asfarasIknow.(Therewasamethod,butitisdeprecated,andyoucan'tspecifyencodingwithit). IwrotetheabovebeforeIsawyourcomment,Ithinktheanswertousethenioclassesisontherighttrack.Iwaslookingatthat,butI'mnotfamiliarenoughwiththeAPItoknowoffhandhowyougetthatdone. Share Follow editedMay18,2009at20:36 answeredMay18,2009at20:09 YishaiYishai 88.9k3131goldbadges186186silverbadges257257bronzebadges 3 Thanks.InadditionwhatIwouldhavelikedhereistonotallocatetheentirebytearraywithstring.getBytes("UTF-16LE")--perhapsbywrappingthestreamasanInputStream,whichwasthepointofmyearlierquestion:stackoverflow.com/questions/837703/… – JaredOberhaus May18,2009at20:21 NotethatthiscodeactuallyallocatesarraysbigenoughfortheStringthreetimes,sinceyouhavetheinternalarrayoftheByteArrayOutputStreamwhichiscopiedinthecall.toByteArray().AwaytogetitbackdowntoonlyallocatingtwoistowraptheByteArrayOutputStreaminanOutputStreamWriterandwritethestringtothat.ThenyoustillhavetheByteArrayOutputStream'sinternalstateandthecopymadeby.toByteArray(),butnotthereturnvaluefrom.getBytes – DanielMartin May18,2009at20:55 Itseemsthatyouarejustexchangingachararrayforabytearrayifyoudothat,astheOutputStreamWriterdelegatestotheStreamEncoderclass,whichcreatesachar[]buffertoretrievetheStringdata.Stringisimmutable,andthesizeofanarrayisinvariable,sothatcopyseemsunavoidable.IthinknioissupposedtohelpwiththatdoublecreationontheByteArrayOutputStream – Yishai May18,2009at21:29 Addacomment  |  0 Thisisanoldquestion,butstill,Icouldn'tfindanacceptableanswerformysituation.Basically,Javadoesn'thaveabuilt-inencoderforUTF-16LEwithaBOM.Andso,youhavetorolloutyourownimplementation. Here'swhatIendedupwith: privatebyte[]encodeUTF16LEWithBOM(finalStrings){ ByteBuffercontent=Charset.forName("UTF-16LE").encode(s); byte[]bom={(byte)0xff,(byte)0xfe}; returnByteBuffer.allocate(content.capacity()+bom.length).put(bom).put(content).array(); } Share Follow answeredAug24,2017at22:17 hopiahopia 4,82177goldbadges2929silverbadges5252bronzebadges Addacomment  |  YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedjavaunicodeutf-16byte-order-markoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 0 emojicodeOKHANDSING(👌)notrenderingproperlywithanycharsetencoding 146 WhichencodingopensCSVfilescorrectlywithExcelonbothMacandWindows? 4 RightwaytodealwithUnicodeBOMinatextfile Related 1538 HowcanIconcatenatetwoarraysinJava? 2784 HowcanIcreateanexecutableJARwithdependenciesusingMaven? 974 What'sthedifferencebetweenUTF-8andUTF-8withBOM? 37 Isn’tonbigendianmachinesUTF-8'sbyteorderdifferentthanonlittleendianmachines?Sowhythendoesn’tUTF-8requireaBOM? 881 UnicodeDecodeError:'charmap'codeccan'tdecodebyteXinpositionY:charactermapsto 6 UTF-16BEtoUTF-16LE,andback 1 whydoesiconv(1)incygwinproducebig-endianUTF-16with`-tutf-16`? 12 InUTF-16,UTF-16BE,UTF-16LE,istheendianofUTF-16thecomputer'sendianness? 1 What'sthememorylayoutofUTF-16encodedstringswithVisualStudio2015? 0 ByteOrderMask:confusingtheUTFencoding HotNetworkQuestions Whatistheconventionalwaytonotateameterwithaccentsoneverysecond8thnote? Whataretheargumentsforrevengeandretribution? Howtosimplifyapurefunction? Doyoupayforthebreakfastinadvance? Whatare"HollywoodTwin"beds? CounterexampleforChvatal'sconjectureinaninfiniteset PacifistethosblockingmyprogressinStellaris Howdoyoucalculatethetimeuntilthesteady-stateofadrug? Isitcorrecttochangetheverbto"being"in"Despitenoonewashurtinthisincident…"? Single-rowSettingstable:prosandconsofJoinsvsscalarsubqueries Could"nocloning"beusedasadefenceforquantumencryption? MakeaCourtTranscriber DidMS-DOSeverdropabilitytosupportnon-IBMPCcompatiblemachines? Howtoplug2.5mm²strandedwiresintoapushwirewago? Whyare"eat"and"drink"differentwordsinlanguages? WhytheneedforaScienceOfficeronacargovessel? LeavingaTTjobthenre-enteringacademia:Areaofbusinessandmanagement WhathadEstherdonein"TheBellJar"bySylviaPlath? What'sthedifferencebetween'Dynamic','Random',and'Procedural'generations? WhydidGodprohibitwearingofgarmentsofdifferentmaterialsinLeviticus19:19? Areyougettingtiredofregularcrosswords? Sciencefictionbook/novelaboutaliensinhumans'bodies HowdothosewhoholdtoaliteralinterpretationofthefloodaccountrespondtothecriticismthatNoahbuildingthearkwouldbeunfeasible? Findanddeletepartiallyduplicatelines morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-java Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings  



請為這篇文章評分?