STR51-J. Use the charset encoder and decoder classes when ...
文章推薦指數: 80 %
String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ... Skiptocontent Skiptobreadcrumbs Skiptoheadermenu Skiptoactionmenu Skiptoquicksearch SEICERTOracleCodingStandardforJavaPagesBoardsSpaceshortcutsDashboardSecureCodingHomeAndroidCC++JavaPerlPagetree BrowsepagesConfigureSpacetools Attachments(0) PageHistory PageInformation Resolvedcomments ViewinHierarchy ViewSource ExporttoPDF ExporttoWord Pages … SEICERTOracleCodingStandardforJava 3Recommendations Rec.04.CharactersandStrings(STR) Skiptoendofbanner Jiralinks Gotostartofbanner STR51-J.Usethecharsetencoderanddecoderclasseswhenmorecontrolovertheencodingprocessisrequired Skiptoendofmetadata CreatedbyFredLong,lastmodifiedbyWillSnavelyonNov16,2017 Gotostartofmetadata StringobjectsinJavaareencodedinUTF-16.JavaPlatformisrequiredtosupportothercharacterencodingsorcharsetssuchasUS-ASCII,ISO-8859-1,andUTF-8. Errorsmayoccurwhenconvertingbetweendifferentlycodedcharacterdata. Therearetwogeneraltypesofencodingerrors.Ifthebytesequenceisnotvalidforthespecifiedcharsetthentheinputisconsideredmalformed.Ifthebytesequencecannotbemappedtoanequivalentcharactersequencethenanunmappablecharacterhasbeenencountered.AccordingtotheJavaAPI [API2014]fortheStringconstructors:Thebehaviorofthisconstructorwhenthegivenbytesarenotvalidinthegivencharsetisunspecified.Similarly,thedescriptionoftheString.getBytes(Charset) methodstates:Thismethodalwaysreplacesmalformed-inputandunmappable-charactersequenceswiththischarset'sdefaultreplacementbytearray.TheCharsetEncoderclassisusedtotransformcharacterdataintoasequenceofbytesinaspecificcharset. Theinputcharactersequenceisprovidedinacharacterbufferoraseriesofsuchbuffers.Theoutputbytesequenceiswrittentoabytebufferoraseriesofsuchbuffers. TheCharsetDecoderclassreversesthisprocessbytransformingasequenceofbytesinaspecificcharsetintocharacterdata. Theinputbytesequenceisprovidedinabytebufferoraseriesofsuchbuffers,whiletheoutputcharactersequenceiswrittentoacharacterbufferoraseriesofsuchbuffers.Specialcareshouldbetakenwhendecodinguntrustedbytedatatoensurethatmalformedinputorunmappablecharactererrorsdonotresultindefectsandvulnerabilities. Encodingerrorscanalsooccur,forexample,encodingacryptographickeycontainingmalformedinputfortransmissionwillresultinanerror.Encodinganddecodingerrorstypicallyresultindatacorruption. NoncompliantCodeExampleThisnoncompliantcodeexampleissimilartotheoneusedinSTR03-J.Donotrepresentnumericdataasstringsinthatitattemptstoconvertabytearraycontainingthetwo's-complementrepresentationofthisBigIntegervaluetoaString.Becausethebytearraycontainsmalformed-inputsequences,thebehavioroftheStringconstructorisunspecified. importjava.math.BigInteger; importjava.nio.CharBuffer; publicclassCharsetConversion{ publicstaticvoidmain(String[]args){ BigIntegerx=newBigInteger("530500452766"); byte[]byteArray=x.toByteArray(); Strings=newString(byteArray); System.out.println(s); } } CompliantSolutionThe java.nio.charset.CharsetEncoderandjava.nio.charset.CharacterDecoderprovidegreatercontrolovertheprocess. Inthiscompliantsolution,theCharsetDecode.decode()methodisusedtoconvertthebytearraycontainingthetwo's-complementrepresentationofthisBigIntegervaluetoaCharBuffer. BecausethebytesdonotrepresentavalidUTF-16,theinputisconsideredmalformed,andaMalformedInputExceptionisthrown. importjava.math.BigInteger; importjava.nio.ByteBuffer; importjava.nio.CharBuffer; importjava.nio.charset.CharacterCodingException; importjava.nio.charset.CharsetDecoder; importjava.nio.charset.MalformedInputException; importjava.nio.charset.StandardCharsets; importjava.nio.charset.UnmappableCharacterException; publicclassCharsetConversion{ publicstaticvoidmain(String[]args){ CharBuffercharBuffer; CharsetDecoderdecoder=StandardCharsets.UTF_16.newDecoder(); BigIntegerx=newBigInteger("530500452766"); byte[]byteArray=x.toByteArray(); ByteBufferbyteBuffer=ByteBuffer.wrap(byteArray); try{ charBuffer=decoder.decode(byteBuffer); s=charBuffer.toString(); System.out.println(s); }catch(IllegalStateExceptione){ e.printStackTrace(); }catch(MalformedInputExceptione){ e.printStackTrace(); }catch(UnmappableCharacterExceptione){ e.printStackTrace(); }catch(CharacterCodingExceptione){ e.printStackTrace(); } } } RiskAssessmentMalformedinputorunmappablecharactererrorscanresultinalossofdataintegrity.RuleSeverityLikelihoodRemediationCostPriorityLevelSTR05-JlowunlikelymediumP2L3RelatedGuidelinesMITRECWECWE-838.InappropriateEncodingforOutputContext CWE-116.ImproperEncodingorEscapingofOutputBibliography[API2006]ClassString draftandroidnormativerecommendationstrtech-edit-done 11Comments ThomasHawtin Thecompliantsolutionimplicitlyusestheplatformdefaultcharacterencodingtwice.Generallyitisbettertospecifyanencodingexplicitly(evenifyoudowanttheplatformdefault).Thisalsogoesforthelikesoflocale,timezone,etc.Inthisparticularexample,theplatformdefaultcharacterencodingmaynotcontainlatincharacters,andthesebyterepresentationisarbitraryevenifitdoes. Permalink Aug19,2009 FredLong Oops,yes,that'sexactlywhatFIO03-J."SpecifythecharacterencodingwhileperformingfileornetworkIO"says! Thanks. Permalink Aug19,2009 DhruvMohindra Insteadofusing"SomeArbitraryString"youcanusethesameBigIntegerastheNCEandusethetoString()methodtoconvertittoaStringandthendothebytearraystuff.TheCScanthenactuallyaddresstheproblemdescribedintheNCE,ratherthandoingsomethingwhichisnotquiterelatedtotheNCE. EDIT:I'veincorporatedthischange. Permalink Sep20,2009 YozoTODA InNCCE,theconversionfrombytearraytoString(newString(byteArray))dependsonthedefaultcharset. forexample,NCCEworksokonmyPC(windows764bit,jdk-7-fcs-b147-x64),thedefaultcharsetiswindows-31j. howaboutchangingnewString(byteArray)tonewString(byteArray,"US-ASCII")? Permalink Jul27,2011 DavidSvoboda Well,thatwouldmakethenoncompliantsolutionlessnoncompliantwouldn'tit? Actually,IgotdifferentresultswhenIrantheprogram,soIaddedthoseresultsin,alongwiththefactthatthedefaultencodingaffectstheresults. Permalink Jul27,2011 YozoTODA usingString(byteArray,"US-ASCII"),Igot529342807871asaconvertedbackBigInteger,whichisclearlyenvironment-dependent(-: JavaSEAPI6(and7)says Thebehaviorofthisconstructorwhenthegivenbytesarenotvalidinthedefaultcharsetisunspecified. inthefollowingtext,thevalueofslooksgarbageonmybrowser. WhenrunonaplatformwherethedefaultcharacterencodingisUS-ASCII,thestringsgetsthevalue{{{ÂJÂÂ}},becausesomeofthecharactersareunprintable.WhenconvertedbacktoaBigInteger,xgetsthevalue149830058370101340468658109. so,howaboutreplacingthatparagraphwiththefollowingtext? WhenrunonaplatformwherethedefaultcharacterencodingisUS-ASCII,thestringsincludessomeunprintablecharacters.WhenconvertedbacktoaBigInteger,xgetsthevalue149830058370101340468658109. Permalink Jul27,2011 ABishop I'vehitthisproblemtwicein10years,Iwasexpectingittobehigherthan'unlikely'.OneinstancewasstoringthebinaryvaluesofencryptedpasswordsasStringsintheDB,andthenwonderingwhypeoplewerecomplainingaboutbeinglockedoutwhenweshiftedDBOS. Permalink Jul11,2014 RobertSeacord Justtoclarify,the"unlikely"meanshowlikelyisitthataflawintroducedbyviolatingtherulecouldleadtoanexploitablevulnerability(seePriorityandLevels). Itisnotmeanttoindicatehowcommonthedefectis. Thisinformationisusedtodecidewhichprioritizerepairstothecode. Letmeknowifyoustillbelievethisshouldbechanged. Yourexampleaboveseemedexceptionallysecure.8^) Permalink Jul13,2014 ABishop OK,agreedontheprobability.Unfortunatelyitwasn'tan'example'itwasareallifeprojectforagovernmentagency. Permalink Jul13,2014 RobertSeacord I'mveryconfusedbythefirstexample. AsfarasIcantell,thisexamplehasnothingtodowithspecifyingavalidcharacterencoding. TheproblemisthatintheNCE,theBigIntegerisconvertedtobinaryandintheCStheBigIntegerisconvertedtoaString.Perhapstherearetoseparateruleshere? Therulethatgoeswiththisexampleisprobably"Don'trepresentbinaryvaluesasStrings". Iwouldthinkthiswouldbeprettyobvious,butABishophasseenittwicein10years. Idosortoflikehisexample,ifwecouldcodeitup. Permalink Jan01,2015 RobertSeacord AFAICT,theNCEandCSperformthesameactionsasthedefaultbehaviorof CharsetEncoderseemstobethesameorsimilartogetByte(). RightnowI'mthinkingthiswouldbebestasaguidelinewhichsayssomethinglike"UsetheCharsetEncoderclasswhenmorecontrolovertheencodingprocessisrequired." Permalink Jan02,2015 Overview ContentTools PoweredbyAtlassianConfluence7.13.7 PrintedbyAtlassianConfluence7.13.7 Reportabug AtlassianNews Atlassian {"serverDuration":135,"requestCorrelationId":"4b05f941de509375"}
延伸文章資訊
- 1Byte Encodings and Strings (The Java™ Tutorials ...
Byte Encodings and Strings ... If a byte array contains non-Unicode text, you can convert the tex...
- 2Encode String to UTF-8 - java - Stack Overflow
A Java String is internally always encoded in UTF-16 - but you really should think about it like ...
- 3Java String - Jenkov.com
- 4Java String Encoding - Javatpoint
- 5Convert String to UTF-8 bytes in Java - Tutorialspoint
UTF-8 is a variable width character encoding. UTF-8 has ability to be as condense as ASCII but ca...