Oracle OCI changeing invalid UTF8 characters to U+FFFD
文章推薦指數: 80 %
It is correct, by the Unicode standard, to replace such data by U+FFFD REPLACEMENT CHARACTER when reading data as UTF-8. So if you don't want ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams OracleOCIchangeinginvalidUTF8characterstoU+FFFD AskQuestion Asked 8years,11monthsago Modified 8years,11monthsago Viewed 2ktimes 1 IamwritingaC++dataconversionprogramwhichiscopyingdatafromanODBCdatasourceintoanOracledatabase.HavechosenC++(witharrayoperations)duetotheveryhighvolumeofdatatomove(billionsofrows). Nowthetextcolumnsare"supposed"tobeUTF-8,butthisisnotalwaysthecase.WhenitsnotIstillwanttocopytheinvalidrawbytesintoOracle.Wewillcleanthemuplater.ThecolumnisasimpleVARCHAR2(100),so100byteslong.ButOracleappearstobeattemptingsomesortofUTF-8parsing/processingonthedata. Forexamplethefollowingstring(hasbeentruncatedto100bytes,thusinvalid): HexBytes:464654F09F9884F09F9888F09F9894F09F9885F09F9890F09F9888F09F9894F09F9888F09F9885F09F9894F09F9886F09F9894F09F9885F09F9890F09F9890F09F9886F09F9890F09F9890F09F9887F09F9890F09F9892F09F9888F09F989AF09F9888F0 http://tinyurl.com/nhhkf62 Isactuallybeinginsertedintothedatabaseas: HexBytes:464654EFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBDEFBFBD http://tinyurl.com/orkv6z6 Whichisbasicallytheleading3asciicharsfollowedbytheUTF-8encodingofU+FFFDforeachofthesubsequentbytes. Otherdetails: OracleVersion:11gEnterpriseEditionRelease11.2.0.1.0 OracleClient:oracle-instantclient11.2-basic-11.2.0.3.0-1 OracleOCIrpm:oracle-instantclient11.2-devel-11.2.0.3.0-1 Environment:LANG=en_US.UTF-8 Environment:NLS_CHARACTERSET=AMERICAN_AMERICA.UTF8 Environment:NLS_LANG=AMERICAN.UTF8 SodoesanyoneknowwhyOracleand/orOCIismodifyingthisdata?Andisthereawaytostopitfromhappenning? Thanks c++oracleunicodeutf-8oracle-call-interface Share Follow askedOct30,2013at1:27 SodvedSodved 8,26811goldbadge2929silverbadges4242bronzebadges 5 1 Whatisthedatabasecharacterset?Ifyouwanttostorethebyteswithoutworryingaboutcharactersetconversiontakingplace,youreallyoughttostorethedatainaRAW(100)column,notVARCHAR2(100).Isthatanoption? – JustinCave Oct30,2013at3:02 3 TheyarenotinvalidUTF-8charactersbutbytesthatdonotconstitutevalidUTF-8representationofanything.Itiscorrect,bytheUnicodestandard,toreplacesuchdatabyU+FFFDREPLACEMENTCHARACTERwhenreadingdataasUTF-8.Soifyoudon’twantthat,youmustnotreaditthatwaybutasrawbinarydata. – JukkaK.Korpela Oct30,2013at5:05 UnfortunatelyIdon'tthinkrawisanoption,wouldputthingselsewhereatrisk.Theproblemisnotonread,itsoninsert.Thebytesinthe"before"aredumpedfromthememorybufferrightbeforetheOCIStmtExecuteoftheinsert.Thebytesinthe"after"areextractedformthedbusingdump(colname,16). – Sodved Oct30,2013at5:51 2 [email protected],likeallqualitydatabasesisverycarefulaboutdataintegrity.AUnicodestringmustbeinUnicode,forinstance.Yourideatostorerawbytes(invalidUnicode)inaUnicodestringisanexplicitviolationofthatdataintegrity.Ifyoudon'twantdataintegrity,don'tuseadatabase. – MSalters Oct30,2013at8:40 ThecharactersetintheDBisAL32UTF8.Yourpointsareveryvalid,butIhavemyrequirements.Ihavenotseenthisimplicitconversionhappenbeforewhenusingaperlclient?MaybeIammessingsomethingelseupsomewhere... – Sodved Oct30,2013at23:49 Addacomment | 1Answer 1 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 0 NLS_LANGismostimportantforimplicitcharacterconversion.IthinkitshouldbeNLS_LANG=AMERICAN_AMERICA.UTF8insteadofNLS_LANG=AMERICAN.UTF8 Whatisyourdatabasecharacterset? Share Follow editedOct30,2013at6:30 answeredOct30,2013at6:07 HAL9000HAL9000 3,79711goldbadge2222silverbadges2929bronzebadges 2 Close,Ihadtomatchthecharactersetofthedatabaseexactly.SoIsetNLS_LANG=AMERICAN.AL32UTF8anditallworked.Thanksforpointingmeintherightdirection – Sodved Oct31,2013at0:29 ahok.AL32UTF8isthenewerversion,allowingformore(all?)characters. – HAL9000 Oct31,2013at4:44 Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedc++oracleunicodeutf-8oracle-call-interfaceoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Related 1181 GetlistofalltablesinOracle? 1257 HowdoIlimitthenumberofrowsreturnedbyanOraclequeryafterordering? 1356 Whatcharacterscanbeusedforup/downtriangle(arrowwithoutstem)fordisplayinHTML? 3 OracleODBC:WhyarenationalcharacterschangedtoLatinequivalentinSELECTresult 3 OCIinvalidoperationORA-01010 1 InsertingbinarydataintoVarchar2withOTL(OCCI,OCI) 1 WhatOraclesettingpreventsopencursorsfromclosing? 1 Can'tinsertarabiccharactersintooracledatabase HotNetworkQuestions Interpretinganegativeself-evaluationofahighperformer Wouldextractinghydrogenfromthesunlessenitslifespan? Ifquasarsdestroyalllifeintheirhostgalaxy,thenhowdidlifesurvivewhenMilkyWaywasaQuasar6millionyearsago? ShouldIusepwdortildeplus(~+)? WillIgetdeniedentryafterIremovedavisasticker?Ismypassportdamaged? WherewasthisneonsignofadragondisplayedinLosAngelesinthe1990s?Isitstilltherenow? Whataretheargumentsforrevengeandretribution? Unknownnotation:squarebrackets,triangles,andnumbers Workplaceidiomfor"beiGelegenheit"-ordertodoeventually,butdonotprovidepriority Alternativeversionsofbreathing? keyless/flatkeyboard WhydoNorthandSouthAmericancountriesoffercitizenshipbasedonunrestrictedJusSoli(rightofsoil)? IsthematrixinducedL1-normgreaterthantheinducedL2-norm? TwoidenticalDCmotorswithtwoidenticaldrivers SomeoneofferedtaxdeductibledonationasapaymentmethodforsomethingIamselling.AmIgettingscammed? WhatisthedifferencebetweenGlidepathversusGlideslope? Howtosimplifyapurefunction? Changelinkcolorbasedinbackgroundcolor? Traditionally,andcurrently,whatstopshumanvotecountersfromalteringballotstomakethem'Spoilt/Invalidvotes? Canaphotonturnaprotonintoaneutron? Howtoremovetikznode? HowtofindthebordercrossingtimeofatraininEurope?(Czechbureaucracyedition) Whyare"eat"and"drink"differentwordsinlanguages? Unsurewhatthesewatersoftenerdialsarefor morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. default Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Oracle OCI changeing invalid UTF8 characters to U+FFFD
It is correct, by the Unicode standard, to replace such data by U+FFFD REPLACEMENT CHARACTER when...
- 2document and implement invalid UTF-8 treated as U+FFFD ...
Invalid UTF-8 gets turned, one byte at a time, into U+FFFD. Here is text from the spec about rang...
- 3REPLACEMENT CHARACTER (U+FFFD) - Unicodepedia
Character: , Unicode code point: U+FFFD, HTML Entity: , Unicode name: REPLACEMENT CHARACTER, Gr...
- 4Unicode Replacement Character (U+FFFD) - Sublime Forum
I am currently working on a c++ project where it sometimes happens that typing a space is interpr...
- 5Unicode字符列表- 维基百科,自由的百科全书
本條目以列表形式展示並介紹Unicode字符。如果字母顯示模糊,請將瀏覽器字型調為例如「Arial ... U+FFFD, , 佔位字元(英語:Replacement Character).