How to write a check in python to see if file is valid UTF-8?
文章推薦指數: 80 %
Could be simpler by using only one line: codecs.open("path/to/file", encoding="utf-8", errors="strict").readlines() instead of ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams HowtowriteacheckinpythontoseeiffileisvalidUTF-8? AskQuestion Asked 12years,2monthsago Modified 4monthsago Viewed 18ktimes 22 Asstatedintitle,Iwouldliketocheckingivenfileobject(openedasbinarystream)isvalidUTF-8file. Anyone? Thanks utf-8python-2.x Share Improvethisquestion Follow editedMay31at7:49 malat 11.6k1313goldbadges8181silverbadges143143bronzebadges askedJul16,2010at22:33 JoxJox 7,0721414goldbadges4848silverbadges6363bronzebadges Addacomment | 3Answers 3 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 30 deftry_utf8(data): "ReturnsaUnicodeobjectonsuccess,orNoneonfailure" try: returndata.decode('utf-8') exceptUnicodeDecodeError: returnNone data=f.read() udata=try_utf8(data) ifudataisNone: #NotUTF-8.Dosomethingelse else: #Handleunicodedata Share Improvethisanswer Follow answeredJul16,2010at22:53 DanielStutzbachDanielStutzbach 71.5k1717goldbadges8585silverbadges7676bronzebadges 1 ObviouslyIdidn'tdomyhomeworkgoodenoughwhenthereismorethatonesolutionsimpleasthis:(Thanks! – Jox Jul16,2010at23:53 Addacomment | 14 Youcoulddosomethinglike importcodecs try: f=codecs.open(filename,encoding='utf-8',errors='strict') forlineinf: pass print"Validutf-8" exceptUnicodeDecodeError: print"invalidutf-8" Share Improvethisanswer Follow answeredJul16,2010at22:39 michaelmichael 45133silverbadges77bronzebadges 1 1 Couldbesimplerbyusingonlyoneline:codecs.open("path/to/file",encoding="utf-8",errors="strict").readlines()insteadof3. – colidyre May7,2019at19:06 Addacomment | 0 Ifanyoneneededascripttofindallnonutf-8filesincurrentdir: importos deftry_utf8(data): try: returndata.decode('utf-8') exceptUnicodeDecodeError: returnNone forroot,_,filesinos.walk('.'): ifroot.startswith('./.git'): continue forfileinfiles: iffile.endswith('.pyc'): continue path=os.path.join(root,file) withopen(path,'rb')asf: data=f.read() data=try_utf8(data) ifdataisNone: print(path) Share Improvethisanswer Follow answeredJun13at17:22 VulwsztynVulwsztyn 1,95311goldbadge99silverbadges1717bronzebadges Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedutf-8python-2.xoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Visitchat Linked 11 Checkforvalidutf8stringinPython 0 Replaceallcharactersexceptforalphanumericsfromalllanguages Related 233 WritetoUTF-8fileinPython 23 Canstr_replacebesafelyusedonaUTF-8encodedstringifit'sonlygivenvalidUTF-8encodedstringsasarguments? 329 UsingPowerShelltowriteafileinUTF-8withouttheBOM 582 IsitpossibletoforceExcelrecognizeUTF-8CSVfilesautomatically? 877 Whatis__future__inPythonusedforandhow/whentouseit,andhowitworks 101 ConvertUTF-8withBOMtoUTF-8withnoBOMinPython 87 WriteafileinUTF-8usingFileWriter(Java)? 1195 HowtoreturndictionarykeysasalistinPython? 2 MySQLcheckifBLOBisvalidUTF-8 HotNetworkQuestions StrangeFruitfromTomatoPlant Single-rowSettingstable:prosandconsofJoinsvsscalarsubqueries Could"nocloning"beusedasadefenceforquantumencryption? Howtotellifmybikehasanaluminumframe sshhowtoallowaverylimiteduserwithnohometologinwithpubkey HowtofindthebordercrossingtimeofatraininEurope?(Czechbureaucracyedition) HowtogetridofUbuntuProadvertisementwhenupdatingapt? Howtoremovetikznode? Interpretinganegativeself-evaluationofahighperformer WillIgetdeniedentryafterIremovedavisasticker?Ismypassportdamaged? WhytheneedforaScienceOfficeronacargovessel? Howdocucumbershappen?Whatdoes"verypoorlypollinatedcucumber"meanexactly?Howcanpollinationbe"uneven"? Theunusualphrasing"verb+the+comparativeadjective"intheLordoftheRingsnovels HowdoIdownloadmacOSMontereyonunsupportedMac? WhatisthedifferencebetweenGlidepathversusGlideslope? Traditionally,andcurrently,whatstopshumanvotecountersfromalteringballotstomakethem'Spoilt/Invalidvotes? Sortbycolumngroupandignoreothercolumnsfailingforthisexample,why? WhatdothecolorsindicateonthisKC135tankerboom? Howdoyoucalculatethetimeuntilthesteady-stateofadrug? Findanddeletepartiallyduplicatelines Myfavoriteanimalisa-singularandpluralform 2016PutnamB6difficultsummationproblem WhathadEstherdonein"TheBellJar"bySylviaPlath? WhyareRussiancombatantsinUkraineconsideredsoldiersratherthanterrorists? morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-py Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 18. How to guess the encoding of a document?
Check for BOM markers¶. If the string begins with a BOM, the encoding can be extracted from the B...
- 2How can I detect if a file is binary (non-text) in Python? - Stack Overflow
- 3Is there a Linux command to find out if a file is UTF-8? - Super User
- 4How can Python check if a file name is in UTF8?
How can Python check if a file name is in UTF8? I have a PHP script that creates a list of files ...
- 5Unicode HOWTO — Python 3.10.7 documentation
If bytes are corrupted or lost, it's possible to determine the start of the next UTF-8-encoded co...