Read UTF-8 files correctly with PowerShell - Stack Overflow
文章推薦指數: 80 %
I need a function that can read any file with UTF-8 encoding, ignore and delete the BOM and not modify the content. What should I use? Update. I ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams ReadUTF-8filescorrectlywithPowerShell AskQuestion Asked 8years,6monthsago Modified 5years,7monthsago Viewed 34ktimes 14 Followingsituation: APowerShellscriptcreatesafilewithUTF-8encoding Theusermayormaynoteditthefile,possiblylosingtheBOM,butshouldkeeptheencodingasUTF-8,andpossiblychangingthelineseparators ThesamePowerShellscriptreadsthefile,addssomemorecontentandwritesitallasUTF-8backtothesamefile Thiscanbeiteratedmanytimes WithGet-ContentandOut-File-EncodingUTF8Ihaveproblemsreadingitcorrectly.It'sstumblingovertheBOMithaswrittenbefore(puttingitinthecontent,breakingmyparsingregex),doesnotuseUTF-8encodingandevendeleteslinebreaksintheoriginalcontentpart. IneedafunctionthatcanreadanyfilewithUTF-8encoding,ignoreanddeletetheBOMandnotmodifythecontent.WhatshouldIuse? Update IhaveaddedalittletestscriptthatshowswhatI'mtryingtodoandwhathappensinstead. #Readdataifexists $data="" $startRev=1; if(Test-Pathtest.txt) { $data=Get-Content-Pathtest.txt if($data-match"^[0-9-]{10}-r([0-9]+)") { $startRev=[int]$matches[1]+1 } } Write-HostNextrevisionis$startRev #Defineexampledatatoadd $startRev=$startRev+10 $newMsgs="2014-04-01-r"+$startRev+"`r`n`r`n"+` "Line1`r`n"+` "Line2`r`n`r`n" #Writenewdataback $data=$newMsgs+$data $data|Out-Filetest.txt-EncodingUTF8 Afterrunningitafewtimes,newsectionsshouldbeaddedtothebeginningofthefile,theexistingcontentshouldnotbealteredinanyway(currentlyloseslinebreaks)andnoadditionalnewlinesshouldbeaddedattheendofthefile(seemstohappensometimes). Instead,thesecondrungivesmeanerror. powershellencodingutf-8 Share Improvethisquestion Follow editedApr1,2014at15:13 ygoe askedApr1,2014at14:49 ygoeygoe 17.3k2121goldbadges103103silverbadges199199bronzebadges 2 I'mnotgreatwiththewholeencodingtopic,butwouldn'tyouhavetore-injecttheBOM,ifitgetsremoved,inordertoreaditproperly?I'malittleconfusedbythequestion.WhydoyouwanttoremovetheUTF-8BOM? – user189198 Apr1,2014at14:54 Mytexteditorisstupidandremovesit.AnywayyouneverknowwhattexteditorsdowithUTF-8files.Myscriptshouldsimplybesmartenoughtohandleit.LiketheStreamReaderclassdoesitprettywell. – ygoe Apr1,2014at15:05 Addacomment | 3Answers 3 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 31 IfthefileissupposedtobeUTF8whydon'tyoutrytoreaditdecodingUTF8: Get-Content-Pathtest.txt-EncodingUTF8 Share Improvethisanswer Follow answeredApr1,2014at16:20 JPBlancJPBlanc 68.3k1515goldbadges129129silverbadges167167bronzebadges 3 4 Because,accordingtotheofficialdocumentation,thisparameterdoesn'tevenexist?HowcouldIknowaboutit?I'llgiveitatry. – ygoe Apr2,2014at8:47 1 Sorry,5yearslaterIdon'tknowthatanymore.Ihaven'tusedPSmuchinawhile. – ygoe Jan23,2019at20:00 theparameterhasexistedsinceatleastPowerShell3.0 – phuclv Apr6,2019at3:15 Addacomment | 4 ReallyJPBlancisright.IfyouwantitreadasUTF8thenspecifythatwhenthefileisread. Onasidenote,you'relosingformattinginherewiththe[String]+[String]stuff.Nottomentionyourregexmatchdoesn'twork.Checkouttheregexsearchchanges,andthechangesmadetothe$newMsgs,andthewayI'moutputtingyourdatatothefile. #Readdataifexists $data="" $startRev=1; if(Test-Pathtest.txt) { $data=Get-Content-Pathtest.txt#-EncodingUTF8 if($data-match"\br([0-9]+)\b"){ $startRev=[int]([regex]::Match($data,"\br([0-9]+)\b")).groups[1].value+1 } } Write-HostNextrevisionis$startRev #Defineexampledatatoadd $startRev=$startRev+10 $newMsgs=@" 2014-04-01-r$startRev`r`n`r`n Line1`r`n Line2`r`n`r`n "@ #Writenewdataback $newmsgs,$data|Out-Filetest.txt-EncodingUTF8 Share Improvethisanswer Follow answeredApr1,2014at18:19 TheMadTechnicianTheMadTechnician 33.5k22goldbadges3838silverbadges4848bronzebadges 4 Thatimprovedit.Theregexitselfwasgood,justnothowIusedit.Ifoundthatsomewhereelse...Isn'tthereawaywithoutduplicatingtheregexstring?Also,whatdoesthecommainthelastcommanddo?Iseelotsofadditionalnewlinesaddedattheendinitially. – ygoe Apr2,2014at8:56 Foundit,mustbeanarray.Unfortunatelytheempty$dataforthefirstruncausesextralines.–Andwhydoesthe+operatoroftwostringschangetheiractualcontent?That'snewtomeinanyprogramminglanguage. – ygoe Apr2,2014at9:16 Okay,it'sGet-Content'sfault.Itgivesmeanarrayoflines,notasinglemultilinestring.Thatcausesallsortsofchaos.I'veswitchedto[System.IO.File]::ReadAllText()and[System.IO.File]::WriteAllText()andnowIgetmuchmorepredictableresults. – ygoe Apr2,2014at9:32 Get-Content-rawgivesyouthesinglemultilinestringyou'relookingfor. – Polymorphix Jun7,2017at8:20 Addacomment | 2 Get-Contentdoesn'tseemtohandleUTF-fileswithoutBOMatall(ifyouomittheEncoding-flag).System.IO.File.ReadLinesseemstobeanalternative,examples: PSC:\temp\powershellutf8>$a=Get-Content.\utf8wobom.txt PSC:\temp\powershellutf8>$b=Get-Content.\utf8wbom.txt PSC:\temp\powershellutf8>$a2=Get-Content.\utf8wbom.txt-EncodingUTF8 PSC:\temp\powershellutf8>$a ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ<==Thisdoesntseemtoberightatall PSC:\temp\powershellutf8>$b ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ PSC:\temp\powershellutf8>$a2 ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ PSC:\temp\powershellutf8> PSC:\temp\powershellutf8>$c=[IO.File]::ReadLines('.\utf8wbom.txt'); PSC:\temp\powershellutf8>$c ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ PSC:\temp\powershellutf8>$d=[IO.File]::ReadLines('.\utf8wobom.txt'); PSC:\temp\powershellutf8>$d ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ<==Works! Share Improvethisanswer Follow editedFeb24,2017at9:04 answeredFeb24,2017at8:57 EmilGEmilG 1,1621313silverbadges2525bronzebadges Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedpowershellencodingutf-8oraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 1 HowtoavoidconversionofcharacterswhileconvertingfiletoUnixformatinWindows Related 597 Bestwaytoconverttextfilesbetweencharactersets? 1322 UTF-8allthewaythrough 674 WhatisthedifferencebetweenUTF-8andUnicode? 2802 DetermineinstalledPowerShellversion 244 HowtooutputsomethinginPowerShell 974 What'sthedifferencebetweenUTF-8andUTF-8withBOM? 475 WhatareUnicode,UTF-8,andUTF-16? 2619 PowerShellsays"executionofscriptsisdisabledonthissystem." 49 UTF-8outputfromPowerShell 154 ChangingPowerShell'sdefaultoutputencodingtoUTF-8 HotNetworkQuestions Determinethelengthoftherestofamathdisplaylineformultlined Idon'tunderstandif"per"meaningexactamountforeachunitordoesitmean"onaverage" Whenisthefirstelementintheargumentlistregardedasafunctionsymbolandwhennot? Howtoremovetikznode? UnderstandingElectricFieldsLinesandhowtheyshow‘like’chargesrepelling Canaphotonturnaprotonintoaneutron? Whyare"eat"and"drink"differentwordsinlanguages? HowdoGPSreceiverscommunicatewithsatellites? FPGAlogicthreshold-distinguishingalogic0and1 Sapiensdominabiturastris—isitnotPassivevoice? Wouldatraitthat'sgeneticshave"circulardominance"beplausible? Sciencefictionbook/novelaboutaliensinhumansbodies Howdoparty-listsystemsaccommodateindependentcandidates? Whataretheargumentsforrevengeandretribution? Whatisthedefinitionofatrollinthelegalcontext? DoestheDemocraticPartyofficiallysupportrepealingtheSecondAmendment? IfthedrowshadowbladeusesShadowSwordasarangedattack,doesitthrowasword(thatitthenhastoretrievebeforeusingitagain)? ElectronicCircuitsforSafeInitiationofPyrotechnics? HowcanIuseWindowstocreateanOSXYosemiteUSBflashdriveinstallerfromthediskimage(.dmg)filedownloadedfromApple? Myfavoriteanimalisa-singularandpluralform Whyarefighterjetssoloudwhendoingslowflight? WhathappenswhenthequasarremnantsreachEarthin3millionyears? Applying5Vto3.3Voutputpins PacifistethosblockingmyprogressinStellaris morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. default Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Understanding file encoding in VS Code and PowerShell
This problem occurs because VS Code encodes the character – in UTF-8 as the bytes 0xE2 0x80 0x93 ...
- 2UTF-8 - MDN Web Docs Glossary: Definitions of Web-related terms
- 3PowerShell Studio tip: View and change file encoding
- 4Get-Content - PowerShell Command - PDQ
UTF8: Encodes in UTF-8 format. ... Therefore, by default, when reading a text file, Get-Content r...
- 5Set-Content - PowerShell - SS64.com
set-content -encoding UTF8 will write a BOM if one is available in the source file, or if the sou...