Understanding file encoding in VS Code and PowerShell

文章推薦指數: 80 %
投票人數:10人

This problem occurs because VS Code encodes the character – in UTF-8 as the bytes 0xE2 0x80 0x93 . When these bytes are decoded as Windows-1252, ... Skiptomaincontent Thisbrowserisnolongersupported. UpgradetoMicrosoftEdgetotakeadvantageofthelatestfeatures,securityupdates,andtechnicalsupport. DownloadMicrosoftEdge MoreinfoaboutInternetExplorerandMicrosoftEdge Tableofcontents Exitfocusmode ReadinEnglish Save Tableofcontents ReadinEnglish Save Feedback Edit Print Twitter LinkedIn Facebook Email Tableofcontents UnderstandingfileencodinginVSCodeandPowerShell Article 10/22/2021 9minutestoread 1contributor Inthisarticle WhenusingVSCodetocreateandeditPowerShellscripts,itisimportantthatyourfilesaresaved usingthecorrectcharacterencodingformat. Whatisfileencodingandwhyisitimportant? VSCodemanagestheinterfacebetweenahumanenteringstringsofcharactersintoabufferand reading/writingblocksofbytestothefilesystem.WhenVSCodesavesafile,itusesatext encodingtodecidewhatbyteseachcharacterbecomes.Formoreinformation,see about_Character_Encoding. Similarly,whenPowerShellrunsascriptitmustconvertthebytesinafiletocharactersto reconstructthefileintoaPowerShellprogram.SinceVSCodewritesthefileandPowerShellreads thefile,theyneedtousethesameencodingsystem.ThisprocessofparsingaPowerShellscript goes:bytes->characters->tokens->abstractsyntaxtree->execution. BothVSCodeandPowerShellareinstalledwithasensibledefaultencodingconfiguration.However, thedefaultencodingusedbyPowerShellhaschangedwiththereleaseofPowerShell6.Toensureyou havenoproblemsusingPowerShellorthePowerShellextensioninVSCode,youneedtoconfigureyour VSCodeandPowerShellsettingsproperly. Commoncausesofencodingissues EncodingproblemsoccurwhentheencodingofVSCodeoryourscriptfiledoesnotmatchtheexpected encodingofPowerShell.ThereisnowayforPowerShelltoautomaticallydeterminethefileencoding. You'remorelikelytohaveencodingproblemswhenyou'reusingcharactersnotinthe 7-bitASCIIcharacterset.Forexample: Extendednon-lettercharacterslikeem-dash(—),non-breakingspace()orleftdouble quotationmark(") Accentedlatincharacters(É,ü) Non-latincharacterslikeCyrillic(Д,Ц) CJKcharacters(本,화,が) Commonreasonsforencodingissuesare: TheencodingsofVSCodeandPowerShellhavenotbeenchangedfromtheirdefaults.ForPowerShell 5.1andbelow,thedefaultencodingisdifferentfromVSCode's. Anothereditorhasopenedandoverwrittenthefileinanewencoding.Thisoftenhappenswiththe ISE. ThefileischeckedintosourcecontrolinanencodingthatisdifferentfromwhatVSCodeor PowerShellexpects.Thiscanhappenwhencollaboratorsuseeditorswithdifferentencoding configurations. Howtotellwhenyouhaveencodingissues Oftenencodingerrorspresentthemselvesasparseerrorsinscripts.Ifyoufindstrangecharacter sequencesinyourscript,thiscanbetheproblem.Intheexamplebelow,anen-dash(–)appearsas thecharactersâ€": Send-MailMessage:Apositionalparametercannotbefoundthatacceptsargument'TestingFuseMailSMTP...'. AtC:\Users\\\Development\PowerShell\Scripts\Send-EmailUsingSmtpRelay.ps1:6char:1 +Send-MailMessageâ€"From$fromâ€"To$recipient1â€"Subject$subject... +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +CategoryInfo:InvalidArgument:(:)[Send-MailMessage],ParameterBindingException +FullyQualifiedErrorId:PositionalParameterNotFound,Microsoft.PowerShell.Commands.SendMailMessage ThisproblemoccursbecauseVSCodeencodesthecharacter–inUTF-8asthebytes 0xE20x800x93.WhenthesebytesaredecodedasWindows-1252,theyareinterpretedasthe charactersâ€". Somestrangecharactersequencesthatyoumightseeinclude: â€"insteadof– â€"insteadof— Ä2insteadofÄ Âinsteadof (anon-breakingspace) éinsteadofé Thishandyreferenceliststhecommonpatternsthat indicateaUTF-8/Windows-1252encodingproblem. HowthePowerShellextensioninVSCodeinteractswithencodings ThePowerShellextensioninteractswithscriptsinanumberofways: WhenscriptsareeditedinVSCode,thecontentsaresentbyVSCodetotheextension.The LanguageServerProtocolmandatesthatthiscontentistransferredinUTF-8.Therefore,it isnotpossiblefortheextensiontogetthewrongencoding. WhenscriptsareexecuteddirectlyintheIntegratedConsole,they'rereadfromthefileby PowerShelldirectly.IfPowerShell'sencodingdiffersfromVSCode's,somethingcangowrong here. WhenascriptthatisopeninVSCodereferencesanotherscriptthatisnotopeninVSCode,the extensionfallsbacktoloadingthatscript'scontentfromthefilesystem.ThePowerShell extensiondefaultstoUTF-8encoding,butusesbyte-ordermark,orBOM,detectionto selectthecorrectencoding. TheproblemoccurswhenassumingtheencodingofBOM-lessformats(likeUTF-8withnoBOM andWindows-1252).ThePowerShellextensiondefaultstoUTF-8.Theextensioncannot changeVSCode'sencodingsettings.Formoreinformation,see issue#824. Choosingtherightencoding Differentsystemsandapplicationscanusedifferentencodings: In.NETStandard,ontheweb,andintheLinuxworld,UTF-8isnowthedominantencoding. Many.NETFrameworkapplicationsuseUTF-16.Forhistoricalreasons,thisissometimescalled "Unicode",atermthatnowreferstoabroadstandard thatincludesbothUTF-8andUTF-16. OnWindows,manynativeapplicationsthatpredateUnicodecontinuetouseWindows-1252bydefault. Unicodeencodingsalsohavetheconceptofabyte-ordermark(BOM).BOMsoccuratthebeginningof texttotelladecoderwhichencodingthetextisusing.Formulti-byteencodings,theBOMalso indicatesendiannessoftheencoding.BOMsaredesigned tobebytesthatrarelyoccurinnon-Unicodetext,allowingareasonableguessthattextisUnicode whenaBOMispresent. BOMsareoptionalandtheiradoptionisn'taspopularintheLinuxworldbecauseadependable conventionofUTF-8isusedeverywhere.MostLinuxapplicationspresumethattextinputisencoded inUTF-8.WhilemanyLinuxapplicationswillrecognizeandcorrectlyhandleaBOM,anumberdonot, leadingtoartifactsintextmanipulatedwiththoseapplications. Therefore: IfyouworkprimarilywithWindowsapplicationsandWindowsPowerShell,youshouldpreferan encodinglikeUTF-8withBOMorUTF-16. Ifyouworkacrossplatforms,youshouldpreferUTF-8withBOM. IfyouworkmainlyinLinux-associatedcontexts,youshouldpreferUTF-8withoutBOM. Windows-1252andlatin-1areessentiallylegacyencodingsthatyoushouldavoidifpossible. However,someolderWindowsapplicationsmaydependonthem. It'salsoworthnotingthatscriptsigningis encoding-dependent,meaningachangeof encodingonasignedscriptwillrequireresigning. ConfiguringVSCode VSCode'sdefaultencodingisUTF-8withoutBOM. TosetVSCode'sencoding,gototheVSCodesettings(Ctrl+,)and setthe"files.encoding"setting: "files.encoding":"utf8bom" Somepossiblevaluesare: utf8:[UTF-8]withoutBOM utf8bom:[UTF-8]withBOM utf16le:Littleendian[UTF-16] utf16be:Bigendian[UTF-16] windows1252:[Windows-1252] YoushouldgetadropdownforthisintheGUIview,orcompletionsforitintheJSONview. Youcanalsoaddthefollowingtoautodetectencodingwhenpossible: "files.autoGuessEncoding":true Ifyoudon'twantthesesettingstoaffectallfilestypes,VSCodealsoallowsper-language configurations.Createalanguage-specificsettingbyputtingsettingsina[] field.Forexample: "[powershell]":{ "files.encoding":"utf8bom", "files.autoGuessEncoding":true } YoumayalsowanttoconsiderinstallingtheGremlinstrackerforVisualStudioCode. ThisextensionrevealscertainUnicodecharactersthateasilycorruptedbecausetheyareinvisible orlooklikeothernormalcharacters. ConfiguringPowerShell PowerShell'sdefaultencodingvariesdependingonversion: InPowerShell6+,thedefaultencodingisUTF-8withoutBOMonallplatforms. InWindowsPowerShell,thedefaultencodingisusuallyWindows-1252,anextensionof latin-1,alsoknownasISO8859-1. InPowerShell5+youcanfindyourdefaultencodingwiththis: [psobject].Assembly.GetTypes()|Where-Object{$_.Name-eq'ClrFacade'}| ForEach-Object{ $_.GetMethod('GetDefaultEncoding',[System.Reflection.BindingFlags]'nonpublic,static').Invoke($null,@()) } Thefollowingscriptcanbe usedtodeterminewhatencodingyourPowerShellsessioninfersforascriptwithoutaBOM. $badBytes=[byte[]]@(0xC3,0x80) $utf8Str=[System.Text.Encoding]::UTF8.GetString($badBytes) $bytes=[System.Text.Encoding]::ASCII.GetBytes('Write-Output"')+[byte[]]@(0xC3,0x80)+[byte[]]@(0x22) $path=Join-Path([System.IO.Path]::GetTempPath())'encodingtest.ps1' try { [System.IO.File]::WriteAllBytes($path,$bytes) switch(&$path) { $utf8Str { return'UTF-8' break } default { return'Windows-1252' break } } } finally { Remove-Item$path } It'spossibletoconfigurePowerShelltouseagivenencodingmoregenerallyusingprofilesettings. Seethefollowingarticles: @mklement0'sansweraboutPowerShellencodingonStackOverflow. @rkeithhill'sblogpostaboutdealingwithBOM-lessUTF-8inputinPowerShell. It'snotpossibletoforcePowerShelltouseaspecificinputencoding.PowerShell5.1andbelow, runningonWindowswiththelocalesettoen-US,defaultstoWindows-1252encodingwhenthere'sno BOM.Otherlocalesettingsmayuseadifferentencoding.Toensureinteroperability,it'sbestto savescriptsinaUnicodeformatwithaBOM. Important AnyothertoolsyouhavethattouchPowerShellscriptsmaybeaffectedbyyourencodingchoicesor re-encodeyourscriptstoanotherencoding. Existingscripts Scriptsalreadyonthefilesystemmayneedtobere-encodedtoyournewchosenencoding.Inthe bottombarofVSCode,you'llseethelabelUTF-8.ClickittoopentheactionbarandselectSave withencoding.Youcannowpickanewencodingforthatfile.SeeVSCode'sencoding forfullinstructions. Ifyouneedtore-encodemultiplefiles,youcanusethefollowingscript: Get-ChildItem*.ps1-Recurse|ForEach-Object{ $content=Get-Content-Path$_ Set-Content-Path$_.Fullname-Value$content-EncodingUTF8-PassThru-Force } ThePowerShellIntegratedScriptingEnvironment(ISE) IfyoualsoeditscriptsusingthePowerShellISE,youneedtosynchronizeyourencoding settingsthere. TheISEshouldhonoraBOM,butit'salsopossibletousereflectionto settheencoding. Notethatthiswouldn'tbepersistedbetweenstartups. Sourcecontrolsoftware Somesourcecontroltools,suchasgit,ignoreencodings;gitjusttracksthebytes.Others,like AzureDevOpsorMercurial,maynot.Evensomegit-basedtoolsrelyondecodingtext. Whenthisisthecase,makesureyou: ConfigurethetextencodinginyoursourcecontroltomatchyourVSCodeconfiguration. Ensureallyourfilesarecheckedintosourcecontrolintherelevantencoding. Bewaryofchangestotheencodingreceivedthroughsourcecontrol.Akeysignofthisisadiff indicatingchangesbutwherenothingseemstohavechanged(becausebyteshavebutcharactershave not). Collaborators'environments Ontopofconfiguringsourcecontrol,ensurethatyourcollaboratorsonanyfilesyousharedon't havesettingsthatoverrideyourencodingbyre-encodingPowerShellfiles. Otherprograms AnyotherprogramthatreadsorwritesaPowerShellscriptmayre-encodeit. Someexamplesare: Usingtheclipboardtocopyandpasteascript.Thisiscommoninscenarioslike: CopyingascriptintoaVM Copyingascriptoutofanemailorwebpage CopyingascriptintooroutofaMicrosoftWordorPowerPointdocument Othertexteditors,suchas: Notepad vim AnyotherPowerShellscripteditor Texteditingutilities,like: Get-Content/Set-Content/Out-File PowerShellredirectionoperatorslike>and>> sed/awk Filetransferprograms,like: Awebbrowser,whendownloadingscripts Afileshare Someofthesetoolsdealinbytesratherthantext,butothersofferencodingconfigurations.In thosecaseswhereyouneedtoconfigureanencoding,youneedtomakeitthesameasyoureditor encodingtopreventproblems. OtherresourcesonencodinginPowerShell ThereareafewothernicepostsonencodingandconfiguringencodinginPowerShellthatarewortha read: about_Character_Encoding @mklement0'ssummaryofPowerShellencodingonStackOverflow PreviousissuesopenedonVSCode-PowerShellforencodingproblems: #1308 #1628 #1680 #1744 #1751 TheclassicJoelonSoftwarewriteupaboutUnicode Encodingin.NETStandard Feedback Submitandviewfeedbackfor Thisproduct Thispage Viewallpagefeedback Inthisarticle



請為這篇文章評分?