***Batch mode***: Pass in a directory as the input, and all text files that meets the criteria underneath it will be converted to UTF8-encoding.
Skiptomaincontent
Switchtomobileversion
SearchPyPI
Search
convert2utf1.0.0
pipinstallconvert2utf==1.0.0
CopyPIPinstructions
Newerversionavailable(1.3.2)
Released:
Aug12,2017
Alightweighttoolthatconvertsnon-UTF-encoded(suchasGB2312,GBK,BIG5encoded)filestoUTF-8encodedfiles.Atthesametime,itcanalsoremoveByte-order-mark(BOM)inthosefiles.
Navigation
Projectdescription
Releasehistory
Downloadfiles
Projectlinks
Homepage
Statistics
GitHubstatistics:
Stars:
Forks:
Openissues/PRs:
ViewstatisticsforthisprojectviaLibraries.io,orbyusingourpublicdatasetonGoogleBigQuery
Meta
License:MITLicense(MIT)
Author:x1ang.li
Tags
target_encoding,
UTF-8,
UTF,
UTF8,
GBK,
GB2312,
Byte-Order-Mark,
BOM
Maintainers
x1angli
Classifiers
IntendedAudience
Developers
EndUsers/Desktop
License
OSIApproved::MITLicense
ProgrammingLanguage
Python::3
Python::3.3
Python::3.4
Python::3.5
Python::3.6
Python::Implementation::CPython
Topic
SoftwareDevelopment::Internationalization
TextEditors
TextProcessing::General
Projectdescription
Projectdetails
Releasehistory
Downloadfiles
Projectdescription
ConvertstextfilesorsourcecodefilesintoUTF-8encoding
============================================================
Thislightweighttoolconvertsnon-UTF-encoded(suchasGB2312,GBK,
BIG5encoded)filestoUTF-8encodedfiles.Itcaneitherbeexecuted
fromcommandline(CLI),orimportedintootherPythoncode.
Installation
------------
AutomaticInstallation(recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.MakesurePython3,alongwithpip,isproperlyinstalled.
2.InyourCLI,execute``pipinstallconvert2utf``
ManualInstallation(fordevelopersonly)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.MakesurePython3isproperlyinstalled.
2.Clonethisproject,orjustdownloadthe.zipfilefromgithub.com
andunarchiveit
3.StartCLI(commandlineinterface),enterthelocalfolder
4.SetupPythonvirtualenvironmentwith``virtualenv...``or
``python-mvenv...``
5.Run:``pipinstall-rrequirements.txt``
Usage
-----
Thereisonlyonemandatoryargument:filename,whereyoucanspecify
thedirectoryorfilename.\****Batchmode***:Passinadirectoryas
theinput,andalltextfilesthatmeetsthecriteriaunderneathitwill
beconvertedtoUTF8-encoding.\****Singlefilemode***:Iftheinput
argumentisjustanindividualfile,itwouldbestraightforwardly
convertedtoUTF-8.
***Examples:***
-Changeall.txtfilestoUTF-8encoding.
``pythoncvt2utf.py"/path/to/your/repo"``
-Changeall.txtfilestoUTF-8encoding.Plusremovebyte-ordermarks
(a.k.a."BOM"sor"signature"s)fromexistingUTF-8files.
``pythoncvt2utf.py"/path/to/your/repo"-u``
-Changeall.csvfilestoUTF-8encoding.
SinceBOMareusedbysomeapplications(suchasMicrosoftExcel),we
wanttoaddBOM
``pythoncvt2utf.py"/path/to/your/repo"-b-u--extscsv``
-Convertall.php,.js,.java,.pyfilestoUTF-8encoding.
Meanwhile,thoseBOMsfromexistingUTF-encodedfileswillbe
**removed**.
``pythoncvt2utf.py"/path/to/your/repo"-u--extsphpjsjavapy``
-Convertall.cand.cppfilestoUTF-8withBOMs.
Thisactionwillalso**add**BOMstoexistingUTF-encodedfiles.
Per`issue#3`__,
VisualStudiomaymandateBOMinsourcefiles.IfBOMsaremissing,
thenVisualStudiowillunabletocompilethem.
``pythoncvt2utf.py"/path/to/your/repo"-b-u--extsccpp``
-AftermanuallyverifythenewUTF-8filesarecorrect,youcanremove
all.bakfiles
``pythoncvt2utf.py"/path/to/your/repo"--cleanbak``
-Alternatively,ifyouareextremelyconfidentwitheverything,you
cansimplyconvertfileswithoutcreatingbackupsinthebeginning.
Do**NOT**runthecommandinthisway,unlessyouknowwhatyouare
doing!
``pythoncvt2utf.py"/path/to/your/repo"--overwrite``
-Convertsanindividualfile
``pythoncvt2utf.py"/path/to/your/repo/a.txt"``
-Showhelpinformation
``pythoncvt2utf.py-h``
(Linuxonly)Directlyruntheprogram
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes,youmaywanttoruntheprogramwithoutspecifyingthePython
interpretor,suchas:
::
./cvt2utf.py"/path/to/your/repo"
(Notetheleading``python``commandismissinghere)
Toachievethis,youfirstneedtogranttheexecutionpermissiononto
thePython,(skipthisprovideditalreadyhavetheeXecution
permission:
::
sudochmod+x./cvt2utf.py
Thenactivatethevirtualenvironment:
::
.venv/bin/activate
Next,makesuredependenciesareinstalled
::
pipinstall-rrequirements.txt
Finally,executethefile:(youcouldaddcommandargumentshere):
::
./cvt2utf.py"/path/to/your/repo"
Youmightwanttouseabsolutepathforthisprogramifyouarerunning
itinanarbitraryworkingdirectory.
Miscellaneous
-------------
Bydefault,theconvertedoutputtextfileswill**NOT**containBOM
(byteordermark).
However,youcanusetheswitch``-b``or``--addbom``toexplicitly
includeBOMintheoutputtextfiles.
Tolearnmore,pleasecheck:
https://en.wikipedia.org/wiki/Byte\_order\_mark
FAQ
---
WhydowechooseUTF-8amongallcharsets?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fori18n,UTF-8iswidespread.Itisthedefactostandardfor
non-Englishtexts.
ComparedwithUTF-16,UTF-8isusuallymorecompactand"withfull
fidelity".Italsodoesn'tsufferfromtheendiannessissueofUTF-16.
Whydoweneedthistool?
^^^^^^^^^^^^^^^^^^^^^^^^^
Indeed,thereareabunchoftexteditorsoutthere(suchasNotepad++)
thathandlevariousencodingsoftextfilesverywell.Yetforthe
purposeof**batchconversion**weneedthisPythonscript.Thisscript
isalsowrittenforeducationalpurpose--developerscanlearnfrom
thisscripttogetanideaofhowtohandletextencoding.
WhenshouldweremoveBOM?
^^^^^^^^^^^^^^^^^^^^^^^^^^
BelowisalistofplaceswhereBOMmightcauseaproblem.Tomakeyour
lifeeasyandsmooth,BOMsinthesefilesareadvisedtoberemoved.\*
**Jekyll**:JekyllisaRuby-basedCMSthatgeneratesstaticwebsites.
PleaseremoveBOMsinyoursourcefiles.Also,removetheminyourCSS
ifyouareSASSifying.\***PHP**:BOMsin``*.php``filesshouldbe
stripped.\***JSP**:BOMsin``*.jsp``filesshouldbestripped.\*(to
beadded...)
WhenshouldweaddBOM?
^^^^^^^^^^^^^^^^^^^^^^^
BOMsinthesefilesarenotnecessary,butitisrecommendedtoadd
them.\***Unicodeplaintextfile**:M$suggests"Alwaysprefixa
Unicodeplaintextfilewithabyteordermark"
(https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101(v=vs.85).aspx)
\***CSV**:BOMsinCSVfilesmightbeusefulandnecessary.
Isthecurrentversionreliable?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Wearestrivingtodeliverhighreliablesolutionstoourusers.This
codeisstillatits"beta"phase.YoumightbeawarethatPython's
built-inUTFencoding/decodingpluschardetmaynotbeveryreliable.
Forthatreason,wesuggestuserscreatebackups,eithermanually
duplicatethefile/directory,orautomaticallythroughourpackage
(remember,thebackupfeaturewillbeshort-circuitedwiththe
``--overwrite``switch)
Projectdetails
Projectlinks
Homepage
Statistics
GitHubstatistics:
Stars:
Forks:
Openissues/PRs:
ViewstatisticsforthisprojectviaLibraries.io,orbyusingourpublicdatasetonGoogleBigQuery
Meta
License:MITLicense(MIT)
Author:x1ang.li
Tags
target_encoding,
UTF-8,
UTF,
UTF8,
GBK,
GB2312,
Byte-Order-Mark,
BOM
Maintainers
x1angli
Classifiers
IntendedAudience
Developers
EndUsers/Desktop
License
OSIApproved::MITLicense
ProgrammingLanguage
Python::3
Python::3.3
Python::3.4
Python::3.5
Python::3.6
Python::Implementation::CPython
Topic
SoftwareDevelopment::Internationalization
TextEditors
TextProcessing::General
Releasehistory
Releasenotifications|
RSSfeed
1.3.2
Dec25,2018
1.3.1
Dec24,2018
1.3.0
Dec23,2018
1.0.2
Jan15,2018
1.0.1
Jan15,2018
Thisversion
1.0.0
Aug12,2017
0.8.5
Jun4,2017
0.8.4
Jun4,2017
0.8.3
Jun4,2017
0.8.2
Jun4,2017
0.8.1
Jun4,2017
0.8
Jul21,2016
Downloadfiles
Downloadthefileforyourplatform.Ifyou'renotsurewhichtochoose,learnmoreaboutinstallingpackages.
SourceDistribution
convert2utf-1.0.0.tar.gz
(8.2kB
viewhashes)
Uploaded
Aug12,2017
source
Close
Hashesforconvert2utf-1.0.0.tar.gz
Hashesforconvert2utf-1.0.0.tar.gz
Algorithm
Hashdigest
SHA256
6b15ee93168b8354a4915a31a6d8fbd336da84641f7a4b352774f71f88d217f7
Copy
MD5
a5a679e6ef8fd3a9d1e3283669703f96
Copy
BLAKE2-256
d7695fde1a11819efe563df0669de74a481d72fc0a57f3462d5ea9207827ecc3
Copy
Close
English
español
français
日本語
português(Brasil)
українська
Ελληνικά
Deutsch
中文(简体)
中文(繁體)
русский
עברית
esperanto
Supportedby
AWS
Cloudcomputing
Datadog
Monitoring
Facebook/Instagram
PSFSponsor
Fastly
CDN
Google
ObjectStorageandDownloadAnalytics
Huawei
PSFSponsor
Microsoft
PSFSponsor
NVIDIA
PSFSponsor
Pingdom
Monitoring
Salesforce
PSFSponsor
Sentry
Errorlogging
StatusPage
Statuspage