Bytes are what Python decodes to make strings. Encoding strings into bytes. If you have a string in Python and you'd like to convert it into ...
Articles
Screencasts
Exercises
Courses
Pastebin
Gift
/
SignUp
SignIn
Watchasvideo
02:58
Showcaptions
Autoplay
Auto-expand
Signintochangeyoursettings
SignintoyourPythonMorselsaccounttosaveyourscreencastsettings.
Don'thaveanaccountyet?Signuphere.
AlltextthatcomesfromoutsideofyourPythonprocessstartsasbinarydata.
Allinputstartsasrawbytes
WhenyouopenafileinPython,thedefaultmodeisrorrt,forreadtextmode:
>>>withopen("my_file.txt")asf:
...contents=f.read()
...
>>>f.mode
'r'
Meaningwhenwereadourfile,we'llgetbackstringsthatrepresenttext:
>>>contents
'Thisisafile✨\n'
Butthat'snotwhatPythonactuallyreadsfromdisk.
Ifweopenafilewiththemoderbandreadfromourfilewe'llseewhatPythonsees;thatisbytes:
>>>withopen("my_file.txt",mode="rb")asf:
...contents=f.read()
...
>>>contents
b'Thisisafile\xe2\x9c\xa8\n'
>>>type(contents)
BytesarewhatPythondecodestomakestrings.
Encodingstringsintobytes
IfyouhaveastringinPythonandyou'dliketoconvertitintobytes,youcancallitsencodemethod:
>>>text="Hellothere!\u2728"
>>>text.encode()
b'Hellothere!\xe2\x9c\xa8'
Theencodemethodusesthecharacterencodingutf-8bydefault:
>>>text.encode("utf-8")
b'Hellothere!\xe2\x9c\xa8'
Butyoucanspecifyadifferentcharacterencodingifyou'dlike:
>>>text.encode("utf-16-le")
b"H\x00e\x00l\x00l\x00o\x00\x00t\x00h\x00e\x00r\x00e\x00!\x00\x00('"
Decodingbytesintostrings
Ifyouhaveabytesobjectandyou'dliketoconvertitintoastring,youneedtodecodeitbycallingitsdecodemethod:
>>>data=b"Hellothere!\xe2\x9c\xa8"
>>>data.decode()
'Hellothere!✨'
Likethestringencodemethod,thebytesdecodemethodusesthecharacterencodingutf-8bydefault:
>>>data.decode("utf-8")
'Hellothere!✨'
Butifyouhavebytesthatrepresentdatainadifferentcharacterencoding,you'llneedtospecifythatcharacterencodinginstead:
>>>data=b"H\x00e\x00l\x00l\x00o\x00\x00t\x00h\x00e\x00r\x00e\x00!\x00\x00('"
>>>data.decode("utf-16le")
'Hellothere!✨'
Specifyingacharacterencodingwhenopeningfiles
WhenyouopenafileinPython,whetherforwritingorforreading,it'sconsideredabestpracticetospecifythecharacterencodingthatyou'reworkingwith:
>>>withopen("message.txt",mode="wt",encoding="utf-8")asf:
...f.write("InJan2020Isaid\u201cI'mgladIupgradedtoPython3\u201d.")
...
53
>>>withopen("message.txt",mode="rt",encoding="utf-8")asf:
...contents=f.read()
...
>>>contents
'InJan2020Isaid\u201cI'mgladIupgradedtoPython3\u201d.'
Thisisbecauseondifferentoperatingsystems,Pythonwilluseadifferentcharacterencodingbydefaultwhenit'sworkingwithtextfiles.
Onmymachine,thedefaultcharacterencodingisutf-8.
ButonWindows,thedefaultcharacterencodingisusuallycp1252.
Becarefulwithyourcharacterencodings
SoifwereadthisUTF-8fileonaWindowsmachinewithoutspecifyinganencoding,wewouldgetaUnicodeDecodeError:
>>>withopen("message.txt",mode="rt")asf:
...contents=f.read()
...
Traceback(mostrecentcalllast):
File"",line2,in
File"/usr/lib/python3.10/encodings/cp1252.py",line23,indecode
returncodecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError:'charmap'codeccan'tdecodebyte0x9dinposition55:charactermapsto
>>>
ThistracebackforthisUnicodeDecodeErroristryingtotellusthatthere'samismatchbetweenthecharacterencodingofthebytesthatwe'rereadingandthecharacterencodingthatPythonistryingtousetoreadthem.
Butyoucan'trelyonUnicodeDecodeErrorsalwaysbeingraisedwhenthere'sacharacterencodingmismatch.
Sometimestwodifferentencodingsmayusethesamebytestorepresentdifferenttext.
Herewe'vesavedafilewithusingtheUTF-8characterencoding:
>>>text="Yayunicode!\N{SPARKLES}"
>>>print(text)
Yayunicode!✨
>>>withopen("sparkles.txt",mode="wt",encoding="utf-8")asf:
...f.write(text)
...
14
Ifreadthisfileusingthecp1252characterencoding,we'llseedifferenttextthanwhatwestartedwith:
>>>withopen("sparkles.txt",encoding="cp1252")asf:
...contents=f.read()
...
>>>contents
'Yayunicode!✨'
>>>
Weusedcp1252todecodebytesthatwereencodedusingutf-8andendedupwithmojibake.
Thisisactuallyareallycommonproblembetweenutf-8(defaultencodingonLinux/Mac)andcp1252(defaultencodingonWindows)inparticularbecausethesetwocharacterencodingsareverysimilar,butfarfromthesame.
Summary
Whenyoureadafile,Pythonwillreadbytesfromdiskandthendecodethosebytestomakethemintostrings.
Whenyouwritetoafile,Pythonwilltakeyourstringsandencodethosestringsintobytestowritethemtodisk.
It'sconsideredabestpracticetospecifythecharacterencodingthatyou'reworkingwithwheneveryou'rereadingorwritingtextfromoutsideofyourPythonprocess,especiallyifyou'reworkingwithnon-ASCIItext.
Pythontipseverycoupleweeks
Needtofill-ingapsinyourPythonskills?
Isendregularemailsdesignedtodojustthat.
SignupformyPythontipsemailsandI'llsharemyfavoritePythoninsightswithyoueverycoupleweeks.
Website
SignupforPythontips
Series:Files
Readingfromandwritingtotextfiles(andsometimesbinaryfiles)isanimportantskillformostPythonprogrammers.
TotrackyourprogressonthisPythonMorselstopictrail,signinorsignup.
0%
Howtoreadfromatextfile
03:03
Readafileline-by-lineinPython
01:52
WritetoafileinPython
02:54
Unicodecharacterencodings
02:58
ReadingbinaryfilesinPython
03:47
Printingtoafile
02:52
Filesareiterators
02:49
File-likeobjectsinPython
02:53
FilemodesinPython
03:34
Seekinginfiles
04:18
✕
↑
APythonTipEveryWeek
Needtofill-ingapsinyourPythonskills?Isendweeklyemailsdesignedtodojustthat.
Website
Watchasvideo
02:58
TableofContents
NextUp
03:47
ReadingbinaryfilesinPython
HowcanyoureadbinaryfilesinPython?Andhowcanyoureadverylargebinaryfilesinsmallchunks?