Victor Stinner blog 3

Python 3, locales and encodings

Thu 06 September 2018 — Victor Stinner

Recently, I worked on a change which looked simple: move the code to initialize the sys.stdout encoding before Py_Initialize(). While I was on it, I also decided to move the code which selects the Python "filesystem encoding". I didn't expect that I would spend 2 weeks on these issues …

Category: cpython Tags: unicode locales

Python 3.7 UTF-8 Mode

Tue 27 March 2018 — Victor Stinner

Since Python 3.0 was released in 2008, each time an user reported an encoding issue, someone showed up and asked why Python does not "simply" always use UTF-8. Well, it's not that easy. UTF-8 is the best encoding in most cases, but it is still not the best encoding …

Category: python Tags: cpython unicode

Python 3.7 and the POSIX locale

Fri 23 March 2018 — Victor Stinner

During the childhood of Python 3, encodings issues were common, even on well configured systems. Python used UTF-8 rather than the locale encoding, and so commonly produced mojibake. For these reasons, when users complained about the Python behaviour with the POSIX locale, bug reports were closed with a message like …

Category: python Tags: cpython unicode

Python 3.6 now uses UTF-8 on Windows

Thu 22 March 2018 — Victor Stinner

September 2016, a few days before the CPython core dev sprint, Steve Dower proposed two major backward incompatible changes for Python 3.6 on Windows: PEP 528: Change Windows console encoding to UTF-8 and PEP 529: Change Windows filesystem encoding to UTF-8. At the first read, I was sure that …

Category: python Tags: cpython unicode

Python 3.2 Painful History of the Filesystem Encoding

Thu 15 March 2018 — Victor Stinner

Between Python 3.0 released in 2008 and Python 3.4 released in 2014, the Python filesystem encoding changed multiple times. It took 6 years to choose the best Python filesystem encoding on each platform.

I have been officially promoted as a core developer in January 2010 by Martin von …

Category: python Tags: cpython unicode

Python 3.1 surrogateescape error handler (PEP 383)

Thu 15 March 2018 — Victor Stinner

In my previous article, I wrote that os.listdir(str) ignored silently undecodable filenames in Python 3.0 and that lying on the real content of a directory looks like a very bad idea.

Martin v. Löwis found a very smart solution to this problem: the surrogateescape error handler.

This …

Category: python Tags: cpython unicode

Python 3.0 listdir() Bug on Undecodable Filenames

Fri 09 March 2018 — Victor Stinner

Ten years ago, when Python 3.0 final was released, os.listdir(str) ignored silently undecodable filenames:

$ python3.0
>>> os.mkdir(b'x')
>>> open(b'x/nonascii\xff', 'w').close()
>>> os.listdir('x')
[]

You had to use bytes to see all filenames:

>>> os.listdir(b'x')
[b'nonascii\xff']

If the locale is POSIX …

Category: python Tags: cpython unicode

How I fixed a very old GIL race condition in Python 3.7

Thu 08 March 2018 — Victor Stinner

It took me 4 years to fix a nasty bug in the famous Python GIL (Global Interpreter Lock), one of the most critical part of Python. I had to dig the Git history to find a change made 26 years ago by Guido van Rossum: at this time, threads were …

Category: python Tags: cpython

Python 3.7 nanoseconds

Tue 06 March 2018 — Victor Stinner

Thanks to my latest change on time.perf_counter(), all Python 3.7 clocks now use nanoseconds as integer internally. It became possible to propose again my old idea of getting time as nanoseconds at Python level and so I wrote a new PEP 564 "Add new time functions with nanosecond …

Category: python Tags: cpython

Python 3.7 perf_counter() nanoseconds

Tue 06 March 2018 — Victor Stinner

Since 2012, I have been trying to convert all Python clocks to use internally nanoseconds. The last clock which still used floating point internally was time.perf_counter(). INADA Naoki's new importtime tool was an opportunity for me to have a new look on a tricky integer overflow issue.

Modify importtime …

Category: python Tags: cpython