Victor Stinner blog 3https://vstinner.github.io/2024-03-20T17:00:00+01:00Status of the Python Limited C API (March 2024)2024-03-20T17:00:00+01:002024-03-20T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2024-03-20:/status-limited-c-api-march-2024.html<a class="reference external image-reference" href="https://danielazconegui.com/en/prints/ghibli-spyrited-away.html"><img alt="Ghibli - Spirited Away" src="https://vstinner.github.io/images/ghibli-spyrited-away.jpg" /></a>
<p>In Python 3.13, I made multiple enhancements to make the limited C API more
usable:</p>
<ul class="simple">
<li>Add 14 functions to the limited C API.</li>
<li>Make the special debug build <tt class="docutils literal">Py_TRACE_REFS</tt> compatible with the limited
C API.</li>
<li>Enhance Argument Clinic to generate C code using the limited C API.</li>
<li>Add an …</li></ul><a class="reference external image-reference" href="https://danielazconegui.com/en/prints/ghibli-spyrited-away.html"><img alt="Ghibli - Spirited Away" src="https://vstinner.github.io/images/ghibli-spyrited-away.jpg" /></a>
<p>In Python 3.13, I made multiple enhancements to make the limited C API more
usable:</p>
<ul class="simple">
<li>Add 14 functions to the limited C API.</li>
<li>Make the special debug build <tt class="docutils literal">Py_TRACE_REFS</tt> compatible with the limited
C API.</li>
<li>Enhance Argument Clinic to generate C code using the limited C API.</li>
<li>Add an convenient API to format a type fully qualified name using the limited
C API (PEP 737).</li>
<li>Add <tt class="docutils literal">_testlimitedcapi</tt> extension.</li>
<li>Convert 16 stdlib extensions to the limited C API.</li>
</ul>
<p>What's Next?</p>
<ul class="simple">
<li>PEP 741: Python Configuration C API.</li>
<li>Py_GetConstant().</li>
<li>Cython and PyO3.</li>
</ul>
<p><em>Drawing: Ghibli - Spirited Away by Daniel Azconegui.</em></p>
<div class="section" id="new-functions">
<h2>New Functions</h2>
<p>I added 14 functions to the limited C API:</p>
<ul class="simple">
<li><tt class="docutils literal">PyDict_GetItemRef()</tt></li>
<li><tt class="docutils literal">PyDict_GetItemStringRef()</tt></li>
<li><tt class="docutils literal">PyImport_AddModuleRef()</tt></li>
<li><tt class="docutils literal">PyLong_AsInt()</tt></li>
<li><tt class="docutils literal">PyMem_RawCalloc()</tt></li>
<li><tt class="docutils literal">PyMem_RawFree()</tt></li>
<li><tt class="docutils literal">PyMem_RawMalloc()</tt></li>
<li><tt class="docutils literal">PyMem_RawRealloc()</tt></li>
<li><tt class="docutils literal">PySys_Audit()</tt></li>
<li><tt class="docutils literal">PySys_AuditTuple()</tt></li>
<li><tt class="docutils literal">PyType_GetFullyQualifiedName()</tt></li>
<li><tt class="docutils literal">PyType_GetModuleName()</tt></li>
<li><tt class="docutils literal">PyWeakref_GetRef()</tt></li>
<li><tt class="docutils literal">Py_IsFinalizing()</tt></li>
</ul>
<p>It makes code using these functions <strong>compatible with the limited C API</strong>.</p>
</div>
<div class="section" id="py-trace-refs">
<h2>Py_TRACE_REFS</h2>
<p>I modified the special debug build <tt class="docutils literal">Py_TRACE_REFS</tt>. Instead of adding two
members to <tt class="docutils literal">PyObject</tt> to create a double linked list of all objects, I added
an hash table to track all objects.</p>
<p>Since the <tt class="docutils literal">PyObject</tt> structure is no longer modified, this special debug
build is now <strong>ABI compatible</strong> with the <strong>release build</strong>! Moreover, it also
becomes compatible with the <strong>limited C API</strong>!</p>
</div>
<div class="section" id="argument-clinic">
<h2>Argument Clinic</h2>
<p>I modified Argument Clinic (AC) to generate C code compatible with the limited
C API.</p>
<p>First, I moved private functions used by Argument Clinic to the internal C API
and modified Argument Clinic to generate <tt class="docutils literal">#include</tt> to get these functions.
Then I modified Argument Clinic to use only the limited C API and to not
generate these <tt class="docutils literal">#include</tt>.</p>
<p>At the beginning, only some converteres were supported and only the slower
<tt class="docutils literal">METH_VARARGS</tt> calling convention was supported.</p>
<p>Now, more and more converters and formats are supported, and the regular
efficient <tt class="docutils literal">METH_FASTCALL</tt> calling convention is used.</p>
<div class="section" id="example">
<h3>Example</h3>
<p>Example from the <tt class="docutils literal">grp</tt> extension:</p>
<pre class="literal-block">
/*[clinic input]
grp.getgrgid
id: object
Return the group database entry for the given numeric group ID.
</pre>
<p>Python 3.12 uses the <strong>private</strong> <tt class="docutils literal">_PyArg_UnpackKeywords()</tt> functions:</p>
<pre class="literal-block">
args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames, &_parser, 1, 1, 0, argsbuf);
if (!args) {
goto exit;
}
id = args[0];
return_value = grp_getgrgid_impl(module, id);
</pre>
<p>Python 3.13 now uses the public <tt class="docutils literal">PyArg_ParseTupleAndKeywords()</tt> function of
the <strong>limited C API</strong>:</p>
<pre class="literal-block">
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O:getgrgid", _keywords,
&id))
goto exit;
return_value = grp_getgrgid_impl(module, id);
</pre>
</div>
</div>
<div class="section" id="pep-737-format-type-name">
<h2>PEP 737: Format Type Name</h2>
<p>One issue that I had with Argument Clinic was to <strong>format an error message</strong>
with the limited C API. I cannot use the private <tt class="docutils literal">_PyArg_BadArgument()</tt>
function, nor access to <tt class="docutils literal">PyTypeObject.tp_name</tt> (opaque structure in the
limited C API) to format a type name. While the limited C API provides
<tt class="docutils literal">PyType_GetName()</tt> and <tt class="docutils literal">PyType_GetQualName()</tt>, it's still different than
how Python formats type names in error messages.</p>
<p>I proposed different APIs but there was no agreement. So I decided to write
<a class="reference external" href="https://peps.python.org/pep-0737/">PEP 737</a> "C API to format a type fully
qualified name".</p>
<p>After four months of discussions, the <strong>Steering Council</strong> decided to accept it
in Python 3.13.</p>
<p>Changes:</p>
<ul class="simple">
<li>Add <tt class="docutils literal">PyType_GetFullyQualifiedName()</tt> function.</li>
<li>Add <tt class="docutils literal">PyType_GetModuleName()</tt> function.</li>
<li>Add <tt class="docutils literal">%T</tt>, <tt class="docutils literal">%#T</tt>, <tt class="docutils literal">%N</tt> and <tt class="docutils literal">%#N</tt> formats to
<tt class="docutils literal">PyUnicode_FromFormat()</tt>.</li>
</ul>
<p>I also proposed adding a new <tt class="docutils literal">type.__fully_qualified_name__</tt> attribute, and a
few methods to format a the fully qualified name of type in Python. But the
Steering Council was not convinced and asked me to <strong>remove these Python
changes</strong> until someone comes with a strong use case for this attribute and
methods.</p>
<p>In <strong>2018</strong>, I made a <strong>first attempt</strong>: I made a similar change, but I had to
revert it. I created a discussion on the python-dev mailing list, but we failed
to reach a consensus.</p>
<p>In <strong>2011</strong>, I already asked to stop the <strong>cargo cult</strong> of truncating type
names, but I didn't implement my idea by proactively stop truncating type
names.</p>
<div class="section" id="example-1">
<h3>Example</h3>
<p>Example of the code generating an error message in the <tt class="docutils literal">pwd</tt> extension.</p>
<p>Python 3.12 uses the <strong>private</strong> <tt class="docutils literal">_PyArg_BadArgument()</tt> private:</p>
<pre class="literal-block">
_PyArg_BadArgument("getpwnam", "argument", "str", arg);
</pre>
<p>Python 3.13 now uses the new <tt class="docutils literal">%T</tt> format (PEP 737) of the <strong>limited C API</strong>:</p>
<pre class="literal-block">
PyErr_Format(PyExc_TypeError,
"getpwnam() argument must be str, not %T",
arg);
</pre>
</div>
</div>
<div class="section" id="add-testlimitedcapi-extension">
<h2>Add _testlimitedcapi extension</h2>
<p>In Python 3.12, C API tests are splitted in two categories:</p>
<ul class="simple">
<li><tt class="docutils literal">_testcapi</tt>: public C API</li>
<li><tt class="docutils literal">_testinternalcapi</tt>: internal C API (<tt class="docutils literal">Py_BUILD_CORE</tt>)</li>
</ul>
<p>I added a third <tt class="docutils literal">_testlimitedcapi</tt> extension to test the limited C API
(<tt class="docutils literal">Py_LIMITED_API</tt>). I moved tests using the limited C C API from
<tt class="docutils literal">_testcapi</tt> to <tt class="docutils literal">_testlimitedcapi</tt>.</p>
<p>The difference between <tt class="docutils literal">_testcapi</tt> and <tt class="docutils literal">_testlimitedcapi</tt> is that the
<tt class="docutils literal">_testlimitedcapi</tt> extension is built with the <tt class="docutils literal">Py_LIMITED_API</tt> macro
defined, and so can only access the internal C API.</p>
</div>
<div class="section" id="convert-stdlib-extensions-to-the-limited-c-api">
<h2>Convert stdlib extensions to the limited C API</h2>
<p>At August 2023, I proposed to:
<a class="reference external" href="https://discuss.python.org/t/use-the-limited-c-api-for-some-of-our-stdlib-c-extensions/32465">Use the limited C API for some of our stdlib C extensions</a>.</p>
<p>In March 2024, there are now <strong>16</strong> C extensions built with the limited C API:</p>
<ul class="simple">
<li><tt class="docutils literal">_ctypes_test</tt></li>
<li><tt class="docutils literal">_multiprocessing.posixshmem</tt></li>
<li><tt class="docutils literal">_scproxy</tt></li>
<li><tt class="docutils literal">_stat</tt></li>
<li><tt class="docutils literal">_statistics</tt></li>
<li><tt class="docutils literal">_testimportmultiple</tt></li>
<li><tt class="docutils literal">_testlimitedcapi</tt></li>
<li><tt class="docutils literal">_uuid</tt></li>
<li><tt class="docutils literal">errno</tt></li>
<li><tt class="docutils literal">fcntl</tt></li>
<li><tt class="docutils literal">grp</tt></li>
<li><tt class="docutils literal">md5</tt></li>
<li><tt class="docutils literal">pwd</tt></li>
<li><tt class="docutils literal">resource</tt></li>
<li><tt class="docutils literal">termios</tt></li>
<li><tt class="docutils literal">winsound</tt></li>
</ul>
<p>Other stdlib C extensions use the internal C API for various reasons or are
using functions which are missing in the limited C API. Remaining issues should
be analyzed on a case by case basis.</p>
<p>This work shows that non-trivial C extensions can be written using only the
limited C API version 3.13.</p>
</div>
<div class="section" id="what-s-next">
<h2>What's Next?</h2>
<div class="section" id="pep-741-python-configuration-c-api">
<h3>PEP 741: Python Configuration C API</h3>
<p>In Python 3.8, I added the <tt class="docutils literal">PyConfig</tt> API to configure the Python
initialization. Problem: it has no stable ABI and is excluded from the limited
C API.</p>
<p>Recently, I proposed <a class="reference external" href="https://peps.python.org/pep-0741/">PEP 741: Python Configuration C API</a> which is built on top of the
<tt class="docutils literal">PyConfig</tt>, provides a stable ABI, and is compatible with the limited C API. I
submitted PEP 741 to the Steering Council.</p>
</div>
<div class="section" id="py-getconstant">
<h3>Py_GetConstant()</h3>
<p>Accessing constants reads private ABI symbols. For example, <tt class="docutils literal">Py_None</tt> API
reads the private <tt class="docutils literal">_Py_NoneStruct</tt> symbol at the stable ABI level.</p>
<p>I <a class="reference external" href="https://github.com/python/cpython/pull/116883">proposed</a> to change the
constant implementations to use function calls instead. For example, reading
<tt class="docutils literal">Py_None</tt> would call <tt class="docutils literal">Py_GetConstant(Py_CONSTANT_NONE)</tt>. The advantage is
that it adds 5 more constants: zero, one, empty string, empty bytes string, and
empty tuple. For example, <tt class="docutils literal">Py_GetConstant(Py_CONSTANT_ZERO)</tt> gives the number
<tt class="docutils literal">0</tt> and the function cannot fail.</p>
</div>
<div class="section" id="cython-and-pyo3">
<h3>Cython and PyO3</h3>
<p>Cython and PyO3 projects are two big consumers of the C API.</p>
<p>While Cython has an experimental build mode for the limited C API, it's still
incomplete. It would be nice to complete it to cover more use cases and more
APIs.</p>
<p>PyO3 can use the limited API but can still use the non-limited API for some use
cases. It would be interersting to only use the limited C API. The PEP 741 to
embed Python in Rust would be interesting for that.</p>
</div>
</div>
Remove private C API functions2023-12-15T23:00:00+01:002023-12-15T23:00:00+01:00Victor Stinnertag:vstinner.github.io,2023-12-15:/remove-c-api-funcs-313.html<a class="reference external image-reference" href="https://en.wikipedia.org/wiki/The_Seasons_(Mucha)"><img alt="Mucha paintaing: the 4 seasons" src="https://vstinner.github.io/images/mucha_seasons.jpg" /></a>
<p>In Python 3.13 alpha 1, I removed more than 300 private C API functions. Even
if I announced my plan early in July, users didn't "embrace" my plan and didn't
agree with the rationale. I reverted 50 functions in the alpha 2 release to
calm down the situation and …</p><a class="reference external image-reference" href="https://en.wikipedia.org/wiki/The_Seasons_(Mucha)"><img alt="Mucha paintaing: the 4 seasons" src="https://vstinner.github.io/images/mucha_seasons.jpg" /></a>
<p>In Python 3.13 alpha 1, I removed more than 300 private C API functions. Even
if I announced my plan early in July, users didn't "embrace" my plan and didn't
agree with the rationale. I reverted 50 functions in the alpha 2 release to
calm down the situation and have more time to replace private functions with
public functions.</p>
<p><em>Painting: The Seasons by Czech visual artist Alphonse Mucha (1900)</em></p>
<div class="section" id="remove-private-functions">
<h2>Remove private functions</h2>
<p>On June 25th, I created <a class="reference external" href="https://github.com/python/cpython/issues/106084">issue gh-106084</a>: "Remove private C API
functions from abstract.h".</p>
<blockquote>
Over the years, we accumulated many <strong>private</strong> functions as part of the
<strong>public</strong> C API in abstract.h header file. I propose to remove them: move
them to the <strong>internal</strong> C API.</blockquote>
<p>On July 1st, I created the meta <a class="reference external" href="https://github.com/python/cpython/issues/106320">issue gh-106320</a>: "Remove private C API
functions". The issue has 63 pull requests (a lot!), 53 comments and more than
300 events (created by commits and pull requests) which make the issue hard
to navigate.</p>
<p>On July 3rd, <strong>Petr Viktorin</strong> shared his concerns:</p>
<blockquote>
<p>Please be careful about assuming that the <strong>underscore</strong> means a function
is <strong>private</strong>. AFAIK, that rule first appears for <a class="reference external" href="https://docs.python.org/3.10/c-api/stable.html#stable">3.10</a>, and was only
properly formalized in <a class="reference external" href="https://peps.python.org/pep-0689/">PEP 689</a>, for
Python 3.12.</p>
<p>For older functions, please consider if they should be added to the
unstable API. IMO it's better to call them “underscored” than “private”.</p>
<p>See also: historical note in the <a class="reference external" href="https://devguide.python.org/developer-workflow/c-api/index.html#private-names">devguide</a>.</p>
</blockquote>
<p>On July 4th, <strong>Petr</strong> posted on Discourse: <a class="reference external" href="https://discuss.python.org/t/pssst-lets-treat-all-api-in-public-headers-as-public/28916">(pssst) Let's treat all API in
public headers as public</a>.</p>
</div>
<div class="section" id="remove-more-private-functions">
<h2>Remove more private functions</h2>
<p>On July 4th, I removed <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1620749616">181 private functions</a> so
far.</p>
<p>On July 4th, I identified that <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1620773057">34 projects</a> on
PyPI top 5,000 are affected by these removals.</p>
<p>On July 7th, I <a class="reference external" href="https://github.com/python/pythoncapi-compat/pull/62">added PyObject_Vectorcall()</a> to the
pythoncapi-compat project.</p>
<p>On July 9th, I started the discussion:
<a class="reference external" href="https://discuss.python.org/t/c-api-how-much-private-is-the-private-py-identifier-api/29190">C API: How much private is the private _Py_IDENTIFIER() API?</a></p>
<p>On July 13th, I asked if <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1633302147">the PyComplex API</a>
should be made private or not. Petr noticed that this API was documented.</p>
<p>On July 23th, I tried to build numpy, but I was blocked by Cython which was broken by my
changes. I created the <a class="reference external" href="https://github.com/python/cpython/issues/107076">issue gh-107076</a>: "C API: Cython 3.0 uses
private functions removed in Python 3.13 (numpy 1.25.1 fails to build)".</p>
<p>On July 23th, I found that the private <tt class="docutils literal">_PyTuple_Resize()</tt> function is documented. I
proposed <a class="reference external" href="https://github.com/python/cpython/pull/107139">adding a new internal _PyTupleBuilder API</a> to replace
<tt class="docutils literal">_PyTuple_Resize()</tt>.</p>
<p>On July 23th, I proposed:
<a class="reference external" href="https://discuss.python.org/t/c-api-my-plan-to-clarify-private-vs-public-functions-in-python-3-13/30131">C API: My plan to clarify private vs public functions in Python 3.13</a>.</p>
<blockquote>
Private API has multiple issues: they are usually <strong>not documented</strong>, <strong>not
tested</strong>, and so their <strong>behavior may change</strong> without any warning or
anything. Also, they can be <strong>removed anytime</strong> without any notice.</blockquote>
<ul class="simple">
<li>Phase 1: Remove as many private API as possible</li>
<li>Phase 2 (Python 3.13 alpha 1): revert removals if needed to make sure that Cython, numpy and pip
work.</li>
<li>Phase 3 (Python 3.13 beta 1): consider reverting more removals if needed.</li>
</ul>
<p>On July 24th, I created the PR <a class="reference external" href="https://github.com/python/cpython/pull/107068">Remove private _PyCrossInterpreterData API</a>. <strong>Eric Snow</strong> asked me
to keep this private API since it's used by 3rd party C extensions.</p>
<p>On August 24th, I created <a class="reference external" href="https://github.com/python/cpython/issues/108444">issue gh-108444</a> to add <tt class="docutils literal">PyLong_AsInt()</tt>
public function, replacing the removed <tt class="docutils literal">_PyLong_AsInt()</tt> function.</p>
<p>On September 4th, I looked at the <tt class="docutils literal">_PyArg</tt> API. I started the discussion:
<a class="reference external" href="https://discuss.python.org/t/use-the-limited-c-api-for-some-of-our-stdlib-c-extensions/32465">Use the limited C API for some of our stdlib C extensions</a>.</p>
<p>On September 4th, <a class="reference external" href="https://discuss.python.org/t/c-api-my-plan-to-clarify-private-vs-public-functions-in-python-3-13/30131/9">I declared</a>:</p>
<blockquote>
I declare that the Python 3.13 <strong>season of “removing as many private C API
as possible” ended</strong>! I stop here until Python 3.14.</blockquote>
<p>Python 3.12 exports <strong>385</strong> private functions. After the cleanup, Python 3.13
only exported <strong>86</strong> private functions: I removed 299 functions. I closed the
issue.</p>
</div>
<div class="section" id="python-3-13-alpha-1-negative-feedback">
<h2>Python 3.13 alpha 1 negative feedback</h2>
<p>On October 13th, <strong>Python 3.13 alpha 1 was released</strong> with my changes
removing around 300 private C API functions.</p>
<p>On October 14th, <strong>Guido van Rossum</strong> <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1762755146">asked</a>:</p>
<blockquote>
Thanks for the list. Should we <strong>encourage</strong> various <strong>projects to test
3.13a1</strong>, which just came out? Is there a way we can encourage them more?</blockquote>
<p>On October 30th, <strong>Stefan Behnel</strong>, Cython creator, posted the message:
<a class="reference external" href="https://discuss.python.org/t/python-3-13-alpha-1-contains-breaking-changes-whats-the-plan/37490">Python 3.13 alpha 1 contains breaking changes, what's the plan?</a>.
He also <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1772735064">commented the issue</a>.
Extract:</p>
<blockquote>
I just came across this issue. Let me express my general disapproval
regarding deliberate breakage, which this issue appears to be entirely
about. As far as I can see, none of these removals was motivated. The mere
idea of removing existing API "because we can" is entirely foreign to me.</blockquote>
<p>On October 31th, <strong>Petr</strong> asked the Steering Council:
<a class="reference external" href="https://github.com/python/steering-council/issues/212">Is it OK to remove _PyObject_Vectorcall?</a>
about the removal of old aliases with underscore, such as
<tt class="docutils literal">_PyObject_Vectorcall</tt>.
I didn't know that these names were part of <a class="reference external" href="https://peps.python.org/pep-0590/">PEP 590 – Vectorcall: a fast
calling protocol for CPython</a>, nothing was
written about that in the header files.</p>
<p>On November 2nd, <strong>Guido</strong> <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1790832433">wrote</a>
(where WG stands for C API Working Group):</p>
<blockquote>
<p>We can talk till we’re blue in the face but please no more action (i.e., no
more moving/removing APIs) until the full WG has had a chance to discuss
this and make a decision.</p>
<p>(Restoring removed APIs at users’ requests is fine.)</p>
</blockquote>
<p>On November 3rd, <strong>Gregory P. Smith</strong> <a class="reference external" href="https://github.com/python/cpython/issues/111481#issuecomment-1794211126">wrote</a>:</p>
<blockquote>
<p>I'd much prefer 'revert' for any API anyone is found using in 3.13.</p>
<p>We need to treat 3.13 as a more special than usual release and aim to
minimize compatibility headaches for existing project code. That way more
things that build and run on 3.12 build can run on 3.13 as is or with
minimal work.</p>
<p>This will enable ecosystem code owners to focus on the bigger picture task
of enabling existing code to be built and tested on an experimental pep703
free-threading build rather than having a pile of unrelated cleanup trivia
blocking that.</p>
</blockquote>
<p>On November 7th, my colleague <strong>Karolina Surma</strong> posted a report: <a class="reference external" href="https://discuss.python.org/t/ongoing-packages-rebuild-with-python-3-13-in-fedora/38134">Ongoing packages'
rebuild with Python 3.13 in Fedora</a>.
She did a great bug triage work on counting build failures per C API issue by
recompiling 4000+ Python packages in Fedora with Python 3.13.</p>
<p>On November 13th, <strong>Petr</strong> also identified that the private PyComplex API, such as
<tt class="docutils literal">_Py_c_sum()</tt> function, was documented. Moreover, the <a class="reference external" href="https://github.com/python/cpython/issues/112019">issue gh-112019</a> was created to ask to
revert these APIs.</p>
</div>
<div class="section" id="revert-in-python-3-13-alpha-2">
<h2>Revert in Python 3.13 alpha 2</h2>
<p>On November 13th, I created <a class="reference external" href="https://github.com/python/cpython/issues/112026">issue gh-112026</a>: "[C API] Revert of private
functions removed in Python 3.13 causing most problems". I made 4 changes:</p>
<ul class="simple">
<li>Add again <tt class="docutils literal"><unistd.h></tt> include in Python.h</li>
<li>Restore removed private C API</li>
<li>Restore removed _PyDict_GetItemStringWithError()</li>
<li>Add again _PyThreadState_UncheckedGet() function</li>
</ul>
<p>I selected functions by looking at bug reports, <strong>Karolina</strong>'s report, and by
trying to build numpy and cffi. With my reverts, numpy built successfully, and
cffi built successfully with a minor change that I reported upstream
(<a class="reference external" href="https://github.com/python-cffi/cffi/pull/34">cffi: Use PyErr_FormatUnraisable() on Python 3.13</a>).</p>
<p>In total, I restored <a class="reference external" href="https://github.com/python/cpython/issues/112026#issuecomment-1813191948">50 private functions</a>.</p>
<p>On November 22th, <strong>Python 3.13 alpha 2 was released</strong> with these restored
functions. It seems like the situation is calmer now.</p>
<p>Reverting was part of my initial plan, it was clearly announced since the
beginning. But I didn't expect that so many people would test Python 3.13 alpha
1 as soon as it was released (October)! Usually, we only start to get feedback
around beta 1 (May). I had like <strong>2 weeks to fix most issues instead of 7
months</strong>. It was really stressful for me.</p>
<p>I <a class="reference external" href="https://discuss.python.org/t/python-3-13-alpha-1-contains-breaking-changes-whats-the-plan/37490/29">posted a message to apologize</a>
and to give the context of this work. Extract:</p>
<blockquote>
<p>Following the announced plan 22, I reverted 50 private APIs 20 which were
removed in Python 3.13 alpha 1. These APIs will be available again in the
incoming Python 3.13 alpha 2 (scheduled next Tuesday).</p>
<p>I <strong>planned to make Cython, numpy and cffi compatible</strong> with Python 3.13
<strong>alpha 1</strong>. Well, I missed this release. With reverted changes, numpy
1.26.2 can be built successfully, and cffi 1.16.0 just requires a single
change 13. So we should be good (or almost good) for Python 3.13
<strong>alpha 2</strong>.</p>
<p>(...)</p>
<p>I’m sorry if some people felt that this C API work was forced on them and
their opinion was not taken in account. We heard you and we took your
feedback in account. It took me time to adjust my plan according to early
received feedback. I expected to have 6 months to work step by step. Well,
I had 2 weeks instead 🙂</p>
</blockquote>
</div>
<div class="section" id="add-public-functions">
<h2>Add public functions</h2>
<p>On October 30th, I created <a class="reference external" href="https://github.com/python/cpython/issues/111481">issue gh-111481</a>: "[C API] Meta issue: add
new public functions with doc+tests to replace removed private functions".</p>
<p>So far, I added 7 public functions to Python 3.13:</p>
<ul class="simple">
<li><tt class="docutils literal">PyDict_Pop()</tt></li>
<li><tt class="docutils literal">PyDict_PopString()</tt></li>
<li><tt class="docutils literal">PyList_Clear()</tt></li>
<li><tt class="docutils literal">PyList_Extend()</tt></li>
<li><tt class="docutils literal">PyLong_AsInt()</tt></li>
<li><tt class="docutils literal">Py_HashPointer()</tt></li>
<li><tt class="docutils literal">Py_IsFinalizing()</tt></li>
</ul>
<p>More functions are coming soon, I have many open pull requests!</p>
<p>Adding new functions is slower than what I expected. The good part is that many
people are reviewing the APIs, and that the new public APIs are way better than
the old private ones: less error prone, can be more efficient, etc. At least,
the conversion of private to public is moving steadily, functions are added one
by one.</p>
</div>
Design the API of a new PyDict_GetItemRef() function2023-11-16T20:00:00+01:002023-11-16T20:00:00+01:00Victor Stinnertag:vstinner.github.io,2023-11-16:/c-api-dict-getitemref.html<p>Last June, I proposed adding a new <tt class="docutils literal">PyDict_GetItemRef()</tt> function to Python
3.13 C API. Every aspect of the API design was discussed in length. I will
explain how the API was designed, to finish with the future creation of C API
Working Group.</p>
<img alt="Psyche Revived by Cupid's Kiss" src="https://vstinner.github.io/images/amour_psychee.jpg" />
<p>Photo: <em>Psyche Revived by Cupid's Kiss …</em></p><p>Last June, I proposed adding a new <tt class="docutils literal">PyDict_GetItemRef()</tt> function to Python
3.13 C API. Every aspect of the API design was discussed in length. I will
explain how the API was designed, to finish with the future creation of C API
Working Group.</p>
<img alt="Psyche Revived by Cupid's Kiss" src="https://vstinner.github.io/images/amour_psychee.jpg" />
<p>Photo: <em>Psyche Revived by Cupid's Kiss</em> sculpture by Antonio Canova.</p>
<div class="section" id="add-pyimport-addmoduleref-function">
<h2>Add PyImport_AddModuleRef() function</h2>
<p>In June, while reading Python C code, I found a <a class="reference external" href="https://github.com/python/cpython/blob/8cd70eefc7f3363cfa0d43f34522c3072fa9e160/Python/import.c#L345-L369">surprising code</a>:
the <tt class="docutils literal">PyImport_AddModuleObject()</tt> function creates a <strong>weak reference</strong> on the
module returned by <tt class="docutils literal">import_add_module()</tt>, call <tt class="docutils literal">Py_DECREF()</tt> on the module,
and then try to get the module back from the weak reference: it can be NULL if
the reference count was one. I expected to have just <tt class="docutils literal">Py_DECREF()</tt>, but no,
complicated code involving a weak reference is needed to prevent a crash.</p>
<p>So I <a class="reference external" href="https://github.com/python/cpython/issues/105922">added</a> the new
<a class="reference external" href="https://docs.python.org/dev/c-api/import.html#c.PyImport_AddModuleRef">PyImport_AddModuleRef() function</a> to
return directly the strong reference, and avoid having to create a temporary
weak reference.</p>
<p>Note: The API of the new PyImport_AddModuleObject() function is <a class="reference external" href="https://github.com/python/cpython/issues/106915">still being
discussed and may change in the near future</a>.</p>
</div>
<div class="section" id="add-pyweakref-getref-function">
<h2>Add PyWeakref_GetRef() function</h2>
<p>Shortly after, I <a class="reference external" href="https://github.com/python/cpython/issues/105927">added</a> the
new <a class="reference external" href="https://docs.python.org/dev/c-api/weakref.html#c.PyWeakref_GetRef">PyWeakref_GetRef() function</a>. It is
similar to <tt class="docutils literal">PyWeakref_GetObject()</tt>, but returns a strong reference instead of
a borrowed reference.</p>
<p>Since I listed <a class="reference external" href="https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references">Bad C API</a> in my
"Design a new better C API for Python" project in 2018, I am now fighting
against borrowed references since they cause multiple issues such as:</p>
<ul class="simple">
<li>Subtle crashes in C extensions.</li>
<li>Make the C API implementation in PyPy more complicated: see
<a class="reference external" href="https://www.pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-8083064623681286567.html">Inside cpyext: Why emulating CPython C API is so Hard</a>
(2018) by Antonio Cuni.</li>
<li>Unknown objects lifetime preventing optimization opportunities.</li>
<li>Make the C API less regular and harder to use: some functions return a new
reference, others return borrowed reference.</li>
</ul>
<p>In 2020, my first attempt to <a class="reference external" href="https://github.com/python/cpython/issues/86460">add a new PyTuple_GetItemRef() function</a> was rejected.</p>
</div>
<div class="section" id="pydict-getitemref-easy">
<h2>PyDict_GetItemRef(): easy!</h2>
<p>Since it went well (quick discussion, no major disagreement) to add
<tt class="docutils literal">PyImport_AddModuleRef()</tt> and <tt class="docutils literal">PyWeakref_GetRef()</tt> functions, I felt lucky and
proposed <a class="reference external" href="https://github.com/python/cpython/issues/106004">adding a new PyDict_GetItemRef() function</a>. It should be easy as well,
right? The discussion started in the issue and continued in the associated
<a class="reference external" href="https://github.com/python/cpython/pull/106005">pull request</a>.</p>
<p>The idea of <tt class="docutils literal">PyDict_GetItemRef()</tt> is to replace the <tt class="docutils literal">PyDict_GetItem()</tt>
function which returns a borrowed reference and ignore all errors:
<tt class="docutils literal">hash(key)</tt> error, <tt class="docutils literal">key == key2</tt> comparison error, <tt class="docutils literal">KeyboardInterrupt</tt>,
etc.</p>
<p>There is already the <tt class="docutils literal">PyDict_GetItemWithError()</tt> function which reports
errors. But it returns a borrowed reference and its API has an issue: when it
returns <tt class="docutils literal">NULL</tt>, the caller must check <tt class="docutils literal">PyErr_Occurred()</tt> to know if an
exception is set, or if the key is missing. This problem was the <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/1">very first
issue</a> created in the
Problems project of the C API Working Group.</p>
<p>This Problems project is a collaborative work to collect C API issues. By the
way, the <a class="reference external" href="https://peps.python.org/pep-0733/">PEP 733 – An Evaluation of Python’s Public C API</a> was published at October 16: summary of
these problems.</p>
</div>
<div class="section" id="pydict-getitemref-api-version-1">
<h2>PyDict_GetItemRef(): API version 1</h2>
<p>I proposed the API:</p>
<pre class="literal-block">
int PyDict_GetItemRef(PyObject *mp, PyObject *key, PyObject **pvalue)
int PyDict_GetItemStringRef(PyObject *mp, const char *key, PyObject **pvalue)
</pre>
<p>Return <tt class="docutils literal">0</tt> on success, or <tt class="docutils literal"><span class="pre">-1</span></tt> on error. Simple, right?</p>
<p><strong>Gregory Smith</strong> was supportive:</p>
<blockquote>
I'm in favor of this because I don't think we should have public APIs that
(a) require a value check + <tt class="docutils literal">PyErr_Occurred()</tt> call pattern - a frequent
source of lurking bugs - or (b) return borrowed references. Yes I know we
already have them, that's missing the point. The point is that with these
in place, we can promote their use over the others because these are better
in all respects.</blockquote>
<p>Later, I discovered that the draft <a class="reference external" href="https://peps.python.org/pep-0703/">PEP 703 – Making the Global Interpreter
Lock Optional in CPython</a> proposed adding
a <tt class="docutils literal">PyDict_FetchItem()</tt> similar to my proposed <tt class="docutils literal">PyDict_GetItemRef()</tt>
function.</p>
</div>
<div class="section" id="api-version-2-change-the-return-value">
<h2>API version 2: Change the Return Value</h2>
<p><strong>Mark Shannon</strong> asked:</p>
<blockquote>
What's the rationale for not distinguishing between found and not found in
the return value? See: <a class="reference external" href="https://github.com/python/devguide/issues/1121">Document the preferred style for API functions with
three, four or five-way returns</a>.</blockquote>
<p>I modified the API to return <tt class="docutils literal">1</tt> if the key is present and return <tt class="docutils literal">0</tt> if
the key is missing.</p>
<p>By the way, <strong>Erlend Aasland</strong> added <a class="reference external" href="https://devguide.python.org/developer-workflow/c-api/index.html#guidelines-for-expanding-changing-the-public-api">C API guidelines</a>
in the Python Developer Guide (devguide) about function return values.</p>
</div>
<div class="section" id="function-name">
<h2>Function Name</h2>
<p><strong>Serhiy Storchaka</strong> had concerns about the name:</p>
<blockquote>
The only problem is that functions with so similar names have completely
different interface. It is pretty confusing. Would not be better to name it
<tt class="docutils literal">PyDict_LookupItem</tt> or like? It may be worth to add also
<tt class="docutils literal">PyMapping_LookupItem</tt> for convenience.</blockquote>
<p><strong>Mark Shannon</strong> added:</p>
<blockquote>
<p>Can we come up with a better name than <tt class="docutils literal">PyDict_GetItemRef</tt>?
I see why you are adding <tt class="docutils literal">Ref</tt> to the end, but all API functions should
return new references, so it is a bit like calling the function
PyDict_GetItemNotWrong.</p>
<p>Obviously, the ideal name [<tt class="docutils literal">PyDict_GetItem()</tt>] is already taken. Anyone
have any suggestions for a better name?</p>
</blockquote>
<p><strong>Sam Gross</strong> wrote:</p>
<blockquote>
<p>In the context of PEP 703, I think it would be better to have variations
that only change one axis of the semantics (e.g., new vs. borrowed, error
vs. no error) and have the naming reflect that. For example, PEP 703
proposes:</p>
<p><tt class="docutils literal">PyDict_FetchItem</tt> for <tt class="docutils literal">PyDict_GetItem</tt> and
<tt class="docutils literal">PyDict_FetchItemWIthError</tt> for <tt class="docutils literal">PyDict_GetItemWithError</tt>.</p>
</blockquote>
<p>I created <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/52">Naming convention for new C API functions</a> to discuss the <tt class="docutils literal">Ref</tt>
suffix for new functions returning a strong refererence.</p>
<p>PEP 703 proposes <tt class="docutils literal">PyDict_FetchItem()</tt> name.</p>
</div>
<div class="section" id="first-argument-type">
<h2>First Argument Type</h2>
<p><strong>Mark Shannon</strong> had concerns about the first argument type:</p>
<blockquote>
Using <tt class="docutils literal">PyObject*</tt> is needlessly throwing away type information.</blockquote>
<p><strong>Erlend Aasland</strong> added:</p>
<blockquote>
Why not strongly typed, since it is a <tt class="docutils literal">PyDict_</tt> API?</blockquote>
</div>
<div class="section" id="pull-request-approvals-and-the-function-name-strikes-back">
<h2>Pull Request Approvals And The Function Name Strikes Back</h2>
<p><strong>Erlend</strong> and <strong>Gregory</strong> approved my pull request.</p>
<p><strong>Erlend</strong> wrote:</p>
<blockquote>
I'm approving this. A new naming scheme makes sense for a new API; I'm not
sure it makes sense to try and enforce a new scheme in the current API. For
now, there is already precedence of the <tt class="docutils literal">Ref</tt> suffix in the current API;
I'm ok with that. Also, the current API uses <tt class="docutils literal">PyObject*</tt> all over the
place. If we are to change this, we practically will end up with a
completely new API; AFAICS, there is no problem with sticking to the
current practice.</blockquote>
<p>Then the discussion about the function name came back. So <strong>Gregory</strong> asked the
Steering Council: <a class="reference external" href="https://github.com/python/steering-council/issues/201">Should we add non-borrowed-ref public C APIs, if
so, is there a naming convention?</a>. He asked two
questions:</p>
<ul class="simple">
<li>Q1: Should we add non-borrowed-reference public C APIs where only
borrowed-reference ones exist.</li>
<li>Q2: if yes to Q1, is there a preferred naming convention to use for new
public C APIs that return a strong reference when the earlier APIs these
would be parallel versions of only returned a borrowed reference.</li>
</ul>
<p>Later, <strong>Serhiy Storchaka</strong> also approved the pull request:</p>
<blockquote>
<p>In general, I support adding this function. The benefits:</p>
<ul class="simple">
<li>Returns a strong reference. It will save from some errors and may be
better for PyPy.</li>
<li>Save CPU time for calling PyErr Occurred().</li>
</ul>
</blockquote>
<p>The PR had a total of 3 approvals.</p>
</div>
<div class="section" id="api-version-3-use-pydictobject">
<h2>API version 3: use PyDictObject</h2>
<p>When I asked again <strong>Mark</strong> his opinion on the API, he wrote:</p>
<blockquote>
I'm opposed because making ad-hoc changes like this is going to make the
C-API worse, not better.</blockquote>
<p>I made the change asked by <strong>Mark</strong>, change the first parameter type from
<tt class="docutils literal">PyObject*</tt> to <tt class="docutils literal">PyDictObject*</tt>. API version 3:</p>
<pre class="literal-block">
int PyDict_GetItemRef(PyDictObject *op, PyObject *key, PyObject **pvalue)
</pre>
</div>
<div class="section" id="disagreement-on-the-pydictobject-type">
<h2>Disagreement On The PyDictObject Type</h2>
<p><strong>Serhiy</strong> was against the change:</p>
<blockquote>
I dislike using concrete struct types instead of <tt class="docutils literal">PyObject*</tt> in API,
especially in public API. Isn't there a rule forbidding this?</blockquote>
<p>In May, <strong>Mark</strong> created <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/31">The C API is weakly typed</a> discussion in the
Problems project.</p>
<p>During the discussion, <strong>Erlend</strong> created <a class="reference external" href="https://github.com/python/devguide/issues/1127">Document guidelines for when to use
dynamically typed APIs</a> in
the devguide to try to find a consensus regarding guidelines for weakly/stronly
typed APIs.</p>
<p>There are two questions:</p>
<ul class="simple">
<li>Use <tt class="docutils literal">PyObject*</tt> or <tt class="docutils literal">PyDictObject*</tt> type for the parameter.</li>
<li>Check the type at runtime, or don't check for best performance (use an
assertion in debug mode).</li>
</ul>
<p><strong>Serhiy</strong> wrote:</p>
<blockquote>
<p>It is not about runtime checking.</p>
<p>It is about requiring to cast the argument to <tt class="docutils literal">PyDictObject*</tt> every time
you use the function: <tt class="docutils literal"><span class="pre">PyDict_GetItemRef((PyDictObject*)foo,</span> bar, &baz)</tt>.</p>
<p>It is tiresome, and it is unsafe, because the compiler will not reject the
code if <tt class="docutils literal">foo</tt> is <tt class="docutils literal">int</tt> or <tt class="docutils literal">const char*</tt>.</p>
</blockquote>
<p><strong>Gregory</strong> added:</p>
<blockquote>
Our C API only accepts plain <tt class="docutils literal">PyObject*</tt> as input to all our public
APIs. Otherwise user code will be littered with typecasts all over the
place.</blockquote>
<p><strong>Gregory</strong> removed his approval.</p>
</div>
<div class="section" id="revert-back-to-pyobject-type-api-version-2">
<h2>Revert: Back To PyObject Type (API Version 2)</h2>
<p>Since <strong>Serhiy</strong> and <strong>Gregory</strong> were against the change, I reverted it to move
back to the <tt class="docutils literal">PyObject*</tt> type. <strong>Serhiy</strong> and <strong>Erlend</strong> confirmed their
approval.</p>
<p>I created the issue <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/55">Design a brand new C API with new PyCAPI_ prefix where all
functions respect new guidelines</a> in the Problems
project to discuss the creation of a branch new API. I suggested <strong>Mark</strong> to
only consider changing weakly type <tt class="docutils literal">PyObject*</tt> type to strongly typed
<tt class="docutils literal">PyDictObject*</tt> in such new API.</p>
</div>
<div class="section" id="more-changes-api-version-4">
<h2>More changes? API version 4</h2>
<p><strong>Petr Viktorin</strong> joined the discussion and proposed a late change:</p>
<blockquote>
FWIW, here's a possible new variant: you could set result to <tt class="docutils literal">NULL</tt> in
which case the result isn't stored/incref'd. And that would start a
convention of how to turn a get operation into a membership test. (And the
Lookup name would fit that better.)</blockquote>
<p>I didn't take <strong>Petr</strong>'s suggestion since <strong>Serhiy</strong> pointed out that there is
already the <tt class="docutils literal">PyDict_Contains()</tt> function to test is a dictionary contains a
key.</p>
<p><strong>Mark Shannon</strong> wrote:</p>
<blockquote>
If this function is to take <tt class="docutils literal">PyObject*</tt>, as <strong>Erlend</strong> seems to insist,
then it shouldn't raise a <tt class="docutils literal">SystemError</tt> when passed something other than
a dict. It should raise a <tt class="docutils literal">TypeError</tt>.</blockquote>
<p>I modified the API (version 4) to raise <tt class="docutils literal">SystemError</tt> if the first argument
is not a dictionary, instead raising <tt class="docutils literal">TypeError</tt>.</p>
</div>
<div class="section" id="merge-the-change">
<h2>Merge The Change</h2>
<p>After around 1 month of intense discussions, I merged my change adding the
<tt class="docutils literal">PyDict_GetItemRef()</tt> function (<a class="reference external" href="https://github.com/python/cpython/commit/41ca16455188db806bfc7037058e8ecff2755e6c">commit</a>)
with <a class="reference external" href="https://github.com/python/cpython/pull/106005#issuecomment-1646249360">a summary of the discussion</a>.</p>
<p>I also <a class="reference external" href="https://github.com/python/pythoncapi-compat/commit/eaff3c172f94ed32ac38860c38d7a8fa27483e57">added the function to pythoncapi-compat project</a>.</p>
<p>Final API:</p>
<pre class="literal-block">
int PyDict_GetItemRef(PyObject *p, PyObject *key, PyObject **result)
int PyDict_GetItemStringRef(PyObject *p, const char *key, PyObject **result)
</pre>
<p>Documentation:</p>
<ul class="simple">
<li><a class="reference external" href="https://docs.python.org/dev/c-api/dict.html#c.PyDict_GetItemRef">PyDict_GetItemRef</a></li>
<li><a class="reference external" href="https://docs.python.org/dev/c-api/dict.html#c.PyDict_GetItemStringRef">PyDict_GetItemStringRef</a></li>
</ul>
<p>Using the <a class="reference external" href="https://pythoncapi-compat.readthedocs.io/">pythoncapi-compat project</a>, you can use this new API right
now on all Python versions!</p>
</div>
<div class="section" id="how-to-take-decisions">
<h2>How To Take Decisions?</h2>
<p>The discussions occurred at many multiple places:</p>
<ul class="simple">
<li>My Python issue</li>
<li>My Python pull request</li>
<li>Multiple Problems issues</li>
<li>Multiple devguide issues</li>
<li>Steering Council issue</li>
</ul>
<p>The discussion was heated. <strong>Erlend</strong> decided to take a break:</p>
<blockquote>
I'm taking a break from the C API discussions; I'm removing myself from
this PR for now</blockquote>
<p>While the change was approved by 3 core developers, there was not strictly a
consensus since <strong>Mark</strong> did not formally approve the change. Some people asked
to wait until some general guidelines for new APIs are decided, <strong>before</strong>
making further C API changes.</p>
<p><strong>Gregory</strong> opened a Steering Council issue at July 2. I asked for an update
at July 17. Three meetings later, they didn't have the opportunity to visit the
question. They were busy discussing the heavy <a class="reference external" href="https://peps.python.org/pep-0703/">PEP 703 – Making the Global
Interpreter Lock Optional in CPython</a>. I
merged my changed before the Steering Council spoke up. I proposed to revert
the change if needed. At July 25, <strong>Gregory</strong> replied in the name of the
Steering Council:</p>
<blockquote>
The steering council chatted about non-borrowed-ref and naming conventions
today. We want to <strong>delegate</strong> this to the <strong>C API working group</strong> to come
back with a broader recommendation. <strong>Irit Katriel</strong> has put together the
initial draft of <a class="reference external" href="https://github.com/capi-workgroup/problems/blob/main/capi_problems.rst">An Evaluation of Python's Public C API</a>
for example.</blockquote>
<p>The problem was that the C API Working Group was just a GitHub organization, it
was not an organized group with designated members.</p>
</div>
<div class="section" id="c-api-working-group">
<h2>C API Working Group</h2>
<p>From October 9 to 14, there was a Core Dev Sprint at Brno (Czech Republic). I
gave a talk about the C API status and my C API agenda: <a class="reference external" href="https://github.com/vstinner/talks/blob/main/2023-CoreDevSprint-Brno/c-api.pdf">slides of my C API
talk</a>.
At the end, I called to create a formal C API Working Group to unblock the
situation.</p>
<p>During the sprint, after my talk, <strong>Guido van Rossum</strong> wrote <a class="reference external" href="https://peps.python.org/pep-0731/">PEP 731 – C API
Working Group Charter</a> with 5 members:</p>
<ul class="simple">
<li><strong>Steve Dower</strong></li>
<li><strong>Irit Katriel</strong></li>
<li><strong>Guido van Rossum</strong></li>
<li><strong>Victor Stinner</strong> (me)</li>
<li><strong>Petr Viktorin</strong></li>
</ul>
<p>Once the PEP was published, it was <a class="reference external" href="https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117">discussed on discuss.python.org</a>.
Two weeks later, <strong>Guido</strong> submitted the PEP to the Steering Council: <a class="reference external" href="https://github.com/python/steering-council/issues/210">PEP 731
-- C API Working Group Charter</a>.</p>
<p>The Steering Council didn't take a decision yet. Previously, the Steering
Council expressed their desire to delegate some C API decisions to a C API
Working Group.</p>
</div>
My contributions to Python (July 2023)2023-07-08T23:00:00+02:002023-07-08T23:00:00+02:00Victor Stinnertag:vstinner.github.io,2023-07-08:/contrib-python-july-2023.html<p>In 2023, between May 4 and July 8, I made 144 commits in the Python main
branch. In this article, I describe the most important Python contributions
that I made to Python 3.12 and Python 3.13 in these months.</p>
<a class="reference external image-reference" href="https://twitter.com/foxes_in_love/status/1668558475490742277"><img alt="Foxes in Love: Cuddle" src="https://vstinner.github.io/images/foxes_in_love_cuddle.jpg" /></a>
<p><em>Drawing: Foxes in Love: Cuddle</em></p>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>Add PyImport_AddModuleRef() and …</li></ul></div><p>In 2023, between May 4 and July 8, I made 144 commits in the Python main
branch. In this article, I describe the most important Python contributions
that I made to Python 3.12 and Python 3.13 in these months.</p>
<a class="reference external image-reference" href="https://twitter.com/foxes_in_love/status/1668558475490742277"><img alt="Foxes in Love: Cuddle" src="https://vstinner.github.io/images/foxes_in_love_cuddle.jpg" /></a>
<p><em>Drawing: Foxes in Love: Cuddle</em></p>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>Add PyImport_AddModuleRef() and PyWeakref_GetRef().</li>
<li>Py_INCREF() and Py_DECREF() as opaque function call in limited C API.</li>
<li>PyList_SET_ITEM() and PyTuple_SET_ITEM() checks index bounds.</li>
<li>Define "Soft Deprecation" in PEP 387; getopt and optparse are soft
deprecated.</li>
<li>Document how to replace imp with importlib.</li>
<li>Remove 19 stdlib modules.</li>
<li>Remove locale.resetlocale() and logging.Logger.warn().</li>
<li>Remove 181 private C API functions.</li>
</ul>
</div>
<div class="section" id="pep-594">
<h2>PEP 594</h2>
<p>In Python 3.13, I removed 19 modules deprecated in Python 3.11 by PEP 594:</p>
<ul class="simple">
<li>aifc</li>
<li>audioop</li>
<li>cgi</li>
<li>cgitb</li>
<li>chunk</li>
<li>crypt</li>
<li>imghdr</li>
<li>mailcap</li>
<li>nis</li>
<li>nntplib</li>
<li>ossaudiodev</li>
<li>pipes</li>
<li>sndhdr</li>
<li>spwd</li>
<li>sunau</li>
<li>telnetlib</li>
<li>uu</li>
<li>xdrlib</li>
</ul>
<p><em>Zachary Ware</em> removed the last deprecated module, msilib, so the PEP 594 is
now fully implemented in Python 3.13!</p>
<p>I announced the change: <a class="reference external" href="https://discuss.python.org/t/pep-594-has-been-implemented-python-3-13-removes-20-stdlib-modules/27124">PEP 594 has been implemented: Python 3.13 removes 20
stdlib modules</a>.</p>
<p>Removing imghdr caused me some troubles with building the Python documentation.
Sphinx uses imghdr, but recent Sphinx versions no longer use it. I updated
the Sphinx version to workaround this issue.</p>
</div>
<div class="section" id="c-api-strong-reference">
<h2>C API: Strong reference</h2>
<p><strong>tl; dr I added PyImport_AddModuleRef() and PyWeakref_GetRef() to Python 3.13
to return strong references, instead of borrowed references.</strong></p>
<p>When I <a class="reference external" href="https://pythoncapi.readthedocs.io/">analyzed issues of Python C API</a>., I quickly identified that the usage of
borrowed references is causing a lot of troubles. By the way, I recently
updated the <a class="reference external" href="https://pythoncapi.readthedocs.io/bad_api.html#functions">list of the 41 functions returning borrowed refererences</a>. This issue is
also tracked as <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/21">Returning borrowed references is fundamentally unsafe</a> in the recently
created <a class="reference external" href="https://github.com/capi-workgroup/problems/">Problems</a> project of
the new C API workgroup.</p>
<p>In Python 3.10, I added <tt class="docutils literal">Py_NewRef()</tt> and <tt class="docutils literal">Py_XNewRef()</tt> functions which
have a better semantics: they create a new strong reference to a Python object.
I also added the <tt class="docutils literal">PyModule_AddObjectRef()</tt> function, variant of
<tt class="docutils literal">PyModule_AddObject()</tt>, which returns a strong reference. And I added
<a class="reference external" href="https://docs.python.org/dev/glossary.html#term-borrowed-reference">borrowed reference</a> and
<a class="reference external" href="https://docs.python.org/dev/glossary.html#term-strong-reference">strong reference</a> terms to
the glossary.</p>
<p>In Python 3.13, I added two functions:</p>
<ul class="simple">
<li><strong>PyImport_AddModuleRef()</strong>: variant of <tt class="docutils literal">PyImport_AddModule()</tt></li>
<li><strong>PyWeakref_GetRef()</strong>: variant of <tt class="docutils literal">PyWeakref_GetObject()</tt>.
I also deprecated <tt class="docutils literal">PyWeakref_GetObject()</tt> and <tt class="docutils literal">PyWeakref_GET_OBJECT()</tt>
functions.</li>
</ul>
<p>I updated pythoncapi-compat to <a class="reference external" href="https://pythoncapi-compat.readthedocs.io/en/latest/api.html#python-3-13">provide these functions to Python 3.12 and
older</a>.</p>
<p>I also added <tt class="docutils literal">Py_TYPE()</tt> to <tt class="docutils literal">Doc/data/refcounts.dat</tt>: file listing how C
functions handle references, it's maintained manually.</p>
<p>Now I'm working on adding <strong>PyDict_GetItemRef()</strong> but the API and the function
name are causing more frictions: see the <a class="reference external" href="https://github.com/python/cpython/pull/106005">pull request</a>. Recently,
PyDict_GetItemRef() API was raised to the Steering Council:
<a class="reference external" href="https://github.com/python/steering-council/issues/201">decision: Should we add non-borrowed-ref public C APIs, if so, is there a
naming convention?</a></p>
</div>
<div class="section" id="c-api-pylist-set-item">
<h2>C API: PyList_SET_ITEM()</h2>
<p><strong>tl;dr In Python 3.13, PyList_SET_ITEM() and PyTuple_SET_ITEM() now checks
index bounds.</strong></p>
<p>In Python 3.9, <tt class="docutils literal">Include/cpython/listobject.h</tt> was created for the PyList API
excluded from the limited C API. <tt class="docutils literal">PyList_SET_ITEM()</tt> was implemented as:</p>
<pre class="literal-block">
#define PyList_SET_ITEM(op, i, v) (_PyList_CAST(op)->ob_item[i] = (v))
</pre>
<p>In Python 3.10, the <a class="reference external" href="https://github.com/python/cpython/issues/74644">return value was removed to fix as bug</a> by adding <tt class="docutils literal">(void)</tt> cast:</p>
<pre class="literal-block">
#define PyList_SET_ITEM(op, i, v) ((void)(_PyList_CAST(op)->ob_item[i] = (v)))
</pre>
<p>In Python 3.11, <a class="reference external" href="https://peps.python.org/pep-0670/">PEP 670: Convert macros to functions in the Python C API</a> was accepted and I converted the macro to
a static inline function:</p>
<pre class="literal-block">
static inline void
PyList_SET_ITEM(PyObject *op, Py_ssize_t index, PyObject *value) {
PyListObject *list = _PyList_CAST(op);
list->ob_item[index] = value;
}
</pre>
<p>I tried to add an assertion in <tt class="docutils literal">PyTuple_SET_ITEM()</tt> to check index bounds ,
but I got assertion failures when running the Python test suite related to
PyStructSequence which inherits from PyTuple.</p>
<p>Recently, I tried again. I updated the PyStructSequence API to check the index
bounds differently. The tricky part is that getting the number of fields of a
PyStructSequence requires to get an item of dictionary, and
<tt class="docutils literal">PyDict_GetItemWithError()</tt> can raise an exception. Moreover,
<tt class="docutils literal">PyStructSequence_SET_ITEM()</tt> was still implemented as a macro in Python
3.12:</p>
<pre class="literal-block">
#define PyStructSequence_SET_ITEM(op, i, v) PyTuple_SET_ITEM((op), (i), (v))
</pre>
<p>Old PyStructSequence_SetItem() implementation:</p>
<pre class="literal-block">
void
PyStructSequence_SetItem(PyObject* op, Py_ssize_t i, PyObject* v)
{
PyStructSequence_SET_ITEM(op, i, v);
}
</pre>
<p>New implementation:</p>
<pre class="literal-block">
void
PyStructSequence_SetItem(PyObject *op, Py_ssize_t index, PyObject *value)
{
PyTupleObject *tuple = _PyTuple_CAST(op);
assert(0 <= index);
#ifndef NDEBUG
Py_ssize_t n_fields = REAL_SIZE(op);
assert(n_fields >= 0);
assert(index < n_fields);
#endif
tuple->ob_item[index] = value;
}
</pre>
<p>The <tt class="docutils literal">REAL_SIZE()</tt> macro is only available in <tt class="docutils literal">Objects/structseq.c</tt>.
Exposing it in the public C API would be a bad idea. So I just converted
PyStructSequence_SET_ITEM() macro to an alias to PyStructSequence_SetItem():</p>
<pre class="literal-block">
#define PyStructSequence_SET_ITEM PyStructSequence_SetItem
</pre>
<p>This way, PyStructSequence_SET_ITEM() and PyStructSequence_SetItem() are
implemented as opaque function calls.</p>
<p>So it became possible to check index bounds in PyList_SET_ITEM():</p>
<pre class="literal-block">
static inline void
PyList_SET_ITEM(PyObject *op, Py_ssize_t index, PyObject *value) {
PyListObject *list = _PyList_CAST(op);
assert(0 <= index);
assert(index < Py_SIZE(list));
list->ob_item[index] = value;
}
</pre>
<p>I had to modify code calling PyList_SET_ITEM() <em>before</em> setting the list size:
list_extend() and _PyList_AppendTakeRef() functions. The size is now set before
calling PyList_SET_ITEM().</p>
<p>I made a similar change to <tt class="docutils literal">PyTuple_SET_ITEM()</tt> to also checks the index.</p>
<p>These bound checks are implemented with an assertion if Python is built in
debug mode or if Python is built with assertions.</p>
</div>
<div class="section" id="c-api-python-3-12-py-incref">
<h2>C API: Python 3.12 Py_INCREF()</h2>
<p><strong>tl; dr I changed Py_INCREF() and Py_DECREF() implementation as opaque
function calls in any version of the limited C API if Python is built in debug
mode.</strong></p>
<p>In Python 3.12, <a class="reference external" href="https://peps.python.org/pep-0683/">PEP 683 – Immortal Objects, Using a Fixed Refcount</a> was implemented. It made Py_INCREF() and
Py_DECREF() static inline functions even more complicated than before. The
implementation required to expose private <tt class="docutils literal">_Py_IncRefTotal_DO_NOT_USE_THIS()</tt>
and <tt class="docutils literal">_Py_DecRefTotal_DO_NOT_USE_THIS()</tt> functions in the stable ABI, whereas
the function names say "DO NOT USE THIS", for debug builds of Python.</p>
<p>In Python 3.10, I modified Py_INCREF() and Py_DECREF() to implement them as
opaque function calls in the limited C API version 3.10 or newer if Python is
built in debug mode (if <tt class="docutils literal">Py_REF_DEBUG</tt> macro is defined). Thanks to this
change, the limited C API is supported if Python is built in debug mode since
Python 3.10.</p>
<p>In Python 3.12, I <strong>modified Py_INCREF() and Py_DECREF() to implement them as
opaque function calls in all limited C API version</strong>, not only in the limited C
API version 3.10 and newer, if Python is built in debug mode. This way,
implementation details are now hidden and no longer leaked in the stable ABI. I
removed <tt class="docutils literal">_Py_NegativeRefcount()</tt> in the limited C API and I removed
<tt class="docutils literal">_Py_IncRefTotal_DO_NOT_USE_THIS()</tt> and <tt class="docutils literal">_Py_DecRefTotal_DO_NOT_USE_THIS()</tt>
in the stable ABI.</p>
<p>Later, I discovered that my fix broke backward compatibility with Python 3.9.
My implementation used <tt class="docutils literal">_Py_IncRef()</tt> and <tt class="docutils literal">_Py_DecRef()</tt> that I added to
Python 3.10. I updated the implementation to use <tt class="docutils literal">Py_IncRef()</tt> and
<tt class="docutils literal">Py_DecRef()</tt> on Python 3.9 and older, these functions are available since
Python 2.4.</p>
</div>
<div class="section" id="c-api-py-incref-opaque-function-call">
<h2>C API: Py_INCREF() opaque function call</h2>
<p><strong>tl; dr I changed Py_INCREF() and Py_DECREF() implementation as opaque
function calls in the limited C API version 3.12.</strong> (also in the regular
release build, not only in the debug build)</p>
<p>In Python 3.8, I converted Py_INCREF() and Py_DECREF() macros to static inline
functions. I already wanted to convert them as opaque function calls, but it
can have an important cost on performance and so I left them as static inline
functions.</p>
<p>As a follow-up of my Python 3.12 Py_INCREF() fix for the debug build, I
modified Py_INCREF() and Py_DECREF() in Python 3.12 to always implemented them
as <strong>opaque function calls in the limited C API version 3.12</strong> and newer.</p>
<ul class="simple">
<li>Discussion: <a class="reference external" href="https://discuss.python.org/t/limited-c-api-implement-py-incref-and-py-decref-as-function-calls/27592">Limited C API: implement Py_INCREF() and Py_DECREF() as function calls</a></li>
<li><a class="reference external" href="https://github.com/python/cpython/pull/105388">Pull request</a></li>
</ul>
<p>For me, it's a <strong>major enhancement</strong> to make the stable ABI more <strong>future
proof</strong> by leaking less implementation details.</p>
<p><a class="reference external" href="https://github.com/python/cpython/blob/da98ed0aa040791ef08b24befab697038c8c9fd5/Include/object.h#L613-L622">Code</a>:</p>
<pre class="literal-block">
static inline Py_ALWAYS_INLINE void Py_INCREF(PyObject *op)
{
#if defined(Py_LIMITED_API) && (Py_LIMITED_API+0 >= 0x030c0000 || defined(Py_REF_DEBUG))
// Stable ABI implements Py_INCREF() as a function call on limited C API
// version 3.12 and newer, and on Python built in debug mode. _Py_IncRef()
// was added to Python 3.10.0a7, use Py_IncRef() on older Python versions.
// Py_IncRef() accepts NULL whereas _Py_IncRef() doesn't.
# if Py_LIMITED_API+0 >= 0x030a00A7
_Py_IncRef(op);
# else
Py_IncRef(op);
# endif
#else
...
#endif
}
</pre>
</div>
<div class="section" id="tests">
<h2>Tests</h2>
<p>The Python test runner <em>regrtest</em> has specific constraints because tests
are run in subprocesses, on different platforms, with custom encodings
and options. Over the last year, an annoying regrtest came and go: if
a subprocess standard output (stdout) cannot be decoded, the test is treated
as a success! I fixed <a class="reference external" href="https://github.com/python/cpython/issues/101634">the bug</a> and I made the code more
reliable by marking this bug class as "test failed".</p>
<p>I fixed test_counter_optimizer() of test_capi when run twice: create a new
function at each call, so each run starts in a known state. Previously, the
second run was in a different state since the function was already optimized.</p>
<p>I cleaned up old test_ctypes. My main goal was to remove <tt class="docutils literal">from ctypes import
*</tt> to be able to use pyflakes on these tests. I found many skipped tests: I
reenabled 3 of them, and removed the other ones. I also removed dead code.</p>
<p>I removed test_xmlrpc_net: it was skipped since 2017. The public
<tt class="docutils literal">buildbot.python.org</tt> server has no XML-RPC interface anymore, and no
replacement public XML-RPC server was found in 6 years.</p>
<p>I fixed dangling threads in <tt class="docutils literal">test_importlib.test_side_effect_import()</tt>: the
import spawns threads, wait until they complete.</p>
</div>
<div class="section" id="c-api-deprecate">
<h2>C API: Deprecate</h2>
<p>I listed <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.13.html#pending-removal-in-python-3-14">pending C API removals</a>
in the What's New in Python 3.13 document.</p>
<p>I deprecated multiple APIs:</p>
<ul class="simple">
<li>Py_UNICODE and PY_UNICODE_TYPE</li>
<li>PyImport_ImportModuleNoBlock()</li>
<li>Py_HasFileSystemDefaultEncoding</li>
</ul>
<p>I deprecated legacy Python initialization functions:</p>
<ul class="simple">
<li>PySys_ResetWarnOptions()</li>
<li>Py_GetExecPrefix()</li>
<li>Py_GetPath()</li>
<li>Py_GetPrefix()</li>
<li>Py_GetProgramFullPath()</li>
<li>Py_GetProgramName()</li>
<li>Py_GetPythonHome()</li>
</ul>
<p>I removed the PyArg_Parse() deprecation. In 2007, the deprecation was added as
a comment to the documentation, but the function remains relevant in Python
3.13 for some specific use cases.</p>
</div>
<div class="section" id="soft-deprecation">
<h2>Soft Deprecation</h2>
<p><strong>tl; dr The getopt module is now soft deprecated.</strong></p>
<p>I updated <a class="reference external" href="https://peps.python.org/pep-0387/">PEP 387: Backwards Compatibility Policy</a> to add <a class="reference external" href="https://peps.python.org/pep-0387/#soft-deprecation">Soft Deprecation</a>:</p>
<blockquote>
<p>A soft deprecation can be used when using an API which should no longer be
used to write new code, but it remains safe to continue using it in
existing code. The API remains documented and tested, but will not be
developed further (no enhancement).</p>
<p>The main difference between a “soft” and a (regular) “hard” deprecation is
that the soft deprecation does not imply scheduling the removal of the
deprecated API.</p>
</blockquote>
<p>I converted <strong>optparse</strong> deprecation to a <strong>soft deprecation</strong>.</p>
<p>I soft deprecated the <strong>getopt</strong> module: it remains available and maintained,
but argparse should be preferred for new projects.</p>
</div>
<div class="section" id="deprecate">
<h2>Deprecate</h2>
<p>I deprecated the <tt class="docutils literal">getmark()</tt>, <tt class="docutils literal">setmark()</tt> and <tt class="docutils literal">getmarkers()</tt> methods of
the Wave_read and Wave_write classes. These methods only existed for
compatibility with the aifc module, but they did nothing or always failed, and
the aifc module was removed in Python 3.13.</p>
<p>I also deprecated <tt class="docutils literal">SetPointerType()</tt> and <tt class="docutils literal">ARRAY()</tt> functions of ctypes.</p>
</div>
<div class="section" id="c-api-remove">
<h2>C API: Remove</h2>
<ul class="simple">
<li>I removed the following old functions to configure the Python initialization,
that I deprecated in Python 3.11:<ul>
<li>PySys_AddWarnOptionUnicode()</li>
<li>PySys_AddWarnOption()</li>
<li>PySys_AddXOption()</li>
<li>PySys_HasWarnOptions()</li>
<li>PySys_SetArgvEx()</li>
<li>PySys_SetArgv()</li>
<li>PySys_SetPath()</li>
<li>Py_SetPath()</li>
<li>Py_SetProgramName()</li>
<li>Py_SetPythonHome()</li>
<li>Py_SetStandardStreamEncoding()</li>
<li>_Py_SetProgramFullPath()</li>
</ul>
</li>
<li>I also deprecated removed "call" functions:<ul>
<li>PyCFunction_Call()</li>
<li>PyEval_CallFunction()</li>
<li>PyEval_CallMethod()</li>
<li>PyEval_CallObject()</li>
<li>PyEval_CallObjectWithKeywords()</li>
</ul>
</li>
<li>I removed deprecated PyEval_AcquireLock() and PyEval_InitThreads() functions.</li>
<li>Remove old aliases which were kept backwards compatibility with Python 3.8:<ul>
<li>_PyObject_CallMethodNoArgs()</li>
<li>_PyObject_CallMethodOneArg()</li>
<li>_PyObject_CallOneArg()</li>
<li>_PyObject_FastCallDict()</li>
<li>_PyObject_Vectorcall()</li>
<li>_PyObject_VectorcallMethod()</li>
<li>_PyVectorcall_Function()</li>
</ul>
</li>
</ul>
</div>
<div class="section" id="remove">
<h2>Remove</h2>
<p>I removed <strong>locale.resetlocale()</strong> function, but I failed to remove
locale.getdefaultlocale() in Python 3.13: INADA-san asked me to keep it.</p>
<p>I removed the untested and not documented <strong>logging.Logger.warn()</strong> method.</p>
<p>Oh, I forgot to remove <strong>cafile</strong>, <strong>capath</strong> and <strong>cadefault</strong> parameters of
the <strong>urllib.request.urlopen()</strong> function: it's now also done in Python 3.13. I
removed similar parameters in many other modules in Python 3.12.</p>
</div>
<div class="section" id="cleanup">
<h2>Cleanup</h2>
<p>As usual, I removed a bunch of unused imports (in the stdlib, tests and tools).</p>
<p>I reimplemented xmlrpc.client <tt class="docutils literal">_iso8601_format()</tt> function with
<tt class="docutils literal">datetime.datetime.isoformat()</tt>. The timezone is ignored on purpose: the
XML-RPC specification doesn't explain how to handle it, many implementations
ignore it.</p>
</div>
<div class="section" id="port-imp-code-to-importlib">
<h2>Port imp code to importlib</h2>
<p>The importlib module was added to Python 3.1 and it became the default
in Python 3.3. The imp module was deprecated in Python 3.4 but was only removed
in Python 3.12. Replacing imp code with importlib is not trivial: importlib
has a different design and API.</p>
<p>I wrote documentation on how to port imp code to importlib in <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.12.html#removed">What's New in
Python 3.12</a>.</p>
<p>I proposed <a class="reference external" href="https://github.com/python/cpython/pull/105755">adding importlib.util.load_source_path() function</a>, but I understood that the
devil is in details: it's hard to decide how to handle the <tt class="docutils literal">sys.modules</tt>
cache. I gave up and instead added a recipe in the What's New in Python 3.12
documentation:</p>
<pre class="literal-block">
import importlib.util
import importlib.machinery
def load_source(modname, filename):
loader = importlib.machinery.SourceFileLoader(modname, filename)
spec = importlib.util.spec_from_file_location(modname, filename, loader=loader)
module = importlib.util.module_from_spec(spec)
# The module is always executed and not cached in sys.modules.
# Uncomment the following line to cache the module.
# sys.modules[module.__name__] = module
loader.exec_module(module)
return module
</pre>
<p>There are many projects affected by the imp removal and porting them is not
easy. See <a class="reference external" href="https://discuss.python.org/t/how-do-i-migrate-from-imp/27885">How do I migrate from imp?</a> discussion.</p>
</div>
<div class="section" id="c-api-remove-private-functions">
<h2>C API: Remove private functions</h2>
<p>Last but not least, in <a class="reference external" href="https://github.com/python/cpython/issues/106320">issue #106320</a>, I <strong>removed</strong> not less
than <strong>181 private C API functions</strong>.</p>
<p>As a reaction to my changes, a discussion was started to propose <a class="reference external" href="https://discuss.python.org/t/pssst-lets-treat-all-api-in-public-headers-as-public/28916">treating
private functions as public functions</a>.</p>
<p>I'm now working on identifying projects affected by these removals and on
proposing solutions for the most commonly used removed functions like the
<tt class="docutils literal">_PyObject_Vectorcall()</tt> alias.</p>
<p>The list of the 181 removed private C API functions:</p>
<ul class="simple">
<li><tt class="docutils literal">_PyArg_NoKwnames()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_Alloc()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_Dealloc()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_Finish()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_Init()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_Prepare()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_Resize()</tt></li>
<li><tt class="docutils literal">_PyBytesWriter_WriteBytes()</tt></li>
<li><tt class="docutils literal">_PyCodecInfo_GetIncrementalDecoder()</tt></li>
<li><tt class="docutils literal">_PyCodecInfo_GetIncrementalEncoder()</tt></li>
<li><tt class="docutils literal">_PyCodec_DecodeText()</tt></li>
<li><tt class="docutils literal">_PyCodec_EncodeText()</tt></li>
<li><tt class="docutils literal">_PyCodec_Forget()</tt></li>
<li><tt class="docutils literal">_PyCodec_Lookup()</tt></li>
<li><tt class="docutils literal">_PyCodec_LookupTextEncoding()</tt></li>
<li><tt class="docutils literal">_PyComplex_FormatAdvancedWriter()</tt></li>
<li><tt class="docutils literal">_PyDeadline_Get()</tt></li>
<li><tt class="docutils literal">_PyDeadline_Init()</tt></li>
<li><tt class="docutils literal">_PyErr_CheckSignals()</tt></li>
<li><tt class="docutils literal">_PyErr_FormatFromCause()</tt></li>
<li><tt class="docutils literal">_PyErr_GetExcInfo()</tt></li>
<li><tt class="docutils literal">_PyErr_GetHandledException()</tt></li>
<li><tt class="docutils literal">_PyErr_GetTopmostException()</tt></li>
<li><tt class="docutils literal">_PyErr_ProgramDecodedTextObject()</tt></li>
<li><tt class="docutils literal">_PyErr_SetHandledException()</tt></li>
<li><tt class="docutils literal">_PyException_AddNote()</tt></li>
<li><tt class="docutils literal">_PyImport_AcquireLock()</tt></li>
<li><tt class="docutils literal">_PyImport_FixupBuiltin()</tt></li>
<li><tt class="docutils literal">_PyImport_FixupExtensionObject()</tt></li>
<li><tt class="docutils literal">_PyImport_GetModuleAttr()</tt></li>
<li><tt class="docutils literal">_PyImport_GetModuleAttrString()</tt></li>
<li><tt class="docutils literal">_PyImport_GetModuleId()</tt></li>
<li><tt class="docutils literal">_PyImport_IsInitialized()</tt></li>
<li><tt class="docutils literal">_PyImport_ReleaseLock()</tt></li>
<li><tt class="docutils literal">_PyImport_SetModule()</tt></li>
<li><tt class="docutils literal">_PyImport_SetModuleString()</tt></li>
<li><tt class="docutils literal">_PyInterpreterState_Get()</tt></li>
<li><tt class="docutils literal">_PyInterpreterState_GetConfig()</tt></li>
<li><tt class="docutils literal">_PyInterpreterState_GetConfigCopy()</tt></li>
<li><tt class="docutils literal">_PyInterpreterState_GetMainModule()</tt></li>
<li><tt class="docutils literal">_PyInterpreterState_HasFeature()</tt></li>
<li><tt class="docutils literal">_PyInterpreterState_SetConfig()</tt></li>
<li><tt class="docutils literal">_PyLong_AsTime_t()</tt></li>
<li><tt class="docutils literal">_PyLong_FromTime_t()</tt></li>
<li><tt class="docutils literal">_PyModule_CreateInitialized()</tt></li>
<li><tt class="docutils literal">_PyOS_URandom()</tt></li>
<li><tt class="docutils literal">_PyOS_URandomNonblock()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethod()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethodId()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethodIdNoArgs()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethodIdObjArgs()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethodIdOneArg()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethodNoArgs()</tt></li>
<li><tt class="docutils literal">_PyObject_CallMethodOneArg()</tt></li>
<li><tt class="docutils literal">_PyObject_CallOneArg()</tt></li>
<li><tt class="docutils literal">_PyObject_FastCallDict()</tt></li>
<li><tt class="docutils literal">_PyObject_HasLen()</tt></li>
<li><tt class="docutils literal">_PyObject_MakeTpCall()</tt></li>
<li><tt class="docutils literal">_PyObject_RealIsInstance()</tt></li>
<li><tt class="docutils literal">_PyObject_RealIsSubclass()</tt></li>
<li><tt class="docutils literal">_PyObject_Vectorcall()</tt></li>
<li><tt class="docutils literal">_PyObject_VectorcallMethod()</tt></li>
<li><tt class="docutils literal">_PyObject_VectorcallMethodId()</tt></li>
<li><tt class="docutils literal">_PySequence_BytesToCharpArray()</tt></li>
<li><tt class="docutils literal">_PySequence_IterSearch()</tt></li>
<li><tt class="docutils literal">_PyStack_AsDict()</tt></li>
<li><tt class="docutils literal">_PyThreadState_GetDict()</tt></li>
<li><tt class="docutils literal">_PyThreadState_Prealloc()</tt></li>
<li><tt class="docutils literal">_PyThread_CurrentExceptions()</tt></li>
<li><tt class="docutils literal">_PyThread_CurrentFrames()</tt></li>
<li><tt class="docutils literal">_PyTime_Add()</tt></li>
<li><tt class="docutils literal">_PyTime_As100Nanoseconds()</tt></li>
<li><tt class="docutils literal">_PyTime_AsMicroseconds()</tt></li>
<li><tt class="docutils literal">_PyTime_AsMilliseconds()</tt></li>
<li><tt class="docutils literal">_PyTime_AsNanoseconds()</tt></li>
<li><tt class="docutils literal">_PyTime_AsNanosecondsObject()</tt></li>
<li><tt class="docutils literal">_PyTime_AsSecondsDouble()</tt></li>
<li><tt class="docutils literal">_PyTime_AsTimespec()</tt></li>
<li><tt class="docutils literal">_PyTime_AsTimespec_clamp()</tt></li>
<li><tt class="docutils literal">_PyTime_AsTimeval()</tt></li>
<li><tt class="docutils literal">_PyTime_AsTimevalTime_t()</tt></li>
<li><tt class="docutils literal">_PyTime_AsTimeval_clamp()</tt></li>
<li><tt class="docutils literal">_PyTime_FromMicrosecondsClamp()</tt></li>
<li><tt class="docutils literal">_PyTime_FromMillisecondsObject()</tt></li>
<li><tt class="docutils literal">_PyTime_FromNanoseconds()</tt></li>
<li><tt class="docutils literal">_PyTime_FromNanosecondsObject()</tt></li>
<li><tt class="docutils literal">_PyTime_FromSeconds()</tt></li>
<li><tt class="docutils literal">_PyTime_FromSecondsObject()</tt></li>
<li><tt class="docutils literal">_PyTime_FromTimespec()</tt></li>
<li><tt class="docutils literal">_PyTime_FromTimeval()</tt></li>
<li><tt class="docutils literal">_PyTime_GetMonotonicClock()</tt></li>
<li><tt class="docutils literal">_PyTime_GetMonotonicClockWithInfo()</tt></li>
<li><tt class="docutils literal">_PyTime_GetPerfCounter()</tt></li>
<li><tt class="docutils literal">_PyTime_GetPerfCounterWithInfo()</tt></li>
<li><tt class="docutils literal">_PyTime_GetSystemClock()</tt></li>
<li><tt class="docutils literal">_PyTime_GetSystemClockWithInfo()</tt></li>
<li><tt class="docutils literal">_PyTime_MulDiv()</tt></li>
<li><tt class="docutils literal">_PyTime_ObjectToTime_t()</tt></li>
<li><tt class="docutils literal">_PyTime_ObjectToTimespec()</tt></li>
<li><tt class="docutils literal">_PyTime_ObjectToTimeval()</tt></li>
<li><tt class="docutils literal">_PyTime_gmtime()</tt></li>
<li><tt class="docutils literal">_PyTime_localtime()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_ClearTraces()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_GetMemory()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_GetObjectTraceback()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_GetTraceback()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_GetTracebackLimit()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_GetTracedMemory()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_GetTraces()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_Init()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_IsTracing()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_ResetPeak()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_Start()</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_Stop()</tt></li>
<li><tt class="docutils literal">_PyUnicodeTranslateError_Create()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_Dealloc()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_Finish()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_Init()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_PrepareInternal()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_PrepareKindInternal()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_WriteASCIIString()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_WriteChar()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_WriteLatin1String()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_WriteStr()</tt></li>
<li><tt class="docutils literal">_PyUnicodeWriter_WriteSubstring()</tt></li>
<li><tt class="docutils literal">_PyUnicode_AsASCIIString()</tt></li>
<li><tt class="docutils literal">_PyUnicode_AsLatin1String()</tt></li>
<li><tt class="docutils literal">_PyUnicode_AsUTF8String()</tt></li>
<li><tt class="docutils literal">_PyUnicode_CheckConsistency()</tt></li>
<li><tt class="docutils literal">_PyUnicode_Copy()</tt></li>
<li><tt class="docutils literal">_PyUnicode_DecodeRawUnicodeEscapeStateful()</tt></li>
<li><tt class="docutils literal">_PyUnicode_DecodeUnicodeEscapeInternal()</tt></li>
<li><tt class="docutils literal">_PyUnicode_DecodeUnicodeEscapeStateful()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EQ()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EncodeCharmap()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EncodeUTF16()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EncodeUTF32()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EncodeUTF7()</tt></li>
<li><tt class="docutils literal">_PyUnicode_Equal()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EqualToASCIIId()</tt></li>
<li><tt class="docutils literal">_PyUnicode_EqualToASCIIString()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FastCopyCharacters()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FastFill()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FindMaxChar ()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FormatAdvancedWriter()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FormatLong()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FromASCII()</tt></li>
<li><tt class="docutils literal">_PyUnicode_FromId()</tt></li>
<li><tt class="docutils literal">_PyUnicode_InsertThousandsGrouping()</tt></li>
<li><tt class="docutils literal">_PyUnicode_JoinArray()</tt></li>
<li><tt class="docutils literal">_PyUnicode_ScanIdentifier()</tt></li>
<li><tt class="docutils literal">_PyUnicode_TransformDecimalAndSpaceToASCII()</tt></li>
<li><tt class="docutils literal">_PyUnicode_WideCharString_Converter()</tt></li>
<li><tt class="docutils literal">_PyUnicode_WideCharString_Opt_Converter()</tt></li>
<li><tt class="docutils literal">_PyUnicode_XStrip()</tt></li>
<li><tt class="docutils literal">_PyVectorcall_Function()</tt></li>
<li><tt class="docutils literal">_Py_AtExit()</tt></li>
<li><tt class="docutils literal">_Py_CheckFunctionResult()</tt></li>
<li><tt class="docutils literal">_Py_CoerceLegacyLocale()</tt></li>
<li><tt class="docutils literal">_Py_FatalErrorFormat()</tt></li>
<li><tt class="docutils literal">_Py_FdIsInteractive()</tt></li>
<li><tt class="docutils literal">_Py_FreeCharPArray()</tt></li>
<li><tt class="docutils literal">_Py_GetConfig()</tt></li>
<li><tt class="docutils literal">_Py_IsCoreInitialized()</tt></li>
<li><tt class="docutils literal">_Py_IsFinalizing()</tt></li>
<li><tt class="docutils literal">_Py_IsInterpreterFinalizing()</tt></li>
<li><tt class="docutils literal">_Py_LegacyLocaleDetected()</tt></li>
<li><tt class="docutils literal">_Py_RestoreSignals()</tt></li>
<li><tt class="docutils literal">_Py_SetLocaleFromEnv()</tt></li>
<li><tt class="docutils literal">_Py_VaBuildStack()</tt></li>
<li><tt class="docutils literal">_Py_add_one_to_index_C()</tt></li>
<li><tt class="docutils literal">_Py_add_one_to_index_F()</tt></li>
<li><tt class="docutils literal">_Py_c_abs()</tt></li>
<li><tt class="docutils literal">_Py_c_diff()</tt></li>
<li><tt class="docutils literal">_Py_c_neg()</tt></li>
<li><tt class="docutils literal">_Py_c_pow()</tt></li>
<li><tt class="docutils literal">_Py_c_prod()</tt></li>
<li><tt class="docutils literal">_Py_c_quot()</tt></li>
<li><tt class="docutils literal">_Py_c_sum()</tt></li>
<li><tt class="docutils literal">_Py_gitidentifier()</tt></li>
<li><tt class="docutils literal">_Py_gitversion()</tt></li>
</ul>
</div>
Convert macros to functions in the Python C API2022-12-12T23:00:00+01:002022-12-12T23:00:00+01:00Victor Stinnertag:vstinner.github.io,2022-12-12:/c-api-convert-macros-functions.html<a class="reference external image-reference" href="https://www.exemplaire-editions.fr/librairie/livre/loeil-du-cyclone"><img alt="L'oeil du cyclone - Théo Grosjean" src="https://vstinner.github.io/images/loeil_cyclone.jpg" /></a>
<p><em>Drawing: "L'oeil du cyclone" by Théo Grosjean.</em></p>
<div class="section" id="convert-macros-to-functions">
<h2>Convert macros to functions</h2>
<p>For 4 years, between Python 3.7 (2018) and Python 3.12 (2022), I made many
changes on macros in the Python C API to make the API less error prone (avoid
<a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">macro pitfalls</a>) and
better define the API …</p></div><a class="reference external image-reference" href="https://www.exemplaire-editions.fr/librairie/livre/loeil-du-cyclone"><img alt="L'oeil du cyclone - Théo Grosjean" src="https://vstinner.github.io/images/loeil_cyclone.jpg" /></a>
<p><em>Drawing: "L'oeil du cyclone" by Théo Grosjean.</em></p>
<div class="section" id="convert-macros-to-functions">
<h2>Convert macros to functions</h2>
<p>For 4 years, between Python 3.7 (2018) and Python 3.12 (2022), I made many
changes on macros in the Python C API to make the API less error prone (avoid
<a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">macro pitfalls</a>) and
better define the API (parameter types and return types, variable scope, etc.).
<a class="reference external" href="https://peps.python.org/pep-0670/">PEP 670</a> "Convert macros to functions in
the Python C API" describes in length the rationale of these changes.</p>
<p>I moved private functions to the internal C API to reduce the C API size.</p>
<p>Some changes are also related to preparing the API to make members of
structures like <tt class="docutils literal">PyObject</tt> or <tt class="docutils literal">PyTypeObject</tt> private.</p>
<p>Converting macros and static inline functions to regular functions hides
implementation details and bends the API towards the limited C API and the
stable ABI (build a C extension once, use the binary on multiple Python
versions). Regular functions are usable in programming languages and use cases
which cannot use C macros and C static inline functions.</p>
<p>Most macros are converted to static inline functions, rather regular functions,
to have no impact on performance.</p>
<p>This work was made incrementally in 5 Python versions (3.8, 3.9, 3.10, 3.11 and
3.12) to limit the number of impacted projects at each Python release.</p>
<p>Changing <tt class="docutils literal">Py_TYPE()</tt> and <tt class="docutils literal">Py_SIZE()</tt> macros impacted most projects. Python
3.11 contains the change. During Python 3.10 development cycle, the change has
to be reverted since it impacted too many projects.</p>
<p>Note: I didn't modify all macros and functions listed in this article, it's a
collaborative work as usual.</p>
</div>
<div class="section" id="statistics">
<h2>Statistics</h2>
<p><a class="reference external" href="https://pythoncapi.readthedocs.io/stats.html">Statistics on public functions</a>:</p>
<ul class="simple">
<li>Python 3.7: 893 regular functions, 315 macros.</li>
<li>Python 3.12: 943 regular functions, 246 macros, 69 static inline functions.</li>
</ul>
<p>Cumulative changes on macros between Python 3.7 and Python 3.12 on public,
private and internal APIs:</p>
<ul class="simple">
<li>Converted 88 macros to static inline functions</li>
<li>Converted 11 macros to regular functions</li>
<li>Converted 3 static inline functions to regular functions:</li>
<li>Removed 47 macros</li>
</ul>
<p>See <a class="reference external" href="https://pythoncapi.readthedocs.io/stats.html">Statistics on the Python C API</a> for more numbers.</p>
</div>
<div class="section" id="python-3-12">
<h2>Python 3.12</h2>
<p>Convert 39 macros to static inline functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyCell_GET()</tt></li>
<li><tt class="docutils literal">PyCell_SET()</tt></li>
<li><tt class="docutils literal">PyCode_GetNumFree()</tt></li>
<li><tt class="docutils literal">PyDict_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyFloat_AS_DOUBLE()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_ANNOTATIONS()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_CLOSURE()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_CODE()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_DEFAULTS()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_GLOBALS()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_KW_DEFAULTS()</tt></li>
<li><tt class="docutils literal">PyFunction_GET_MODULE()</tt></li>
<li><tt class="docutils literal">PyInstanceMethod_GET_FUNCTION()</tt></li>
<li><tt class="docutils literal">PyMemoryView_GET_BASE()</tt></li>
<li><tt class="docutils literal">PyMemoryView_GET_BUFFER()</tt></li>
<li><tt class="docutils literal">PyMethod_GET_FUNCTION()</tt></li>
<li><tt class="docutils literal">PyMethod_GET_SELF()</tt></li>
<li><tt class="docutils literal">PySet_GET_SIZE()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_HIGH_SURROGATE()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_ISALNUM()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_ISSPACE()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_IS_HIGH_SURROGATE()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_IS_LOW_SURROGATE()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_IS_SURROGATE()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_JOIN_SURROGATES()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_LOW_SURROGATE()</tt></li>
<li><tt class="docutils literal">_PyGCHead_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyGCHead_NEXT()</tt></li>
<li><tt class="docutils literal">_PyGCHead_PREV()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_NEXT()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_PREV()</tt></li>
<li><tt class="docutils literal">_PyGC_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyGC_SET_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyObject_GC_IS_TRACKED()</tt></li>
<li><tt class="docutils literal">_PyObject_GC_MAY_BE_TRACKED()</tt></li>
<li><tt class="docutils literal">_PyObject_SIZE()</tt></li>
<li><tt class="docutils literal">_PyObject_VAR_SIZE()</tt></li>
<li><tt class="docutils literal">_Py_AS_GC()</tt></li>
</ul>
<p>Remove 5 macros:</p>
<ul class="simple">
<li><tt class="docutils literal">PyUnicode_AS_DATA()</tt></li>
<li><tt class="docutils literal">PyUnicode_AS_UNICODE()</tt></li>
<li><tt class="docutils literal">PyUnicode_GET_DATA_SIZE()</tt></li>
<li><tt class="docutils literal">PyUnicode_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyUnicode_WSTR_LENGTH()</tt></li>
</ul>
<p>The following 4 macros can be used as l-values in Python 3.12:</p>
<ul class="simple">
<li><tt class="docutils literal">PyList_GET_ITEM()</tt></li>
<li><tt class="docutils literal">PyTuple_GET_ITEM()</tt>:</li>
<li><tt class="docutils literal">PyDescr_NAME()</tt></li>
<li><tt class="docutils literal">PyDescr_TYPE()</tt></li>
</ul>
<p>Code pattern like <tt class="docutils literal">&PyTuple_GET_ITEM(tuple, 0)</tt> and <tt class="docutils literal">&PyList_GET_ITEM(list,
0)</tt> is still commonly used to get a direct access to items as <tt class="docutils literal">PyObject**</tt>.
<tt class="docutils literal">PyDescr_NAME()</tt> and <tt class="docutils literal">PyDescr_TYPE()</tt> are used by SWIG: see
<a class="reference external" href="https://bugs.python.org/issue46538">https://bugs.python.org/issue46538</a></p>
</div>
<div class="section" id="python-3-11">
<h2>Python 3.11</h2>
<p>Convert 33 macros to static inline functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyByteArray_AS_STRING()</tt></li>
<li><tt class="docutils literal">PyByteArray_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyBytes_AS_STRING()</tt></li>
<li><tt class="docutils literal">PyBytes_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyCFunction_GET_CLASS()</tt></li>
<li><tt class="docutils literal">PyCFunction_GET_FLAGS()</tt></li>
<li><tt class="docutils literal">PyCFunction_GET_FUNCTION()</tt></li>
<li><tt class="docutils literal">PyCFunction_GET_SELF()</tt></li>
<li><tt class="docutils literal">PyList_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyList_SET_ITEM()</tt></li>
<li><tt class="docutils literal">PyTuple_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyTuple_SET_ITEM()</tt></li>
<li><tt class="docutils literal">PyUnicode_AS_DATA()</tt></li>
<li><tt class="docutils literal">PyUnicode_AS_UNICODE()</tt></li>
<li><tt class="docutils literal">PyUnicode_CHECK_INTERNED()</tt></li>
<li><tt class="docutils literal">PyUnicode_DATA()</tt></li>
<li><tt class="docutils literal">PyUnicode_GET_DATA_SIZE()</tt></li>
<li><tt class="docutils literal">PyUnicode_GET_LENGTH()</tt></li>
<li><tt class="docutils literal">PyUnicode_GET_SIZE()</tt></li>
<li><tt class="docutils literal">PyUnicode_IS_ASCII()</tt></li>
<li><tt class="docutils literal">PyUnicode_IS_COMPACT()</tt></li>
<li><tt class="docutils literal">PyUnicode_IS_COMPACT_ASCII()</tt></li>
<li><tt class="docutils literal">PyUnicode_IS_READY()</tt></li>
<li><tt class="docutils literal">PyUnicode_MAX_CHAR_VALUE()</tt></li>
<li><tt class="docutils literal">PyUnicode_READ()</tt></li>
<li><tt class="docutils literal">PyUnicode_READY()</tt></li>
<li><tt class="docutils literal">PyUnicode_READ_CHAR()</tt></li>
<li><tt class="docutils literal">PyUnicode_WRITE()</tt></li>
<li><tt class="docutils literal">PyWeakref_GET_OBJECT()</tt></li>
<li><tt class="docutils literal">Py_SIZE()</tt>: <tt class="docutils literal">Py_SET_SIZE()</tt> must be used to set an object size</li>
<li><tt class="docutils literal">Py_TYPE()</tt>: <tt class="docutils literal">Py_SET_TYPE()</tt> must be used to set an object type</li>
<li><tt class="docutils literal">_PyUnicode_COMPACT_DATA()</tt></li>
<li><tt class="docutils literal">_PyUnicode_NONCOMPACT_DATA()</tt></li>
</ul>
<p>Convert 2 macros to regular functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyType_SUPPORTS_WEAKREFS()</tt></li>
<li><tt class="docutils literal">Py_GETENV()</tt></li>
</ul>
<p>Remove 11 macros:</p>
<ul class="simple">
<li>Moved to the internal C API:<ul>
<li><tt class="docutils literal">PyHeapType_GET_MEMBERS()</tt>: renamed to <tt class="docutils literal">_PyHeapType_GET_MEMBERS()</tt></li>
<li><tt class="docutils literal">_Py_InIntegralTypeRange()</tt></li>
<li><tt class="docutils literal">_Py_IntegralTypeMax()</tt></li>
<li><tt class="docutils literal">_Py_IntegralTypeMin()</tt></li>
<li><tt class="docutils literal">_Py_IntegralTypeSigned()</tt></li>
</ul>
</li>
<li><tt class="docutils literal">PyFunction_AS_FRAME_CONSTRUCTOR()</tt></li>
<li><tt class="docutils literal">Py_FORCE_DOUBLE()</tt></li>
<li><tt class="docutils literal">Py_OVERFLOWED()</tt></li>
<li><tt class="docutils literal">Py_SET_ERANGE_IF_OVERFLOW()</tt></li>
<li><tt class="docutils literal">Py_SET_ERRNO_ON_MATH_ERROR()</tt></li>
<li><tt class="docutils literal">_Py_SET_EDOM_FOR_NAN()</tt></li>
</ul>
<p>Add <tt class="docutils literal">_Py_RVALUE()</tt> to 7 macros to disallow using them as l-value:</p>
<ul class="simple">
<li><tt class="docutils literal">_PyGCHead_SET_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_NEXT()</tt></li>
<li><tt class="docutils literal">asdl_seq_GET()</tt></li>
<li><tt class="docutils literal">asdl_seq_GET_UNTYPED()</tt></li>
<li><tt class="docutils literal">asdl_seq_LEN()</tt></li>
<li><tt class="docutils literal">asdl_seq_SET()</tt></li>
<li><tt class="docutils literal">asdl_seq_SET_UNTYPED()</tt></li>
</ul>
<p>Note: the <tt class="docutils literal">PyCell_SET()</tt> macro was modified to use <tt class="docutils literal">_Py_RVALUE()</tt>, but it
already used <tt class="docutils literal">(void)</tt> in Python 3.10.</p>
</div>
<div class="section" id="python-3-10">
<h2>Python 3.10</h2>
<p>Convert 3 macros to regular functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyDescr_IsData()</tt></li>
<li><tt class="docutils literal">PyExceptionClass_Name()</tt></li>
<li><tt class="docutils literal">PyIter_Check()</tt></li>
</ul>
<p>Convert 2 macros to static inline functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyObject_TypeCheck()</tt></li>
<li><tt class="docutils literal">Py_REFCNT()</tt>: <tt class="docutils literal">Py_SET_REFCNT()</tt> must be used to set an object reference
count</li>
</ul>
<p>Remove 6 macros:</p>
<ul class="simple">
<li><tt class="docutils literal">PyAST_Compile()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseFile()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseString()</tt></li>
<li><tt class="docutils literal">PySTEntry_Check()</tt>: moved to the internal C API</li>
<li><tt class="docutils literal">_PyErr_OCCURRED()</tt></li>
<li><tt class="docutils literal">_PyList_ITEMS()</tt>: moved to the internal C API</li>
</ul>
<p>Modify 3 macros to disallow using them as l-values by adding <tt class="docutils literal">(void)</tt> cast:</p>
<ul class="simple">
<li><tt class="docutils literal">PyCell_SET()</tt></li>
<li><tt class="docutils literal">PyList_SET_ITEM()</tt></li>
<li><tt class="docutils literal">PyTuple_SET_ITEM()</tt></li>
</ul>
</div>
<div class="section" id="python-3-9">
<h2>Python 3.9</h2>
<p>Convert 6 macros to regular functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyIndex_Check()</tt></li>
<li><tt class="docutils literal">PyObject_CheckBuffer()</tt></li>
<li><tt class="docutils literal">PyObject_GET_WEAKREFS_LISTPTR()</tt></li>
<li><tt class="docutils literal">PyObject_IS_GC()</tt></li>
<li><tt class="docutils literal">Py_EnterRecursiveCall()</tt></li>
<li><tt class="docutils literal">Py_LeaveRecursiveCall()</tt></li>
</ul>
<p>Convert 5 macros to static inline functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyType_Check()</tt></li>
<li><tt class="docutils literal">PyType_CheckExact()</tt></li>
<li><tt class="docutils literal">PyType_HasFeature()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_COPY()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_FILL()</tt></li>
</ul>
<p>Convert 3 static inline functions to regular functions:</p>
<ul class="simple">
<li><tt class="docutils literal">_Py_Dealloc()</tt></li>
<li><tt class="docutils literal">_Py_ForgetReference()</tt></li>
<li><tt class="docutils literal">_Py_NewReference()</tt></li>
</ul>
<p>Remove 18 macros:</p>
<ul class="simple">
<li>Moved to the internal C API:<ul>
<li><tt class="docutils literal">PyDoc_STRVAR_shared()</tt>:</li>
<li><tt class="docutils literal">PyObject_GC_IS_TRACKED()</tt></li>
<li><tt class="docutils literal">PyObject_GC_MAY_BE_TRACKED()</tt></li>
<li><tt class="docutils literal">Py_AS_GC()</tt></li>
<li><tt class="docutils literal">_PyGCHead_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyGCHead_NEXT()</tt></li>
<li><tt class="docutils literal">_PyGCHead_PREV()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_FINALIZED()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_NEXT()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_PREV()</tt></li>
<li><tt class="docutils literal">_PyGC_SET_FINALIZED()</tt></li>
</ul>
</li>
<li><tt class="docutils literal">Py_UNICODE_MATCH()</tt></li>
<li><tt class="docutils literal">_Py_DEC_TPFREES()</tt></li>
<li><tt class="docutils literal">_Py_INC_TPALLOCS()</tt></li>
<li><tt class="docutils literal">_Py_INC_TPFREES()</tt></li>
<li><tt class="docutils literal">_Py_MakeEndRecCheck()</tt></li>
<li><tt class="docutils literal">_Py_MakeRecCheck()</tt></li>
<li><tt class="docutils literal">_Py_RecursionLimitLowerWaterMark()</tt></li>
</ul>
</div>
<div class="section" id="python-3-8">
<h2>Python 3.8</h2>
<p>Convert 9 macros to static inline functions:</p>
<ul class="simple">
<li><tt class="docutils literal">Py_DECREF()</tt></li>
<li><tt class="docutils literal">Py_INCREF()</tt></li>
<li><tt class="docutils literal">Py_XDECREF()</tt></li>
<li><tt class="docutils literal">Py_XINCREF()</tt></li>
<li><tt class="docutils literal">_PyObject_CallNoArg()</tt></li>
<li><tt class="docutils literal">_PyObject_FastCall()</tt></li>
<li><tt class="docutils literal">_Py_Dealloc()</tt></li>
<li><tt class="docutils literal">_Py_ForgetReference()</tt></li>
<li><tt class="docutils literal">_Py_NewReference()</tt></li>
</ul>
<p>Remove 7 macros:</p>
<ul class="simple">
<li><tt class="docutils literal">_PyGCHead_DECREF()</tt></li>
<li><tt class="docutils literal">_PyGCHead_REFS()</tt></li>
<li><tt class="docutils literal">_PyGCHead_SET_REFS()</tt></li>
<li><tt class="docutils literal">_PyGC_REFS()</tt></li>
<li><tt class="docutils literal">_PyObject_GC_TRACK()</tt>: moved to the internal C API</li>
<li><tt class="docutils literal">_PyObject_GC_UNTRACK()</tt>: moved to the internal C API</li>
<li><tt class="docutils literal">_Py_CHECK_REFCNT()</tt></li>
</ul>
</div>
Debug a Python reference leak2022-11-04T13:00:00+01:002022-11-04T13:00:00+01:00Victor Stinnertag:vstinner.github.io,2022-11-04:/debug-python-refleak.html<a class="reference external image-reference" href="https://twitter.com/djamilaknopf/status/1587441869403099136"><img alt="Childhood memories in the countryside" src="https://vstinner.github.io/images/refleak.jpg" /></a>
<p>This morning, I got <a class="reference external" href="https://mail.python.org/archives/list/buildbot-status@python.org/message/MU2EJRTFF4ZCYTDXYER7KCL3IQUM5F3T/">this email</a>
from the buildbot-status mailing list:</p>
<blockquote>
The Buildbot has detected a new failure on builder PPC64LE Fedora Rawhide
<strong>Refleaks</strong> 3.x while building Python.</blockquote>
<p>I get many of buildbot failures per month (by email), but I like to debug
reference leaks: they are more challenging …</p><a class="reference external image-reference" href="https://twitter.com/djamilaknopf/status/1587441869403099136"><img alt="Childhood memories in the countryside" src="https://vstinner.github.io/images/refleak.jpg" /></a>
<p>This morning, I got <a class="reference external" href="https://mail.python.org/archives/list/buildbot-status@python.org/message/MU2EJRTFF4ZCYTDXYER7KCL3IQUM5F3T/">this email</a>
from the buildbot-status mailing list:</p>
<blockquote>
The Buildbot has detected a new failure on builder PPC64LE Fedora Rawhide
<strong>Refleaks</strong> 3.x while building Python.</blockquote>
<p>I get many of buildbot failures per month (by email), but I like to debug
reference leaks: they are more challenging :-) I decided to write this article
to document and explain my work on maintaining Python (buildbots).</p>
<p>I truncated most the output of most commands in this article to make it easier
to read.</p>
<p>Drawing: <a class="reference external" href="https://twitter.com/djamilaknopf/status/1587441869403099136">Childhood memories in the countryside</a> by <a class="reference external" href="https://twitter.com/djamilaknopf/">Djamila
Knopf</a>.</p>
<div class="section" id="reproduce-the-bug">
<h2>Reproduce the bug</h2>
<p>I look into <a class="reference external" href="https://buildbot.python.org/all/#builders/300/builds/548">buildbot logs</a>:</p>
<pre class="literal-block">
test_int leaked [1, 1, 1] references, sum=3
</pre>
<p>Aha, interesting: the <tt class="docutils literal">test_int</tt> test leaks Python strong references, each
test iteration leaks exactly one reference. Well, in short, it leaks memory.</p>
<p>I build Python to check if the refleak is still there:</p>
<pre class="literal-block">
git switch main
make clean
./configure --with-pydebug
make
</pre>
<p>The main branch is currently at this commit:</p>
<pre class="literal-block">
$ git show main
commit 2844aa6a8eb1d486b5c432f0ed33a2082998f41e
(...)
</pre>
<p>I run the test with <tt class="docutils literal"><span class="pre">-R</span> 3:3</tt> to check for reference leaks:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_int
(...)
test_int leaked [1, 1, 1] references, sum=3
(...)
Total duration: 4.8 sec
</pre>
<p>Great! It's still there, it's real regression. I told you, I love this kind of
bugs :-)</p>
</div>
<div class="section" id="identify-which-test-leaks-test-bisect-cmd">
<h2>Identify which test leaks (test.bisect_cmd)</h2>
<pre class="literal-block">
$ ./python -m test test_int --list-cases|wc -l
42
$ wc -l Lib/test/test_int.py
885 Lib/test/test_int.py
</pre>
<p><tt class="docutils literal">test_int</tt> has only 42 methods and takes 4.8 seconds to run (with <tt class="docutils literal"><span class="pre">-R</span>
3:3</tt>). That's small, but the file is made of 885 lines of Python code. I'm
lazy, I don't want to read so many lines. I will use <tt class="docutils literal">python <span class="pre">-m</span>
test.bisect_cmd</tt> to identify which test method leaks so I have less test code
to read and reproducing the test will be even faster.</p>
<p>I run <tt class="docutils literal">python <span class="pre">-m</span> test.bisect_cmd</tt>:</p>
<pre class="literal-block">
$ ./python -m test.bisect_cmd -R 3:3 test_int
(...)
[+] Iteration 17: run 1 tests/2
(...)
test_int leaked [1, 1, 1] references, sum=3
(...)
* test.test_int.PyLongModuleTests.test_pylong_misbehavior_error_path_from_str
</pre>
<p>I love watching this tool doing my job, I don't have anything to do! :-)</p>
<p>I confirm that the <tt class="docutils literal">test_pylong_misbehavior_error_path_from_str()</tt> test
leaks:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
test_int leaked [1, 1, 1] references, sum=3
Total duration: 445 ms
</pre>
<p>The <tt class="docutils literal">test_pylong_misbehavior_error_path_from_str()</tt> method is only 17 lines
of code, it's way better than 885 lines of code (52x less code to read). And
reproducing the bug now only takes 445 ms instead of 4.8 seconds (10x faster).</p>
<p>At this point, there is the brave method of looking into the C code: Python is
made of 500 000 lines of C code. Good luck! Or maybe there is another way?</p>
</div>
<div class="section" id="git-bisection">
<h2>Git bisection</h2>
<p>Again, I'm lazy. I always begin with the "divide to conquer" method. A Git
bisection is an efficient method for that.</p>
<p>I start <tt class="docutils literal">git bisect</tt>:</p>
<pre class="literal-block">
git bisect reset
git bisect start --term-bad=leak --term-good=noleak
git bisect leak # we just saw that current commit leaks
</pre>
<p>Defining "good" and "bad" terms helps me a lot to prevent mistakes: it's a nice
Git bisect feature! In the past, I always picked the wrong one at some point
which messed up the whole bisection.</p>
<p>Ok, now how can I know when the leak was introduced? Well, I like to move in
the past step by step: one day, two days, one week, one month, one year, etc.</p>
<p>I pick a random commit merged yesterday:</p>
<pre class="literal-block">
$ date
Fri Nov 4 11:55:12 CET 2022
$ git log
(...)
commit 016c7d37b6acfe2203542a2655080c6402b3be1f
Date: Thu Nov 3 23:21:01 2022 +0000
(...)
commit 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead
Date: Thu Nov 3 16:18:38 2022 -0700
(...)
</pre>
<p>I'm not lucky at my first bet, the code already leaked yesterday:</p>
<pre class="literal-block">
$ git checkout 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead^C
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
test_int leaked [1, 1, 1] references, sum=3
</pre>
<p>I repeat the process, I pick a random commit the day before:</p>
<pre class="literal-block">
$ git log
(...)
commit f3007ac3702ea22c7dd0abf8692b1504ea3c9f63
Author: Victor Stinner <vstinner@python.org>
Date: Wed Nov 2 20:45:58 2022 +0100
(...)
</pre>
<p>For my greatest pleasure, I pick a commit made by myself. Maybe I'm lucky and
I'm the one who introduced the leak :-D</p>
<pre class="literal-block">
$ git checkout f3007ac3702ea22c7dd0abf8692b1504ea3c9f63
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
(...)
Tests result: NO TESTS RAN
</pre>
<p>"NO TESTS RAN" means that the test doesn't exist. Oh wait, the test didn't
exist 2 days ago? So the test itself is new? Well, no tests ran also means...
"no leak".</p>
<p>I will make the assumption that "NO TESTS RAN" means "no leak" and see what's
going on:</p>
<pre class="literal-block">
$ git bisect noleak
Bisecting: 13 revisions left to test after this (roughly 4 steps)
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
Tests result: NO TESTS RAN
$ git bisect noleak
Bisecting: 6 revisions left to test after this (roughly 3 steps)
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
Tests result: NO TESTS RAN
$ git bisect noleak
Bisecting: 3 revisions left to test after this (roughly 2 steps)
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
Tests result: NO TESTS RAN
$ git bisect noleak
Bisecting: 1 revision left to test after this (roughly 1 step)
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
test_int leaked [1, 1, 1] references, sum=3
$ git bisect leak
Bisecting: 0 revisions left to test after this (roughly 0 steps)
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
test_int leaked [1, 1, 1] references, sum=3
vstinner@mona$ git bisect leak
4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead is the first leak commit
commit 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead
Author: Gregory P. Smith <greg@krypto.org>
Date: Thu Nov 3 16:18:38 2022 -0700
gh-90716: bugfixes and more tests for _pylong. (#99073)
* Properly decref on _pylong import error.
* Improve the error message on _pylong TypeError.
* Fix the assertion error in pydebug builds to be a TypeError.
* Tie the return value comments together.
These are minor followups to issues not caught among the reviewers on
https://github.com/python/cpython/pull/96673.
Lib/test/test_int.py | 39 +++++++++++++++++++++++++++++++++++++++
Objects/longobject.c | 15 +++++++++++----
2 files changed, 50 insertions(+), 4 deletions(-)
</pre>
<p>In total, it took 7 <tt class="docutils literal">git bisect</tt> steps to identify a single commit. That's
quick! I also love this tool, I feel that it does my job!</p>
<p>Sometimes, I mess up with Git bisection. Here, <a class="reference external" href="https://github.com/python/cpython/commit/4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead">the guilty commit</a>
seems like a good candidate since it changes <tt class="docutils literal">Objects/longobject.c</tt> which is
C code, so it can likely introduce a leak. Moreover, this C file is the
implementation of the Python <tt class="docutils literal">int</tt> type, so it is directly related to
<tt class="docutils literal">test_int</tt> (the test suite of the <tt class="docutils literal">int</tt> type).</p>
<p>Just in case, I test manually the the leak before/after:</p>
<pre class="literal-block">
# after
$ git checkout 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
test_int leaked [1, 1, 1] references, sum=3
# before
$ git checkout 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead^
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
Tests result: NO TESTS RAN
</pre>
<p>Ok, there is no doubt anymore: the commit introduced the leak. But since the
commit also adds the leaking test, maybe the leak already existed, and it's
just that nobody noticed the leak before.</p>
</div>
<div class="section" id="debug-the-leak">
<h2>Debug the leak</h2>
<p>Since I identified the commit introducing the leak, I only have to review code
changes by this single commit. But to debug the code, I prefer to come back to
the main branch. To prepare a fix, I will have to start from the main branch
anyway.</p>
<p>Go back to the main branch:</p>
<pre class="literal-block">
$ git bisect reset
$ git switch main
</pre>
<p>The second command is useless, I was already at the main branch. I did some
many mistakes with Git in the past, that I took the habit of doing things very
carefully. I don't care of doing things twice, just in case. It's cheaper than
messing with the Git god! Trust me.</p>
<p>Just in case, I double check that the leak is still there in the main branch:</p>
<pre class="literal-block">
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
test_int leaked [1, 1, 1] references, sum=3
</pre>
<p>Ok, we are good to start debugging. Let me open Lib/test/test_int.py and look
for the test_pylong_misbehavior_error_path_from_str() method:</p>
<pre class="literal-block">
@support.cpython_only # tests implementation details of CPython.
@unittest.skipUnless(_pylong, "_pylong module required")
@mock.patch.object(_pylong, "int_from_string")
def test_pylong_misbehavior_error_path_from_str(
self, mock_int_from_str):
big_value = '7'*19_999
with support.adjust_int_max_str_digits(20_000):
mock_int_from_str.return_value = b'not an int'
with self.assertRaises(TypeError) as ctx:
int(big_value)
self.assertIn('_pylong.int_from_string did not',
str(ctx.exception))
mock_int_from_str.side_effect = RuntimeError("test123")
with self.assertRaises(RuntimeError):
int(big_value)
</pre>
<p>Always divide to conquer: let me try to make the code as short as possible (7
lines), I also make the "big_value" smaller:</p>
<pre class="literal-block">
@mock.patch.object(_pylong, "int_from_string")
def test_pylong_misbehavior_error_path_from_str(self, mock_int_from_str):
big_value = '7' * 9999
with support.adjust_int_max_str_digits(10_000):
mock_int_from_str.return_value = b'not an int'
with self.assertRaises(TypeError) as ctx:
int(big_value)
</pre>
<p>Ok, so the test is about converting a long string (9999 decimal digits) to an
integer using the new <tt class="docutils literal">_pylong</tt> module which is implemented
in pure Python (<tt class="docutils literal">Lib/_pylong.py</tt>) and called from C code
(<tt class="docutils literal">Objects/longobject.c</tt>). Well, I followed recent developments, so I don't
have to dig into the C code to know that. It helps!</p>
<p>If I search for <tt class="docutils literal">_pylong</tt> in <tt class="docutils literal">Objects/longobject.c</tt>, I find this
interesting function:</p>
<pre class="literal-block">
/* asymptotically faster str-to-long conversion for base 10, using _pylong.py */
static int
pylong_int_from_string(const char *start, const char *end, PyLongObject **res)
{
PyObject *mod = PyImport_ImportModule("_pylong");
...
}
</pre>
<p>With a quick look, I don't see any obvious reference leak in this code. I add
<tt class="docutils literal">printf()</tt> to make sure that I'm looking at the right function:</p>
<pre class="literal-block">
static int
pylong_int_from_string(const char *start, const char *end, PyLongObject **res)
{
...
PyObject *s = PyUnicode_FromStringAndSize(start, end-start);
if (s == NULL) {
Py_DECREF(mod);
goto error;
}
printf("pylong_int_from_string()\n");
PyObject *result = PyObject_CallMethod(mod, "int_from_string", "O", s);
...
}
</pre>
<p>I added the print before the int_from_string() call, since this function is
overriden by the test.</p>
<p>I build Python and run the test:</p>
<pre class="literal-block">
$ make
$ ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
(...)
beginning 6 repetitions
123456
pylong_int_from_string()
.pylong_int_from_string()
.pylong_int_from_string()
.pylong_int_from_string()
.pylong_int_from_string()
.pylong_int_from_string()
(...)
</pre>
<p>Ok, I'm looking at the right place. The print happens when the test runs. So
which code path is taken? Let me add print calls <em>after</em> the function call:</p>
<pre class="literal-block">
static int
pylong_int_from_string(const char *start, const char *end, PyLongObject **res)
{
...
PyObject *result = PyObject_CallMethod(mod, "int_from_string", "O", s);
Py_DECREF(s);
Py_DECREF(mod);
if (result == NULL) {
printf("pylong_int_from_string() error\n"); // <====== ADD
goto error;
}
if (!PyLong_Check(result)) {
printf("pylong_int_from_string() wrong type\n"); // <====== ADD
PyErr_SetString(PyExc_TypeError,
"_pylong.int_from_string did not return an int");
goto error;
}
printf("pylong_int_from_string() ok\n"); // <====== ADD
...
}
</pre>
<p>Test output:</p>
<pre class="literal-block">
...
pylong_int_from_string() wrong type
.pylong_int_from_string() wrong type
.pylong_int_from_string() wrong type
...
</pre>
<p>Aha, the bug should be around the <tt class="docutils literal">if (!PyLong_Check(result))</tt> code path. Oh
wait... <tt class="docutils literal">result</tt> is a Python object, and in this code path, the function exits
without returning <tt class="docutils literal">result</tt> to the caller, nor removing the reference to
<tt class="docutils literal">result</tt>. That's our leak!</p>
</div>
<div class="section" id="write-a-fix">
<h2>Write a fix</h2>
<p>To write a fix, I start by reverting all local changes (remove debug traces,
restore the original test code):</p>
<pre class="literal-block">
$ git checkout .
</pre>
<p>I write a fix:</p>
<pre class="literal-block">
$ git diff
diff --git a/Objects/longobject.c b/Objects/longobject.c
index a872938990..652fdb7974 100644
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -2376,6 +2376,7 @@ pylong_int_from_string(const char *start, const char *end, PyLongObject **res)
goto error;
}
if (!PyLong_Check(result)) {
+ Py_DECREF(result);
PyErr_SetString(PyExc_TypeError,
"_pylong.int_from_string did not return an int");
goto error;
</pre>
<p>I build and test my fix:</p>
<pre class="literal-block">
$ make && ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str
(...)
Tests result: SUCCESS
</pre>
<p>Ok, the leak is fixed! So it was a just a missing <tt class="docutils literal">Py_DECREF()</tt> in code
recently added to Python. It's a common mistake. By the way, when I looked at
the code the first code, I also missed this "obvious" leak.</p>
<p>I prepare a PR:</p>
<pre class="literal-block">
$ git switch -c int_str
$ git commit -a
# Commit message:
# gh-90716: Fix pylong_int_from_string() refleak
</pre>
<p>Let me validate my work from the new clean commit:</p>
<pre class="literal-block">
$ make && ./python -m test -R 3:3 test_int
(...)
Tests result: SUCCESS
</pre>
<p>I complete the commit message using <tt class="docutils literal">git commit <span class="pre">--amend</span></tt>:</p>
<pre class="literal-block">
gh-90716: Fix pylong_int_from_string() refleak
Fix validated by:
$ ./python -m test -R 3:3 test_int
Tests result: SUCCESS
</pre>
<p>I run <tt class="docutils literal">gh_pr.sh</tt> (my short shell script) to create a PR from the command
line.</p>
<p>I add the <tt class="docutils literal">skip news</tt> label on the PR, since this refleak is not part of any
Python release, no user is impacted. It's not worth documenting it. I don't
think that the change is part of Python 3.12 alpha 1. Moreover, only very few
users test alpha 1 releases.</p>
<p>Here it is, my shiny PR fixing the leak! <a class="reference external" href="https://github.com/python/cpython/pull/99094">https://github.com/python/cpython/pull/99094</a></p>
<p>Since Gregory worked on longobject.c recently, I add him in copy of my PR. I
just add the comment <tt class="docutils literal">cc @gpshead</tt> to my PR.</p>
<p>I don't plan to wait for this review. The change is just one line, I'm
confident that it does fix the issue, I don't need a review.</p>
<p>To finish, I <a class="reference external" href="https://mail.python.org/archives/list/buildbot-status@python.org/message/J3MC7FIPFN6GNQAWQQRHE4EDLE7J2MIQ/">reply by email to the buildbot-status failure email</a>.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>In total, it took me between one and two hours to reproduce, debug and fix this
reference leak.</p>
<p>In the meanwhile, I also looked into other Python stuffs (and I discussed with
friends!), while the bisection was running, or during the Python build. It's
hard to estimate exactly how much time it takes me to fix a refleak.</p>
<p>I consider that I'm efficient on fixing such leak since I'm following the
Python development: I was already aware of the on-going <tt class="docutils literal">_pylong</tt> work. I
also fixed many refleaks in the past.</p>
<p>By the way, I wrote the <tt class="docutils literal">python <span class="pre">-m</span> test.bisect_cmd</tt> tool exactly to
accelerate my work on debugging reference leaks. I'm now also used to Git
bisection.</p>
<p>For me, <strong>the key of my whole methodology is to "divide to conquer"</strong>:</p>
<ul class="simple">
<li>Reproduce the issue</li>
<li>Get a reproducer</li>
<li>Make the reproducer as fast as possible and as short as possible</li>
<li>Use Git bisection to identify the change introducing the change</li>
<li>Add print calls to identify which parts of the code and the test are
taken by the issue</li>
</ul>
<p>Oh by the way, while I finished my article, my PR got reviewed and I merged it:
<a class="reference external" href="https://github.com/python/cpython/commit/387f72588d538bc56669f0f28cc41df854fc5b43">my commit fixing the leak</a>!</p>
</div>
Python C API: Add functions to access PyObject2021-10-05T14:00:00+02:002021-10-05T14:00:00+02:00Victor Stinnertag:vstinner.github.io,2021-10-05:/c-api-abstract-pyobject.html<a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1433139994516934663"><img alt="A spider in my bedroom" src="https://vstinner.github.io/images/spider.png" /></a>
<p>The PyObject structure prevents indirectly to optimize CPython. We will see why
and how I prepared the C API to make this structure opaque. It took me 1 year
and a half to add functions and to introduce <strong>incompatible C API changes</strong>
(fear!).</p>
<p>In February 2020, I started by adding …</p><a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1433139994516934663"><img alt="A spider in my bedroom" src="https://vstinner.github.io/images/spider.png" /></a>
<p>The PyObject structure prevents indirectly to optimize CPython. We will see why
and how I prepared the C API to make this structure opaque. It took me 1 year
and a half to add functions and to introduce <strong>incompatible C API changes</strong>
(fear!).</p>
<p>In February 2020, I started by adding functions like <tt class="docutils literal">Py_SET_TYPE()</tt> to
abstract accesses to the <tt class="docutils literal">PyObject</tt> structure. I modified C extensions of the
standard library to use functions like <tt class="docutils literal">Py_TYPE()</tt> and <tt class="docutils literal">Py_SET_TYPE()</tt>.</p>
<p>I converted the <tt class="docutils literal">Py_TYPE()</tt> macro to a static inline function, but my change
was reverted twice. I had to fix many C extensions and fix a test_exceptions
crash on Windows to be able to finally merge my change in September 2021.</p>
<p>Finally, we will also see what can be done next to be able to fully make the
PyObject structure opaque.</p>
<p>Thanks to <strong>Dong-hee Na</strong>, <strong>Hai Shi</strong> and <strong>Andy Lester</strong> who helped me to
make these changes, and thanks to <strong>Miro Hrončok</strong> who reported C extensions
broken by my incompatible C API changes.</p>
<p>This article is a follow-up of the <a class="reference external" href="https://vstinner.github.io/c-api-opaque-structures.html">Make structures opaque in the Python C API</a> article.</p>
<p><em>Drawing: "A spider in my bedroom" by Kéké</em></p>
<div class="section" id="the-c-api-prevents-to-optimize-cpython">
<h2>The C API prevents to optimize CPython</h2>
<p>The C API allows to access directly to structure members by deferencing an
<tt class="docutils literal">PyObject*</tt> pointer. Example getting directly the reference count of an
object:</p>
<pre class="literal-block">
Py_ssize_t get_refcnt(PyObject *obj)
{
return obj->ob_refcnt;
}
</pre>
<p>This ability to access directly structure members prevents optimizing CPython.</p>
<div class="section" id="mandatory-inefficient-boxing-unboxing">
<h3>Mandatory inefficient boxing/unboxing</h3>
<p>The ability to dereference a <tt class="docutils literal">PyObject*</tt> pointer prevents optimizations which
avoid inefficient boxing/unboxing, like tagged pointers or list strategies.</p>
</div>
<div class="section" id="no-tagged-pointer">
<h3>No tagged pointer</h3>
<p>Tagged pointers require adding code to all functions which currently
dereference object pointers. The current C API prevents doing that in C
extensions, since pointers can be dereferenced directly.</p>
</div>
<div class="section" id="no-list-strategies">
<h3>No list strategies</h3>
<p>Since all Python object structures must start with a <tt class="docutils literal">PyObject ob_base;</tt>
member, it is not possible to make other structures opaque before PyObject is
made opaque. It prevents implementing PyPy list strategies to reduce the memory
footprint, like storing an array of numbers directly as numbers, not as boxed
numbers (<tt class="docutils literal">PyLongObject</tt> objects).</p>
<p>Currently, the <tt class="docutils literal">PyListObject</tt> structure cannot be made opaque. If
<tt class="docutils literal">PyListObject</tt> could be made opaque, it would be possible to store an array
of numbers directly as numbers, and to box objects in <tt class="docutils literal">PyList_GetItem()</tt> on
demand.</p>
</div>
<div class="section" id="no-moving-garbage-collector">
<h3>No moving garbage collector</h3>
<p>Being able to dereference a <tt class="docutils literal">PyObject**</tt> pointer also prevents to move
objects in memory. A moving garbage collector can compact memory to reduce the
fragmentation. Currently, it cannot be implemented in CPython.</p>
</div>
<div class="section" id="cannot-allocate-temporarily-objects-on-the-stack">
<h3>Cannot allocate temporarily objects on the stack</h3>
<p>In CPython, all objects must be allocated on the heap. If an object is
allocated on the stack, stored in a list and the list is still accessible after
the function completes: the stack memory is no longer valid, and so the list is
corrupted at the function exit.</p>
<p>If objects would only be referenced by opaque handles, as the HPy project does,
it would be possible to copy the object from the stack to the heap memory, when
the object is added to the list.</p>
</div>
<div class="section" id="reference-counting-doesn-t-scale">
<h3>Reference counting doesn't scale</h3>
<p>The <tt class="docutils literal">PyObject</tt> structure has a reference count (<tt class="docutils literal">ob_refcnt</tt> member),
whereas reference counting is a performance bottleneck when using the same
objects from multiple threads running in parallel. Quickly, there is a race for
the memory cacheline which contains the <tt class="docutils literal">PyObject.ob_refcnt</tt> counter. It is
especially true for the most commonly used Python objects like None and True
singletons. All CPUs want to read or modify it in parallel.</p>
<p>This problem killed the Gilectomy project which attempted to remove the GIL
from CPython.</p>
<p>A <a class="reference external" href="https://en.wikipedia.org/wiki/Tracing_garbage_collection">tracing garbage collector</a> doesn't need
reference counting, but it cannot be implemented currently because of the
<tt class="docutils literal">PyObject</tt> structure.</p>
</div>
</div>
<div class="section" id="creation-of-the-issue-feb-2020">
<h2>Creation of the issue (Feb 2020)</h2>
<p>In February 2020, I created the <a class="reference external" href="https://bugs.python.org/issue39573">bpo-39573</a> : "[C API] Make PyObject an opaque
structure in the limited C API". It is related to my work on the my <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620
(Hide implementation details from the C API)</a>.</p>
<p>My initial plan was to make the PyObject structure fully opaque in the C API.</p>
</div>
<div class="section" id="add-functions">
<h2>Add functions</h2>
<p>In Python 3.8, <tt class="docutils literal">Py_REFCNT()</tt> and <tt class="docutils literal">Py_TYPE()</tt> macros can be used to set directly an
object reference count or an object type:</p>
<pre class="literal-block">
Py_REFCNT(obj) = new_refcnt;
Py_TYPE(obj) = new_type;
</pre>
<p>Such syntax requires a direct access to <tt class="docutils literal">PyObject.ob_refcnt</tt> and
<tt class="docutils literal">PyObject.ob_type</tt> members as l-value.</p>
<p>In Python 3.9, I added Py_SET_REFCNT() and Py_SET_TYPE() functions to add an
abstraction to <tt class="docutils literal">PyObject</tt> members, and I added <tt class="docutils literal">Py_SET_SIZE()</tt> to add an
abstraction to the <tt class="docutils literal">PyVarObject.ob_size</tt> member.</p>
<p>In Python 3.9, I also added <tt class="docutils literal">Py_IS_TYPE(obj, type,)</tt> helper function to test
an object type. It is equivalent to: <tt class="docutils literal">Py_TYPE(obj) == type</tt>.</p>
</div>
<div class="section" id="use-py-type-and-py-set-size-in-the-stdlib">
<h2>Use Py_TYPE() and Py_SET_SIZE() in the stdlib</h2>
<p>I modified the standard library (C extensions) to no longer access directly
<tt class="docutils literal">PyObject</tt> and <tt class="docutils literal">PyVarObject</tt> members directly:</p>
<ul class="simple">
<li>Replace <tt class="docutils literal"><span class="pre">"obj->ob_refcnt"</span></tt> with <tt class="docutils literal">Py_REFCNT(obj)</tt></li>
<li>Replace <tt class="docutils literal"><span class="pre">"obj->ob_type"</span></tt> with <tt class="docutils literal">Py_TYPE(obj)</tt></li>
<li>Replace <tt class="docutils literal"><span class="pre">"obj->ob_size"</span></tt> with <tt class="docutils literal">Py_SIZE(obj)</tt></li>
<li>Replace <tt class="docutils literal">"Py_REFCNT(obj) = new_refcnt"</tt> with <tt class="docutils literal">Py_SET_REFCNT(obj, new_refcnt)</tt></li>
<li>Replace <tt class="docutils literal">"Py_TYPE(obj) = new_type"</tt> with <tt class="docutils literal">Py_SET_TYPE(obj, new_type)</tt></li>
<li>Replace <tt class="docutils literal">"Py_SIZE(obj) = new_size"</tt> with <tt class="docutils literal">Py_SET_SIZE(obj, new_size)</tt></li>
<li>Replace <tt class="docutils literal">"Py_TYPE(obj) == type"</tt> test with <tt class="docutils literal">Py_IS_TYPE(obj, type)</tt></li>
</ul>
</div>
<div class="section" id="enforce-py-set-type">
<h2>Enforce Py_SET_TYPE()</h2>
<p>In Python 3.10, I converted Py_REFCNT(), Py_TYPE() and Py_SIZE() macros to
static inline functions, so <tt class="docutils literal">Py_TYPE(obj) = new_type</tt> becomes a compiler
error.</p>
<p>Static inline functions still access directly <tt class="docutils literal">PyObject</tt> and <tt class="docutils literal">PyVarObject</tt>
members at the ABI level, and so don't solve the initial goal: "make the
PyObject structure opaque". Not accessing members at the ABI level can have a
negative impact on performance and I prefer to address it later. I already get
enough backfire with the other C API changes that I made :-)</p>
</div>
<div class="section" id="broken-c-extensions-first-revert">
<h2>Broken C extensions (first revert)</h2>
<p>Converting Py_TYPE() and Py_SIZE() macros to static inline functions broke 16 C
extensions:</p>
<ul class="simple">
<li><strong>Cython</strong></li>
<li>PyPAM</li>
<li>bitarray</li>
<li>boost</li>
<li>breezy</li>
<li>duplicity</li>
<li>gobject-introspection</li>
<li>immutables</li>
<li>mercurial</li>
<li><strong>numpy</strong></li>
<li>pybluez</li>
<li>pycurl</li>
<li>pygobject3</li>
<li>pylibacl</li>
<li>pyside2</li>
<li>rdiff-backup</li>
</ul>
<p>In November 2020, during the Python 3.10 devcycle, I preferred to revert
Py_TYPE() and Py_SIZE() changes.</p>
<p>I kept the Py_REFCNT() change since it only broke a single C extension
(PySide2) and it was simple to update it to Py_SET_REFCNT().</p>
</div>
<div class="section" id="pythoncapi-compat">
<h2>pythoncapi_compat</h2>
<p>I created the <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat</a> project to provide the
following functions to Python 3.8 and older:</p>
<ul class="simple">
<li><tt class="docutils literal">Py_SET_REFCNT()</tt></li>
<li><tt class="docutils literal">Py_SET_TYPE()</tt></li>
<li><tt class="docutils literal">Py_SET_SIZE()</tt></li>
<li><tt class="docutils literal">Py_IS_TYPE()</tt></li>
</ul>
<p>I also wrote a upgrade_pythoncapi.py script to upgrade C extensions to use
these functions, without losing support for Python 3.8 and older.</p>
<p>Using the pythoncapi_compat project, I succeeded to update multiple C
extensions to prepare them for Py_TYPE() becoming a static inline function.</p>
</div>
<div class="section" id="test-exceptions-crash-second-revert">
<h2>test_exceptions crash (second revert)</h2>
<p>In June 2021, during the Python 3.11 devcycle, I changed again Py_TYPE() and
Py_SIZE() since <a class="reference external" href="https://bugs.python.org/issue39573#msg401378">most C extensions have been fixed in the meanwhile</a>.</p>
<p>Problem: <tt class="docutils literal">test_recursion_in_except_handler()</tt> of <tt class="docutils literal">test_exceptions</tt> started
to crash on a Python debug build on Windows: see <a class="reference external" href="https://bugs.python.org/issue44348">bpo-44348</a>.</p>
<p>Since nobody understood the issue, it was decided to revert my change again to
repair buildbots.</p>
</div>
<div class="section" id="fix-baseexception-deallocator">
<h2>Fix BaseException deallocator</h2>
<p>In September 2021, I looked at the test_exceptions crash. In a <strong>debug build</strong>,
the MSC compiler <strong>doesn't inline</strong> calls to static inline functions. Because
of that, converting Py_TYPE() macro to a static inline functions <strong>increases
the stack memory usage</strong> on a Python debug build on Windows.</p>
<p>I proposed to enable compiler optimizations when building Python in debug mode
on Windows, to inline calls to static inline functions like Py_TYPE(). This
idea was rejected, since the debug build must remain fully usable in a
debugger.</p>
<p>I looked again at the crash and found the root issue.
test_recursion_in_except_handler() creates chained of exceptions. When an
exception is deallocated, it calls the deallocator of another exception, etc.</p>
<ul class="simple">
<li>recurse_in_except() sub-test creates chains of 11 nested deallocator calls</li>
<li>recurse_in_body_and_except() sub-test creates a chain of <strong>8192 nested deallocator calls</strong></li>
</ul>
<p>I proposed a change to use the <strong>trashcan mechanism</strong>. It limits the call stack to
50 function calls. I checked with a benchmark that the performance overhead is
acceptable. My change fixed the test_exceptions crash!</p>
</div>
<div class="section" id="close-the-pyobject-issue">
<h2>Close the PyObject issue</h2>
<p>Since most C extensions have been fixed and test_exceptions is fixed, I was
able to change Py_TYPE() and Py_SIZE() for the third time. My final commit:
<a class="reference external" href="https://github.com/python/cpython/commit/cb15afcccffc6c42cbfb7456ce8db89cd2f77512">Py_TYPE becomes a static inline function</a>.</p>
<p>I changed the issue topic to restrict it to adding functions to access PyObject
members. Previously, the goal was to make the PyObject structure opaque.
It took 1 year and a half to add made all these changes.</p>
</div>
<div class="section" id="what-s-next-to-make-pyobject-opaque">
<h2>What's Next to Make PyObject opaque?</h2>
<p>The <tt class="docutils literal">PyObject</tt> structure is used to define structurres of all Python types,
like <tt class="docutils literal">PyListObject</tt>. All structures start with <tt class="docutils literal">PyObject ob_base;</tt> and so
the compiler must have access to the <tt class="docutils literal">PyObject</tt> structure.</p>
<p>Moreover, <tt class="docutils literal">PyType_FromSpec()</tt> and <tt class="docutils literal">PyType_Spec</tt> API use indirectly
<tt class="docutils literal">sizeof(PyObject)</tt> in the <tt class="docutils literal">PyType_Spec.basicsize</tt> member when defining a
type.</p>
<p>One option to make the <tt class="docutils literal">PyObject</tt> structure opaque would be to modify the
<tt class="docutils literal">PyObject</tt> structure to make it empty, and move its members into a new
private <tt class="docutils literal">_PyObject</tt> structure. This <tt class="docutils literal">_PyObject</tt> structure would be
allocated before the <tt class="docutils literal">PyObject*</tt> pointer, same idea as the current
<tt class="docutils literal">PyGC_Head</tt> header which is also allocated before the <tt class="docutils literal">PyObject*</tt> pointer.</p>
<p>These changes are more complex than what I expected and so I prefer to open a
new issue later to propose these changes. Also, the performance of these
changes must be checked with benchmarks, to ensure that there is no performance
overhead or that the overhead is acceptable.</p>
</div>
C API changes between Python 3.5 to 3.102021-10-04T15:00:00+02:002021-10-04T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2021-10-04:/c-api-python3_10-changes.html<img alt="Homer Simpson hiding" src="https://vstinner.github.io/images/homer_hiding.webp" />
<p>I'm trying to enhance and to fix the Python C API for 5 years. My first goal
was to shrink the C API without breaking third party C extensions. I hid many
private functions from the public functions: I moved them to the "internal C
API". I also deprecated and …</p><img alt="Homer Simpson hiding" src="https://vstinner.github.io/images/homer_hiding.webp" />
<p>I'm trying to enhance and to fix the Python C API for 5 years. My first goal
was to shrink the C API without breaking third party C extensions. I hid many
private functions from the public functions: I moved them to the "internal C
API". I also deprecated and removed many functions.</p>
<p>Between Python 3.5 and 3.10, 80 symbols have been removed. Python 3.10 is the
first Python version exporting less symbols than its previous version!</p>
<p>Since Python 3.8, the C API is organized as 3 parts:</p>
<ol class="arabic simple">
<li><tt class="docutils literal">Include/</tt> directory: Limited API</li>
<li><tt class="docutils literal">Include/cpython/</tt> directory: CPython implementation details</li>
<li><tt class="docutils literal">Include/internal/</tt> directory: The internal API</li>
</ol>
<p>The devguide <a class="reference external" href="https://devguide.python.org/c-api/">Changing Python’s C API</a>
documentation now gives guidelines for C API additions, like avoiding borrowed
references.</p>
<p>The limited C API got a few more functions, whereas broken and private
functions have been removed. The Stable ABI is now explicitly defined and
documented in the <a class="reference external" href="https://docs.python.org/dev/c-api/stable.html#stable">C API Stability</a> page.</p>
<p>This article lists all C API changes, not only the ones done by me.</p>
<div class="section" id="shrink-the-the-c-api">
<h2>Shrink the the C API</h2>
<p>Between Python 3.5 and 3.10, 80 symbols (functions or variables) have been
removed, 3 structures have been removed, and 21 functions have been deprecated.
In meanwhile, other symbols have been added to implement new Python features at
each Python version.</p>
<p>Python 3.10 is the first Python version exporting less symbols than its
previous version.</p>
<div class="section" id="python-3-6">
<h3>Python 3.6</h3>
<p>Deprecate 4 functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyUnicode_AsDecodedObject()</tt></li>
<li><tt class="docutils literal">PyUnicode_AsDecodedUnicode()</tt></li>
<li><tt class="docutils literal">PyUnicode_AsEncodedObject()</tt></li>
<li><tt class="docutils literal">PyUnicode_AsEncodedUnicode()</tt></li>
</ul>
</div>
<div class="section" id="python-3-7">
<h3>Python 3.7</h3>
<ul class="simple">
<li>Deprecate <tt class="docutils literal">PyOS_AfterFork()</tt></li>
<li>Remove <tt class="docutils literal">PyExc_RecursionErrorInst</tt> singleton (also removed in Python 3.6.4).</li>
</ul>
</div>
<div class="section" id="python-3-8">
<h3>Python 3.8</h3>
<p>Remove 3 functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyByteArray_Init()</tt></li>
<li><tt class="docutils literal">PyByteArray_Fini()</tt></li>
<li><tt class="docutils literal">PyEval_ReInitThreads()</tt></li>
</ul>
<p>Remove 1 structure:</p>
<ul class="simple">
<li><tt class="docutils literal">PyInterpreterState</tt> (moved to the internal C API)</li>
</ul>
</div>
<div class="section" id="python-3-9">
<h3>Python 3.9</h3>
<p>Remove 32 symbols:</p>
<ul class="simple">
<li><tt class="docutils literal">PyAsyncGen_ClearFreeLists()</tt></li>
<li><tt class="docutils literal">PyCFunction_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyCmpWrapper_Type</tt></li>
<li><tt class="docutils literal">PyContext_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyDict_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyFloat_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyFrame_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyFrame_ExtendStack()</tt></li>
<li><tt class="docutils literal">PyList_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyMethod_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyNoArgsFunction type</tt></li>
<li><tt class="docutils literal">PyNullImporter_Type</tt></li>
<li><tt class="docutils literal">PySet_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PySortWrapper_Type</tt></li>
<li><tt class="docutils literal">PyTuple_ClearFreeList()</tt></li>
<li><tt class="docutils literal">PyUnicode_ClearFreeList()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_MATCH()</tt></li>
<li><tt class="docutils literal">_PyAIterWrapper_Type</tt></li>
<li><tt class="docutils literal">_PyBytes_InsertThousandsGrouping()</tt></li>
<li><tt class="docutils literal">_PyBytes_InsertThousandsGroupingLocale()</tt></li>
<li><tt class="docutils literal">_PyDebug_PrintTotalRefs()</tt></li>
<li><tt class="docutils literal">_PyFloat_Digits()</tt></li>
<li><tt class="docutils literal">_PyFloat_DigitsInit()</tt></li>
<li><tt class="docutils literal">_PyFloat_Repr()</tt></li>
<li><tt class="docutils literal">_PyThreadState_GetFrame()</tt> (and <tt class="docutils literal">_PyRuntime.getframe</tt>)</li>
<li><tt class="docutils literal">_PyUnicode_ClearStaticStrings()</tt></li>
<li><tt class="docutils literal">_Py_AddToAllObjects()</tt></li>
<li><tt class="docutils literal">_Py_InitializeFromArgs()</tt></li>
<li><tt class="docutils literal">_Py_InitializeFromWideArgs()</tt></li>
<li><tt class="docutils literal">_Py_PrintReferenceAddresses()</tt></li>
<li><tt class="docutils literal">_Py_PrintReferences()</tt></li>
<li><tt class="docutils literal">_Py_tracemalloc_config</tt></li>
</ul>
<p>Remove 1 structure:</p>
<ul class="simple">
<li><tt class="docutils literal">PyGC_Head</tt> (moved to the internal C API)</li>
</ul>
<p>Deprecate 15 functions:</p>
<ul class="simple">
<li><tt class="docutils literal">PyEval_CallFunction()</tt></li>
<li><tt class="docutils literal">PyEval_CallMethod()</tt></li>
<li><tt class="docutils literal">PyEval_CallObject()</tt></li>
<li><tt class="docutils literal">PyEval_CallObjectWithKeywords()</tt></li>
<li><tt class="docutils literal">PyNode_Compile()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseFileFlags()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseStringFlags()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseStringFlagsFilename()</tt></li>
<li><tt class="docutils literal">PyUnicode_AsUnicode()</tt></li>
<li><tt class="docutils literal">PyUnicode_AsUnicodeAndSize()</tt></li>
<li><tt class="docutils literal">PyUnicode_FromUnicode()</tt></li>
<li><tt class="docutils literal">PyUnicode_WSTR_LENGTH()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_COPY()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_FILL()</tt></li>
<li><tt class="docutils literal">_PyUnicode_AsUnicode()</tt></li>
</ul>
</div>
<div class="section" id="python-3-10">
<h3>Python 3.10</h3>
<p>Remove 44 symbols:</p>
<ul class="simple">
<li><tt class="docutils literal">PyAST_Compile()</tt></li>
<li><tt class="docutils literal">PyAST_CompileEx()</tt></li>
<li><tt class="docutils literal">PyAST_CompileObject()</tt></li>
<li><tt class="docutils literal">PyAST_Validate()</tt></li>
<li><tt class="docutils literal">PyArena_AddPyObject()</tt></li>
<li><tt class="docutils literal">PyArena_Free()</tt></li>
<li><tt class="docutils literal">PyArena_Malloc()</tt></li>
<li><tt class="docutils literal">PyArena_New()</tt></li>
<li><tt class="docutils literal">PyFuture_FromAST()</tt></li>
<li><tt class="docutils literal">PyFuture_FromASTObject()</tt></li>
<li><tt class="docutils literal">PyLong_FromUnicode()</tt></li>
<li><tt class="docutils literal">PyNode_Compile()</tt></li>
<li><tt class="docutils literal">PyOS_InitInterrupts()</tt></li>
<li><tt class="docutils literal">PyObject_AsCharBuffer()</tt></li>
<li><tt class="docutils literal">PyObject_AsReadBuffer()</tt></li>
<li><tt class="docutils literal">PyObject_AsWriteBuffer()</tt></li>
<li><tt class="docutils literal">PyObject_CheckReadBuffer()</tt></li>
<li><tt class="docutils literal">PyParser_ASTFromFile()</tt></li>
<li><tt class="docutils literal">PyParser_ASTFromFileObject()</tt></li>
<li><tt class="docutils literal">PyParser_ASTFromFilename()</tt></li>
<li><tt class="docutils literal">PyParser_ASTFromString()</tt></li>
<li><tt class="docutils literal">PyParser_ASTFromStringObject()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseFileFlags()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseStringFlags()</tt></li>
<li><tt class="docutils literal">PyParser_SimpleParseStringFlagsFilename()</tt></li>
<li><tt class="docutils literal">PyST_GetScope()</tt></li>
<li><tt class="docutils literal">PySymtable_Build()</tt></li>
<li><tt class="docutils literal">PySymtable_BuildObject()</tt></li>
<li><tt class="docutils literal">PySymtable_Free()</tt></li>
<li><tt class="docutils literal">PyUnicode_AsUnicodeCopy()</tt></li>
<li><tt class="docutils literal">PyUnicode_GetMax()</tt></li>
<li><tt class="docutils literal">Py_ALLOW_RECURSION</tt></li>
<li><tt class="docutils literal">Py_END_ALLOW_RECURSION</tt></li>
<li><tt class="docutils literal">Py_SymtableString()</tt></li>
<li><tt class="docutils literal">Py_SymtableStringObject()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strcat()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strchr()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strcmp()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strcpy()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strlen()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strncmp()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strncpy()</tt></li>
<li><tt class="docutils literal">Py_UNICODE_strrchr()</tt></li>
<li><tt class="docutils literal">_Py_CheckRecursionLimit</tt></li>
</ul>
<p>Remove 1 structure:</p>
<ul class="simple">
<li><tt class="docutils literal">_PyUnicode_Name_CAPI</tt></li>
</ul>
<p>Deprecate 1 function:</p>
<ul class="simple">
<li><tt class="docutils literal">PyUnicode_InternImmortal()</tt></li>
</ul>
<p>Moreover, <tt class="docutils literal">PyUnicode_FromStringAndSize(NULL, size)</tt> and
<tt class="docutils literal">PyUnicode_FromUnicode(NULL, size)</tt> have been deprecated.</p>
</div>
<div class="section" id="statistics">
<h3>Statistics</h3>
<p>Public Python symbols exported with <tt class="docutils literal">PyAPI_FUNC()</tt> and <tt class="docutils literal">PyAPI_DATA()</tt>:</p>
<table border="1" class="docutils">
<colgroup>
<col width="39%" />
<col width="61%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Python</th>
<th class="head">Symbols</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>2.7</td>
<td>891</td>
</tr>
<tr><td>3.6</td>
<td>1041 (+150)</td>
</tr>
<tr><td>3.7</td>
<td>1068 (+27)</td>
</tr>
<tr><td>3.8</td>
<td>1105 (+37)</td>
</tr>
<tr><td>3.9</td>
<td>1115 (+10)</td>
</tr>
<tr><td>3.10</td>
<td>1080 (-35)</td>
</tr>
</tbody>
</table>
<p>Command used to count public symbols:</p>
<pre class="literal-block">
grep -E 'PyAPI_(FUNC|DATA)' Include/*.h Include/cpython/*.h|grep -v ' _Py'|wc -l
</pre>
</div>
</div>
<div class="section" id="reorganize-header-files">
<h2>Reorganize header files</h2>
<p>Since Python 3.8, the C API is organized as 3 parts:</p>
<ol class="arabic simple">
<li><tt class="docutils literal">Include/</tt> directory: Limited API</li>
<li><tt class="docutils literal">Include/cpython/</tt> directory: CPython implementation details</li>
<li><tt class="docutils literal">Include/internal/</tt> directory: The internal API</li>
</ol>
<p>The intent is to help developers to think about if their additions must be part
of the limited C API, the CPython C API or the internal C API.</p>
<div class="section" id="python-3-7-1">
<h3>Python 3.7</h3>
<p>Creation on the <tt class="docutils literal">Include/internal/</tt> directory.</p>
</div>
<div class="section" id="python-3-8-1">
<h3>Python 3.8</h3>
<p>Creation on the <tt class="docutils literal">Include/cpython/</tt> directory.</p>
</div>
<div class="section" id="python-3-10-1">
<h3>Python 3.10</h3>
<p>Move 8 header files from <tt class="docutils literal">Include/</tt> to <tt class="docutils literal">Include/cpython/</tt>:</p>
<ul class="simple">
<li><tt class="docutils literal">odictobject.h</tt></li>
<li><tt class="docutils literal">parser_interface.h</tt></li>
<li><tt class="docutils literal">picklebufobject.h</tt></li>
<li><tt class="docutils literal">pyarena.h</tt></li>
<li><tt class="docutils literal">pyctype.h</tt></li>
<li><tt class="docutils literal">pydebug.h</tt></li>
<li><tt class="docutils literal">pyfpe.h</tt></li>
<li><tt class="docutils literal">pytime.h</tt></li>
</ul>
<p>Python 3.10 added a <a class="reference external" href="https://github.com/python/cpython/blob/master/Include/README.rst">Include/README.rst documentation</a> to explain
this organization and give guidelines for adding new functions. For example,
new functions in the public C API must not steal references nor return borrowed
references. In the meanwhile, this documentation moved to the devguide:
<a class="reference external" href="https://devguide.python.org/c-api/">Changing Python’s C API</a>.</p>
</div>
<div class="section" id="statistics-1">
<h3>Statistics</h3>
<p>Number of C API line numbers per Python version:</p>
<table border="1" class="docutils">
<colgroup>
<col width="14%" />
<col width="27%" />
<col width="22%" />
<col width="24%" />
<col width="14%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Python</th>
<th class="head">Limited API</th>
<th class="head">CPython API</th>
<th class="head">Internal API</th>
<th class="head">Total</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>2.7</td>
<td>12,686 (100%)</td>
<td>0</td>
<td>0</td>
<td>12,686</td>
</tr>
<tr><td>3.6</td>
<td>16,011 (100%)</td>
<td>0</td>
<td>0</td>
<td>16,011</td>
</tr>
<tr><td>3.7</td>
<td>16,517 (96%)</td>
<td>0</td>
<td>705 (4%)</td>
<td>17,222</td>
</tr>
<tr><td>3.8</td>
<td>13,160 (70%)</td>
<td>3,417 (18%)</td>
<td>2,230 (12%)</td>
<td>18,807</td>
</tr>
<tr><td>3.9</td>
<td>12,264 (62%)</td>
<td>4,343 (22%)</td>
<td>3,066 (16%)</td>
<td>19,673</td>
</tr>
<tr><td>3.10</td>
<td>10,305 (52%)</td>
<td>4,513 (23%)</td>
<td>5,092 (26%)</td>
<td>19,910</td>
</tr>
</tbody>
</table>
<p>Commands:</p>
<ul class="simple">
<li>Limited: <tt class="docutils literal">wc <span class="pre">-l</span> <span class="pre">Include/*.h</span></tt></li>
<li>CPython: <tt class="docutils literal">wc <span class="pre">-l</span> <span class="pre">Include/cpython/*.h</span></tt></li>
<li>Internal: <tt class="docutils literal">wc <span class="pre">-l</span> <span class="pre">Include/internal/*.h</span></tt></li>
</ul>
</div>
</div>
<div class="section" id="changes-in-the-limited-c-api">
<h2>Changes in the Limited C API</h2>
<p>Between Python 3.8 and 3.10, 4 new functions have been and 14 symbols
(functions or variables) have been removed from the limited C API.</p>
<p>The trashcan API was excluded from the limited C API since it never worked.
The implementation accessed directly PyThreadState members, whereas this
structure is opaque in the limited C API.</p>
<p>On the other side, Py_EnterRecursiveCall() and Py_LeaveRecursiveCall()
functions have been added to the limited C API. In Python 3.8, they were
defined as macros accessing directly PyThreadState members. In Python 3.9, they
became opaque function calls and so are now compatible with the stable ABI.</p>
<div class="section" id="python-3-9-1">
<h3>Python 3.9</h3>
<p>Add 3 functions to the limited C API:</p>
<ul class="simple">
<li><tt class="docutils literal">Py_EnterRecursiveCall()</tt></li>
<li><tt class="docutils literal">Py_LeaveRecursiveCall()</tt></li>
<li><tt class="docutils literal">PyFrame_GetLineNumber()</tt></li>
</ul>
<p>Remove 14 symbols from the limited C API:</p>
<ul class="simple">
<li><tt class="docutils literal">PyFPE_START_PROTECT()</tt></li>
<li><tt class="docutils literal">PyFPE_END_PROTECT()</tt></li>
<li><tt class="docutils literal">PyThreadState_DeleteCurrent()</tt></li>
<li><tt class="docutils literal">PyTrash_UNWIND_LEVEL</tt></li>
<li><tt class="docutils literal">Py_TRASHCAN_BEGIN</tt></li>
<li><tt class="docutils literal">Py_TRASHCAN_BEGIN_CONDITION</tt></li>
<li><tt class="docutils literal">Py_TRASHCAN_END</tt></li>
<li><tt class="docutils literal">Py_TRASHCAN_SAFE_BEGIN</tt></li>
<li><tt class="docutils literal">Py_TRASHCAN_SAFE_END</tt></li>
<li><tt class="docutils literal">_PyTraceMalloc_NewReference()</tt></li>
<li><tt class="docutils literal">_Py_CheckRecursionLimit</tt></li>
<li><tt class="docutils literal">_Py_GetRefTotal()</tt></li>
<li><tt class="docutils literal">_Py_NewReference()</tt></li>
<li><tt class="docutils literal">_Py_ForgetReference()</tt></li>
</ul>
</div>
<div class="section" id="python-3-10-2">
<h3>Python 3.10</h3>
<p>Add 1 function to the limited C API:</p>
<ul class="simple">
<li><tt class="docutils literal">PyUnicode_AsUTF8AndSize()</tt></li>
</ul>
</div>
</div>
<div class="section" id="pep-652-maintaining-the-stable-abi">
<h2>PEP 652: Maintaining the Stable ABI</h2>
<p>Petr Viktorin wrote and implemented the <a class="reference external" href="https://www.python.org/dev/peps/pep-0652/">PEP 652: Maintaining the Stable ABI</a> in Python 3.10.</p>
<p>The Stable ABI (Application Binary Interface) for extension modules or
embedding Python is now explicitly defined. The <a class="reference external" href="https://docs.python.org/dev/c-api/stable.html#stable">C API Stability</a> documentation
describes C API and ABI stability guarantees along with best practices for
using the Stable ABI.</p>
</div>
Creation of the pythoncapi_compat project2021-03-30T20:00:00+02:002021-03-30T20:00:00+02:00Victor Stinnertag:vstinner.github.io,2021-03-30:/pythoncapi_compat.html<a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1378034391872638980"><img alt="Strange Cat by Kéké" src="https://vstinner.github.io/images/strange_cat.jpg" /></a>
<p>In 2020, I created a new <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat project</a> to add Python 3.10 support
to C extensions without losing support for old Python versions. It supports
Python 2.7-3.10 and PyPy 2.7-3.7. The project is made of two parts:</p>
<ul class="simple">
<li><tt class="docutils literal">pythoncapi_compat.h</tt>: Header file providing new C API …</li></ul><a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1378034391872638980"><img alt="Strange Cat by Kéké" src="https://vstinner.github.io/images/strange_cat.jpg" /></a>
<p>In 2020, I created a new <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat project</a> to add Python 3.10 support
to C extensions without losing support for old Python versions. It supports
Python 2.7-3.10 and PyPy 2.7-3.7. The project is made of two parts:</p>
<ul class="simple">
<li><tt class="docutils literal">pythoncapi_compat.h</tt>: Header file providing new C API functions to old
Python versions, like <tt class="docutils literal">Py_SET_TYPE()</tt>.</li>
<li><tt class="docutils literal">upgrade_pythoncapi.py</tt>: Script upgrading C extension modules using
<tt class="docutils literal">pythoncapi_compat.h</tt>. For example, it replaces <tt class="docutils literal">Py_TYPE(obj) = type;</tt>
with <tt class="docutils literal">Py_SET_TYPE(obj, type);</tt>.</li>
</ul>
<p>This article is about the creation of the header file and the upgrade script.</p>
<p>Photo: Strange cats 🐾 by Kéké.</p>
<div class="section" id="py-set-type-macro-for-python-3-8-and-older">
<h2>Py_SET_TYPE() macro for Python 3.8 and older</h2>
<div class="section" id="py-type-macro-converted-to-a-static-inline-function">
<h3>Py_TYPE() macro converted to a static inline function</h3>
<p>In May 2020 in the <a class="reference external" href="https://bugs.python.org/issue39573">bpo-39573 "Make PyObject an opaque structure"</a>, <a class="reference external" href="https://github.com/python/cpython/commit/ad3252bad905d41635bcbb4b76db30d570cf0087">Py_TYPE()</a>
(change by Dong-hee Na), <a class="reference external" href="https://github.com/python/cpython/commit/fe2978b3b940fe2478335e3a2ca5ad22338cdf9c">Py_REFCNT() and Py_SIZE()</a>
(change by me) macros were converted to static inline functions. This change
broke 17 C extension modules (see my previous article <a class="reference external" href="https://vstinner.github.io/c-api-opaque-structures.html">Make structures opaque
in the Python C API</a>).</p>
<p>I prepared this change in Python 3.9 by adding Py_SET_REFCNT(), Py_SET_TYPE()
and Py_SET_SIZE() functions, and by modifying Python to use these functions. I
also <a class="reference external" href="https://github.com/python/cpython/commit/d905df766c367c350f20c46ccd99d4da19ed57d8">added Py_IS_TYPE() function</a>
which tests the type of an object:</p>
<pre class="literal-block">
static inline int _Py_IS_TYPE(PyObject *ob, PyTypeObject *type) {
return ob->ob_type == type;
}
#define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST(ob), type)
</pre>
<p>For example, <tt class="docutils literal">Py_TYPE(ob) == (tp)</tt> can be replaced with <tt class="docutils literal">Py_IS_TYPE(ob, tp)</tt>.</p>
</div>
<div class="section" id="cython-and-numpy-fixes">
<h3>Cython and numpy fixes</h3>
<p>I fixed Cython by <a class="reference external" href="https://github.com/cython/cython/commit/d8e93b332fe7d15459433ea74cd29178c03186bd">adding __Pyx_SET_REFCNT() and __Pyx_SET_SIZE() macros</a>:</p>
<pre class="literal-block">
#if PY_VERSION_HEX >= 0x030900A4
#define __Pyx_SET_REFCNT(obj, refcnt) Py_SET_REFCNT(obj, refcnt)
#define __Pyx_SET_SIZE(obj, size) Py_SET_SIZE(obj, size)
#else
#define __Pyx_SET_REFCNT(obj, refcnt) Py_REFCNT(obj) = (refcnt)
#define __Pyx_SET_SIZE(obj, size) Py_SIZE(obj) = (size)
#endif
</pre>
<p>The <a class="reference external" href="https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9eccaa">numpy fix</a>:</p>
<pre class="literal-block">
#if PY_VERSION_HEX < 0x030900a4
#define Py_SET_TYPE(obj, typ) (Py_TYPE(obj) = typ)
#define Py_SET_SIZE(obj, size) (Py_SIZE(obj) = size)
#endif
</pre>
<p><a class="reference external" href="https://github.com/numpy/numpy/commit/f1671076c80bd972421751f2d48186ee9ac808aa">The numpy fix was updated</a>
to not have a return value by adding <tt class="docutils literal">", (void)0"</tt>:</p>
<pre class="literal-block">
#if PY_VERSION_HEX < 0x030900a4
#define Py_SET_TYPE(obj, type) ((Py_TYPE(obj) = (type)), (void)0)
#define Py_SET_SIZE(obj, size) ((Py_SIZE(obj) = (size)), (void)0)
#endif
</pre>
<p>So the macros better mimicks the static inline functions behavior.</p>
</div>
<div class="section" id="c-api-porting-guide">
<h3>C API Porting Guide</h3>
<p>I copied the numpy macros <a class="reference external" href="https://github.com/python/cpython/commit/dc24b8a2ac32114313bae519db3ccc21fe45c982">to the C API section of the Python 3.10 porting
guide (What's New in Python 3.10)</a>.
Py_SET_TYPE() documentation.</p>
<blockquote>
<p>Since <tt class="docutils literal">Py_TYPE()</tt> is changed to the inline static function,
<tt class="docutils literal">Py_TYPE(obj) = new_type</tt> must be replaced with
<tt class="docutils literal">Py_SET_TYPE(obj, new_type)</tt>: see <tt class="docutils literal">Py_SET_TYPE()</tt> (available since
Python 3.9). For backward compatibility, this macro can be used:</p>
<pre class="literal-block">
#if PY_VERSION_HEX < 0x030900A4
# define Py_SET_TYPE(obj, type) ((Py_TYPE(obj) = (type)), (void)0)
#endif
</pre>
</blockquote>
</div>
<div class="section" id="copy-paste-macros">
<h3>Copy/paste macros</h3>
<p>Up to 3 macros must be copied/pasted for backward compatibility in each
project:</p>
<pre class="literal-block">
#if PY_VERSION_HEX < 0x030900A4
# define Py_SET_TYPE(obj, type) ((Py_TYPE(obj) = (type)), (void)0)
#endif
#if PY_VERSION_HEX < 0x030900A4
# define Py_SET_REFCNT(obj, refcnt) ((Py_REFCNT(obj) = (refcnt)), (void)0)
#endif
#if PY_VERSION_HEX < 0x030900A4
# define Py_SET_SIZE(obj, size) ((Py_SIZE(obj) = (size)), (void)0)
#endif
</pre>
<p>These macros started to be copied into multiple projects. Examples:</p>
<ul class="simple">
<li><a class="reference external" href="https://bazaar.launchpad.net/~brz/brz/3.1/revision/7647">breezy</a></li>
<li><a class="reference external" href="https://github.com/numpy/numpy/commit/f1671076c80bd972421751f2d48186ee9ac808aa">numpy</a></li>
<li><a class="reference external" href="https://github.com/pycurl/pycurl/commit/e633f9a1ac4df5e249e78c218d5fbbd848219042">pycurl</a></li>
</ul>
<p>There might be a better way than copying/pasting these compatibility layer in
each project, adding macros one by one...</p>
</div>
</div>
<div class="section" id="creation-of-the-pythoncapi-compat-h-header-file">
<h2>Creation of the pythoncapi_compat.h header file</h2>
<p>While the code for Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE() macros is
short, I also wanted to use the new seven Python 3.9 getter functions on Python
3.8 and older:</p>
<ul class="simple">
<li>Py_IS_TYPE()</li>
<li>PyFrame_GetBack()</li>
<li>PyFrame_GetCode()</li>
<li>PyInterpreterState_Get()</li>
<li>PyThreadState_GetFrame()</li>
<li>PyThreadState_GetID()</li>
<li>PyThreadState_GetInterpreter()</li>
</ul>
<p>In June 2020, I created <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">the pythoncapi_compat project</a> project with a
<a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat/blob/main/pythoncapi_compat.h">pythoncapi_compat.h header file</a>
which defines these functions as static inline functions. An
<tt class="docutils literal">"#if PY_VERSION_HEX"</tt> guard prevents to define a function if it's already
provided by <tt class="docutils literal">Python.h</tt>. Example of the current implementation of
PyThreadState_GetInterpreter() for Python 3.8 and older:</p>
<pre class="literal-block">
// bpo-39947 added PyThreadState_GetInterpreter() to Python 3.9.0a5
#if PY_VERSION_HEX < 0x030900A5
static inline PyInterpreterState *
PyThreadState_GetInterpreter(PyThreadState *tstate)
{
assert(tstate != NULL);
return tstate->interp;
}
#endif
</pre>
<p>I wrote tests on each function using a C extension. The project initially
supported Python 3.6 to Python 3.10. The test runner checks also for reference
leaks.</p>
</div>
<div class="section" id="mercurial-and-python-2-7">
<h2>Mercurial and Python 2.7</h2>
<p>The Mercurial project has multiple C extensions, was broken on Python 3.10 by
the Py_TYPE() change, and is one of the last project still requiring Python 2.7
in 2021. It's a good candidate to check if pythoncapi_compat.h is useful.</p>
<p><a class="reference external" href="https://bz.mercurial-scm.org/show_bug.cgi?id=6451">I proposed a patch</a> then
<a class="reference external" href="https://foss.heptapod.net/octobus/mercurial-devel/-/merge_requests/61">converted to a merge request</a>. It
got accepted in the "next" branch, but compatibility with Visual Studio 2008
had to be fixed for Python 2.7 on Windows. I fixed pythoncapi_compat.h by
defining <tt class="docutils literal">inline</tt> as <tt class="docutils literal">__inline</tt>:</p>
<pre class="literal-block">
// Compatibility with Visual Studio 2013 and older which don't support
// the inline keyword in C (only in C++): use __inline instead.
#if (defined(_MSC_VER) && _MSC_VER < 1900 \
&& !defined(__cplusplus) && !defined(inline))
# define inline __inline
# define PYTHONCAPI_COMPAT_MSC_INLINE
// These two macros are undefined at the end of this file
#endif
(...)
#ifdef PYTHONCAPI_COMPAT_MSC_INLINE
# undef inline
# undef PYTHONCAPI_COMPAT_MSC_INLINE
#endif
</pre>
<p>I chose to continue writing <tt class="docutils literal">static inline</tt>, so pythoncapi_compat.h remains
close to Python header files. I also modified the pythoncapi_compat test suite
to also test Python 2.7.</p>
</div>
<div class="section" id="pybind11-and-pypy">
<h2>pybind11 and PyPy</h2>
<p>More recently, I added PyPy 2.7, 3.6 and 3.7 support for pybind11, since PyPy
is tested by their CI. The fix is to no longer define the following functions
on PyPy:</p>
<ul class="simple">
<li>PyFrame_GetBack(), _PyFrame_GetBackBorrow()</li>
<li>PyThreadState_GetFrame(), _PyThreadState_GetFrameBorrow()</li>
<li>PyThreadState_GetID()</li>
<li>PyObject_GC_IsTracked()</li>
<li>PyObject_GC_IsFinalized()</li>
</ul>
</div>
<div class="section" id="creation-of-the-upgrade-pythoncapi-py-script">
<h2>Creation of the upgrade_pythoncapi.py script</h2>
<div class="section" id="upgrade-pythoncapi-py">
<h3>upgrade_pythoncapi.py</h3>
<p>In November 2020, I created a new <tt class="docutils literal">upgrade_pythoncapi.py</tt> script to replace
<tt class="docutils literal">"Py_TYPE(obj) = type;"</tt> with <tt class="docutils literal">"Py_SET_TYPE(obj, <span class="pre">type);"</span></tt>. The script is
based on my <a class="reference external" href="https://github.com/vstinner/sixer">old sixer.py project</a> which
adds Python 3 support to a Python project without losing Python 2 support. The
<tt class="docutils literal">upgrade_pythoncapi.py</tt> script uses regular expressions to replace one
pattern with another.</p>
<p>Similar to <tt class="docutils literal">sixer</tt> which adds <tt class="docutils literal">import six</tt> to support Python 2 and Python 3
in a single code base, <tt class="docutils literal">upgrade_pythoncapi.py</tt> adds
<tt class="docutils literal">#include "pythoncapi_compat.h"</tt> to support old and new versions of the
Python C API in a single code base.</p>
<p>I first created a new GitHub project for upgrade_pythoncapi.py, but since it
was too tightly coupled to the pythoncapi_compat.h header file, I moved the
script to the pythoncapi_compat project.</p>
</div>
<div class="section" id="tests">
<h3>Tests</h3>
<p>I added more and more "operations" to update C extensions. For me, <strong>the most
important part is the test suite</strong> to ensure that the script doesn't introduce
bugs. It contains code which must not be replaced. For example, it ensures that
<tt class="docutils literal"><span class="pre">frame->f_code</span> = code</tt> is not replaced with <tt class="docutils literal">_PyFrame_GetCodeBorrow(frame) =
code</tt> by mistake.</p>
</div>
<div class="section" id="borrowed-references">
<h3>Borrowed references</h3>
<p>Code accessing <tt class="docutils literal"><span class="pre">frame->f_code</span></tt> directly must use <tt class="docutils literal">PyFrame_GetCode()</tt> but
this function returns a strong reference, whereas
<tt class="docutils literal"><span class="pre">frame->f_code</span></tt> gives a borrowed reference. I added "Borrow" variants of the
functions to <tt class="docutils literal">pythoncapi_compat.h</tt> for <tt class="docutils literal">upgrade_pythoncapi.py</tt>. For
example, <tt class="docutils literal"><span class="pre">frame->f_code</span></tt> is replaced with <tt class="docutils literal">_PyFrame_GetCodeBorrow()</tt> which
is defined as:</p>
<pre class="literal-block">
static inline PyCodeObject*
_PyFrame_GetCodeBorrow(PyFrameObject *frame)
{
return (PyCodeObject *)_Py_StealRef(PyFrame_GetCode(frame));
}
</pre>
<p>The <tt class="docutils literal">_Py_StealRef(obj)</tt> function converts a strong reference to a borrowed
reference (simplified code):</p>
<pre class="literal-block">
static inline PyObject* _Py_StealRef(PyObject *obj)
{
Py_DECREF(obj);
return obj;
}
</pre>
<p>It is the opposite of <tt class="docutils literal">Py_NewRef()</tt>. It is similar to <tt class="docutils literal">Py_DECREF(obj)</tt> but
it can be used as an expression: it returns <em>obj</em>. pythoncapi_compat.h defines
private <tt class="docutils literal">_Py_StealRef()</tt> and <tt class="docutils literal">_Py_XStealRef()</tt> static inline functions.
First I proposed to add them to Python, but I abandoned the idea (see
<a class="reference external" href="https://bugs.python.org/issue42522">bpo-42522</a>).</p>
<p>Thanks to the "Borrow" suffix in function names, it becomes easier to discover
the usage of borrowed references. Using a borrowed reference is unsafe if it is
possible that the object is destroyed before the last usage of borrowed
reference. In case of doubt, it's better to use a strong reference. For
example, <tt class="docutils literal">_PyFrame_GetCodeBorrow()</tt> can be replaced with
<tt class="docutils literal">PyFrame_GetCode()</tt>, but it requires to explicitly delete the created strong
reference with <tt class="docutils literal">Py_DECREF()</tt>.</p>
</div>
</div>
<div class="section" id="practical-solution-for-incompatible-c-api-changes">
<h2>Practical solution for incompatible C API changes</h2>
<p>So far, I succeeded to convince 4 projects to use pythoncapi_compat.h:
bitarray, immutables, Mercurial and python-zstandard.</p>
<p>In my opinion, pythoncapi_compat.h is the right approach to introduce
incompatible C API changes: provide a practical solution to support old and new
Python versions in a single code base.</p>
<p>The next steps is to get it adopted more widely and get it endorsed by the
Python project, maybe by moving it under the PSF organization on GitHub.</p>
</div>
Make structures opaque in the Python C API2021-03-26T12:00:00+01:002021-03-26T12:00:00+01:00Victor Stinnertag:vstinner.github.io,2021-03-26:/c-api-opaque-structures.html<a class="reference external image-reference" href="https://fr.wikipedia.org/wiki/Incendie_du_centre_de_donn%C3%A9es_d%27OVHcloud_%C3%A0_Strasbourg"><img alt="OVHcloud datacenter fire in Strasbourg" src="https://vstinner.github.io/images/incendie-ovh.jpg" /></a>
<p>This article is about changes that I made, with the help other developers, in
the Python C API in Python 3.8, 3.9 and 3.10 to avoid accessing structures
members: prepare the C API to <a class="reference external" href="https://en.wikipedia.org/wiki/Opaque_data_type">make structures opaque</a>. These changes are related
to my <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 "Hide implementation …</a></p><a class="reference external image-reference" href="https://fr.wikipedia.org/wiki/Incendie_du_centre_de_donn%C3%A9es_d%27OVHcloud_%C3%A0_Strasbourg"><img alt="OVHcloud datacenter fire in Strasbourg" src="https://vstinner.github.io/images/incendie-ovh.jpg" /></a>
<p>This article is about changes that I made, with the help other developers, in
the Python C API in Python 3.8, 3.9 and 3.10 to avoid accessing structures
members: prepare the C API to <a class="reference external" href="https://en.wikipedia.org/wiki/Opaque_data_type">make structures opaque</a>. These changes are related
to my <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 "Hide implementation details from the C API"</a>.</p>
<p>One change had <strong>negative impact on performance</strong> and had to be
reverted. Making Python slower just to make structures opaque would first
require to get the PEP 620 accepted.</p>
<p>While compatible changes merged in Python 3.8 and Python 3.9 went fine, one
Python 3.10 <strong>incompatible change caused more troubles</strong> and had to be
reverted.</p>
<p>Photo: OVHcloud data center fire in Strasbourg.</p>
<div class="section" id="rationale">
<h2>Rationale</h2>
<p>The C API currently exposes most object structures, C extensions indirectly
access structures members through the API, but can also access them directly.
It causes different issues:</p>
<ul class="simple">
<li>Modifying a structure can break an unknown number of C extensions. To prevent
any risk, CPython core developers avoid modifying structures. Once most
structures will be opaque, it will be possible to experiment <strong>optimizations</strong>
which require deep structures changes without breaking C extensions. The
irony is that we first have to break the backward compatibility and C
extensions for that.</li>
<li>Any structure change breaks the ABI. The <strong>stable ABI</strong> solved this issue by
not exposing structures into its limited C API. The idea is to bend the
default C API towards the limited C API to provide a stable ABI for everyone
in the long term.</li>
</ul>
</div>
<div class="section" id="issues">
<h2>Issues</h2>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue39573">PyObject: bpo-39573</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue40170">PyTypeObject: bpo-40170</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue39947">PyThreadState: bpo-39947</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue40421">PyFrameObject: bpo-40421</a></li>
</ul>
</div>
<div class="section" id="opaque-structures">
<h2>Opaque structures</h2>
<ul class="simple">
<li>Python 3.8 made the PyInterpreterState structure opaque.</li>
<li>Python 3.9 made the PyGC_Head structure opaque.</li>
</ul>
</div>
<div class="section" id="add-getter-functions-to-python-3-9">
<h2>Add getter functions to Python 3.9</h2>
<ul class="simple">
<li>PyObject, PyVarObject:<ul>
<li>Py_SET_REFCNT()</li>
<li>Py_SET_TYPE()</li>
<li>Py_SET_SIZE()</li>
<li>Py_IS_TYPE()</li>
</ul>
</li>
<li>PyFrameObject:<ul>
<li>PyFrame_GetCode()</li>
<li>PyFrame_GetBack()</li>
</ul>
</li>
<li>PyThreadState:<ul>
<li>PyThreadState_GetInterpreter()</li>
<li>PyThreadState_GetFrame()</li>
<li>PyThreadState_GetID()</li>
</ul>
</li>
<li>PyInterpreterState:<ul>
<li>PyInterpreterState_Get()</li>
</ul>
</li>
</ul>
<p>PyInterpreterState_Get() can be used to replace <tt class="docutils literal"><span class="pre">PyThreadState_Get()->interp</span></tt>
and <tt class="docutils literal"><span class="pre">PyThreadState_GetInterpreter(PyThreadState_Get())</span></tt>.</p>
</div>
<div class="section" id="convert-macros-to-static-inline-functions-in-python-3-8">
<h2>Convert macros to static inline functions in Python 3.8</h2>
<div class="section" id="macro-pitfalls">
<h3>Macro pitfalls</h3>
<p>Macros are convenient but have <a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">multiple pitfalls</a>. Some macros
can be abused in surprising ways. For example, the following code is valid with
Python 3.9:</p>
<pre class="literal-block">
if (obj == NULL || PyList_SET_ITEM (l, i, obj) < 0) { ... }
</pre>
<p>In Python 3.9, PyList_SET_ITEM() returns <em>obj</em> in this case, <em>obj</em> is a
pointer, and so the test checks if a pointer is negative which makes no sense
(but is accepted by C compilers by default). This code is likely a confusion
with PyList_SetItem() which returns a int, negative in case of an error.</p>
<p>Zackery Spytz and me modified <a class="reference external" href="https://github.com/python/cpython/commit/556d97f473fa538cef780f84bd29239ecf57d9c5">PyList_SET_ITEM()</a>
and <a class="reference external" href="https://github.com/python/cpython/commit/0ef96c2b2a291c9d2d9c0ba42bbc1900a21e65f3">PyCell_SET()</a>
macros in Python 3.10 to return void.</p>
<p>This change broke alsa-python: I proposed a <a class="reference external" href="https://github.com/alsa-project/alsa-python/commit/5ea2f8709b4d091700750661231f8a3ddce0fc7c">fix which was merged</a>.</p>
<p>One nice side effect of converting macros to static inline functions is that
debuggers and profilers are able to retrieve the name of the function.</p>
</div>
<div class="section" id="converted-macros">
<h3>Converted macros</h3>
<ul class="simple">
<li>Py_INCREF(), Py_XINCREF()</li>
<li>Py_DECREF(), Py_XDECREF()</li>
<li>PyObject_INIT(), PyObject_INIT_VAR()</li>
<li>_PyObject_GC_TRACK(), _PyObject_GC_UNTRACK(), _Py_Dealloc()</li>
</ul>
</div>
<div class="section" id="performance">
<h3>Performance</h3>
<p>Since <tt class="docutils literal">Py_INCREF()</tt> is criticial for general Python performance, the impact
of the change was analyzed in depth before <a class="reference external" href="https://github.com/python/cpython/commit/2aaf0c12041bcaadd7f2cc5a54450eefd7a6ff12">being merged</a>
in <a class="reference external" href="https://bugs.python.org/issue35059">bpo-35059</a>. The usage of
<tt class="docutils literal"><span class="pre">__attribute__((always_inline))</span></tt> and <tt class="docutils literal">__forceinline</tt> to force inlining was
rejected.</p>
</div>
<div class="section" id="cast-to-pyobject">
<h3>Cast to PyObject*</h3>
<p>Old Py_INCREF() implementation in Python 3.7:</p>
<pre class="literal-block">
#define Py_INCREF(op) ( \
_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \
((PyObject *)(op))->ob_refcnt++)
</pre>
<p>where <tt class="docutils literal">_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA</tt> becomes <tt class="docutils literal"><span class="pre">_Py_RefTotal++,</span></tt> if
the <tt class="docutils literal">Py_REF_DEBUG</tt> macro is defined, or nothing otherwise. Current
Py_INCREF() implementation in Python 3.10:</p>
<pre class="literal-block">
static inline void _Py_INCREF(PyObject *op)
{
#ifdef Py_REF_DEBUG
_Py_RefTotal++;
#endif
op->ob_refcnt++;
}
#define Py_INCREF(op) _Py_INCREF(_PyObject_CAST(op))
</pre>
<p>Most static inline functions go through a macro to cast their argument to
<tt class="docutils literal">PyObject*</tt> using the macro:</p>
<pre class="literal-block">
#define _PyObject_CAST(op) ((PyObject*)(op))
</pre>
</div>
</div>
<div class="section" id="convert-macros-to-regular-functions-in-python-3-9">
<h2>Convert macros to regular functions in Python 3.9</h2>
<div class="section" id="converted-macros-1">
<h3>Converted macros</h3>
<ul class="simple">
<li>PyIndex_Check()</li>
<li>PyObject_CheckBuffer()</li>
<li>PyObject_GET_WEAKREFS_LISTPTR()</li>
<li>PyObject_IS_GC()</li>
<li>PyObject_NEW(): alias to PyObject_New()</li>
<li>PyObject_NEW_VAR(): alias to PyObjectVar_New()</li>
</ul>
</div>
<div class="section" id="performance-1">
<h3>Performance</h3>
<p>PyType_HasFeature() was modified to always call PyType_GetFlags() function,
rather than accessing directly <tt class="docutils literal">PyTypeObject.tp_flags</tt>. The problem is that
on macOS, Python is built without LTO, the PyType_GetFlags() call is not
inlined, making functions like tuplegetter_descr_get() <strong>slower</strong>: see
<a class="reference external" href="https://bugs.python.org/issue39542#msg372962">bpo-39542</a>. I <strong>reverted the
PyType_HasFeature() change</strong> until the PEP 620 is accepted. macOS does not
use LTO to keep support support for macOS 10.6 (Snow Leopard): see <a class="reference external" href="https://bugs.python.org/issue41181">bpo-41181</a>.</p>
</div>
<div class="section" id="fast-static-inline-functions">
<h3>Fast static inline functions</h3>
<p>To keep best performances on Python built without LTO, fast private variants
were added as static inline functions to the internal C API:</p>
<ul class="simple">
<li>_PyIndex_Check()</li>
<li>_PyObject_IS_GC()</li>
<li>_PyType_HasFeature()</li>
<li>_PyType_IS_GC()</li>
</ul>
<p>For example, PyObject_IS_GC() is defined as a function, whereas
_PyObject_IS_GC() is defined as an internal static inline function. Header
file:</p>
<pre class="literal-block">
/* Test if an object implements the garbage collector protocol */
PyAPI_FUNC(int) PyObject_IS_GC(PyObject *obj);
// Fast inlined version of PyObject_IS_GC()
static inline int _PyObject_IS_GC(PyObject *obj)
{
return (PyType_IS_GC(Py_TYPE(obj))
&& (Py_TYPE(obj)->tp_is_gc == NULL
|| Py_TYPE(obj)->tp_is_gc(obj)));
}
</pre>
<p>C code:</p>
<pre class="literal-block">
int
PyObject_IS_GC(PyObject *obj)
{
return _PyObject_IS_GC(obj);
}
</pre>
</div>
</div>
<div class="section" id="python-3-10-incompatible-c-api-change">
<h2>Python 3.10 incompatible C API change</h2>
<p>The <tt class="docutils literal">Py_REFCNT()</tt> macro was converted to a static inline function:
<tt class="docutils literal">Py_REFCNT(obj) = refcnt;</tt> now fails with a compiler error. It must be
replaced with <tt class="docutils literal">Py_SET_REFCNT(obj, refcnt)</tt>: Py_SET_REFCNT() was added to
Python 3.9.</p>
</div>
<div class="section" id="the-complex-case-of-py-type-and-py-size-macros">
<h2>The complex case of Py_TYPE() and Py_SIZE() macros</h2>
<div class="section" id="macros-converted-and-then-reverted">
<h3>Macros converted and then reverted</h3>
<p>The <tt class="docutils literal">Py_TYPE()</tt> and <tt class="docutils literal">Py_SIZE()</tt> macros were also converted to static inline
functions in Python 3.10, but the change <a class="reference external" href="https://bugs.python.org/issue39573#msg370303">broke 17 C extensions</a>.</p>
<p>Since the change broke too many C extensions, I reverted the change: I
<a class="reference external" href="https://github.com/python/cpython/commit/0e2ac21dd4960574e89561243763eabba685296a">converted Py_TYPE() and Py_SIZE() back to macros</a>
to have more time to fix fix C extensions.</p>
</div>
<div class="section" id="i-fixed-6-extensions">
<h3>I fixed 6 extensions</h3>
<ul class="simple">
<li>Cython: <a class="reference external" href="https://github.com/cython/cython/commit/d8e93b332fe7d15459433ea74cd29178c03186bd">my fix adding __Pyx_SET_SIZE() and __Pyx_SET_REFCNT()</a></li>
<li>immutables: <a class="reference external" href="https://github.com/MagicStack/immutables/commit/45105ecd8b56a4d88dbcb380fcb8ff4b9cc7b19c">my fix adding pythoncapi_compat.h for Py_SET_SIZE()</a></li>
<li>breezy: <a class="reference external" href="https://bazaar.launchpad.net/~brz/brz/3.1/revision/7647">my fix adding Py_SET_REFCNT() macro</a></li>
<li>bitarray: <a class="reference external" href="https://github.com/ilanschnell/bitarray/commit/a0cca9f2986ec796df74ca8f42aff56c4c7103ba">my fix adding pythoncapi_compat.h</a></li>
<li>python-zstandard: <a class="reference external" href="https://github.com/indygreg/python-zstandard/commit/e5a3baf61b65f3075f250f504ddad9f8612bfedf">my fix adding pythoncapi_compat.h</a>
followed by <a class="reference external" href="https://github.com/indygreg/python-zstandard/commit/477776e6019478ca1c0b5777b073afbec70975f5">a pythoncapi_compat.h update for Python 2.7</a></li>
<li>mercurial: <a class="reference external" href="https://www.mercurial-scm.org/repo/hg/rev/e92ca942ddca">my fix adding pythoncapi_compat.h</a>
followed by a <a class="reference external" href="https://www.mercurial-scm.org/repo/hg/rev/38b9a63d3a13">fix for Python 2.7</a>
(then <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat/commit/3e0bde93954ea8df328d36900c7060a3f3433eb0">fixed into upstream pythoncapi_compat.h</a>)</li>
</ul>
</div>
<div class="section" id="extensions-fixed-by-others">
<h3>Extensions fixed by others</h3>
<ul class="simple">
<li>numpy: <a class="reference external" href="https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9eccaa">fix defining Py_SET_TYPE() and Py_SET_SIZE()</a>,
followed by a <a class="reference external" href="https://github.com/numpy/numpy/commit/f1671076c80bd972421751f2d48186ee9ac808aa">cleanup commit</a></li>
<li>pycurl: <a class="reference external" href="https://github.com/pycurl/pycurl/commit/e633f9a1ac4df5e249e78c218d5fbbd848219042">fix defining Py_SET_TYPE()</a></li>
<li>boost: <a class="reference external" href="https://github.com/boostorg/python/commit/500194edb7833d0627ce7a2595fec49d0aae2484#diff-b06ac66c98951b48056826c904be75263cdf56ec9b79d3274ea493e7d27cbac4">fix adding Py_SET_TYPE() and Py_SET_SIZE() macros</a></li>
<li>duplicity:
<a class="reference external" href="https://git.launchpad.net/duplicity/commit/?id=9c63dcb83e922e0afac206188203891e203b4e66">fix 1</a>,
<a class="reference external" href="https://git.launchpad.net/duplicity/commit/?id=bbaae91b5ac6ef7e295968e508522884609fbf84">fix 2</a></li>
<li>pylibacl: <a class="reference external" href="https://github.com/iustin/pylibacl/commit/26712b8fd92f1146102248cac1c92cb344620eff">fixed</a></li>
<li>gobject-introspection: <a class="reference external" href="https://gitlab.gnome.org/GNOME/gobject-introspection/-/commit/c4d7d21a2ad838077c6310532fdf7505321f0ae7">fix adding Py_SET_TYPE() macro</a></li>
</ul>
</div>
<div class="section" id="extensions-still-not-fixed">
<h3>Extensions still not fixed</h3>
<ul class="simple">
<li>pyside2:<ul>
<li>My patch is not merged upstream yet</li>
<li><a class="reference external" href="https://bugreports.qt.io/browse/PYSIDE-1436">https://bugreports.qt.io/browse/PYSIDE-1436</a></li>
<li><a class="reference external" href="https://src.fedoraproject.org/rpms/python-pyside2/pull-request/7">https://src.fedoraproject.org/rpms/python-pyside2/pull-request/7</a></li>
<li><a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1898974">https://bugzilla.redhat.com/show_bug.cgi?id=1898974</a></li>
<li><a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1902618">https://bugzilla.redhat.com/show_bug.cgi?id=1902618</a></li>
</ul>
</li>
<li>pybluez: <a class="reference external" href="https://github.com/pybluez/pybluez/pull/371">closed PR (not merged)</a></li>
<li>PyPAM</li>
<li>pygobject3</li>
<li>rdiff-backup</li>
</ul>
</div>
</div>
<div class="section" id="what-s-next">
<h2>What's Next?</h2>
<ul class="simple">
<li>Convert again Py_TYPE() and Py_SIZE() macros to static inline functions.</li>
<li>Add "%T" formatter for <tt class="docutils literal"><span class="pre">Py_TYPE(obj)->tp_name</span></tt>:
see <a class="reference external" href="https://bugs.python.org/issue34595">rejected bpo-34595</a>.</li>
<li>Modify Cython to use getter functions.</li>
<li>Attempt to make some structures opaque, like PyThreadState.</li>
</ul>
</div>
Isolate Python Subinterpreters2020-12-27T22:00:00+01:002020-12-27T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-12-27:/isolate-subinterpreters.html<img alt="Christmas gift." src="https://vstinner.github.io/images/christmas-gift.jpg" />
<p>This article is about the work done in Python in 2019 and 2020 to better
isolate subinterpreters. Static types are converted to heap types, extension
modules are converted to use the new multiphase initialization API (PEP 489),
caches, states, singletons and free lists are made per-interpreter, many bugs
have been …</p><img alt="Christmas gift." src="https://vstinner.github.io/images/christmas-gift.jpg" />
<p>This article is about the work done in Python in 2019 and 2020 to better
isolate subinterpreters. Static types are converted to heap types, extension
modules are converted to use the new multiphase initialization API (PEP 489),
caches, states, singletons and free lists are made per-interpreter, many bugs
have been fixed, etc.</p>
<p>Running multiple interpreters in parallel with one "GIL" per interpreter cannot
be done yet, but a lot of complex technical challenges have been solved.</p>
<div class="section" id="why-isolating-subinterpreters">
<h2>Why isolating subinterpreters?</h2>
<p>The final goal is to be able run multiple interpreters in parallel in the same
process, like one interpreter per CPU, each interpreter would run in its own
thread. The principle is the same than the multiprocessing module and has the
same limitations: no Python object can be shared directly between two
interpreters. Later, we can imagine helpers to share Python mutable objects
using proxies which would prevent race conditions.</p>
<p>The work on subinterpreter requires to modify many functions and extension
modules. It will benefit to Python in different ways.</p>
<p>Converting static types to heap types and convert extension modules to the
multiphase initialization API (PEP 489) makes extension modules implemented in
C to behave closer to modules implemented in Python, which is good for the <a class="reference external" href="https://www.python.org/dev/peps/pep-0399/">PEP
399 -- Pure Python/C Accelerator Module Compatibility Requirements</a>. So <strong>this work also helps
Python implementations other than CPython, like PyPy</strong>.</p>
<p>These changes also destroy more Python objects and release more memory at
Python exit which matters <strong>when Python is embedded in an application</strong>. Python
should be "state less", especially release all memory at exit. This work slowly
fix the <a class="reference external" href="https://bugs.python.org/issue1635741">bpo-163574: Py_Finalize() doesn't clear all Python objects at exit</a>. Python leaks less and less Python
objects at exit.</p>
</div>
<div class="section" id="proof-of-concept-in-may-2020">
<h2>Proof-of-concept in May 2020</h2>
<p>In May 2020, I wrote a proof-of-concept to prove the feasability of the project
and to prove that it is faster than sequential execution: <a class="reference external" href="https://mail.python.org/archives/list/python-dev@python.org/thread/S5GZZCEREZLA2PEMTVFBCDM52H4JSENR/#RIK75U3ROEHWZL4VENQSQECB4F4GDELV">PoC: Subinterpreters
4x faster than sequential execution or threads on CPU-bound workaround</a>.
Benchmark on 4 CPUs:</p>
<ul class="simple">
<li>Sequential: 1.99 sec +- 0.01 sec</li>
<li>Threads: 3.15 sec +- 0.97 sec (1.5x <strong>slower</strong>)</li>
<li>Multiprocessing: 560 ms +- 12 ms (3.6x <strong>faster</strong>)</li>
<li>Subinterpreters: 583 ms +- 7 ms (3.4x <strong>faster</strong>)</li>
</ul>
<p>The performance of subintepreters is basically the same speed than
multiprocessing on this benchmark which is promising.</p>
</div>
<div class="section" id="experimental-isolated-subintepreters">
<h2>Experimental isolated subintepreters</h2>
<p>To write this PoC, I added a <tt class="docutils literal"><span class="pre">--with-experimental-isolated-subinterpreters</span></tt>
option to <tt class="docutils literal">./configure</tt> in <a class="reference external" href="https://bugs.python.org/issue40514">bpo-40514</a>
which defines the <tt class="docutils literal">EXPERIMENTAL_ISOLATED_SUBINTERPRETERS</tt> macro. Effects of
this special build:</p>
<ul class="simple">
<li>Make the GIL per-interpreter.</li>
<li><tt class="docutils literal">_xxsubinterpreters.run_string()</tt> releases the GIL when running the
subinterpreter.</li>
<li>Add a thread local storage for the Python thread state ("tstate").</li>
<li>Disable the garbage collector in subinterpreters.</li>
<li>Disable the type attribute lookup cache.</li>
<li>Disable free lists: frame, list, tuple, type attribute lookup cache.</li>
<li>Disable singletons: latin1 characters.</li>
<li>Disable interned strings.</li>
<li>Disable the fast pymalloc memory allocator (force libc malloc memory
allocator).</li>
</ul>
<p>Features are disabled because their implementation is currently not compatible
with multiple interpreters running in parallel.</p>
<p>This special build is designed to be temporary. It should ease the development
of isolated subinterpreters. It will be removed once subinterpreters will be
fully isolated (once each interpreter will have its own GIL).</p>
</div>
<div class="section" id="convert-static-types-to-heap-types">
<h2>Convert static types to heap types</h2>
<p>Types declared in Python (<tt class="docutils literal">class MyType: ...</tt>) are always "heap types":
types dynamically allocated on the heap memory. Historically, all types
declared in C were declared as "static types": defined statically at build
time.</p>
<p>In C, static types are referenced directly using the using <tt class="docutils literal">&</tt> operator to
get their address, they are not copied. For example, the Python <tt class="docutils literal">str</tt> type is
referenced as <tt class="docutils literal">&PyUnicode_Type</tt> in C.</p>
<p>Types are also regular objects (<tt class="docutils literal">PyTypeObject</tt> inherits from <tt class="docutils literal">PyObject</tt>)
and have a reference count, whereas the <tt class="docutils literal">PyObject.ob_refcnt</tt> member is not
atomic and so must not be modified in parallel. Problem: all interpreters share
the same static types. Static types have other problems:</p>
<ul class="simple">
<li>A type <tt class="docutils literal">__mro__</tt> tuple (<tt class="docutils literal">PyTypeObject.tp_mro</tt> member) has the same
problem of non-atomic reference count.</li>
<li>When a subtype is created, it is stored in the <tt class="docutils literal">PyTypeObject.tp_subclasses</tt>
dictionary member (accessible in Python with the <tt class="docutils literal">__subclasses__()</tt>
method), whereas Python dictionaries are not thread-safe.</li>
<li>Static types behave differently than regular Python types. For example,
usually it is not possible to add an arbitrary attribute or override
an attribute. It goes against the <a class="reference external" href="https://www.python.org/dev/peps/pep-0399/">PEP 399 -- Pure Python/C Accelerator
Module Compatibility Requirements</a> principles.</li>
<li>etc.</li>
</ul>
<p>Right now, <strong>43% (89/206)</strong> of types are declared as heap types on a total of
206 types. For comparison, in Python 3.8, only 9% (15/172) of types were
declared as heap types: <strong>74 types</strong> have been converted in the meanwhile.</p>
<p>TODO: convert the remaining 117 static types: see <a class="reference external" href="https://bugs.python.org/issue40077">bpo-40077</a>.</p>
</div>
<div class="section" id="multiphase-initialization-api">
<h2>Multiphase initialization API</h2>
<p>Historically, extension modules are declared with the <tt class="docutils literal">PyModule_Create()</tt>
function. Usually, such extension can be instanciated exactly once. It is
stored in an internal <tt class="docutils literal">PyInterpreterState.modules_by_index</tt> list; an unique
index is assigned to the module and stored in <tt class="docutils literal">PyModuleDef.m_base.m_index</tt>.
Usually, such extension use static global variables.</p>
<p>Such "static" extension has multiple issues:</p>
<ul class="simple">
<li>The extension cannot be unloaded: its memory is not released at Python exit.
It is an issue when Python is embedded in an application.</li>
<li>The extension behaves differently than modules defined in Python. When an
extension is reimported, its namespace (<tt class="docutils literal">module.__dict__</tt>) is duplicated,
but mutable objects and static global variables are still shared. It goes
against the <a class="reference external" href="https://www.python.org/dev/peps/pep-0399/">PEP 399 -- Pure Python/C Accelerator Module Compatibility
Requirements</a> principles.</li>
<li>etc.</li>
</ul>
<p>In 2013, <strong>Petr Viktorin</strong>, <strong>Stefan Behnel</strong> and <strong>Nick Coghlan</strong> wrote the
<a class="reference external" href="https://www.python.org/dev/peps/pep-0489/">PEP 489 -- Multi-phase extension module initialization</a> which has been approved and
implemented in Python 3.5. For example, the <tt class="docutils literal">_abc</tt> module initialization
function is now just a call to the new <tt class="docutils literal">PyModuleDef_Init()</tt> function:</p>
<pre class="literal-block">
PyMODINIT_FUNC
PyInit__abc(void)
{
return PyModuleDef_Init(&_abcmodule);
}
</pre>
<p>An extension module can have a module state, if <tt class="docutils literal">PyModuleDef.m_size</tt> is
greater than zero. Example:</p>
<pre class="literal-block">
typedef struct {
PyTypeObject *_abc_data_type;
unsigned long long abc_invalidation_counter;
} _abcmodule_state;
static struct PyModuleDef _abcmodule = {
...
.m_size = sizeof(_abcmodule_state), // <=== HERE ===
};
</pre>
<p>The <tt class="docutils literal">PyModule_GetState()</tt> can be used to retrieve the module state. Example:</p>
<pre class="literal-block">
static inline _abcmodule_state*
get_abc_state(PyObject *module)
{
void *state = PyModule_GetState(module);
assert(state != NULL);
return (_abcmodule_state *)state;
}
static PyObject *
_abc__abc_init(PyObject *module, PyObject *self)
{
_abcmodule_state *state = get_abc_state(module);
...
data = abc_data_new(state->_abc_data_type, NULL, NULL);
...
}
</pre>
<p>Right now, <strong>77% (102/132)</strong> of extension modules use the new multiphase
initialization API (PEP 489) on a total of 132 extension modules. For
comparison, in Python 3.8, only 23% (27/118) of extensions used the new
multiphase initialization API: <strong>75 extensions</strong> have been converted in the
meanwhile.</p>
<p>TODO: convert the remaining 30 extension modules (<a class="reference external" href="https://bugs.python.org/issue1635741">bpo-163574</a>).</p>
</div>
<div class="section" id="module-states">
<h2>Module states</h2>
<p>Some modules have a state which should be stored in the interpreter to share
its state between multiple instances of the module, and also to give access to
the state in functions of the public C API (ex: <tt class="docutils literal">PyAST_Check()</tt>).</p>
<p>States made per-interpreter:</p>
<ul class="simple">
<li>2019-05-10: <strong>warnings</strong>
(<a class="reference external" href="https://bugs.python.org/issue36737">bpo-36737</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/86ea58149c3e83f402cecd17e6a536865fb06ce1">commit</a> by <strong>Eric Snow</strong>)</li>
<li>2019-11-07: <strong>parser</strong>
(<a class="reference external" href="https://bugs.python.org/issue36876">bpo-36876</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/9def81aa52adc3cc89554156e40742cf17312825">commit</a> by <strong>Vinay Sajip</strong>)</li>
<li>2019-11-20: <strong>gc</strong>
(<a class="reference external" href="https://bugs.python.org/issue36854">bpo-36854</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/7247407c35330f3f6292f1d40606b7ba6afd5700">commit</a> by me)</li>
<li>2020-11-02: <strong>ast</strong>
(<a class="reference external" href="https://bugs.python.org/issue41796">bpo-41796</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/5cf4782a2630629d0978bf4cf6b6340365f449b2">commit</a> by me)</li>
<li>2020-12-15: <strong>atexit</strong>
(<a class="reference external" href="https://bugs.python.org/issue42639">bpo-42639</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/b8fa135908d294b350cdad04e2f512327a538dee">commit</a> by me)</li>
</ul>
</div>
<div class="section" id="singletons">
<h2>Singletons</h2>
<p>Singletons must not be shared between interpreters.</p>
<p>Singletons made per-interpreter.</p>
<p><a class="reference external" href="https://bugs.python.org/issue38858">bpo-38858</a>:</p>
<ul class="simple">
<li>2019-12-17: small <strong>integer</strong>, the [-5; 256] range
(<a class="reference external" href="https://github.com/python/cpython/commit/630c8df5cf126594f8c1c4579c1888ca80a29d59">commit</a> by me)</li>
</ul>
<p><a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>:</p>
<ul class="simple">
<li>2020-06-04: empty <strong>tuple</strong> singleton
(<a class="reference external" href="https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca">commit</a> by me)</li>
<li>2020-06-23: empty <strong>bytes</strong> string singleton and single byte character
(<tt class="docutils literal"><span class="pre">b'\x00'</span></tt> to <tt class="docutils literal"><span class="pre">b'\xFF'</span></tt>) singletons
(<a class="reference external" href="https://github.com/python/cpython/commit/c41eed1a874e2f22bde45c3c89418414b7a37f46">commit</a> by me)</li>
<li>2020-06-23: empty <strong>Unicode</strong> string singleton
(<a class="reference external" href="https://github.com/python/cpython/commit/f363d0a6e9cfa50677a6de203735fbc0d06c2f49">commit</a> by me)</li>
<li>2020-06-23: empty <strong>frozenset</strong> singleton
(<a class="reference external" href="https://github.com/python/cpython/commit/261cfedf7657a515e04428bba58eba2a9bb88208">commit</a> by me);
later removed.</li>
<li>2020-06-24: single <strong>Unicode</strong> character (U+0000-U+00FF range)
(<a class="reference external" href="https://github.com/python/cpython/commit/2f9ada96e0d420fed0d09a032b37197f08ef167a">commit</a> by me)</li>
</ul>
<p>I also micro-optimized the code: most singletons are now always created at
startup, it's no longer needed to check if it is created at each function call.
Moreover, an assertion now ensures that singletons are no longer used after
they are deleted.</p>
</div>
<div class="section" id="free-lists">
<h2>Free lists</h2>
<p>A free list is a micro-optimization on memory allocations. The memory of
recently destroyed objects is not freed to be able to reuse it for new objects.
Free lists must not be shared between interpreters.</p>
<p>Free lists made per-interpreter (<a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>):</p>
<ul class="simple">
<li>2020-06-04: <strong>slice</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91">commit</a> by me)</li>
<li>2020-06-04: <strong>tuple</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca">commit</a> by me)</li>
<li>2020-06-04: <strong>float</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/2ba59370c3dda2ac229c14510e53a05074b133d1">commit</a> by me)</li>
<li>2020-06-04: <strong>frame</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/3744ed2c9c0b3905947602fc375de49533790cb9">commit</a> by me)</li>
<li>2020-06-05: <strong>async generator</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/78a02c2568714562e23e885b6dc5730601f35226">commit</a> by me)</li>
<li>2020-06-05: <strong>context</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/e005ead49b1ee2b1507ceea94e6f89c28ecf1f81">commit</a> by me)</li>
<li>2020-06-05: <strong>list</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273">commit</a> by me)</li>
<li>2020-06-23: <strong>dict</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0">commit</a> by me)</li>
<li>2020-06-23: <strong>MemoryError</strong>
(<a class="reference external" href="https://github.com/python/cpython/commit/281cce1106568ef9fec17e3c72d289416fac02a5">commit</a> by me)</li>
</ul>
</div>
<div class="section" id="caches">
<h2>Caches</h2>
<p>Caches made per interpreter:</p>
<ul class="simple">
<li>2020-06-04: <strong>slice</strong> cache
(<a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91">commit</a> by me)</li>
<li>2020-12-26: <strong>type</strong> attribute lookup cache
(<a class="reference external" href="https://bugs.python.org/issue42745">bpo-42745</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/41010184880151d6ae02a226dbacc796e5c90d11">commit</a> by me)</li>
</ul>
</div>
<div class="section" id="interned-strings-and-identifiers">
<h2>Interned strings and identifiers</h2>
<ul class="simple">
<li>2020-12-25: Per-interpreter identifiers: <tt class="docutils literal">_PyUnicode_FromId()</tt>
(<a class="reference external" href="https://bugs.python.org/issue39465">bpo-39465</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/ba3d67c2fb04a7842741b1b6da5d67f22c579f33">commit</a> by me)</li>
<li>2020-12-26: Per-interpreter interned strings: <tt class="docutils literal">PyUnicode_InternInPlace()</tt>
(<a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/ea251806b8dffff11b30d2182af1e589caf88acf">commit</a> by me)</li>
</ul>
<p>For <tt class="docutils literal">_PyUnicode_FromId()</tt>, I added the <tt class="docutils literal">pycore_atomic_funcs.h</tt> header file
(<a class="reference external" href="https://github.com/python/cpython/commit/52a327c1cbb86c7f2f5c460645889b23615261bf">commit</a>)
which adds functions for atomic memory accesses (to variables of type
<tt class="docutils literal">Py_ssize_t</tt>). It uses <tt class="docutils literal">__atomic_load_n()</tt> and <tt class="docutils literal">__atomic_store_n()</tt> on GCC
and clang, or <tt class="docutils literal">_InterlockedCompareExchange64()</tt> and
<tt class="docutils literal">_InterlockedExchange64()</tt> on MSC (Windows).</p>
<p>First, I tried to use the <tt class="docutils literal">_Py_hashtable</tt> type: <a class="reference external" href="https://github.com/python/cpython/pull/20048">PR 20048</a>. Using <tt class="docutils literal">_Py_hashtable</tt>,
<tt class="docutils literal">_PyUnicode_FromId()</tt> took 15.5 ns +- 0.1 ns. I optimized <tt class="docutils literal">_Py_hashtable</tt>:
<tt class="docutils literal">_PyUnicode_FromId()</tt> took 6.65 ns +- 0.09 ns. But it was still slower than
the reference code: 2.38 ns +- 0.00 ns.</p>
<p>The merged implementation uses an array. An unique index is assigned, index in
this array. The array is made larger on demand. The final change adds 1 ns
per function call:</p>
<pre class="literal-block">
[ref] 2.42 ns +- 0.00 ns -> [atomic] 3.39 ns +- 0.00 ns: 1.40x slower
</pre>
</div>
<div class="section" id="misc">
<h2>Misc</h2>
<ul class="simple">
<li>2020-03-19: Per-interpreter pending calls
(<a class="reference external" href="https://bugs.python.org/issue39984">bpo-39984</a>,
<a class="reference external" href="https://github.com/python/cpython/commit/50e6e991781db761c496561a995541ca8d83ff87">commit</a> by me).</li>
</ul>
</div>
<div class="section" id="bugfixes">
<h2>Bugfixes</h2>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html">GIL bugfixes for daemon threads in Python 3.9</a></li>
<li>Fix many <a class="reference external" href="https://vstinner.github.io/subinterpreter-leaks.html">leaks discovered by subinterpreters</a></li>
<li>Fix pickling heap types implemented in C with protocols 0 and 1
(<a class="reference external" href="https://bugs.python.org/issue41052">bpo-41052</a>)</li>
</ul>
</div>
<div class="section" id="pep-630-isolating-extension-modules">
<h2>PEP 630: Isolating Extension Modules</h2>
<p>In August 2020, <strong>Petr Viktorin</strong> wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0630/">PEP 630 -- Isolating Extension Modules</a> which gives practical advices on
how to update an extension module to make it stateless using previous PEPs
(heap types, multi-phase init, etc.). Once a module is stateless, it becomes
safe to use it subinterpreters running in parallel.</p>
</div>
<div class="section" id="thanks">
<h2>Thanks</h2>
<p>The work on subintepreters, multiphase init and heap types is a collaborative
work on-going for 2 years. I would like to thank the following developers for
helping on this large task:</p>
<ul class="simple">
<li><strong>Christian Heimes</strong></li>
<li><strong>Dong-hee Na</strong></li>
<li><strong>Eric Snow</strong></li>
<li><strong>Erlend Egeberg Aasland</strong></li>
<li><strong>Hai Shi</strong></li>
<li><strong>Mohamed Koubaa</strong></li>
<li><strong>Nick Coghlan</strong></li>
<li><strong>Paulo Henrique Silva</strong></li>
<li><strong>Petr Viktorin</strong></li>
<li><strong>Vinay Sajip</strong></li>
</ul>
<p>Note: Since the work is scattered in many issues and pull requests, it's hard
to track who helped: sorry if I forgot someone! (Please contact me and I
will complete the list.)</p>
</div>
<div class="section" id="what-s-next">
<h2>What's Next?</h2>
<p>There are still multiple interesting technical challenges:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue39511">bpo-39511: Per-interpreter singletons (None, True, False, etc.)</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue40601">bpo-40601: Hide static types from the C API</a></li>
<li>Make pymalloc allocator compatible with subinterpreters.</li>
<li>Make the GIL per interpreter. Maybe even give the choice to share or not
the GIL when a subinterpreter is created.</li>
<li>Make the <tt class="docutils literal">_PyArg_Parser</tt> (<tt class="docutils literal">parser_init()</tt>) function compatible with
subinterpreters. Maybe use a per-interpreter array, similar solution than
<tt class="docutils literal">_PyUnicode_FromId()</tt>.</li>
<li><a class="reference external" href="https://bugs.python.org/issue15751">bpo-15751: Make the PyGILState API compatible with subinterpreters</a> (issue created in 2012!)</li>
<li><a class="reference external" href="https://bugs.python.org/issue40522">bpo-40522: Get the current Python interpreter state from Thread Local
Storage (autoTSSkey)</a></li>
</ul>
<p>Also, there are still many static types to convert to heap types (<a class="reference external" href="https://bugs.python.org/issue40077">bpo-40077</a>) and many extension modules to convert
to the multiphase initialization API (<a class="reference external" href="https://bugs.python.org/issue1635741">bpo-163574</a>).</p>
<p>I'm tracking the work in my <a class="reference external" href="https://pythondev.readthedocs.io/subinterpreters.html">Python Subinterpreters</a> page
and in the <a class="reference external" href="https://bugs.python.org/issue40512">bpo-40512: Meta issue: per-interpreter GIL</a>.</p>
</div>
Hide implementation details from the Python C API2020-12-25T22:00:00+01:002020-12-25T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-12-25:/hide-implementation-details-python-c-api.html<img alt="My cat attacking the Python C API" src="https://vstinner.github.io/images/pepsie.jpg" />
<p>This article is the history of Python C API discussions over the last 4 years,
and the creation of C API projects: <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi website</a>, <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat.h header file</a> and <a class="reference external" href="https://hpy.readthedocs.io/">HPy (new clean C API)</a>. More and more people are aware of issues
caused by the C API and are working …</p><img alt="My cat attacking the Python C API" src="https://vstinner.github.io/images/pepsie.jpg" />
<p>This article is the history of Python C API discussions over the last 4 years,
and the creation of C API projects: <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi website</a>, <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat.h header file</a> and <a class="reference external" href="https://hpy.readthedocs.io/">HPy (new clean C API)</a>. More and more people are aware of issues
caused by the C API and are working on solutions.</p>
<p>It took me a lot of iterations to find the right approach to evolve the C API
without breaking too many third-party extension modules. My first ideas were
based on two APIs with an opt-in option somehow. At the end, I decided to fix
directly the default API, and helped maintainers of extension modules to update
their projects for incompatible C API changes.</p>
<p>I wrote a <tt class="docutils literal">pythoncapi_compat.h</tt> header file which adds C API functions of
newer Python to old Python versions up to Python 2.7. I also wrote a
<tt class="docutils literal">upgrade_pythoncapi.py</tt> script to add Python 3.10 support to an extension
module without losing Python 2.7 support: the tool adds <tt class="docutils literal">#include
"pythoncapi_compat.h"</tt>. For example, it replaces <tt class="docutils literal">Py_TYPE(obj) = type</tt>
with <tt class="docutils literal">Py_SET_SIZE(obj, type)</tt>.</p>
<p>The photo: my cat attacking the Python C API.</p>
<div class="section" id="year-2016">
<h2>Year 2016</h2>
<p>Between 2016 and 2017, Larry Hastings worked on removing the GIL in a CPython
fork called "The Gilectomy". He pushed the first commit in April 2016: <a class="reference external" href="https://github.com/larryhastings/gilectomy/commit/4a1a4ff49e34b9705608cad968f467af161dcf02">Removed
the GIL. Don't merge this!</a>
("Few programs work now"). At EuroPython 2016, he gave the talk <a class="reference external" href="https://www.youtube.com/watch?v=fgWUwQVoLHo">Larry Hastings
- The Gilectomy</a> where he
explains that the current parallelism bottleneck is the CPython reference
counting which doesn't scale with the number of threads.</p>
<p>It was just another hint telling me that "something" should be done to make the
C API more abstract, move away from implementation details like reference
counting. PyPy also has performance issues with the C API for many years.</p>
</div>
<div class="section" id="year-2017">
<h2>Year 2017</h2>
<div class="section" id="may">
<h3>May</h3>
<p>In 2017, I discussed with Eric Snow who was working on subinterpreters. He had
to modify public structures, especially the <tt class="docutils literal">PyInterpreterState</tt> structure.
He created <tt class="docutils literal">Include/internal/</tt> subdirectory to create a new "internal C API"
which should not be exported. (Later, he moved the <tt class="docutils literal">PyInterpreterState</tt>
structure to the internal C API in Python 3.8.)</p>
<p>I started the discuss C API changes during the Python Language Summit
(PyCon US 2017): <a class="reference external" href="https://github.com/vstinner/conf/raw/master/2017-PyconUS/summit.pdf">"Python performance" slides (PDF)</a>:</p>
<ul class="simple">
<li>Split Include in sub-directories</li>
<li>Move towards a stable ABI by default</li>
</ul>
<p>See also the LWN article: <a class="reference external" href="https://lwn.net/Articles/723752/#723949">Keeping Python competitive</a> by Jake Edge.</p>
</div>
<div class="section" id="july-first-pep-draft">
<h3>July: first PEP draft</h3>
<p>I proposed the first PEP draft to python-ideas:
<a class="reference external" href="https://mail.python.org/archives/list/python-ideas@python.org/thread/6XATDGWK4VBUQPRHCRLKQECTJIPBVNJQ/">PEP: Hide implementation details in the C API</a>.</p>
<p>The idea is to add an opt-in option to distutils to build an extension module
with a new C API, remove implementation details from the new C API, and maybe
later switch to the new C API by default.</p>
</div>
<div class="section" id="september">
<h3>September</h3>
<p>I discussed my C API change ideas at the CPython core dev sprint (at Instagram,
California). The ideas were liked by most (if not all) core developers who are
fine with a minor performance slowdown (caused by replacing macros with
function calls). I wrote <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">A New C API for CPython</a> blog post about these
discussions.</p>
</div>
<div class="section" id="november">
<h3>November</h3>
<p>I proposed <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150607.html">Make the stable API-ABI usable</a> on
the python-dev list. The idea is to add <tt class="docutils literal">PyTuple_GET_ITEM()</tt> (for example) to
the limited C API but declared as a function call. Later, if enough extension
modules are compatible with the extended limited C API, make it the default.</p>
</div>
</div>
<div class="section" id="year-2018">
<h2>Year 2018</h2>
<p>In July, I created the <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi website</a> to collect issues of the current C
API, list things to avoid in new functions like borrowed references, and start
to design a new better C API.</p>
<p>In September, Antonio Cuni wrote <a class="reference external" href="https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html">Inside cpyext: Why emulating CPython C API is
so Hard</a>
article.</p>
</div>
<div class="section" id="year-2019">
<h2>Year 2019</h2>
<p>In February, I sent <a class="reference external" href="https://mail.python.org/archives/list/capi-sig@python.org/thread/WS6ATJWRUQZESGGYP3CCSVPF7OMPMNM6/">Update on CPython header files reorganization</a>
to the capi-sig list.</p>
<ul class="simple">
<li><tt class="docutils literal">Include/</tt>: limited C API</li>
<li><tt class="docutils literal">Include/cpython/</tt>: CPython C API</li>
<li><tt class="docutils literal">Include/internal/</tt>: CPython internal C API</li>
</ul>
<p>In March, I modified the Python debug build to make its ABI compatible with the
release build ABI:
<a class="reference external" href="https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build">What’s New In Python 3.8: Debug build uses the same ABI as release build</a>.</p>
<p>In May, I gave a lightning talk <a class="reference external" href="https://github.com/vstinner/conf/blob/master/2019-Pycon/status_stable_api_abi.pdf">Status of the stable API and ABI in Python 3.8</a>,
at the Language Summit (during Pycon US 2019):</p>
<ul class="simple">
<li>Convert macros to static inline functions</li>
<li>Install the internal C API</li>
<li>Debug build now ABI compatible with the release build ABI</li>
<li>Getting rid of global variables</li>
</ul>
<p>By the way, see my <a class="reference external" href="https://vstinner.github.io/split-include-directory-python38.html">Split Include/ directory in Python 3.8</a> article: I converted many macros in
Python 3.8.</p>
<p>In July, the <a class="reference external" href="https://hpy.readthedocs.io/">HPy project</a> was created during
EuroPython at Basel. There was an informal meeting which included core
developers of PyPy (Antonio, Armin and Ronan), CPython (Victor Stinner and Mark
Shannon) and Cython (Stefan Behnel).</p>
<p>In December, Antonio, Armin and Ronan had a small internal sprint to kick-off
the development of HPy: <a class="reference external" href="https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html">HPy kick-off sprint report</a></p>
</div>
<div class="section" id="year-2020">
<h2>Year 2020</h2>
<div class="section" id="april">
<h3>April</h3>
<p>I proposed <a class="reference external" href="https://mail.python.org/archives/list/python-dev@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#TKHNENOXP6H34E73XGFOL2KKXSM4Z6T2">PEP: Modify the C API to hide implementation details</a>
on the python-dev list. The main idea is to provide a new optimized Python
runtime which is backward incompatible on purpose, and continue to ship the
regular runtime which is fully backward compatible.</p>
</div>
<div class="section" id="june">
<h3>June</h3>
<p>I wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 -- Hide implementation details from the C API</a> and <a class="reference external" href="https://mail.python.org/archives/list/python-dev@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/">proposed the PEP to
python-dev</a>.
This PEP is my 3rd attempt to fix the C API: I rewrote it from scratch. Python
now distributes a new <tt class="docutils literal">pythoncapi_compat.h</tt> header and a process is defined
to reduce the number of broken C extensions when introducing C API incompatible
changes listed in this PEP.</p>
<p>I created the <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat project</a>: header file providing new
C API functions to old Python versions using static inline functions.</p>
</div>
<div class="section" id="december">
<h3>December</h3>
<p>I wrote a new <tt class="docutils literal">upgrade_pythoncapi.py</tt> script to add Python 3.10
support to an extension module without losing support with Python 2.7. I sent
<a class="reference external" href="https://mail.python.org/archives/list/capi-sig@python.org/thread/LFLXFMKMZ77UCDUFD5EQCONSAFFWJWOZ/">New script: add Python 3.10 support to your C extensions without losing Python
3.6 support</a>
to the capi-sig list.</p>
<p>The pythoncapi_compat project got its first users (bitarray, immutables,
python-zstandard)! It proves that the project is useful and needed.</p>
<p>I collaborated with the HPy project to create a manifesto explaining how the C
API prevents to optimize CPython and makes the CPython C API inefficient on
PyPy. It is still a draft.</p>
</div>
</div>
Leaks discovered by subinterpreters2020-12-23T14:00:00+01:002020-12-23T14:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-12-23:/subinterpreter-leaks.html<p>This article is about old reference leaks discovered or caused by the work on
isolating subinterpreters: leaks in 6 different modules (gc, _weakref, _abc,
_signal, _ast and _thread).</p>
<img alt="_thread GC bug" src="https://vstinner.github.io/images/thread_gc_bug.jpg" />
<div class="section" id="refleaks-buildbot-failures">
<h2>Refleaks buildbot failures</h2>
<p>With my work on isolating subinterpreters, old bugs about Python objects leaked
at Python exit are suddenly becoming blocker …</p></div><p>This article is about old reference leaks discovered or caused by the work on
isolating subinterpreters: leaks in 6 different modules (gc, _weakref, _abc,
_signal, _ast and _thread).</p>
<img alt="_thread GC bug" src="https://vstinner.github.io/images/thread_gc_bug.jpg" />
<div class="section" id="refleaks-buildbot-failures">
<h2>Refleaks buildbot failures</h2>
<p>With my work on isolating subinterpreters, old bugs about Python objects leaked
at Python exit are suddenly becoming blocker issues on buildbots.</p>
<p>When subinterpreters still share Python objects with the main interpreter, it
is ok-ish to leak these objects at Python exit. Right now (current master
branch), there are still more than 18 000 Python objects which are not
destroyed at Python exit:</p>
<pre class="literal-block">
$ ./python -X showrefcount -c pass
[18411 refs, 6097 blocks]
</pre>
<p>This issue is being solved in the <a class="reference external" href="https://bugs.python.org/issue1635741">bpo-1635741: Py_Finalize() doesn't clear all
Python objects at exit</a> which was
opened almost 14 years ago (2007).</p>
<p>When subinterpreters are better isolated, objects are no longer shared, and
suddenly these leaks make subinterpreters tests failing on Refleak buildbots.
For example, when an extension module is converted to the multiphase
initialization API (PEP 489) or when static types are converted to heap types,
these issues pop up.</p>
<p>It is a blocker issue for me, since I care of having only "green" buildbots (no
test failure), otherwise more serious regressions can be easily missed.</p>
</div>
<div class="section" id="per-interpreter-gc-state">
<h2>Per-interpreter GC state</h2>
<p>In November 2019, I made the state of the GC module per-interpreter in
<a class="reference external" href="https://bugs.python.org/issue36854">bpo-36854</a>
(<a class="reference external" href="https://github.com/python/cpython/commit/7247407c35330f3f6292f1d40606b7ba6afd5700">commit</a>)
and test_atexit started to leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_atexit -m test.test_atexit.SubinterpreterTest.test_callbacks_leak
test_atexit leaked [3988, 3986, 3988] references, sum=11962
</pre>
<p>I fixed the usage of the <tt class="docutils literal">PyModule_AddObject()</tt> function in the <tt class="docutils literal">_testcapi</tt>
module (<a class="reference external" href="https://github.com/python/cpython/commit/310e2d25170a88ef03f6fd31efcc899fe062da2c">commit</a>).</p>
<p>I also pushed a <strong>workaround</strong> in <tt class="docutils literal">finalize_interp_clear()</tt>:</p>
<pre class="literal-block">
+ /* bpo-36854: Explicitly clear the codec registry
+ and trigger a GC collection */
+ PyInterpreterState *interp = tstate->interp;
+ Py_CLEAR(interp->codec_search_path);
+ Py_CLEAR(interp->codec_search_cache);
+ Py_CLEAR(interp->codec_error_registry);
+ _PyGC_CollectNoFail();
</pre>
<p>I dislike having to push a "temporary" workaround, but the Python finalization
is really complex and fragile. Fixing the root issues would require too much
work, whereas I wanted to repair the Refleak buildbots as soon as possible.</p>
<p>In December 2019, the workaround was partially removed (<a class="reference external" href="https://github.com/python/cpython/commit/ac0e1c2694bc199dbd073312145e3c09bee52cc4">commit</a>):</p>
<pre class="literal-block">
- Py_CLEAR(interp->codec_search_path);
- Py_CLEAR(interp->codec_search_cache);
- Py_CLEAR(interp->codec_error_registry);
</pre>
<p>The year after (December 2020), the last GC collection was moved into
<tt class="docutils literal">PyInterpreterState_Clear()</tt>, before finalizating the GC (<a class="reference external" href="https://github.com/python/cpython/commit/eba5bf2f5672bf4861c626937597b85ac0c242b9">commit</a>).</p>
</div>
<div class="section" id="port-weakref-to-multiphase-init">
<h2>Port _weakref to multiphase init</h2>
<p>In March 2020, the <tt class="docutils literal">_weakref</tt> module was ported to the multiphase
initialization API (PEP 489) in <a class="reference external" href="https://bugs.python.org/issue40050">bpo-40050</a> and test_importlib started to leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_importlib
test_importlib leaked [6303, 6299, 6303] references, sum=18905
</pre>
<p>The analysis was quite long and complicated. The importlib imported some
extension modules twice and it has to inject frozen modules to "bootstrap" the
code.</p>
<p>At the end, I fixed the issue by removing the now unused <tt class="docutils literal">_weakref</tt> import in
<tt class="docutils literal">importlib._bootstrap_external</tt>
(<a class="reference external" href="https://github.com/python/cpython/commit/83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322">commit</a>).
The fix also avoids importing an extension module twice.</p>
</div>
<div class="section" id="convert-abc-static-types-to-heap-types">
<h2>Convert _abc static types to heap types</h2>
<p>In April 2020, the static types of the <tt class="docutils literal">_abc</tt> extension module were converted
to heap types in <a class="reference external" href="https://bugs.python.org/issue40077">bpo-40077</a>
(<a class="reference external" href="https://github.com/python/cpython/commit/53e4c91725083975598350877e2ed8e2d0194114">commit</a>) and
test_threading started to leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_threading
test_threading leaked [19, 19, 19] references, sum=57
</pre>
<p>I created <a class="reference external" href="https://bugs.python.org/issue40149">bpo-40149</a> to track the leak.</p>
<div class="section" id="objects-hold-a-reference-to-heap-types">
<h3>Objects hold a reference to heap types</h3>
<p>In March 2019, the <tt class="docutils literal">PyObject_Init()</tt> function was modified in <a class="reference external" href="https://bugs.python.org/issue35810">bpo-35810</a> to keep a strong reference (<tt class="docutils literal">INCREF</tt>)
to the type if the type is a heap type
(<a class="reference external" href="https://github.com/python/cpython/commit/364f0b0f19cc3f0d5e63f571ec9163cf41c62958">commit</a>):</p>
<pre class="literal-block">
+ if (PyType_GetFlags(tp) & Py_TPFLAGS_HEAPTYPE) {
+ Py_INCREF(tp);
+ }
</pre>
<p>I opened <a class="reference external" href="https://bugs.python.org/issue40217">bpo-40217: The garbage collector doesn't take in account that objects
of heap allocated types hold a strong reference to their type</a> to discuss the regression
(the test_threading leak).</p>
</div>
<div class="section" id="first-workaround-not-merged-force-a-second-garbage-collection">
<h3>First workaround (not merged): force a second garbage collection</h3>
<p>While analysing test_threading regression leak, I identified a first
workaround: add a second <tt class="docutils literal">_PyGC_CollectNoFail()</tt> call in
<tt class="docutils literal">finalize_interp_clear()</tt>.</p>
<p>It was only a workaround which helped to understand the issue, it was not
merged.</p>
</div>
<div class="section" id="first-fix-merged-abc-data-traverse">
<h3>First fix (merged): abc_data_traverse()</h3>
<p>I merged a first fix: add a traverse function to the <tt class="docutils literal">_abc._abc_data</tt> type
(<a class="reference external" href="https://github.com/python/cpython/commit/9cc3ebd7e04cb645ac7b2f372eaafa7464e16b9c">commit</a>):</p>
<pre class="literal-block">
+static int
+abc_data_traverse(_abc_data *self, visitproc visit, void *arg)
+{
+ Py_VISIT(self->_abc_registry);
+ Py_VISIT(self->_abc_cache);
+ Py_VISIT(self->_abc_negative_cache);
+ return 0;
+}
</pre>
</div>
<div class="section" id="second-workaround-not-merged-visit-the-type-in-abc-data-traverse">
<h3>Second workaround (not merged): visit the type in abc_data_traverse()</h3>
<p>A second workaround was identified: add <tt class="docutils literal"><span class="pre">Py_VISIT(Py_TYPE(self));</span></tt> to
the new <tt class="docutils literal">abc_data_traverse()</tt> function.</p>
<p>Again, it was only a workaround which helped to understand the issue, but it
was not merged.</p>
</div>
<div class="section" id="second-fix-merged-call-py-visit-py-type-self-automatically">
<h3>Second fix (merged): call Py_VISIT(Py_TYPE(self)) automatically</h3>
<p>20 days after I opened <a class="reference external" href="https://bugs.python.org/issue40217">bpo-40217</a>,
<strong>Pablo Galindo</strong> modified <tt class="docutils literal">PyType_FromSpec()</tt> to add a wrapper around the
traverse function of heap types to ensure that <tt class="docutils literal">Py_VISIT(Py_TYPE(self))</tt> is
always called (<a class="reference external" href="https://github.com/python/cpython/commit/0169d3003be3d072751dd14a5c84748ab63a249f">commit</a>).</p>
</div>
<div class="section" id="last-fix-merged-fix-every-traverse-function">
<h3>Last fix (merged): fix every traverse function</h3>
<p>In May 2020, <strong>Pablo Galindo</strong> changed his mind. He reverted his
<tt class="docutils literal">PyType_FromSpec()</tt> change and instead fixed traverse function of heap types
(<a class="reference external" href="https://github.com/python/cpython/commit/1cf15af9a6f28750f37b08c028ada31d38e818dd">commit</a>).</p>
<p>At the end, <tt class="docutils literal">abc_data_traverse()</tt> calls <tt class="docutils literal">Py_VISIT(Py_TYPE(self))</tt>. The
second "workaround" was the correct fix!</p>
</div>
</div>
<div class="section" id="convert-signal-to-multiphase-init">
<h2>Convert _signal to multiphase init</h2>
<p>In September 2020, <strong>Mohamed Koubaa</strong> ported the <tt class="docutils literal">_signal</tt> module to the
multiphase initialization API (PEP 489) in <a class="reference external" href="https://bugs.python.org/issue1635741">bpo-1635741</a> (<a class="reference external" href="https://github.com/python/cpython/commit/71d1bd9569c8a497e279f2fea6fe47cd70a87ea3">commit 71d1bd95</a>)
and test_interpreters started to leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_interpreters
test_interpreters leaked [237, 237, 237] references, sum=711
</pre>
<p>I created <a class="reference external" href="https://bugs.python.org/issue41713">bpo-41713</a> to track the
regression. Since I failed to find a simple fix, I started by reverting the
change which caused Refleak buildbots to fail (<a class="reference external" href="https://github.com/python/cpython/commit/4b8032e5a4994a7902076efa72fca1e2c85d8b7f">commit</a>).</p>
<p>I had to refactor the <tt class="docutils literal">_signal</tt> extension module code with multiple commits
to fix all bugs.</p>
<p>The first fix was to remove the <tt class="docutils literal">IntHandler</tt> variable: there was no need to
keep it alive, it was only needed once in <tt class="docutils literal">signal_module_exec()</tt>.</p>
<p>The second fix is to close the Windows event at exit:</p>
<pre class="literal-block">
+ #ifdef MS_WINDOWS
+ if (sigint_event != NULL) {
+ CloseHandle(sigint_event);
+ sigint_event = NULL;
+ }
+ #endif
</pre>
<p>The last fix, the most important, is to clear the strong reference to old
Python signal handlers when <tt class="docutils literal">signal_module_exec()</tt> is called more than once:</p>
<pre class="literal-block">
// If signal_module_exec() is called more than one, we must
// clear the strong reference to the previous function.
Py_XSETREF(Handlers[signum].func, Py_NewRef(func));
</pre>
<p>The <tt class="docutils literal">_signal</tt> module is not well isolated for subinterpreters yet, but at
least it no longer leaks.</p>
</div>
<div class="section" id="per-interpreter-ast-state">
<h2>Per-interpreter _ast state</h2>
<p>In September 2019, the <tt class="docutils literal">_ast</tt> extension module was converted to PEP 384
(stable ABI) in <a class="reference external" href="https://bugs.python.org/issue38113">bpo-38113</a> (<a class="reference external" href="https://github.com/python/cpython/commit/ac46eb4ad6662cf6d771b20d8963658b2186c48c">commit</a>):
the AST state moves into a module state.</p>
<p>This change caused 3 different bugs including crashes (<a class="reference external" href="https://bugs.python.org/issue41194">bpo-41194</a>, <a class="reference external" href="https://bugs.python.org/issue41261">bpo-41261</a>, <a class="reference external" href="https://bugs.python.org/issue41631">bpo-41631</a>). The issue is complex since there are
public C APIs which require to access AST types, whereas it became possible to
have multiple <tt class="docutils literal">_ast</tt> extension module instances.</p>
<p>In July 2020, I fixed the root issue in <a class="reference external" href="https://bugs.python.org/issue41194">bpo-41194</a> by replacing the module state with a
global state (<a class="reference external" href="https://github.com/python/cpython/commit/91e1bc18bd467a13bceb62e16fbc435b33381c82">commit</a>):</p>
<pre class="literal-block">
static astmodulestate global_ast_state;
</pre>
<p>A global state is bad for subinterpreters. In November 2020, I made the AST
state per-interpreter in <a class="reference external" href="https://bugs.python.org/issue41796">bpo-41796</a>
(<a class="reference external" href="https://github.com/python/cpython/commit/5cf4782a2630629d0978bf4cf6b6340365f449b2">commit</a>
and test_ast started to leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_ast
test_ast leaked [23640, 23636, 23640] references, sum=70916
</pre>
<p>The fix is to call <tt class="docutils literal">_PyAST_Fini()</tt> earlier (<a class="reference external" href="https://github.com/python/cpython/commit/fd957c124c44441d9c5eaf61f7af8cf266bafcb1">commit</a>).</p>
<p>Python types contain a reference to themselves in in their
<tt class="docutils literal">PyTypeObject.tp_mro</tt> member (the MRO tuple: Method Resolution Order).
<tt class="docutils literal">_PyAST_Fini()</tt> must called before the last GC collection to destroy AST
types.</p>
<p><tt class="docutils literal">_PyInterpreterState_Clear()</tt> now calls <tt class="docutils literal">_PyAST_Fini()</tt>. It now also
calls <tt class="docutils literal">_PyWarnings_Fini()</tt> on subinterpeters, not only on the main
interpreter.</p>
</div>
<div class="section" id="thread-lock-traverse">
<h2>_thread lock traverse</h2>
<p>In December 2020, while I tried to port the <tt class="docutils literal">_thread</tt> extesnion module to the multiphase initialization API
(PEP 489), test_threading started to leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_threading
test_threading leaked [56, 56, 56] references, sum=168
</pre>
<p>As usual, the workaround was to force a second GC collection in <tt class="docutils literal">interpreter_clear()</tt>:</p>
<pre class="literal-block">
/* Last garbage collection on this interpreter */
_PyGC_CollectNoFail(tstate);
+ _PyGC_CollectNoFail(tstate);
_PyGC_Fini(tstate);
</pre>
<p>It took me two days to full understand the problem. I drew reference cycles
on paper to help me to understand the problem:</p>
<img alt="_thread GC bug" src="https://vstinner.github.io/images/thread_gc_bug.jpg" />
<p>There are two cycles:</p>
<ul class="simple">
<li>Cycle 1:<ul>
<li>at fork function</li>
<li>-> __main__ module dict</li>
<li>-> at fork function</li>
</ul>
</li>
<li>Cycle 2:<ul>
<li>_thread lock type</li>
<li>-> lock type methods</li>
<li>-> _thread module dict</li>
<li>-> _thread local type</li>
<li>-> _thread module</li>
<li>-> _thread module state</li>
<li>-> _thread lock type</li>
</ul>
</li>
</ul>
<p>Moreover, there is a link between these two reference cycles: an instance of
the lock type.</p>
<p>I fixed the issue by adding a traverse function to the lock type and add
<tt class="docutils literal">Py_TPFLAGS_HAVE_GC</tt> flag to the type (<a class="reference external" href="https://github.com/python/cpython/commit/6104013838e181e3c698cb07316f449a0c31ea96">commit</a>):</p>
<pre class="literal-block">
+static int
+lock_traverse(lockobject *self, visitproc visit, void *arg)
+{
+ Py_VISIT(Py_TYPE(self));
+ return 0;
+}
</pre>
</div>
<div class="section" id="notes-on-weird-gc-bugs">
<h2>Notes on weird GC bugs</h2>
<ul class="simple">
<li><tt class="docutils literal">gc.get_referents()</tt> and <tt class="docutils literal">gc.get_referrers()</tt> can be used to check
traverse functions.</li>
<li><tt class="docutils literal">gc.is_tracked()</tt> can be used to check if the GC tracks an object.</li>
<li>Using the <tt class="docutils literal">gdb</tt> debugger on <tt class="docutils literal">gc_collect_main()</tt> helps to see which
objects are collected. See for example the <tt class="docutils literal">finalize_garbage()</tt> functions
which calls finalizers on unreachable objects.</li>
<li>The solution is usually a missing traverse functions or a missing
<tt class="docutils literal">Py_VISIT()</tt> in an existing traverse function.</li>
<li>GC bugs are hard to debug :-)</li>
</ul>
<p>Thanks <strong>Pablo Galindo</strong> for helping me to debug all these tricky GC bugs!</p>
<p>Thanks to everybody who are helping to better isolate subintrepreters by
converting extension modules to the multiphase initialization API (PEP 489) and
by converting dozens of static types to heap types. We made huge progresses
last months!</p>
</div>
GIL bugfixes for daemon threads in Python 3.92020-04-04T22:00:00+02:002020-04-04T22:00:00+02:00Victor Stinnertag:vstinner.github.io,2020-04-04:/gil-bugfixes-daemon-threads-python39.html<a class="reference external image-reference" href="https://twitter.com/Bouletcorp/status/1241018332112998401"><img alt="`#CoronaMaison by Boulet" src="https://vstinner.github.io/images/coronamaison_boulet.jpg" /></a>
<p>My previous article <a class="reference external" href="https://vstinner.github.io/daemon-threads-python-finalization-python32.html">Daemon threads and the Python finalization in Python 3.2 and 3.3</a> introduces
issues caused by daemon threads in the Python finalization and past changes to
make them work.</p>
<p>This article is about bugfixes of the infamous GIL (Global Interpreter Lock) in
Python 3.9, between …</p><a class="reference external image-reference" href="https://twitter.com/Bouletcorp/status/1241018332112998401"><img alt="`#CoronaMaison by Boulet" src="https://vstinner.github.io/images/coronamaison_boulet.jpg" /></a>
<p>My previous article <a class="reference external" href="https://vstinner.github.io/daemon-threads-python-finalization-python32.html">Daemon threads and the Python finalization in Python 3.2 and 3.3</a> introduces
issues caused by daemon threads in the Python finalization and past changes to
make them work.</p>
<p>This article is about bugfixes of the infamous GIL (Global Interpreter Lock) in
Python 3.9, between March 2019 and March 2020, for daemon threads during Python
finalization. Some bugs were old: up to 6 years old. Some bugs were triggered
by the on-going work on isolating subinterpreters in Python 3.9.</p>
<p>Drawing: <a class="reference external" href="https://twitter.com/Bouletcorp/status/1241018332112998401">#CoronaMaison by Boulet</a>.</p>
<div class="section" id="fix-1-exit-pyeval-acquirethread-if-finalizing">
<h2>Fix 1: Exit PyEval_AcquireThread() if finalizing</h2>
<p>In March 2019, <strong>Remy Noel</strong> created <a class="reference external" href="https://bugs.python.org/issue36469">bpo-36469</a>: a multithreaded Python application
using 20 daemon threads hangs randomly at exit on Python 3.5:</p>
<blockquote>
The bug happens about once every two weeks on a script that is fired more
than 10K times a day.</blockquote>
<p><strong>Eric Snow</strong> analyzed the bug and understood that it is related to daemon
threads and Python finalization. He identified that <tt class="docutils literal">PyEval_AcquireLock()</tt>
and <tt class="docutils literal">PyEval_AcquireThread()</tt> function take the GIL but don't exit the thread
if Python is finalizing.</p>
<p>When Python is finalizing and a daemon thread takes the GIL, Python can hang
randomly.</p>
<p>Eric created <a class="reference external" href="https://bugs.python.org/issue36475">bpo-36475</a> to propose to
modify <tt class="docutils literal">PyEval_AcquireLock()</tt> and <tt class="docutils literal">PyEval_AcquireThread()</tt> to also exit
the thread in this case. In April 2019, <strong>Joannah Nanjekye</strong> fixed the issue
with <a class="reference external" href="https://github.com/python/cpython/commit/f781d202a2382731b43bade845a58d28a02e9ea1">commit f781d202</a>:</p>
<pre class="literal-block">
bpo-36475: Finalize PyEval_AcquireLock() and PyEval_AcquireThread() properly (GH-12667)
PyEval_AcquireLock() and PyEval_AcquireThread() now
terminate the current thread if called while the interpreter is
finalizing, making them consistent with PyEval_RestoreThread(),
Py_END_ALLOW_THREADS, and PyGILState_Ensure().
</pre>
<p>The fix adds <tt class="docutils literal">exit_thread_if_finalizing()</tt> function which exit the thread if
Python is finalizing. This function is called after each <tt class="docutils literal">take_gil()</tt> call.</p>
<p>The fix is very similar to <tt class="docutils literal">PyEval_RestoreThread()</tt> fix made in 2013 (<a class="reference external" href="https://github.com/python/cpython/commit/0d5e52d3469a310001afe50689f77ddba6d554d1">commit
0d5e52d3</a>)
to fix <a class="reference external" href="https://bugs.python.org/issue1856#msg60014">bpo-1856</a> (Python crash
involving daemon threads during Python exit).</p>
</div>
<div class="section" id="fix-2-pyeval-restorethread-on-freed-tstate">
<h2>Fix 2: PyEval_RestoreThread() on freed tstate</h2>
<div class="section" id="concurrent-futures-crash-on-freebsd">
<h3>concurrent.futures crash on FreeBSD</h3>
<p>In December 2019, I reported <a class="reference external" href="https://bugs.python.org/issue39088">bpo-39088</a>:
test_concurrent_futures <strong>crashed randomly</strong> with a coredump on AMD64 FreeBSD
Shared 3.x buildbot. In March 2020, I succeeded to reproduce the bug on FreeBSD
and I was able to debug the coredump in gdb:</p>
<pre class="literal-block">
(gdb) frame
#0 0x00000000003b518c in PyEval_RestoreThread (tstate=0x801f23790) at Python/ceval.c:387
387 _PyRuntimeState *runtime = tstate->interp->runtime;
(gdb) p tstate->interp
$3 = (PyInterpreterState *) 0xdddddddddddddddd
</pre>
<p>The Python thread state (<tt class="docutils literal">tstate</tt>) was freed. In debug mode, the "free()"
function of the Python memory allocator fills the freed memory block with
<tt class="docutils literal">0xDD</tt> byte pattern (<tt class="docutils literal">D</tt> stands for dead byte) to detect usage of freed
memory.</p>
<p>The problem is that Python finalization already freed the memory of all
PyThreadState structures, when <tt class="docutils literal">PyEval_RestoreThread(tstate)</tt> is called by a
daemon thread. <tt class="docutils literal">PyEval_RestoreThread()</tt> dereferences <tt class="docutils literal">tstate</tt>:</p>
<pre class="literal-block">
_PyRuntimeState *runtime = tstate->interp->runtime;
</pre>
<p>This bug is a regression caused by my change:
<a class="reference external" href="https://github.com/python/cpython/commit/01b1cc12e7c6a3d6a3d27ba7c731687d57aae92a">Add PyInterpreterState.runtime field</a>
of <a class="reference external" href="https://bugs.python.org/issue36710">bpo-36710</a>. I replaced:</p>
<pre class="literal-block">
void PyEval_RestoreThread(PyThreadState *tstate) {
_PyRuntimeState *runtime = &_PyRuntime;
...
}
</pre>
<p>with:</p>
<pre class="literal-block">
void PyEval_RestoreThread(PyThreadState *tstate) {
_PyRuntimeState *runtime = tstate->interp->runtime;
...
}
</pre>
</div>
<div class="section" id="fix-pyeval-restorethread-for-daemon-threads">
<h3>Fix PyEval_RestoreThread() for daemon threads</h3>
<p>I created <a class="reference external" href="https://bugs.python.org/issue39877">bpo-39877</a> to investigate
this bug. I managed to reproduce the crash on Linux with a script spawning
daemon threads which sleep randomly between 0.0 and 1.0 second, and by adding
<tt class="docutils literal">sleep(1);</tt> call at <tt class="docutils literal">Py_RunMain()</tt> exit.</p>
<p>I wrote a <tt class="docutils literal">PyEval_RestoreThread()</tt> fix which access to
<tt class="docutils literal">_PyRuntimeState.finalizing</tt> without the GIL.</p>
<p><strong>Antoine Pitrou</strong> asked me to convert <tt class="docutils literal">_PyRuntimeState.finalizing</tt> to an
atomic variable to avoid inconsistencies in case of parallel accesses. At March
7, 2020, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/7b3c252dc7f44d4bdc4c7c82d225ebd09c78f520">commit 7b3c252d</a>:</p>
<pre class="literal-block">
bpo-39877: _PyRuntimeState.finalizing becomes atomic (GH-18816)
Convert _PyRuntimeState.finalizing field to an atomic variable:
* Rename it to _finalizing
* Change its type to _Py_atomic_address
* Add _PyRuntimeState_GetFinalizing() and _PyRuntimeState_SetFinalizing()
functions
* Remove _Py_CURRENTLY_FINALIZING() function: replace it with testing
directly _PyRuntimeState_GetFinalizing() value
Convert _PyRuntimeState_GetThreadState() to static inline function.
</pre>
<p>The day after, I pushed my fix, <a class="reference external" href="https://github.com/python/cpython/commit/eb4e2ae2b8486e8ee4249218b95d94a9f0cc513e">commit eb4e2ae2</a>:</p>
<pre class="literal-block">
bpo-39877: Fix PyEval_RestoreThread() for daemon threads (GH-18811)
* exit_thread_if_finalizing() does now access directly _PyRuntime
variable, rather than using tstate->interp->runtime since tstate
can be a dangling pointer after Py_Finalize() has been called.
* exit_thread_if_finalizing() is now called *before* calling
take_gil(). _PyRuntime.finalizing is an atomic variable,
we don't need to hold the GIL to access it.
</pre>
<p><tt class="docutils literal">exit_thread_if_finalizing()</tt> is now called <strong>before</strong> <tt class="docutils literal">take_gil()</tt> to
ensure that <tt class="docutils literal">take_gil()</tt> cannot be called with an invalid Python thread state
(<tt class="docutils literal">tstate</tt>).</p>
<p>I commented <em>naively</em>:</p>
<blockquote>
Ok, it should now be fixed.</blockquote>
</div>
</div>
<div class="section" id="clear-python-thread-states-earlier-my-first-failed-attempt-in-2013">
<h2>Clear Python thread states earlier: my first failed attempt in 2013</h2>
<p>In 2013, I opened <a class="reference external" href="https://bugs.python.org/issue19466">bpo-19466</a> to clear
earlier the Python thread state of threads during Python finalization. My
intent was to display <tt class="docutils literal">ResourceWarning</tt> warnings of daemon threads as well.
In November 2013, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/45956b9a33af634a2919ade64c1dd223ab2d5235">commit 45956b9a</a>:</p>
<pre class="literal-block">
Close #19466: Clear the frames of daemon threads earlier during the Python
shutdown to call objects destructors. So "unclosed file" resource warnings
are now correctly emitted for daemon threads.
</pre>
<p>Later, I discovered a crash in the the garbage collector while trying to
reproduce a race condition in asyncio: I created <a class="reference external" href="https://bugs.python.org/issue20526">bpo-20526</a>. Sadly, this bug was trigger by my
previous change. I decided that it's safer to revert my change.</p>
<p>By the way, when I looked again at <a class="reference external" href="https://bugs.python.org/issue20526">bpo-20526</a>, I was able to reproduce again the
garbage collector bug, likely because of recent changes. With the help of
<strong>Pablo Galindo Salgado</strong>, Pablo and me <a class="reference external" href="https://bugs.python.org/issue20526#msg364851">understood the root issue</a>. At March 24, 2020, I pushed
a fix (<a class="reference external" href="https://github.com/python/cpython/commit/5804f878e779712e803be927ca8a6df389d82cdf">commit</a>)
to finally fix this 6 years old bug! The fix removes the following line from
<tt class="docutils literal">PyThreadState_Clear()</tt>:</p>
<pre class="literal-block">
Py_CLEAR(tstate->frame);
</pre>
</div>
<div class="section" id="fix-3-exit-also-take-gil-at-exit-point-if-finalizing">
<h2>Fix 3: Exit also take_gil() at exit point if finalizing</h2>
<p>After fixing <tt class="docutils literal">PyEval_RestoreThread()</tt>, I decided to attempt again to fix
<a class="reference external" href="https://bugs.python.org/issue19466">bpo-19466</a> (clear earlier Python thread
states). Sadly, I discovered that my <tt class="docutils literal">PyEval_RestoreThread()</tt> fix
<strong>introduced a race condition</strong>!</p>
<p>While the main thread finalizes Python, daemon threads can be waiting for the
GIL: they block in <tt class="docutils literal">take_gil()</tt>. When the main thread releases the GIL during
finalization, a daemon thread take the GIL instead of exiting. Daemon threads
only check if they must exit <strong>before</strong> trying to take the GIL.</p>
<p>The solution is to call <tt class="docutils literal">exit_thread_if_finalizing()</tt> twice in
<tt class="docutils literal">take_gil()</tt>: before <strong>and</strong> after taking the GIL.</p>
<p>In March 2020, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/9229eeee105f19705f72e553cf066751ac47c7b7">commit 9229eeee</a>:</p>
<pre class="literal-block">
bpo-39877: take_gil() checks tstate_must_exit() twice (GH-18890)
take_gil() now also checks tstate_must_exit() after acquiring
the GIL: exit the thread if Py_Finalize() has been called.
</pre>
<p>I commented:</p>
<blockquote>
<p>I ran multiple times <tt class="docutils literal">daemon_threads_exit.py</tt> with <tt class="docutils literal">slow_exit.patch</tt>:
no crash.</p>
<p>I also ran multiple times <tt class="docutils literal">stress.py</tt> + <tt class="docutils literal">sleep_at_exit.patch</tt> of
bpo-37135: no crash.</p>
<p>And I tested <tt class="docutils literal">asyncio_gc.py</tt> of bpo-19466: no crash neither.</p>
<p><strong>Python finalization now looks reliable.</strong> I'm not sure if it's "more"
reliable than previously, but at least, I cannot get a crash anymore, even
after bpo-19466 has been fixed (clear Python thread states of daemon
threads earlier).</p>
</blockquote>
<p>Funny fact, in June 2019, <strong>Eric Snow</strong> added a very similar bug in <a class="reference external" href="https://bugs.python.org/issue36818">bpo-36818</a> with <a class="reference external" href="https://github.com/python/cpython/commit/396e0a8d9dc65453cb9d53500d0a620602656cfe">commit 396e0a8d</a>:
test_multiprocessing_spawn segfault on FreeBSD (<a class="reference external" href="https://bugs.python.org/issue37135">bpo-37135</a>). I reverted his change to fix the
issue. At this time, I didn't have the bandwidth to investigate the root cause.
I just reverted Eric's change.</p>
</div>
<div class="section" id="fix-4-exit-take-gil-while-waiting-for-the-gil-if-finalizing">
<h2>Fix 4: Exit take_gil() while waiting for the GIL if finalizing</h2>
<p>While I was working on moving pending calls from <tt class="docutils literal">_PyRuntime</tt> to
<tt class="docutils literal">PyInterpreterState</tt>, <a class="reference external" href="https://bugs.python.org/issue39984">bpo-3998</a>, I had
another bug.</p>
<p>At March 18, 2020, I pushed a <tt class="docutils literal">take_gil()</tt> fix to avoid accessing <tt class="docutils literal">tstate</tt>
if Python is finalizing, <a class="reference external" href="https://github.com/python/cpython/commit/29356e03d4f8800b04f799efe7a10e3ce8b16f61">commit 29356e03</a>:</p>
<pre class="literal-block">
bpo-39877: Fix take_gil() for daemon threads (GH-19054)
bpo-39877, bpo-39984: If the thread must exit, don't access tstate to
prevent a potential crash: tstate memory has been freed.
</pre>
<p>And while working on the inefficient signal handling in multithreaded
applications (<a class="reference external" href="https://bugs.python.org/issue40010">bpo-40010</a>), I discovered
that the previous fix was not enough!</p>
<p>At March 19, 2020, I pushed a <tt class="docutils literal">take_gil()</tt> fix to exit while <tt class="docutils literal">take_gil()</tt>
is waiting for the GIL if Python is finalizing, <a class="reference external" href="https://github.com/python/cpython/commit/a36adfa6bbf5e612a4d4639124502135690899b8">commit a36adfa6</a>:</p>
<pre class="literal-block">
bpo-39877: 4th take_gil() fix for daemon threads (GH-19080)
bpo-39877, bpo-40010: Add a third tstate_must_exit() check in
take_gil() to prevent using tstate which has been freed.
</pre>
<p>I can only hope that this fix is the last one to fix all corner cases with
daemon threads in <tt class="docutils literal">take_gil()</tt> (<a class="reference external" href="https://bugs.python.org/issue39877">bpo-39877</a>)!</p>
</div>
<div class="section" id="summary-of-gil-bugfixes">
<h2>Summary of GIL bugfixes</h2>
<p>The GIL got 5 main bugfixes for daemon threads and Python finalization:</p>
<ul class="simple">
<li>May 2011, <strong>Antoine Pitrou</strong>,
<a class="reference external" href="https://github.com/python/cpython/commit/0d5e52d3469a310001afe50689f77ddba6d554d1">commit 0d5e52d3</a>:
<tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>after</strong> taking the GIL (1 check)</li>
<li>April 2019, <strong>Joannah Nanjekye</strong>,
<a class="reference external" href="https://github.com/python/cpython/commit/f781d202a2382731b43bade845a58d28a02e9ea1">commit f781d202</a>:
PyEval_AcquireLock() and PyEval_AcquireThread() also exit if Python is finalizing</li>
<li>March 8, 2020, <strong>Victor Stinner</strong>,
<a class="reference external" href="https://github.com/python/cpython/commit/eb4e2ae2b8486e8ee4249218b95d94a9f0cc513e">commit eb4e2ae2</a>:
<tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>before</strong> taking the GIL (1 check)</li>
<li>March 9, 2020, <strong>Victor Stinner</strong>,
<a class="reference external" href="https://github.com/python/cpython/commit/9229eeee105f19705f72e553cf066751ac47c7b7">commit 9229eeee</a>:
<tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>before and after</strong> taking the GIL (2 checks)</li>
<li>March 19, 2020, <strong>Victor Stinner</strong>,
<a class="reference external" href="https://github.com/python/cpython/commit/a36adfa6bbf5e612a4d4639124502135690899b8">commit a36adfa6</a>:
<tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>before, while, and after</strong> taking the GIL (3 checks)</li>
</ul>
</div>
Threading shutdown race condition2020-04-03T20:00:00+02:002020-04-03T20:00:00+02:00Victor Stinnertag:vstinner.github.io,2020-04-03:/threading-shutdown-race-condition.html<p>This article is about a race condition in threading shutdown that I fixed in
Python 3.9 in March 2019. I also forbid spawning daemon threads in
subinterpreters to fix another related bug.</p>
<a class="reference external image-reference" href="https://twitter.com/neeljulien/status/1240292383369150464"><img alt="#CoronaMaison by Julien Neel" src="https://vstinner.github.io/images/coronamaison_jneel.jpg" /></a>
<p>Drawing: <a class="reference external" href="https://twitter.com/neeljulien/status/1240292383369150464">#CoronaMaison by Julien Neel</a>.</p>
<div class="section" id="race-condition-in-threading-shutdown">
<h2>Race condition in threading shutdown</h2>
<div class="section" id="random-test-failure-noticed-on-freebsd-buildbot">
<h3>Random test failure noticed on FreeBSD buildbot …</h3></div></div><p>This article is about a race condition in threading shutdown that I fixed in
Python 3.9 in March 2019. I also forbid spawning daemon threads in
subinterpreters to fix another related bug.</p>
<a class="reference external image-reference" href="https://twitter.com/neeljulien/status/1240292383369150464"><img alt="#CoronaMaison by Julien Neel" src="https://vstinner.github.io/images/coronamaison_jneel.jpg" /></a>
<p>Drawing: <a class="reference external" href="https://twitter.com/neeljulien/status/1240292383369150464">#CoronaMaison by Julien Neel</a>.</p>
<div class="section" id="race-condition-in-threading-shutdown">
<h2>Race condition in threading shutdown</h2>
<div class="section" id="random-test-failure-noticed-on-freebsd-buildbot">
<h3>Random test failure noticed on FreeBSD buildbot</h3>
<p>In March 2019, I noticed that <tt class="docutils literal">test_threading.test_threads_join_2()</tt> was
killed by SIGABRT on the FreeBSD CURRENT buildbot, <a class="reference external" href="https://bugs.python.org/issue36402">bpo-36402</a>:</p>
<pre class="literal-block">
Fatal Python error: Py_EndInterpreter: not the last thread
</pre>
<p>The <tt class="docutils literal">test_threads_join_2()</tt> test <strong>failed randomly</strong> on buildbots when tests
were <strong>run in parallel</strong>, but test_threading <strong>passed</strong> when it was <strong>re-run
sequentially</strong>. Such failure was silently ignored, since the build was seen
overall as a success.</p>
<p>The test <tt class="docutils literal">test_threading.test_threads_join_2()</tt> was added by in 2013 <a class="reference external" href="https://github.com/python/cpython/commit/7b4769937fb612d576b6829c3b834f3dd31752f1">commit
7b476993</a>.</p>
<p>In 2016, I already reported the same test failure: <a class="reference external" href="https://bugs.python.org/issue27791">bpo-27791</a> (same test, also on FreeBSD). And
Christian Heimes reported a similar issue: <a class="reference external" href="https://bugs.python.org/issue28084">bpo-28084</a>. I simply closed these issues because I
only saw the failure once in 4 months and <strong>I didn't have access to FreeBSD to
attempt to reproduce the crash</strong>.</p>
</div>
<div class="section" id="reproduce-the-race-condition">
<h3>Reproduce the race condition</h3>
<p>In 2019, I had a FreeBSD VM to attempt to reproduce the bug locally.</p>
<p>In June 2019, I found a reliable way to reproduce the bug by <a class="reference external" href="https://github.com/python/cpython/pull/13889/files">adding random
sleeps to the test</a>. With
this patch, I was also able to reproduce the bug on Linux. <strong>I am way more
comfortable to debug an issue on Linux</strong> with my favorite debugging tools!</p>
<p>I identified a race condition in the Python finalization. I also understood
that the bug was not specific to subinterpreters:</p>
<blockquote>
The test shows the bug using subinterpreters (Py_EndInterpreter), but
<strong>the bug also exists in Py_Finalize()</strong> which has the same race condition.</blockquote>
<p>I wrote a patch for <tt class="docutils literal">Py_Finalize()</tt> to help me to reproduce the bug without
subinterpreters:</p>
<pre class="literal-block">
+ if (tstate != interp->tstate_head || tstate->next != NULL) {
+ Py_FatalError("Py_EndInterpreter: not the last thread");
+ }
</pre>
</div>
<div class="section" id="threading-shutdown-race-condition-1">
<h3>threading._shutdown() race condition</h3>
<p><tt class="docutils literal">threading._shutdown()</tt> uses <tt class="docutils literal">threading.enumerate()</tt> which iterates on
<tt class="docutils literal">threading._active</tt> dictionary.</p>
<p><tt class="docutils literal">threading.Thread</tt> registers itself into <tt class="docutils literal">threading._active</tt> when the
thread starts. It unregisters itself from <tt class="docutils literal">threading._active</tt> when it
completes.</p>
<p>The bug occurs when the thread is unregistered whereas the underlying native
thread is still running and <strong>the Python thread state is not deleted yet</strong>.</p>
<p><tt class="docutils literal">_thread._set_sentinel()</tt> creates a lock and registers a
<tt class="docutils literal"><span class="pre">tstate->on_delete</span></tt> callback to release this lock. It's called by
<tt class="docutils literal">threading.Thread</tt> when the thread starts to set
<tt class="docutils literal">threading.Thread._tstate_lock</tt>. This lock is used by
<tt class="docutils literal">threading.Thread.join()</tt> method to wait until the thread completes.</p>
<p><tt class="docutils literal">_thread.start_new_thread()</tt> calls the C function <tt class="docutils literal">t_bootstrap()</tt> which
ends with:</p>
<pre class="literal-block">
tstate->interp->num_threads--;
PyThreadState_Clear(tstate);
PyThreadState_DeleteCurrent();
PyThread_exit_thread();
</pre>
<p>When the native thread completes, <tt class="docutils literal">_PyThreadState_DeleteCurrent()</tt> is called:
it calls <tt class="docutils literal"><span class="pre">tstate->on_delete()</span></tt> callback which releases
<tt class="docutils literal">threading.Thread._tstate_lock</tt> lock.</p>
<p>The root issue is that:</p>
<ul class="simple">
<li><tt class="docutils literal">threading._shutdown()</tt> rely on <tt class="docutils literal">threading._alive</tt> dictionary</li>
<li><tt class="docutils literal">Py_EndInterpreter()</tt> rely on the interpreter linked list of Python thread
states of the interpreter (<tt class="docutils literal"><span class="pre">interp->tstate_head</span></tt>).</li>
</ul>
<p>The lock on Python thread states (<tt class="docutils literal">threading.Thread._tstate_lock</tt>) and
<tt class="docutils literal">PyThreadState.on_delete</tt> callback were added in 2013 by <strong>Antoine Pitrou</strong>
to Python 3.4, <a class="reference external" href="https://github.com/python/cpython/commit/7b4769937fb612d576b6829c3b834f3dd31752f1">commit 7b476993</a>
of <a class="reference external" href="https://bugs.python.org/issue18808">bpo-18808</a>:</p>
<pre class="literal-block">
Issue #18808: Thread.join() now waits for the underlying thread state
to be destroyed before returning. This prevents unpredictable aborts
in Py_EndInterpreter() when some non-daemon threads are still running.
</pre>
</div>
<div class="section" id="fix-threading-shutdown">
<h3>Fix threading._shutdown()</h3>
<p>Finally in June 2019, I fixed the race condition in <tt class="docutils literal">threading._shutdown()</tt>
with <a class="reference external" href="https://github.com/python/cpython/commit/468e5fec8a2f534f1685d59da3ca4fad425c38dd">commit 468e5fec</a>:</p>
<pre class="literal-block">
bpo-36402: Fix threading._shutdown() race condition (GH-13948)
Fix a race condition at Python shutdown when waiting for threads. Wait
until the Python thread state of all non-daemon threads get deleted
(join all non-daemon threads), rather than just wait until Python
threads complete.
</pre>
<p>The fix is to modify <tt class="docutils literal">threading._shutdown()</tt> to wait until the Python thread
state of all non-daemon threads get deleted, rather than calling the <tt class="docutils literal">join()</tt>
method of all non-daemon threads. The <tt class="docutils literal">join()</tt> does not ensure that the
Python thread state is deleted.</p>
<p>The Python finalization calls <tt class="docutils literal">threading._shutdown()</tt> to wait until all
threads complete. Only non-daemon threads are awaited: daemon threads can
continue to run after <tt class="docutils literal">threading._shutdown()</tt>.</p>
<p><tt class="docutils literal">Py_EndInterpreter()</tt> requires that the Python thread states of all threads
have been deleted. <strong>What about daemon threads?</strong> More about that in the next
section ;-)</p>
<p>Note: This change introduced a regression (memory leak) which is not fixed yet:
<a class="reference external" href="https://bugs.python.org/issue37788">bpo-37788</a>.</p>
</div>
</div>
<div class="section" id="forbid-daemon-threads-in-subinterpreters">
<h2>Forbid daemon threads in subinterpreters</h2>
<p>In June 2019, while fixing the threading shutdown, I found a reliable way to
trigger a bug with daemon threads when a subinterpreter is finalized:</p>
<pre class="literal-block">
Fatal Python error: Py_EndInterpreter: not the last thread
</pre>
<p>By design, daemon threads can run after a Python interpreter is finalized,
whereas <tt class="docutils literal">Py_EndInterpreter()</tt> requires that all threads completed.</p>
<p>I reported <a class="reference external" href="https://bugs.python.org/issue37266">bpo-37266</a> to propose to
forbid the creation of daemon threads in subinterpreters. I fixed the issue
with <a class="reference external" href="https://github.com/python/cpython/commit/066e5b1a917ec2134e8997d2cadd815724314252">commit 066e5b1a</a>:</p>
<pre class="literal-block">
bpo-37266: Daemon threads are now denied in subinterpreters (GH-14049)
In a subinterpreter, spawning a daemon thread now raises an
exception. Daemon threads were never supported in subinterpreters.
Previously, the subinterpreter finalization crashed with a Pyton
fatal error if a daemon thread was still running.
</pre>
<p>The change adds this check to <tt class="docutils literal">Thread.start()</tt>:</p>
<pre class="literal-block">
if self.daemon and not _is_main_interpreter():
raise RuntimeError("daemon thread are not supported "
"in subinterpreters")
</pre>
<p>I commented:</p>
<blockquote>
<strong>Daemon threads must die.</strong> That's a first step towards their death!</blockquote>
<p><strong>Antoine Pitrou</strong> created <a class="reference external" href="https://bugs.python.org/issue39812">bpo-39812: Avoid daemon threads in
concurrent.futures</a> as a follow-up.</p>
<p>In February 2020, when rebuilding Fedora Rawhide with Python 3.9, <strong>Miro
Hrončok</strong> of my team noticed that my change <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1792062">broke the python-jep project</a>. I <a class="reference external" href="https://github.com/ninia/jep/issues/229">reported the bug
upstream</a>. It has been fixed by
using regular threads, rather than daemon threads: <a class="reference external" href="https://github.com/ninia/jep/commit/a31d461c6cacc96de68d68320eaa83e19a45d0cc">commit</a>.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>A random failure on a FreeBSD buildbot was hiding a severe race condition in
the threading shutdown. The bug existed since 2013, but was silently ignored
since the test passed when re-run.</p>
<p>The race condition was that that the threading shutdown didn't ensure that the
Python thread state of all non-daemon threads are deleted, whereas it is a
<tt class="docutils literal">Py_EndInterpreter()</tt> requirement.</p>
<p>I fixed the threading shutdown by waiting until the Python thread state of all
non-daemon threads is deleted.</p>
<p>I also modified <tt class="docutils literal">Thread.start()</tt> to forbid spawning daemon threads in Python
subinterpreters to fix a related issue.</p>
</div>
Daemon threads and the Python finalization in Python 3.2 and 3.32020-03-26T22:00:00+01:002020-03-26T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-03-26:/daemon-threads-python-finalization-python32.html<a class="reference external image-reference" href="https://twitter.com/LuppiChan/status/1240346448606171136"><img alt="#CoronaMaison by Luppi" src="https://vstinner.github.io/images/coronamaison_luppi.jpg" /></a>
<p>At exit, the Python finalization calls Python objects finalizers (the
<tt class="docutils literal">__del__()</tt> method) and deallocates memory. The daemon threads are a special
kind of threads which continue to run during and after the Python finalization.
They are causing race conditions and tricky bugs in the Python finalization.</p>
<p>This article covers bugs …</p><a class="reference external image-reference" href="https://twitter.com/LuppiChan/status/1240346448606171136"><img alt="#CoronaMaison by Luppi" src="https://vstinner.github.io/images/coronamaison_luppi.jpg" /></a>
<p>At exit, the Python finalization calls Python objects finalizers (the
<tt class="docutils literal">__del__()</tt> method) and deallocates memory. The daemon threads are a special
kind of threads which continue to run during and after the Python finalization.
They are causing race conditions and tricky bugs in the Python finalization.</p>
<p>This article covers bugs fixed in the Python finalization in Python 3.2 and
Python 3.3 (2009 to 2011), and a backport in Python 2.7.8 (2014).</p>
<p>Drawing: <a class="reference external" href="https://twitter.com/LuppiChan/status/1240346448606171136">#CoronaMaison by Luppi</a>.</p>
<div class="section" id="daemon-threads">
<h2>Daemon threads</h2>
<p>Python has a special kind of thread: "daemon" threads. The difference with
regular threads is that Python doesn't wait until daemon threads complete at
exit, whereas it waits until all regular ("non-daemon") threads complete.
Example:</p>
<pre class="literal-block">
import threading, time
thread = threading.Thread(target=time.sleep, args=(5.0,), daemon=False)
thread.start()
</pre>
<p>This Python program spawns a regular thread which sleeps for 5 seconds. Python
takes 5 seconds to exit:</p>
<pre class="literal-block">
$ time python3 sleep.py
real 0m5,047s
</pre>
<p>If <tt class="docutils literal">daemon=False</tt> is replaced with <tt class="docutils literal">daemon=True</tt> to spawn a daemon thread
instead, Python exits immediately (57 ms):</p>
<pre class="literal-block">
$ time python3 sleep.py
real 0m0,057s
</pre>
<p>Note: The <tt class="docutils literal">Thread.join()</tt> method can be called explicitly to wait until a
daemon thread completes.</p>
</div>
<div class="section" id="don-t-destroy-the-gil-at-exit">
<h2>Don't destroy the GIL at exit</h2>
<p>In November 2009, <strong>Antoine Pitrou</strong> implemented a new GIL (Global Interpreter
Lock) in Python 3.2: <a class="reference external" href="https://github.com/python/cpython/commit/074e5ed974be65fbcfe75a4c0529dbc53f13446f">commit 074e5ed9</a>.</p>
<p>In September 2010, he found a crash with daemon threads while stressing
<tt class="docutils literal">test_threading</tt>: <a class="reference external" href="https://bugs.python.org/issue9901">bpo-9901: GIL destruction can fail</a>. <tt class="docutils literal">test_finalize_with_trace()</tt> failed
with:</p>
<pre class="literal-block">
Fatal Python error: pthread_mutex_destroy(gil_mutex) failed
</pre>
<p>He pushed a fix for this crash in Python 3.2, <a class="reference external" href="https://github.com/python/cpython/commit/b0b384b7c0333bf1183cd6f90c0a3f9edaadd6b9">commit b0b384b7</a>:</p>
<pre class="literal-block">
Issue #9901: Destroying the GIL in Py_Finalize() can fail if some other
threads are still running. Instead, reinitialize the GIL on a second
call to Py_Initialize().
</pre>
<p>The Python GIL internally uses a lock. If the lock is destroyed while a daemon
thread is waiting for it, the thread can crash. The fix is to <strong>no longer
destroy the GIL at exit</strong>.</p>
</div>
<div class="section" id="exit-the-thread-in-pyeval-restorethread">
<h2>Exit the thread in PyEval_RestoreThread()</h2>
<p>The Python finalization clears and deallocates the "Python thread state" of all
threads (in <tt class="docutils literal">PyInterpreterState_Delete()</tt>) which calls Python object
finalizers of these threads. Calling a finalizer can drop the GIL to call a
system call. For example, closing a file drops the GIL. When the GIL is
dropped, a daemon thread is awaken to take the GIL. Since the Python thread
state was just deallocated, the daemon thread crash.</p>
<p>This bug is a race condition. It depends on which order threads are executed,
on which order objects are finalized, on which order memory is deallocated,
etc.</p>
<p>The crash was first reported in April 2005: <a class="reference external" href="https://bugs.python.org/issue1193099">bpo-1193099: Embedded python thread
crashes</a>. In January 2008, <strong>Gregory P.
Smith</strong> reported <a class="reference external" href="https://bugs.python.org/issue1856#msg60014">bpo-1856: shutdown (exit) can hang or segfault with daemon
threads running</a>. He wrote a
short Python program reproducing the bug: spawn 40 daemon threads which do some
I/O operations and sleep randomly between 0 ms and 5 ms in a loop.</p>
<p><strong>Adam Olsen</strong> <a class="reference external" href="https://bugs.python.org/issue1856#msg60059">proposed a solution</a> (with a patch):</p>
<blockquote>
I think <strong>non-main threads should kill themselves off</strong> if they grab the
interpreter lock and the interpreter is tearing down. They're about to get
killed off anyway, when the process exits.</blockquote>
<p>In May 2011, <strong>Antoine Pitrou</strong> pushed a fix to Python 3.3 (6 years after the
first bug report) which implements this solution, <a class="reference external" href="https://github.com/python/cpython/commit/0d5e52d3469a310001afe50689f77ddba6d554d1">commit 0d5e52d3</a>:</p>
<pre class="literal-block">
Issue #1856: Avoid crashes and lockups when daemon threads run while the
interpreter is shutting down; instead, these threads are now killed when
they try to take the GIL.
</pre>
</div>
<div class="section" id="pyeval-restorethread-fix-explanation">
<h2>PyEval_RestoreThread() fix explanation</h2>
<p>The fix adds a new <tt class="docutils literal">_Py_Finalizing</tt> variable which is set by
<tt class="docutils literal">Py_Finalize()</tt> to the (Python thread state of the) thread which runs the
finalization.</p>
<p>Simplified patch of the <tt class="docutils literal">PyEval_RestoreThread()</tt> fix:</p>
<pre class="literal-block">
@@ -440,6 +440,12 @@ PyEval_RestoreThread()
take_gil(tstate);
+ if (_Py_Finalizing && tstate != _Py_Finalizing) {
+ drop_gil(tstate);
+ PyThread_exit_thread();
+ }
</pre>
<p>If Python is finalizing (<tt class="docutils literal">_Py_Finalizing</tt> is not NULL) and
<tt class="docutils literal">PyEval_RestoreThread()</tt> is called by a thread which is not thread running
the finalization, the thread exits immediately (call
<tt class="docutils literal">PyThread_exit_thread()</tt>).</p>
<p><tt class="docutils literal">PyEval_RestoreThread()</tt> is called when a thread takes the GIL. Typical
example of code which drops the GIL to call a system call (close a file
descriptor, <tt class="docutils literal">io.FileIO()</tt> finalizer) and then takes again the GIL:</p>
<pre class="literal-block">
Py_BEGIN_ALLOW_THREADS
close(fd);
Py_END_ALLOW_THREADS
</pre>
<p>The <tt class="docutils literal">Py_BEGIN_ALLOW_THREADS</tt> macro calls <tt class="docutils literal">PyEval_SaveThread()</tt> to drop the
GIL, and the <tt class="docutils literal">Py_END_ALLOW_THREADS</tt> macro calls <tt class="docutils literal">PyEval_RestoreThread()</tt> to
take the GIL. Pseudo-code:</p>
<pre class="literal-block">
PyEval_SaveThread(); // drop the GIL
close(fd);
PyEval_RestoreThread(); // take the GIL
</pre>
<p>With Antoine's fix, if Python is finalizing, a thread now exits immediately
when calling <tt class="docutils literal">PyEval_RestoreThread()</tt>.</p>
</div>
<div class="section" id="revert-take-gil-backport-to-2-7">
<h2>Revert take_gil() backport to 2.7</h2>
<p>In June 2014, <strong>Benjamin Peterson</strong> (Python 2.7 release manager) backported
Antoine's change to Python 2.7: fix included in 2.7.8.</p>
<p>Problem: the Ceph project <a class="reference external" href="https://tracker.ceph.com/issues/8797">started to crash with Python 2.7.8</a>.</p>
<p>In November 2014, the change was reverted in Python 2.7.9: see
<a class="reference external" href="https://bugs.python.org/issue21963">bpo-21963 discussion</a> for the rationale.</p>
<p>In 2014, I already wrote:</p>
<blockquote>
Anyway, <strong>daemon threads are evil</strong> :-( Expecting them to exit cleanly
automatically is not good. Last time I tried to improve code to cleanup
Python at exit in Python 3.4, I also had a regression (just before the
release of Python 3.4.0): see the <a class="reference external" href="https://bugs.python.org/issue21788">issue #21788</a>.</blockquote>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>Daemon threads caused crashes in the Python finalization, first noticed in
2005.</p>
<p>Python 3.2 (released in February 2011) got a new GIL and also a bugfix for
daemon thread. Python 3.3 (released in September 2012) also got a bugfix for
daemon threads. The Python finalization became more reliable.</p>
<p>Changing Python finalization is risky. A backport of a bugfix into Python 2.7.8
caused a regression which required to revert the bugfix in Python 2.7.9.</p>
</div>
Python 3.7 Development Mode2020-01-16T12:00:00+01:002020-01-16T12:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-01-16:/python37-dev-mode.html<a class="reference external image-reference" href="https://twitter.com/guinoir/status/1217146968029331456"><img alt="Ready to race" src="https://vstinner.github.io/images/ready_to_race.jpg" /></a>
<p>This article describes the discussion on the design of the <a class="reference external" href="https://docs.python.org/dev/using/cmdline.html#id5">development mode
(-X dev)</a> that I <strong>added
to Python 3.7</strong> and how it has been implemented.</p>
<p>The development mode enables runtime checks which are too expensive to be
enabled by default. It can be enabled by <tt class="docutils literal">python3 <span class="pre">-X</span> dev …</tt></p><a class="reference external image-reference" href="https://twitter.com/guinoir/status/1217146968029331456"><img alt="Ready to race" src="https://vstinner.github.io/images/ready_to_race.jpg" /></a>
<p>This article describes the discussion on the design of the <a class="reference external" href="https://docs.python.org/dev/using/cmdline.html#id5">development mode
(-X dev)</a> that I <strong>added
to Python 3.7</strong> and how it has been implemented.</p>
<p>The development mode enables runtime checks which are too expensive to be
enabled by default. It can be enabled by <tt class="docutils literal">python3 <span class="pre">-X</span> dev</tt> command line option
or by <tt class="docutils literal">PYTHONDEVMODE=1</tt> environment variable. It helps developers to spot
bugs in their code and helps them to be prepared for future Python changes.</p>
<p>Drawing: <em>Ready to race, by Guillaume Singelin.</em></p>
<div class="section" id="email-sent-to-python-ideas">
<h2>Email sent to python-ideas</h2>
<p>In March 2016, I proposed <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2016-March/039314.html">Add a developer mode to Python: -X dev command line
option</a> on
the python-ideas list:</p>
<blockquote>
<p>When I develop on CPython, I'm always building Python in debug mode
using <tt class="docutils literal">./configure <span class="pre">--with-pydebug</span></tt>. This mode enables a <strong>lot</strong> of extra
checks which helps me to detect bugs earlier. The debug mode makes Python
much slower and so is not enabled by default.</p>
<p>I propose to add a "development mode" to Python, to get a few checks
to detect bugs earlier: a new <tt class="docutils literal"><span class="pre">-X</span> dev</tt> command line option. Example:</p>
<pre class="literal-block">
python3.6 -X dev script.py
</pre>
<p>I propose to enable:</p>
<ul class="simple">
<li>Show <tt class="docutils literal">DeprecationWarning</tt> and <tt class="docutils literal">ResourceWarning warnings</tt>: <tt class="docutils literal">python <span class="pre">-Wd</span></tt></li>
<li>Show <tt class="docutils literal">BytesWarning</tt> warning: <tt class="docutils literal">python <span class="pre">-b</span></tt></li>
<li>Enable Python assertions (<tt class="docutils literal">assert</tt>) and set <tt class="docutils literal">__debug__</tt> to True:
remove (or just ignore) <tt class="docutils literal"><span class="pre">-O</span></tt> or <tt class="docutils literal"><span class="pre">-OO</span></tt> command line arguments</li>
<li>faulthandler to get a Python traceback on segfault and fatal errors:
<tt class="docutils literal">python <span class="pre">-X</span> faulthandler</tt></li>
<li>Debug hooks on Python memory allocators: <tt class="docutils literal">PYTHONMALLOC=debug</tt></li>
</ul>
</blockquote>
<p>I wrote an implementation of this development mode using <tt class="docutils literal">exec()</tt>. <strong>Ronald
Oussoren</strong> <a class="reference external" href="https://bugs.python.org/issue26670#msg262659">commented my patch</a>:</p>
<blockquote>
Why does this patch execv() the interpreter to set options? I'd expect it
to be possible to get the same result by updating the argument parsing code
in Py_Main.</blockquote>
<p>More on that later :-) <strong>Marc-Andre Lemburg</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2016-March/039325.html">didn't buy the idea</a>:</p>
<blockquote>
<strong>I'm not sure whether this would make things easier for the
majority of developers</strong>, e.g. someone not writing C extensions
would likely not be interested in debugging memory allocations
or segfaults, someone spending more time on numerics wouldn't
bother with bytes warnings, etc.</blockquote>
<p>Opinion shared by <strong>Ethan Furman</strong>, so I gave up at this point, closed my issue
and my PR.</p>
</div>
<div class="section" id="async-keyword-deprecationwarning-and-pep-565">
<h2>async keyword, DeprecationWarning and PEP 565</h2>
<p>At November 1, 2017, Ned Deily, the Python 3.7 release release,
sent an email to python-dev: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150061.html">Reminder: 12 weeks to 3.7 feature code cutoff</a>.</p>
<p>A discussion started on <tt class="docutils literal">async</tt> and <tt class="docutils literal">await</tt> becoming keywords and how this
incompatible change was conducted. Read LWN article <a class="reference external" href="https://lwn.net/Articles/740804/">Who should see Python
deprecation warnings?</a> (December 2017) by
Jonathan Corbet for the whole story:</p>
<blockquote>
In early November, one sub-thread of a big discussion on preparing for the
Python 3.7 release focused on the await and async identifiers. They will
become keywords in 3.7, meaning that any code using those names for any
other purpose will break. Nick Coghlan observed that <strong>Python 3.6 does not
warn</strong> about the use of those names, calling it "a fairly major
oversight/bug". <strong>In truth, though, Python 3.6 does emit warnings in that
case — but users rarely see them.</strong></blockquote>
<p>The question is who should see <tt class="docutils literal">DeprecationWarning</tt>. Long time ago, it has
been decided to hide them by default to not bother users. Users are not able to
fix them, and so it is only a source of annoyance.</p>
<p>If the warning is displayed by default, developers can be annoyed by warnings
coming from code that they cannot easily fix, like third-party dependencies.</p>
<p>At November 12, 2017, Nick Coghlan proposed <a class="reference external" href="https://www.python.org/dev/peps/pep-0565/">PEP 565: Show DeprecationWarning
in __main__</a> as a compromise:</p>
<blockquote>
This change will mean that code entered at the interactive prompt and code
in single file scripts will revert to reporting these warnings by default,
while they will <strong>continue to be silenced by default for packaged code</strong>
distributed as part of an importable module.</blockquote>
<p>The PEP has been approved and implemented in Python 3.7. For example,
<tt class="docutils literal">DeprecationWarning</tt> is now displayed by default when running a script and in
the REPL:</p>
<pre class="literal-block">
$ cat example.py
import imp
$ python3 example.py
example.py:1: DeprecationWarning: the imp module is deprecated ...
import imp
$ python3
Python 3.7.6 (default, Dec 19 2019, 22:52:49)
>>> import imp
__main__:1: DeprecationWarning: the imp module is deprecated ...
</pre>
</div>
<div class="section" id="development-mode-proposed-on-python-dev">
<h2>Development mode proposed on python-dev</h2>
<p>I was not convinced that only displaying warnings in the <tt class="docutils literal">__main__</tt> module is
enough to help developers to fix issues in their code. A project is way larger
than just this module.</p>
<p>I came back with my idea, now on the python-dev list: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150514.html">Add a developer mode to
Python: -X dev command line option</a>.</p>
<p>This mode shows <tt class="docutils literal">DeprecationWarning</tt> and <tt class="docutils literal">ResourceWarning</tt> is all modules,
not only in the <tt class="docutils literal">__main__</tt> module. In my opinion, having an opt-in mode for
developers is the best option. Python should not spam users with warnings which
are targeting developers.</p>
<p><strong>In the context of Python 3.7 incompatible changes, the feedback was way better
this time.</strong></p>
</div>
<div class="section" id="issues-with-the-python-initialization">
<h2>Issues with the Python initialization</h2>
<p>When I proposed the idea, my plan was to call exec() to replace the current
process with a new process. But when I tried to implement it, it was more
tricky than expected. My first blocker issue was to remove <tt class="docutils literal"><span class="pre">-O</span></tt> option from
the command line. I hate having to parse the command line: it is very fragile
and it's too easy to make mistake.</p>
<p>So I tried to write a clean implementation: configure Python properly in
"development mode". The first blocker issue was to implement
<tt class="docutils literal">PYTHONMALLOC=debug</tt>. The C code to read and apply the Python configuration
used Python objects before the Python initialization even started. For example,
<tt class="docutils literal"><span class="pre">-W</span></tt> and <tt class="docutils literal"><span class="pre">-X</span></tt> options were stored as Python lists. It means that the Python
memory allocator was used before Python would be able to parse <tt class="docutils literal">PYTHONMALLOC</tt>
environment variable.</p>
<p>Moreover, the Python configuration is quite complex. Many options are
inter-dependent. For example, the <tt class="docutils literal"><span class="pre">-E</span></tt> command line option ignores
environment variables with a name staring with <tt class="docutils literal">PYTHON</tt>: like
<tt class="docutils literal">PYTHONMALLOC</tt>! Python has to parse the command line before being able to
handle <tt class="docutils literal">PYTHONMALLOC</tt>.</p>
<p>Python lists depends on the memory allocator which depends on <tt class="docutils literal">PYTHONMALLOC</tt>
environment variable which depends on the <tt class="docutils literal"><span class="pre">-E</span></tt> command line option which
depends on Python lists...</p>
<p>In short, <strong>it wasn't possible to write a clean implementation of the
development mode without refactoring the Python initialization code</strong>.</p>
</div>
<div class="section" id="refactoring-main-c">
<h2>Refactoring main.c</h2>
<p>For all these reasons, I refactored Python initialization code in <tt class="docutils literal">main.c</tt>,
with <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a> with two <strong>large</strong>
changes:</p>
<ul class="simple">
<li><a class="reference external" href="https://github.com/python/cpython/commit/f7e5b56c37eb859e225e886c79c5d742c567ee95">commit f7e5b56c</a>:
bpo-32030: Split Py_Main() into subfunctions</li>
<li><a class="reference external" href="https://github.com/python/cpython/commit/a7368ac6360246b1ef7f8f152963c2362d272183">commit a7368ac6</a>:
bpo-32030: Enhance Py_Main()</li>
</ul>
</div>
<div class="section" id="add-x-dev-option">
<h2>Add -X dev option</h2>
<p>Since I got enough approval by my peers (core developers), I pushed <a class="reference external" href="https://github.com/python/cpython/commit/ccb0442a338066bf40fe417455e5a374e5238afb">commit
ccb0442a</a>
of <a class="reference external" href="https://bugs.python.org/issue32043">bpo-32043</a> to add the <tt class="docutils literal"><span class="pre">-X</span> dev</tt>
command line option. Thanks to the previous refactoring, the implementation is
less intrusive.</p>
<p>Effects of the development mode:</p>
<ul class="simple">
<li>Add <tt class="docutils literal">default</tt> warnings option. For example, display <tt class="docutils literal">DeprecationWarning</tt>
and <tt class="docutils literal">ResourceWarning</tt> warnings.</li>
<li>Install <a class="reference external" href="https://docs.python.org/dev/c-api/memory.html#c.PyMem_SetupDebugHooks">debug hooks on memory allocators</a> as if
<tt class="docutils literal">PYTHONMALLOC</tt> is set to <tt class="docutils literal">debug</tt>.</li>
<li>Enable my <a class="reference external" href="https://docs.python.org/dev/library/faulthandler.html">faulthandler</a> module to dump the
Python traceback on a crash.</li>
</ul>
</div>
<div class="section" id="add-pythondevmode-environment-variable">
<h2>Add PYTHONDEVMODE environment variable</h2>
<p>In a PR review, Antoine Pitrou <a class="reference external" href="https://github.com/python/cpython/pull/4478#pullrequestreview-77874230">proposed</a>:</p>
<blockquote>
Speaking of which, perhaps it would be nice to set those environment
variables so that child processes launched using subprocess inherit them?</blockquote>
<p>I created <a class="reference external" href="https://bugs.python.org/issue32101">bpo-32101</a> to add
<tt class="docutils literal">PYTHONDEVMODE</tt> environment variable: <a class="reference external" href="https://github.com/python/cpython/commit/5e3806f8cfd84722fc55d4299dc018ad9b0f8401">commit 5e3806f8</a>.</p>
<p>Setting <tt class="docutils literal">PYTHONDEVMODE=1</tt> allows to also enable the development mode in
Python child processes, without having to touch their command line.</p>
</div>
<div class="section" id="enable-asyncio-debug-mode">
<h2>Enable asyncio debug mode</h2>
<p>I created <a class="reference external" href="https://bugs.python.org/issue32047">bpo-32047: asyncio: enable debug mode when -X dev is used</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150572.html">asked in the -X dev thread on
python-dev</a>:</p>
<blockquote>
What do you think? Is it ok to include asyncio in the global "developer mode"?</blockquote>
<p>Antoine Pitrou didn't like the idea because asyncio debug mode was "quite
expensive", but Yury Selivanov (one of the asyncio maintainers) and Barry
Warsaw liked the idea, so I merged my PR: <a class="reference external" href="https://github.com/python/cpython/commit/44862df2eeec62adea20672b0fe2a5d3e160569e">commit 44862df2</a>.</p>
<p>Antoine Pitrou created <a class="reference external" href="https://bugs.python.org/issue31970">bpo-31970: asyncio debug mode is very slow</a>. Hopefully, he found a way to make
asyncio debug mode more efficient by truncating tracebacks to 10 frames
(<a class="reference external" href="https://github.com/python/cpython/commit/921e9432a1461bbf312c9c6dcc2b916be6c05fa0">commit 921e9432</a>).</p>
</div>
<div class="section" id="fix-warnings-filters">
<h2>Fix warnings filters</h2>
<p>While checking warnings filters, I noticed that the development mode was hiding
some ResourceWarning warnings. I completed the documentation and fixed warnings
filters in <a class="reference external" href="https://bugs.python.org/issue32089">bpo-32089</a>.</p>
</div>
<div class="section" id="python-3-8-logs-close-exception">
<h2>Python 3.8 logs close() exception</h2>
<p>By default, Python ignores silently <tt class="docutils literal">EBADF</tt> error (bad file descriptor) which
can lead to a <strong>severe crash</strong> , <a class="reference external" href="https://bugs.python.org/issue18748">bpo-18748</a> (simplified gdb traceback):</p>
<pre class="literal-block">
Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7b0eb70 (LWP 17152)]
0xb7fe1424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7fe1424 in __kernel_vsyscall ()
#1 0xb7e4e941 in *__GI_raise (sig=6)
#2 0xb7e51d72 in *__GI_abort ()
#3 0xb7e8ae15 in __libc_message (do_abort=1, fmt=0xb7f606f5 "%s")
#4 0xb7e8af44 in *__GI___libc_fatal (message=0xb7fc75ec
"libgcc_s.so.1 must be installed for pthread_cancel to work\n")
#5 0xb7fc4ffa in pthread_cancel_init ()
#6 0xb7fc509d in _Unwind_ForcedUnwind (...)
#7 0xb7fc2b98 in *__GI___pthread_unwind (buf=<optimized out>)
#8 0xb7fbcce0 in __do_cancel () at pthreadP.h:265
#9 __pthread_exit (value=0x0) at pthread_exit.c:30
...
</pre>
<p>Notice the <tt class="docutils literal">"libgcc_s.so.1 must be installed for pthread_cancel to work"</tt> error
message: the glibc loads dynamically <tt class="docutils literal">libgcc_s.so.1</tt> library when a thread
completes, but another thread closed its file descriptor!</p>
<p>The worst is that <strong>the crash is not deterministic</strong>: it's a <strong>race condition</strong>
which requires to try many times, even with an example designed to trigger the
crash!</p>
<p>Since the <tt class="docutils literal">EBADF</tt> error is silently ignored, it is hard to notice or to debug
such issue. I modified the development mode in Python 3.8 to <strong>log close()
exceptions in io.IOBase destructor</strong>.</p>
<p>It was not accepted to always log the <tt class="docutils literal">close()</tt> exception. So having an
opt-in development mode is a good practical compromise!</p>
</div>
<div class="section" id="python-3-9-checks-encoding-and-errors">
<h2>Python 3.9 checks encoding and errors</h2>
<p>In June 2019, my colleague <strong>Miro Hrončok</strong> reported <a class="reference external" href="https://bugs.python.org/issue37388">bpo-37388</a>:</p>
<blockquote>
<p>I was just bit by specifying an nonexisitng error handler for
bytes.decode() without noticing.</p>
<p>Consider this code:</p>
<pre class="literal-block">
>>> 'a'.encode('cp1250').decode('utf-8', errors='Boom, Shaka Laka, Boom!')
'a'
</pre>
</blockquote>
<p>I modified the development mode in Python 3.9, to also check <em>encoding</em> and
<em>errors</em> arguments on string encoding and decoding operations, like
<tt class="docutils literal">bytes.decode()</tt> or <tt class="docutils literal">str.encode()</tt>.</p>
<p>By default, for best performance, the <em>errors</em> argument is only checked at the
first encoding/decoding error and the <em>encoding</em> argument is sometimes ignored
for empty strings.</p>
<p>Having an opt-in development mode allows to enable additional debug checks at
runtime, without having to care too much about the performance overhead.</p>
<p>Note: I love the choice of the example, "Boom, Shaka Laka, Boom!"
from the game Gruntz :-D</p>
</div>
<div class="section" id="development-mode-example">
<h2>Development Mode Example</h2>
<p>Even in the <tt class="docutils literal">__main__</tt> module with PEP 565, <tt class="docutils literal">ResourceWarning</tt> is still not
displayed by default (PEP 565 only shows <tt class="docutils literal">DeprecationWarning</tt>):</p>
<pre class="literal-block">
$ python3 -c 'print(len(open("README.rst").readlines()))'
39
</pre>
<p>The development mode shows the warning:</p>
<pre class="literal-block">
$ python3 -X dev -c 'print(len(open("README.rst").readlines()))'
-c:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='README.rst' mode='r' encoding='UTF-8'>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
39
</pre>
<p>Not closing a resource explicitly can leave a resource open for way longer than
expected. It can cause severe issues at Python exit. It is bad in CPython, but
it is even worse in PyPy. <strong>Closing resources explicitly makes an application
more deterministic and more reliable.</strong></p>
<p>If one of the development mode effect causes an issue, it is still possible to
override most options. For example,
<tt class="docutils literal">PYTHONMALLOC=default python3 <span class="pre">-X</span> dev ...</tt> command enables the development
mode without installing debug hooks on memory allocators.</p>
</div>
Pass the Python thread state explicitly2020-01-08T15:00:00+01:002020-01-08T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-01-08:/cpython-pass-tstate.html<img alt="Python C API" src="https://vstinner.github.io/images/capi.jpg" />
<div class="section" id="keeping-python-competitive">
<h2>Keeping Python competitive</h2>
<p>I'm trying to find ways to make Python more efficient for many years, see for
example my discussion at the Language Summit during Pycon US 2017: <a class="reference external" href="https://lwn.net/Articles/723949/">Keeping
Python competitive</a> (LWN article); <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2017-PyconUS/summit.pdf">slides</a>.
At EuroPython 2019 (Basel), I gave the keynote "Python Performance: Past,
Present and Future": <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2019-EuroPython/python_performance.pdf">slides …</a></p></div><img alt="Python C API" src="https://vstinner.github.io/images/capi.jpg" />
<div class="section" id="keeping-python-competitive">
<h2>Keeping Python competitive</h2>
<p>I'm trying to find ways to make Python more efficient for many years, see for
example my discussion at the Language Summit during Pycon US 2017: <a class="reference external" href="https://lwn.net/Articles/723949/">Keeping
Python competitive</a> (LWN article); <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2017-PyconUS/summit.pdf">slides</a>.
At EuroPython 2019 (Basel), I gave the keynote "Python Performance: Past,
Present and Future": <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2019-EuroPython/python_performance.pdf">slides</a>
and <a class="reference external" href="https://www.youtube.com/watch?v=T6vC_LOHBJ4&feature=youtu.be&t=1875">video</a>. I
gave my vision on the Python performance and listed 3 projects to speedup
Python that I consider as realistic:</p>
<ul class="simple">
<li>subinterpreters: see Eric Snow's <a class="reference external" href="https://github.com/ericsnowcurrently/multi-core-python/">multi-core-python</a> project</li>
<li>better C API: see <a class="reference external" href="https://github.com/pyhandle/hpy">HPy (new C API)</a>
and <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi.readthedocs.io</a></li>
<li>tracing garbage collector for CPython</li>
</ul>
<p>This article is about <strong>subinterpreters</strong>.</p>
</div>
<div class="section" id="subinterpreters">
<h2>Subinterpreters</h2>
<p>Eric Snow is working on subinterpreters since 2015, see his first blog post
published in September 2016: <a class="reference external" href="http://ericsnowcurrently.blogspot.com/2016/09/solving-mutli-core-python.html">Solving Multi-Core Python</a>.
See Eric Snow's <a class="reference external" href="https://github.com/ericsnowcurrently/multi-core-python/wiki">multi-core-python project wiki</a> for the whole
history.</p>
<p>In September 2017, he wrote a concrete proposal: <a class="reference external" href="https://www.python.org/dev/peps/pep-0554/">PEP 554: Multiple
Interpreters in the Stdlib</a>.</p>
<p>Eric mentions the <a class="reference external" href="https://www.python.org/dev/peps/pep-0432/">PEP 432: Simplifying the CPython startup sequence</a> as one blocker issue. I fixed
this issue (at least for the subinterpreters case) with my <a class="reference external" href="https://www.python.org/dev/peps/pep-0587/">PEP 587: Python
Initialization Configuration</a> that
I implemented in Python 3.8.</p>
<p>Sadly, implementing subinterpreters in the 30 years old CPython project is hard
since a lot of code has to be updated. CPython is made of not less than <strong>603K
lines of C code</strong> (and 815K lines of Python code)!</p>
<p>In May 2018, at CPython sprint during Pycon US, I discussed subinterpreters
with Eric Snow and Nick Coghlan. I draw an overview of Python internals and the
different "states" on a whiteboard:</p>
<img alt="Python states" src="https://vstinner.github.io/images/subinterpreters2.jpg" />
<p>Python and Python subinterpreter lifecycles (creation and finalization):</p>
<img alt="Python subinterpreter lifecycle" src="https://vstinner.github.io/images/subinterpreters1.jpg" />
<p>As a follow-up of this meeting, I wrote down the current state and what should
be done: <a class="reference external" href="https://pythoncapi.readthedocs.io/runtime.html">Reorganize Python “runtime”</a>.</p>
</div>
<div class="section" id="getting-the-current-python-thread-state">
<h2>Getting the current Python thread state</h2>
<p>In the current master branch of Python, getting the current Python thread state
is done using these two macros:</p>
<pre class="literal-block">
#define _PyRuntimeState_GetThreadState(runtime) \
((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))
#define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)
</pre>
<p>These macros depend on the global <tt class="docutils literal">_PyRuntime</tt> variable: instance of the
<tt class="docutils literal">_PyRuntimeState</tt> structure. There is exactly one instance of
<tt class="docutils literal">_PyRuntimeState</tt>: data shared by all interpreters on purpose (more info
about <tt class="docutils literal">_PyRuntimeState</tt> below).</p>
<p><tt class="docutils literal">_Py_atomic_load_relaxed()</tt> uses an atomic operation which may become an
performance issue if Python is modified to get the Python thread state in more
places. I tried to check if it uses a slow atomic read instruction, but it
seems like only a write uses an explicit memory fence operation: read seems to
be "free" (it's a regular efficient <tt class="docutils literal">MOV</tt> instruction). I only checked the
x86-64 machine code, it may be different on other architectures.</p>
</div>
<div class="section" id="gil-state">
<h2>GIL state</h2>
<p>Currently, the <tt class="docutils literal">_PyRuntimeState</tt> structure has a <tt class="docutils literal">gilstate</tt> field which is
shared between all subinterpreters. The long term goal of the PEP 554
(subinterpreters) is to <strong>have one GIL per subinterpeters</strong> to <strong>execute
multiple interpreters in parallel</strong>. Currently, only one interpreter can be
executed at the same time: there is no parallelism, except if a thread releases
the GIL which is not the common case.</p>
<p>It's tracked by these two issues:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue10915">Make the PyGILState API compatible with multiple interpreters</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue15751">Support subinterpreters in the GIL state API</a></li>
</ul>
<p>I expect that fixing this issue may require to add a lock somewhere which <strong>can
hurt performances</strong>, depending on how the GIL state is accessed.</p>
</div>
<div class="section" id="passing-a-state-to-internal-function-calls">
<h2>Passing a state to internal function calls</h2>
<p>To avoid any risk of performance penality with incoming Python internal changes
for subinterpreters, but also to make things more explicit, I proposed to
<strong>pass explicitly "a state" to internal C function calls</strong>.</p>
<p>First, it wasn't obvious which "state" should be passed: <tt class="docutils literal">_PyRuntimeState</tt>,
<tt class="docutils literal">PyThreadState</tt>, a structure containing both, or something else?</p>
<p>Moreover, it was unclear how to get the runtime from <tt class="docutils literal">PyThreadState</tt>, and how
to get <tt class="docutils literal">PyThreadState</tt> from runtime?</p>
<p>I started to <strong>pass runtime to some functions</strong> (<tt class="docutils literal">_PyRuntimeState</tt>): <a class="reference external" href="https://bugs.python.org/issue36710">Pass
_PyRuntimeState as an argument rather than using the _PyRuntime global variable</a>.</p>
<p>Then I pushed more changes to <strong>pass tstate to some other functions</strong>
(<tt class="docutils literal">PyThreadState</tt>): <a class="reference external" href="https://bugs.python.org/issue38644">Pass explicitly tstate to function calls</a>.</p>
<p>I added <tt class="docutils literal">PyInterpreterState.runtime</tt> so getting <tt class="docutils literal">_PyRuntimeState</tt> from
<tt class="docutils literal">PyThreadState</tt> is now done using: <tt class="docutils literal"><span class="pre">tstate->interp->runtime</span></tt>. It's no
longer needed to pass <tt class="docutils literal">runtime</tt> <strong>and</strong> <tt class="docutils literal">tstate</tt> to internal functions:
<tt class="docutils literal">tstate</tt> is enough.</p>
<p>Slowly, I modified the internals to only pass <tt class="docutils literal">tstate</tt> to internal functions:
<strong>tstate should become the root object to access all Python states</strong>.</p>
<p>I ended with a thread on the python-dev mailing list to summarize this work:
<a class="reference external" href="https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/#Q4IPXMQIM5YRLZLHADUGSUT4ZLXQ6MYY">Pass the Python thread state to internal C functions</a>.
The feedback was quite positive, most core developers agreed that passing
explicitly tstate is a good practice and the work should be continued.</p>
</div>
<div class="section" id="pyruntimestate-and-pyinterpreterstate">
<h2>_PyRuntimeState and PyInterpreterState</h2>
<p>Currently, some <tt class="docutils literal">_PyRuntimeState</tt> fields are shared by all interperters,
whereas they should be moved into <tt class="docutils literal">PyInterpreterState</tt>: it's still a work in
progress.</p>
<p>For example, I continued the work started by Eric Snow to move the garbage
collector state from <tt class="docutils literal">_PyRuntimeState</tt> to <tt class="docutils literal">PyInterpreterState</tt>: <a class="reference external" href="https://bugs.python.org/issue36854">GC
operates out of global runtime state.</a>.</p>
<p>As explained above, another example is <tt class="docutils literal">gilstate</tt> that should also be moved
to <tt class="docutils literal">PyInterpreterState</tt>, but that's a complex change that should be well
prepared to not break anything.</p>
</div>
<div class="section" id="more-subinterpreter-work">
<h2>More subinterpreter work</h2>
<p>Implementing subinterpreters also requires to cleanup various parts of Python
internals.</p>
<p>For example, I modified Python so Py_NewInterpreter() and Py_EndInterpreter()
(create and finalize a subinterpreter) share more code with Py_Initialize()
and Py_Finalize() (create and finalize the <strong>main</strong> interpreter):
<a class="reference external" href="https://bugs.python.org/issue38858">new_interpreter() should reuse more Py_InitializeFromConfig() code</a>.</p>
<p>They are still many issues to be fixed: <strong>it's moving slowly but steadily!</strong></p>
</div>
Graphics bugs in Firefox and GNOME2019-10-10T17:00:00+02:002019-10-10T17:00:00+02:00Victor Stinnertag:vstinner.github.io,2019-10-10:/graphics-bugs-firefox-gnome.html<p>After explaining how to <a class="reference external" href="https://vstinner.github.io/debug-hybrid-graphics-issues-linux.html">Debug Hybrid Graphics issues on Linux</a>, here is the story of four graphics bugs
that I had in GNOME and Firefox on my Fedora 30 between May 2018 and September
2019: bugs in gnome-shell, Gtk, Firefox and mutter.</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/34298393@N06/14488759356/"><img alt="Glitch" src="https://vstinner.github.io/images/glitch.jpg" /></a>
<div class="section" id="gnome-shell-freezes">
<h2>gnome-shell freezes</h2>
<p>In May 2018, six months after …</p></div><p>After explaining how to <a class="reference external" href="https://vstinner.github.io/debug-hybrid-graphics-issues-linux.html">Debug Hybrid Graphics issues on Linux</a>, here is the story of four graphics bugs
that I had in GNOME and Firefox on my Fedora 30 between May 2018 and September
2019: bugs in gnome-shell, Gtk, Firefox and mutter.</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/34298393@N06/14488759356/"><img alt="Glitch" src="https://vstinner.github.io/images/glitch.jpg" /></a>
<div class="section" id="gnome-shell-freezes">
<h2>gnome-shell freezes</h2>
<p>In May 2018, six months after I got my Lenovo P50 laptop, gnome-shell was
"sometimes" freezing between 1 and 5 seconds. It was annoying because key
stokes created repeated keys writing "helloooooooooooooooooooooo" instead of
"hello" for example.</p>
<p>My colleagues led my to <tt class="docutils literal"><span class="pre">#fedora-desktop</span></tt> of the GIMP IRC server where I met
my colleague <strong>Jonas Ådahl</strong> (jadahl) who almost immediately identified my
issue! Extract of the IRC chat:</p>
<pre class="literal-block">
15:03 <vstinner> hello. i upgraded from F27 to F28, and it seems like I
switched from Xorg to Wayland. sometimes, the desktop hangs a few
milliseconds (less than 2 secondes)
15:03 <vstinner> bentiss told me that "libinput error: client bug: timer
event7 keyboard: offset negative (-39ms)" can occur when shell is too
slow
15:04 <vstinner> journalctl shows me frenquently the bug
https://gitlab.gnome.org/GNOME/gnome-shell/issues/1 "Object
Shell.GenericContainer (0x559e6bfddc60), has been already finalized.
Impossible to get any property from it."
15:04 <vstinner> i also get "Window manager warning: last_user_time
(3093467) is greater than comparison timestamp (3093466). This most
likely represents a buggy client sending inaccurate timestamps in
messages such as _NET_ACTIVE_WINDOW. Trying to work around..." errors
in logs (from shell)
15:05 <vstinner> bentiss: ah, i also get "libinput error: client bug: timer
event7 trackpoint: offset negative (-352ms)" errors
15:06 <vstinner> it's a recent laptop, Lenovo P50: 32 GB of RAM, 4 physical
CPUs (8 threads) Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
15:06 <vstinner> so. what can i do to debug such performance issue? may it
come from shell? what does it mean if shell is slow? can it be a GPU
issue? a javascript issue?
...
15:13 <jadahl> vstinner: whats your hardware? Do you have a hybrid gpu
system?
15:13 <jadahl> ah, yes P50
15:14 <jadahl> vstinner: there is a branch on mutter upstream that fixes
that issue. want to compile it to test?
</pre>
<p>Ten minutes after I asked my question, Jonas asked the right question: <strong>Do you
have a hybrid gpu system?</strong></p>
<p>I was able to workaround the issue by connecting my laptop to my TV using the
HDMI port:</p>
<pre class="literal-block">
15:22 < jadahl> for example, IIRC if you have a monitor connected to the
HDMI, the issue will go away since the secondary GPU is always awake
anyway
...
15:31 < vstinner> jadahl: i plugged a HDMI cable to my TV and it seems like
the issue is gone
15:31 < vstinner> jadahl: impressive
</pre>
<p>When an external monitor is used (like a TV plugged on the HDMI port), my
NVIDIA GPU is always active which works around the bug I had in gnome-shell.</p>
<p>Jonas provided me a RPM package for Fedora including his work-in-progress fix:
<a class="reference external" href="https://gitlab.gnome.org/GNOME/mutter/merge_requests/106">Upload HW cursor sprite on-demand</a>. I confirmed that
this change fixed my bug. His mutter change has been merged upstream.</p>
</div>
<div class="section" id="firefox-crash-when-selecting-text">
<h2>Firefox crash when selecting text</h2>
<p>In March 2019, Firefox with Wayland crashed on <tt class="docutils literal">wl_abort()</tt> when selecting
more than 4000 characters in a <tt class="docutils literal"><textarea></tt>. I found the bug in Gmail when
selecting the whole email text to remove it. Pressing <strong>CTRL + A</strong> or
Right-click + Select All <strong>crashed the whole Firefox process!</strong></p>
<p>I reported the bug to Firefox: <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1539773">Firefox with Wayland crash on wl_abort() when
selecting more than 4000 characters in a <textarea></a>.</p>
<p>Running gdb in Firefox caused me some troubles since it's a very large binary with
many libraries. I also read <a class="reference external" href="https://cgit.freedesktop.org/wayland/wayland-protocols/tree/unstable/text-input/text-input-unstable-v3.xml#n138">Wayland protocol specifications</a>.
I managed to analyze the bug and so I reported the bug to Gtk as well, <a class="reference external" href="https://gitlab.gnome.org/GNOME/gtk/issues/1783">On
Wayland, notify_surrounding_text() crash on wl_abort() if text is longer than
4000 bytes</a>:</p>
<blockquote>
According to gdb, <tt class="docutils literal">wl_proxy_marshal_array_constructor_versioned()</tt> calls
<tt class="docutils literal">wl_abort()</tt> because the buffer is too short. It seems like
<tt class="docutils literal">wl_buffer_put()</tt> fails with <tt class="docutils literal">E2BIG</tt>.</blockquote>
<p>Quickly, I identified that <strong>my Gtk bug has already been fixed 3 months before
by Carlos Garnacho</strong> (<a class="reference external" href="https://gitlab.gnome.org/GNOME/gtk/merge_requests/438">imwayland: Respect maximum length of 4000 Bytes on
strings being sent</a>)
and <strong>the fix is part of gtk-3.24.3</strong> ("wayland: Respect length limits in text
protocol" says "Overview of Changes in GTK+ 3.24.3").</p>
<p>I requested to upgrade Gtk in Fedora. But it was not possible since the newer
version changed the theme. I was asked to cherry-pick the fix and that's what I
did: <a class="reference external" href="https://src.fedoraproject.org/rpms/gtk3/pull-request/5">imwayland: Respect maximum length of 4000 Bytes on strings</a>.</p>
<p>My PR was merged and a new package was built. I tested it and confirmed that it
fixed the crash: <a class="reference external" href="ttps://bodhi.fedoraproject.org/updates/FEDORA-2019-d67ec97b0b">FEDORA-2019-d67ec97b0b</a>. Soon, the
package was pushed to the public Fedora package repository.</p>
<p><strong>That's the cool part about open source: if you have the skills to hack the
code, you can fix an annoying which is affecting you!</strong></p>
</div>
<div class="section" id="firefox-wayland-window-partially-or-not-updated-when-switching-between-two-tabs">
<h2>Firefox: [Wayland] Window partially or not updated when switching between two tabs</h2>
<div class="section" id="analyze-the-bug">
<h3>Analyze the bug</h3>
<p>In September 2019, after a large system upgrade (install 6 packages, upgrade
234 packages, remove 5 packages), Firefox started to not update the window
content sometimes when I switched from one tab to another. Example:</p>
<img alt="Firefox bug of window partially updated" src="https://vstinner.github.io/images/firefox_bug_1.jpg" />
<p>It took me a few hours to analyze the bug to be able to produce an useful bug
report.</p>
<p>I followed Fedora's guide <a class="reference external" href="https://fedoraproject.org/wiki/How_to_debug_Firefox_problems">How to debug Firefox problems</a> advices.</p>
<p>First, I tried to <strong>understand which GPU driver is used</strong>. I finished by
blacklisting the nouveau driver in the Linux kernel, to ensure that Firefox was
using my Intel IGP. I still reproduced the bug.</p>
<p>I <strong>disabled all Firefox extensions</strong>: bug reproduced.</p>
<p>Then I created a new Firefox profile and started Firefox in <strong>safe mode</strong>: bug
reproduced.</p>
<p>I tested the latest Firefox binary from mozilla.org (Firefox 69.0): bug
reproduced.</p>
<p>Finally, <strong>I tested Firefox Nightly</strong> from mozilla.org (Firefox 71.0a1): bug
reproduced.</p>
<p>Ok, it was enough data to produce an interesting bug report. I reported
<a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1580152">[Wayland] Window partially or not updated when switching between two tabs</a> to Firefox.</p>
</div>
<div class="section" id="identify-the-regression-using-fedora-packages">
<h3>Identify the regression using Fedora packages</h3>
<p>Then I looked at <tt class="docutils literal">/var/log/dnf.log</tt> and I tried to identify which package
update could explain the regression.</p>
<p>I downgraded <strong>gtk3</strong>-3.24.11-1.fc30.x86_64 to gtk3.x86_64 3.24.10-1.fc30: bug
reproduced.</p>
<p>I rebooted on oldest available <strong>Linux kernel</strong>, version 5.2.8-200.fc30.x86_64:
bug reproduced. I checked journalctl logs to check which Linux version I was
running whhen the bug was first seen: Linux 5.2.9-200.fc30.x86_64.</p>
<p>I don't know why, but <strong>downgrading Firefox was only my 3rd test</strong>.</p>
<p>I downgraded firefox-69.0-2.fc30.x86_64 to firefox-68.0.2-1.fc30.x86_64: the
bug is gone! Ok, so <strong>the regression comes from the Firefox package</strong>, and it
was introduced between package versions 68.0.2-1.fc30 and 69.0-2.fc30.</p>
<p>On IRC, I met my colleague <strong>Martin Stránský</strong> who package Firefox for Fedora.
He told me that he is aware of my bug and may have a fix for my bug. Great!</p>
<p>Only 9 days later, <strong>Martin Stránský</strong> fix has been merged in Firefox upstream,
released in Firefox Nightly, and a new package has been shipped in Fedora 30!
Thanks Martin for your efficiency!</p>
<p>The final Firefox change is quite large and intrusive: <a class="reference external" href="https://hg.mozilla.org/releases/mozilla-beta/rev/3281a617f22b">[Wayland] Fix rendering
glitches on wayland</a></p>
</div>
</div>
<div class="section" id="xwayland-crash-in-xwl-glamor-gbm-create-pixmap">
<h2>Xwayland crash in xwl_glamor_gbm_create_pixmap()</h2>
<p>In September 2019, while I was debugging the previous Firefox bug, I started my
IRC client hexchat. Suddently, <strong>Xwayland crashed which closed my whole Gnome
session</strong>! I was testing various GPU configurations to analyze the Firefox
bug.</p>
<p>ABRT managed to rebuild an useless traceback and identified an existing bug
report. It added my coment to <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1729200#c20">[abrt] xorg-x11-server-Xwayland:
OsLookupColor(): Segmentation fault at address 0x28</a> report.</p>
<p>At July 26, 2019 (1 month before I got the bug), <strong>Olivier Fourdan</strong> added <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1729200#c9">an
interesting comment</a>:</p>
<blockquote>
<tt class="docutils literal">glamor_get_modifiers+0x767</tt> is <tt class="docutils literal">xwl_glamor_gbm_create_pixmap()</tt> so this
is the same as <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1729925">bug 1729925</a> fixed upstream with
<a class="reference external" href="https://gitlab.freedesktop.org/xorg/xserver/merge_requests/242">xwayland: Do not free a NULL GBM bo</a>.</blockquote>
<p>So in fact, my bug was already fixed by <strong>Olivier Fourdan</strong> in Xwayland
upstream, but the fix didn't land into Fedora yet.</p>
</div>
<div class="section" id="thanks">
<h2>Thanks!</h2>
<p>I would like to thank the following developers who fixed my Fedora 30. What a
coincidence, all four are my collagues! It seems like Red Hat is investing in
the Linux desktop :-)</p>
<p><a class="reference external" href="https://blogs.gnome.org/carlosg/">Carlos Garnacho</a> (Red Hat).</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/183829480@N06/48623543091/in/pool-14662216@N23/"><img alt="Carlos Garnacho" src="https://vstinner.github.io/images/carlos_garnacho.jpg" /></a>
<p><a class="reference external" href="https://gitlab.gnome.org/jadahl">Jonas Ådahl</a> (Red Hat).</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/183829480@N06/48623189663/in/pool-14662216@N23/"><img alt="Jonas Ådahl" src="https://vstinner.github.io/images/jonas_adahl.jpg" /></a>
<p><a class="reference external" href="http://people.redhat.com/stransky/">Martin Stránský</a> (Red Hat).</p>
<a class="reference external image-reference" href="http://people.redhat.com/stransky/"><img alt="Martin Stránský" src="https://vstinner.github.io/images/mstransky.jpg" /></a>
<p><a class="reference external" href="https://en.wikipedia.org/wiki/Olivier_Fourdan">Olivier Fourdan</a> (Red Hat).</p>
<a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Olivier_Fourdan"><img alt="Olivier Fourdan" src="https://vstinner.github.io/images/olivier_fourdan.jpg" /></a>
</div>
Debug Hybrid Graphics issues on Linux2019-09-11T15:50:00+02:002019-09-11T15:50:00+02:00Victor Stinnertag:vstinner.github.io,2019-09-11:/debug-hybrid-graphics-issues-linux.html<p><a class="reference external" href="https://wiki.archlinux.org/index.php/Hybrid_graphics">Hybrid Graphics</a> is a
complex hardware and software solution to achieve longer laptop battery life:
an <strong>integrated</strong> graphics device is used by default, and a <strong>discrete</strong>
graphics device with higher graphics performances is enabled on demand.</p>
<a class="reference external image-reference" href="https://www.theregister.co.uk/2010/02/09/inside_nvidia_optimus/"><img alt="Hybrid Graphics" src="https://vstinner.github.io/images/hybrid_graphics.jpg" /></a>
<p>If it is designed and implemented carefully, users should not notice that a
laptop …</p><p><a class="reference external" href="https://wiki.archlinux.org/index.php/Hybrid_graphics">Hybrid Graphics</a> is a
complex hardware and software solution to achieve longer laptop battery life:
an <strong>integrated</strong> graphics device is used by default, and a <strong>discrete</strong>
graphics device with higher graphics performances is enabled on demand.</p>
<a class="reference external image-reference" href="https://www.theregister.co.uk/2010/02/09/inside_nvidia_optimus/"><img alt="Hybrid Graphics" src="https://vstinner.github.io/images/hybrid_graphics.jpg" /></a>
<p>If it is designed and implemented carefully, users should not notice that a
laptop has two graphical devices.</p>
<p>Sadly, the Linux implementation is not perfect yet. I had to debug different
graphics issues on GNOME last months, so I decided to write down an article
about this technology.</p>
<p>This article is about the <strong>GNOME</strong> desktop environment with <strong>Wayland</strong>
running on <strong>Fedora</strong> 30, with Linux kernel <strong>vgaswitcheroo</strong> in muxless mode
(more about that above).</p>
<div class="section" id="hybrid-graphics-1">
<h2>Hybrid Graphics</h2>
<p>Hybrid Graphics are known under different names:</p>
<ul class="simple">
<li>Linux kernel <a class="reference external" href="https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html">vgaswitcheroo</a></li>
<li><a class="reference external" href="https://wiki.archlinux.org/index.php/PRIME">PRIME</a> in Linux open source
GPU drivers (nouveau, ati, amdgpu and intel), the "muxless" flavor of hybrid graphics</li>
<li><a class="reference external" href="https://wiki.archlinux.org/index.php/bumblebee">Bumblebee</a>:
<a class="reference external" href="https://wiki.archlinux.org/index.php/NVIDIA_Optimus">NVIDIA Optimus</a>
for Linux</li>
<li>"AMD Dynamic Switchable Graphics" for Radeon</li>
<li>"Dual GPUs"</li>
<li>etc.</li>
</ul>
<p>Nowadays, most manufacturers utilizes the <strong>muxless</strong> model:</p>
<blockquote>
Dual GPUs but <strong>only one of them is connected to outputs</strong>. The other one
is merely used to <strong>offload rendering</strong>, its results are copied over PCIe
into the framebuffer. On Linux this is supported with DRI PRIME.</blockquote>
<p>In 2010, the first generation hybrid model used the <strong>muxed</strong> model:</p>
<blockquote>
Dual GPUs with a hardware multiplexer chip to switch outputs between GPUs.
This model makes the user choose (at boot time or at login time) between
the two power/graphics profiles and is almost fixed throughout the user
session.</blockquote>
<p>Note: The development to support hybrid graphics in Linux started in 2010.</p>
</div>
<div class="section" id="does-my-linux-have-hybrid-graphics">
<h2>Does my Linux have Hybrid Graphics?</h2>
<p>On Linux, Hybrid Graphics is used if the <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/</tt>
directory exists.</p>
<p>No Hybrid Graphics, single graphics device:</p>
<pre class="literal-block">
$ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
cat: /sys/kernel/debug/vgaswitcheroo/switch: No such file or directory
</pre>
<p>Hybrid Graphics with two graphics devices:</p>
<pre class="literal-block">
$ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynOff:0000:01:00.0
</pre>
<p>Command to list graphics devices:</p>
<pre class="literal-block">
$ lspci|grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
</pre>
</div>
<div class="section" id="hardware">
<h2>Hardware</h2>
<p>My employer gave me a Lenovo P50 laptop to work in December 2017. It is my only
computer at home, so I needed a powerful laptop (even if it's heavy for
traveling to conferences). The CPU, RAM and battery are great, but the hybrid
graphics caused me some headaches.</p>
<p>My Lenovo P50 has two GPUs:</p>
<pre class="literal-block">
$ lspci|grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
</pre>
<ul class="simple">
<li>The <strong>Integrated Graphics Device</strong> is a <strong>Intel</strong> IGP (Intel HD Graphics 530)</li>
<li>The <strong>Discrete Graphics Device</strong> is a <strong>NVIDIA</strong> GPU (NVIDIA Quadro M1000M)</li>
</ul>
<p>I didn't know that that the laptop had two graphics device when I chose the
laptop model. I discovered hybrid graphics when I started to debug graphics
issues.</p>
</div>
<div class="section" id="bios">
<h2>BIOS</h2>
<p>Hybrid graphics can be configured in the BIOS:</p>
<ul class="simple">
<li><strong>Discrete Graphics mode</strong> will achieve higher graphics performances.</li>
<li><strong>Hybrid Graphics mode</strong> (default) runs as Integrated Graphics mode to
achieve longer battery life, and Discrete Graphics is enabled on demand.</li>
</ul>
<p>On my Lenovo P50, using the <strong>Discrete Graphics mode</strong> removes "00:02.0 VGA
compatible controller: Intel Corporation HD Graphics 530" from <tt class="docutils literal">lspci</tt>
command output: the <strong>Intel IGP is fully disabled</strong>. The Linux kernel only
sees the NVIDIA GPU.</p>
</div>
<div class="section" id="linux-kernel">
<h2>Linux kernel</h2>
<p>On Linux, hybrid graphics is handled by <strong>vgaswitcheroo</strong>:</p>
<pre class="literal-block">
$ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynPwr:0000:01:00.0
</pre>
<ul class="simple">
<li><tt class="docutils literal">IGD</tt> stands for <strong>Integrated</strong> Graphics Device</li>
<li><tt class="docutils literal">DIS</tt> stands for <strong>DIScrete</strong> Graphics Device</li>
<li>"+" marks the <strong>active</strong> card</li>
<li><tt class="docutils literal">Pwr</tt>: the graphics device is <strong>always active</strong></li>
<li><tt class="docutils literal">DynPwr</tt>: the graphics device is actived <strong>on demand</strong></li>
</ul>
<p>The last field (ex: <tt class="docutils literal">0000:00:02.0</tt>) is based on the PCI identifier:</p>
<pre class="literal-block">
$ lspci|grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
</pre>
<p>On my laptop, hybrid graphics is detected by an <a class="reference external" href="https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface">ACPI</a>
"Device-Specific Method" (DSM):</p>
<pre class="literal-block">
$ journalctl -b -k|grep 'VGA switcheroo'
Sep 11 02:29:54 apu kernel: VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
</pre>
<p>See: <a class="reference external" href="https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html">VGA Switcheroo (Linux kernel documentation)</a>.</p>
</div>
<div class="section" id="opengl">
<h2>OpenGL</h2>
<p><a class="reference external" href="https://en.wikipedia.org/wiki/Mesa_(computer_graphics)">Mesa</a> provides
<tt class="docutils literal">glxinfo</tt> utility to get information about the OpenGL driver currently used:</p>
<pre class="literal-block">
$ glxinfo|grep -E 'Device|direct rendering'
direct rendering: Yes
Device: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) (0x191b)
</pre>
<p>On this example, the discrete Intel IGP is used.</p>
<p>In Firefox, go to <strong>about:support</strong> page and search for the <tt class="docutils literal">Graphics</tt>
section to get information about compositing, WebGL, GPU, etc.</p>
</div>
<div class="section" id="dri-prime-environment-variable">
<h2>DRI_PRIME environment variable</h2>
<p>Set DRI_PRIME=1 environment variable to run an application with the
<strong>discrete</strong> GPU.</p>
<p>Example:</p>
<pre class="literal-block">
$ DRI_PRIME=1 glxinfo|grep -E 'Device|rendering'
direct rendering: Yes
Device: NV117 (0x13b1)
</pre>
</div>
<div class="section" id="switcheroo-control">
<h2>switcheroo-control</h2>
<p><a class="reference external" href="https://github.com/hadess/switcheroo-control">switcheroo-control</a> is a
deamon controlling <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/switch</tt> (Linux kernel).
It can be accessed by DBus.</p>
<p>When the daemon starts, it looks for <tt class="docutils literal">xdg.force_integrated=VALUE</tt> parameter
in the Linux command line. If <em>VALUE</em> is <tt class="docutils literal">1</tt>, <tt class="docutils literal">true</tt> or <tt class="docutils literal">on</tt>, or if
<tt class="docutils literal">xdg.force_integrated=VALUE</tt> is not found in the command line, the daemon
writes <tt class="docutils literal">DIGD</tt> into <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/switch</tt> (delayed
<strong>switch to the integrated graphics device</strong>: my Intel IGP)</p>
<p>If <tt class="docutils literal">xdg.force_integrated=0</tt> is found in the command line, the daemon leaves
<tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/switch</tt> unchanged.</p>
<p>systemd:</p>
<ul class="simple">
<li>Check if the service is running: <tt class="docutils literal">sudo systemctl status <span class="pre">switcheroo-control.service</span></tt></li>
<li>Disable the service: <tt class="docutils literal">sudo systemctl disable <span class="pre">switcheroo-control.service</span></tt>
and <tt class="docutils literal">sudo systemctl stop <span class="pre">switcheroo-control.service</span></tt></li>
</ul>
<p>On Fedora, switcheroo-control is installed by default.</p>
<p>It is unclear to me if this daemon is still useful for my setup. It seems like
the the Linux kernel switcheroo uses the integrated Intel IGP by default
anyway.</p>
</div>
<div class="section" id="disable-the-discrete-gpu-by-blacklisting-its-driver">
<h2>Disable the discrete GPU by blacklisting its driver</h2>
<p>To debug graphical bugs, I wanted to ensure that the discrete NVIDIA GPU is
never used.</p>
<p>I found the solution of fully disabling the nouveau driver in the Linux kernel:
add <tt class="docutils literal">modprobe.blacklist=nouveau</tt> to the Linux kernel command line. On Fedora,
you can use:</p>
<pre class="literal-block">
sudo grubby --update-kernel=ALL --args="modprobe.blacklist=nouveau"
</pre>
<p>To reenable nouveau, remove the parameter. On Fedora:</p>
<pre class="literal-block">
sudo grubby --update-kernel=ALL --remove-args="modprobe.blacklist=nouveau"
</pre>
</div>
<div class="section" id="demo">
<h2>Demo!</h2>
<p>For this test, my laptop is not connected to anything (no power cable, no
external monitor, no dock).</p>
<p>When my laptop is idle (no 3D application is running), the NVIDIA GPU is
<strong>suspended</strong>:</p>
<pre class="literal-block">
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/enable
0
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status
suspended
</pre>
<p>I explicitly run a 3D application on it:</p>
<pre class="literal-block">
DRI_PRIME=1 glxgears
</pre>
<p>The NVIDIA GPU becomes <strong>active</strong>:</p>
<pre class="literal-block">
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/enable
2
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status
active
</pre>
<p>I stop the 3D application. A few seconds later, the NVIDIA GPU is <strong>suspended</strong>
again:</p>
<pre class="literal-block">
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/enable
0
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status
suspended
</pre>
</div>
<div class="section" id="graphics-devices-and-monitors">
<h2>Graphics devices and monitors</h2>
<p>When I disabled the nouveau driver using <tt class="docutils literal">modprobe.blacklist=nouveau</tt> kernel
command line parameter, I was no longer able to use external monitors. I
understood that:</p>
<ul class="simple">
<li>The <strong>Intel</strong> IGP is connected to the <strong>internal</strong> laptop screen</li>
<li>The <strong>NVIDIA</strong> GPU is connected to the <strong>external</strong> monitors (DisplayPort
and HDMI ports)</li>
</ul>
<p>When my laptop has <strong>no external monitor</strong> connected, the <strong>discrete</strong> NVIDIA
GPU is <strong>actived on demand</strong> (suspended when idle)</p>
<p>When I connect my laptop to <strong>two external monitors</strong> (using my dock), the
<strong>discrete</strong> NVIDIA GPU is <strong>always active</strong>:</p>
<pre class="literal-block">
$ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status
active
</pre>
</div>
<div class="section" id="links">
<h2>Links</h2>
<ul class="simple">
<li><a class="reference external" href="https://wiki.archlinux.org/index.php/Hybrid_graphics">https://wiki.archlinux.org/index.php/Hybrid_graphics</a></li>
<li><a class="reference external" href="https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html">https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html</a></li>
<li><a class="reference external" href="https://wiki.archlinux.org/index.php/PRIME">https://wiki.archlinux.org/index.php/PRIME</a></li>
<li><a class="reference external" href="https://help.ubuntu.com/community/HybridGraphics">https://help.ubuntu.com/community/HybridGraphics</a></li>
<li><a class="reference external" href="https://en.wikipedia.org/wiki/Nvidia_Optimus">https://en.wikipedia.org/wiki/Nvidia_Optimus</a></li>
<li><a class="reference external" href="https://en.wikipedia.org/wiki/AMD_Hybrid_Graphics">https://en.wikipedia.org/wiki/AMD_Hybrid_Graphics</a></li>
<li><a class="reference external" href="https://nouveau.freedesktop.org/wiki/Optimus">https://nouveau.freedesktop.org/wiki/Optimus</a></li>
</ul>
</div>
Split Include/ directory in Python 3.82019-06-19T12:00:00+02:002019-06-19T12:00:00+02:00Victor Stinnertag:vstinner.github.io,2019-06-19:/split-include-directory-python38.html<a class="reference external image-reference" href="https://www.flickr.com/photos/mortengade/2747989334/"><img alt="Private way. Trespassers and those disposing rubbish will be prosecuted." src="https://vstinner.github.io/images/private_way.jpg" /></a>
<p>In September 2017, during the CPython sprint at Facebook, I proposed my
idea to create <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">A New C API for CPython</a>.
I'm still working on the Python C API at: <a class="reference external" href="http://pythoncapi.readthedocs.io/">pythoncapi.readthedocs.io</a>.</p>
<p>My analysis is that the C API leaks too many implementation details which
prevent to optimize Python …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/mortengade/2747989334/"><img alt="Private way. Trespassers and those disposing rubbish will be prosecuted." src="https://vstinner.github.io/images/private_way.jpg" /></a>
<p>In September 2017, during the CPython sprint at Facebook, I proposed my
idea to create <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">A New C API for CPython</a>.
I'm still working on the Python C API at: <a class="reference external" href="http://pythoncapi.readthedocs.io/">pythoncapi.readthedocs.io</a>.</p>
<p>My analysis is that the C API leaks too many implementation details which
prevent to optimize Python and make the implementation of PyPy (cpyext) more
painful.</p>
<p>In Python 3.8, I created <tt class="docutils literal">Include/cpython/</tt> sub-directory to stop adding new
APIs to the stable API by mistake.</p>
<p>I moved more private functions into the internal C API: <tt class="docutils literal">Include/internal/</tt>
directory.</p>
<p>I also converted some macros like <tt class="docutils literal">Py_INCREF()</tt> and <tt class="docutils literal">Py_DECREF()</tt> to static
inline functions to have well defined parameter and return type, and to avoid
macro pitfals.</p>
<p>Finally, I removed 3 functions from the C API.</p>
<div class="section" id="include-internal">
<h2>Include/internal/</h2>
<p>In Python 3.7, <strong>Eric Snow</strong> created <tt class="docutils literal">Include/internal/</tt> sub-directory for
the CPython "internal C API": API which should not be used outside CPython code
base. In Python 3.6, these APIs were surrounded by:</p>
<pre class="literal-block">
#ifdef Py_BUILD_CORE
...
#endif
</pre>
<p>In Python 3.8, I continued this work by moving more private functions into
this directory: see <a class="reference external" href="https://bugs.python.org/issue35081">bpo-35081</a>.</p>
<p>I started a thread on python-dev: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2018-October/155587.html">[Python-Dev] Rename Include/internal/ to
Include/pycore/</a>. But
it was decided to keep <tt class="docutils literal">Include/internal/</tt> name. It was decided that internal
header files must not be included implicitly by the generic <tt class="docutils literal">#include
<Python.h></tt>, but included explicitly. For example, when I moved
<tt class="docutils literal">_PyObject_GC_TRACK()</tt> and <tt class="docutils literal">_PyObject_GC_UNTRACK()</tt> to the internal C API,
I had to add <tt class="docutils literal">#include "pycore_object.h"</tt> to 32 C files!</p>
<p><a class="reference external" href="https://bugs.python.org/issue35296">I also modified make install</a> to install
this internal C API, so it can be used for specific needs like debuggers or
profilers which have to access CPython internals (access structure fields) but
cannot call functions. For example, <strong>Eric Snow</strong> moved the <tt class="docutils literal">PyInterpreterState</tt>
structure to the internal C API.</p>
<p>Installing the internal C API ease the migration of APIs to internal: if an API
is still needed after it's moved, it's now possible to opt-in to use it.</p>
<p>Using the internal C API requires to define <tt class="docutils literal">Py_BUILD_CORE_MODULE</tt> macro and
use a different include, like <tt class="docutils literal">#include "internal/pycore_pystate.h"</tt>. It's
more complicated on purpose: ensure that it's not used by mistake.</p>
<p>Python 3.8 now provides 21 internal header files:</p>
<pre class="literal-block">
pycore_accu.h pycore_getopt.h pycore_pyhash.h
pycore_atomic.h pycore_gil.h pycore_pylifecycle.h
pycore_ceval.h pycore_hamt.h pycore_pymem.h
pycore_code.h pycore_initconfig.h pycore_pystate.h
pycore_condvar.h pycore_object.h pycore_traceback.h
pycore_context.h pycore_pathconfig.h pycore_tupleobject.h
pycore_fileutils.h pycore_pyerrors.h pycore_warnings.h
</pre>
</div>
<div class="section" id="include-cpython">
<h2>Include/cpython/</h2>
<p>The <a class="reference external" href="https://www.python.org/dev/peps/pep-0384/">PEP 384 "Defining a Stable ABI"</a> introduced <tt class="docutils literal">Py_LIMITED_API</tt>
macro to exclude functions from the Python C API. The problem is when a new API
is added, it has to explicitly be excluded using <tt class="docutils literal">#ifndef Py_LIMITED_API</tt>.
If the author forgets it, the function is added to be stable API by mistake.</p>
<p>I proposed to move the API which should be excluded from the stable ABI to a
new subdirectory. I created a <a class="reference external" href="https://discuss.python.org/t/poll-what-is-your-favorite-name-for-the-new-include-subdirectory/477">poll on the sub-directory name</a>:</p>
<ul class="simple">
<li><tt class="docutils literal">Include/cpython/</tt></li>
<li><tt class="docutils literal">Include/board/</tt></li>
<li><tt class="docutils literal">Include/impl/</tt></li>
<li><tt class="docutils literal">Include/pycapi/</tt> (the name that I proposed initially)</li>
<li><tt class="docutils literal">Include/unstable/</tt></li>
<li>other (add comment)</li>
</ul>
<p>The <tt class="docutils literal">Include/cpython/</tt> name won with 100% of the 3 votes (and a few more
supports in the python-dev discussion and in the bug tracker) :-)</p>
<p>I created <a class="reference external" href="https://bugs.python.org/issue35134">bpo-35134: Add a new Include/cpython/ subdirectory for the "CPython
API" with implementation details</a>.</p>
<p>My initial description of the directory content:</p>
<blockquote>
The new subdirectory will contain <tt class="docutils literal">#ifndef Py_LIMITED_API</tt> code, not the
“Stable ABI” of <a class="reference external" href="https://www.python.org/dev/peps/pep-0384/">PEP 384</a>, but
more “implementation details” of CPython.</blockquote>
<p>The change is backward compatible: <tt class="docutils literal">#include <Python.h></tt> will still provide
exactly the same API. For example, <tt class="docutils literal">object.h</tt> automatically includes
<tt class="docutils literal">cpython/object.h</tt>. But <tt class="docutils literal">Include/cpython/</tt> headers must not be included
directly (it would fail with a compilation error).</p>
<p>For example, <tt class="docutils literal">Include/object.h</tt> now ends with:</p>
<pre class="literal-block">
#ifndef Py_LIMITED_API
# define Py_CPYTHON_OBJECT_H
# include "cpython/object.h"
# undef Py_CPYTHON_OBJECT_H
#endif
</pre>
<p><tt class="docutils literal">Include/cpython/object.h</tt> structure (content replaced with <tt class="docutils literal">...</tt>):</p>
<pre class="literal-block">
#ifndef Py_CPYTHON_OBJECT_H
# error "this header file must not be included directly"
#endif
#ifdef __cplusplus
extern "C" {
#endif
...
#ifdef __cplusplus
}
#endif
</pre>
<p>In Python 3.8, the work is not complete. I tried to double- or even
triple-check my changes to ensure that I don't remove an API by mistake. This
work is still on-going in Python 3.9.</p>
</div>
<div class="section" id="summary-of-include-directories">
<h2>Summary of Include/ directories</h2>
<p>The header files have been reorganized to better separate the different kinds
of APIs:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">Include/*.h</span></tt> should be the portable public stable C API.</li>
<li><tt class="docutils literal"><span class="pre">Include/cpython/*.h</span></tt> should be the unstable C API specific to CPython;
public API, with some private API prefixed by <tt class="docutils literal">_Py</tt> or <tt class="docutils literal">_PY</tt>.</li>
<li><tt class="docutils literal"><span class="pre">Include/internal/*.h</span></tt> is the private internal C API very specific to
CPython. This API comes with no backward compatibility warranty and should
not be used outside CPython. It is only exposed for very specific needs
like debuggers and profiles which has to access to CPython internals
without calling functions. This API is now installed by <tt class="docutils literal">make install</tt>.</li>
</ul>
</div>
<div class="section" id="convert-macros-to-static-inline-functions">
<h2>Convert macros to static inline functions</h2>
<p>In <a class="reference external" href="https://bugs.python.org/issue35059">bpo-35059</a>, I converted some macros
to static inline functions:</p>
<ul class="simple">
<li><tt class="docutils literal">Py_INCREF()</tt>, <tt class="docutils literal">Py_DECREF()</tt></li>
<li><tt class="docutils literal">Py_XINCREF()</tt>, <tt class="docutils literal">Py_XDECREF()</tt></li>
<li><tt class="docutils literal">PyObject_INIT()</tt>, <tt class="docutils literal">PyObject_INIT_VAR()</tt></li>
<li>Private functions: <tt class="docutils literal">_PyObject_GC_TRACK()</tt>, <tt class="docutils literal">_PyObject_GC_UNTRACK()</tt>,
<tt class="docutils literal">_Py_Dealloc()</tt></li>
</ul>
<p>Compared to macros, static inline functions have multiple advantages:</p>
<ul class="simple">
<li>Parameter types and return type are well defined;</li>
<li>They don't have issues specific to macros: see <a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">GCC Macro Pitfals</a>;</li>
<li>Variables have a well defined local scope.</li>
</ul>
<p>Python 3.7 uses ugly macros with comma and semicolon. Example:</p>
<pre class="literal-block">
#define _Py_REF_DEBUG_COMMA ,
#define _Py_CHECK_REFCNT(OP) /* a semicolon */;
#define _Py_NewReference(op) ( \
_Py_INC_TPALLOCS(op) _Py_COUNT_ALLOCS_COMMA \
_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \
Py_REFCNT(op) = 1)
</pre>
<p><a class="reference external" href="https://www.python.org/dev/peps/pep-0007/#c-dialect">Python 3.6 requires C99 standard of the C dialect</a>. It was time to start
to use it :-)</p>
</div>
<div class="section" id="removed-functions">
<h2>Removed functions</h2>
<p><a class="reference external" href="https://bugs.python.org/issue35713">bpo-35713</a>: I removed
<tt class="docutils literal">PyByteArray_Init()</tt> and <tt class="docutils literal">PyByteArray_Fini()</tt> functions. They did nothing
since Python 2.7.4 and Python 3.2.0, were excluded from the limited API (stable
ABI), and were not documented.</p>
<p><a class="reference external" href="https://bugs.python.org/issue36728">bpo-36728</a>: I also removed
<tt class="docutils literal">PyEval_ReInitThreads()</tt> function. It should not be called explicitly: use
<tt class="docutils literal">PyOS_AfterFork_Child()</tt> instead.</p>
</div>
Python 3.8 sys.unraisablehook2019-06-15T01:00:00+02:002019-06-15T01:00:00+02:00Victor Stinnertag:vstinner.github.io,2019-06-15:/sys-unraisablehook-python38.html<a class="reference external image-reference" href="https://www.flickr.com/photos/dawnmanser/8046201692/"><img alt="Hidden kitten" src="https://vstinner.github.io/images/hidden_kitten.jpg" /></a>
<p>I added a new <a class="reference external" href="https://docs.python.org/dev/library/sys.html#sys.unraisablehook">sys.unraisablehook</a> function to
allow to set a custom hook to control how "unraisable exceptions" are handled.
It is already testable in <a class="reference external" href="https://pythoninsider.blogspot.com/2019/06/python-380b1-is-now-available-for.html">Python 3.8 beta1</a>,
released last week!</p>
<p>An "unraisable exception" is an error which happens when Python cannot report
it to the caller. Examples …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/dawnmanser/8046201692/"><img alt="Hidden kitten" src="https://vstinner.github.io/images/hidden_kitten.jpg" /></a>
<p>I added a new <a class="reference external" href="https://docs.python.org/dev/library/sys.html#sys.unraisablehook">sys.unraisablehook</a> function to
allow to set a custom hook to control how "unraisable exceptions" are handled.
It is already testable in <a class="reference external" href="https://pythoninsider.blogspot.com/2019/06/python-380b1-is-now-available-for.html">Python 3.8 beta1</a>,
released last week!</p>
<p>An "unraisable exception" is an error which happens when Python cannot report
it to the caller. Examples: object finalizer error (<tt class="docutils literal">__del__()</tt>), weak
reference callback failure, error during a GC collection. At the C level, the
<tt class="docutils literal">PyErr_WriteUnraisable()</tt> function is called to handle such exception.</p>
<p>Design the new hook was tricky, as its implementation.</p>
<p>The photo shows an exception awaiting to catch you ;-)</p>
<div class="section" id="kill-python-at-the-first-unraisable-exception">
<h2>Kill Python at the first unraisable exception</h2>
<p>One month ago, <strong>Thomas Grainger</strong> opened <a class="reference external" href="https://bugs.python.org/issue36829">bpo-36829</a>: "CLI option to make
PyErr_WriteUnraisable abort the current process". He wrote:</p>
<blockquote>
Currently it's quite easy for these <strong>errors</strong> to go <strong>unnoticed</strong>. (...)
The point for me is that CI will fail if it happens, then <strong>I can use gdb</strong>
to find out the cause</blockquote>
<p><strong>Zackery Spytz</strong> wrote the <a class="reference external" href="https://github.com/python/cpython/pull/13175">PR 13175</a> to add <tt class="docutils literal"><span class="pre">-X</span> abortunraisable</tt>
command line option. When this option is used, <tt class="docutils literal">PyErr_WriteUnraisable()</tt>
calls <tt class="docutils literal"><span class="pre">Py_FatalError("Unraisable</span> exception")</tt> which calls <tt class="docutils literal">abort()</tt>: it
raises <tt class="docutils literal">SIGABRT</tt> signal which kills the process by default.</p>
</div>
<div class="section" id="handle-unraisable-exception-in-python-sys-unraisablehook">
<h2>Handle unraisable exception in Python: sys.unraisablehook</h2>
<p>I concur with Thomas that it's easy to miss such exception, but I dislike
killing the process. It's not practical to have to use a low-level debugger
like gdb to handle such bug.</p>
<p>I proposed a different design: add a new <tt class="docutils literal">sys.unraisablehook</tt> hook allowing
to use arbitrary Python code to handle an "unraisable exception".</p>
<p>I wrote a <a class="reference external" href="https://bugs.python.org/issue36829#msg341868">hook example</a> which
displays the Python stack where the exception occurred using the <tt class="docutils literal">traceback</tt>
module.</p>
<p>I chose to pass an single object as argument to <tt class="docutils literal">sys.unraisablehook</tt>. The
object has 4 attributes:</p>
<ul class="simple">
<li>exc_type: Exception type.</li>
<li>exc_value: Exception value, can be None.</li>
<li>exc_traceback: Exception traceback, can be None.</li>
<li>object: Object causing the exception, can be None.</li>
</ul>
<p>I wanted to design an <strong>extensible API</strong>: keep the backward compatibility even
if tomorrow we want to add a new attribute to the object to pass more
information.</p>
</div>
<div class="section" id="adding-source-parameter-to-the-warnings-module">
<h2>Adding source parameter to the warnings module</h2>
<p>To explain the rationale of my proposed <tt class="docutils literal">sys.unraisablehook</tt> design (single
objeect with attributes), let me tell you my bad experience with the
<tt class="docutils literal">warnings</tt> module.</p>
<div class="section" id="use-tracemalloc-for-resourcewarning">
<h3>Use tracemalloc for ResourceWarning</h3>
<p>In March 2016, I was tired how debugging <tt class="docutils literal">ResourceWarning</tt> warnings: it's
hard to guess where the bug comes from. The warning is logged where the
resource is released, but I was interested by where the resource was allocated.</p>
<p>My <a class="reference external" href="https://docs.python.org/dev/library/tracemalloc.html">tracemalloc</a> module
provides a convenient <a class="reference external" href="https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.get_object_traceback">get_object_traceback()</a>
function which provides the traceback where any Python has been allocated.</p>
<p>I opened <a class="reference external" href="https://bugs.python.org/issue26604">bpo-26604</a>: "ResourceWarning:
Use tracemalloc to display the traceback where an object was allocated when a
ResourceWarning is emitted".</p>
</div>
<div class="section" id="warnings-hooks-cannot-be-extended">
<h3>warnings hooks cannot be extended</h3>
<p>The problem is that the <tt class="docutils literal">showwarning()</tt> and <tt class="docutils literal">formatwarning()</tt> functions of
<tt class="docutils literal">warnings</tt> can be overriden. They use a fixed number of positional
parameters:</p>
<pre class="literal-block">
def showwarning(message, category, filename, lineno, file=None, line=None): ...
def formatwarning(message, category, filename, lineno, line=None): ...
</pre>
<p>If they are called with an additional parameter, they fail with a
<tt class="docutils literal">TypeError</tt>. I wanted to add a new <tt class="docutils literal">source</tt> parameter to these functions.</p>
</div>
<div class="section" id="reuse-existing-warningmessage-class">
<h3>Reuse existing WarningMessage class</h3>
<p>To extend the warnings module, I chose to rely on the existing
<tt class="docutils literal">WarningMessage</tt> class which can be used to "pack" all parameters as a single
object. This class was used by <tt class="docutils literal">catch_warnings</tt> context manager.</p>
<p>I had to add new private <tt class="docutils literal">_showwarnmsg()</tt> and <tt class="docutils literal">_formatwarnmsg()</tt> functions.
They are called with a <tt class="docutils literal">WarningMessage</tt> instance. The implementation has to
detect when <tt class="docutils literal">showwarning()</tt> and <tt class="docutils literal">formatwarning()</tt> is overriden: the
overriden function must be called with the legacy API in this case. The
backward compatibility requirement makes the implementation complex.</p>
</div>
<div class="section" id="regression">
<h3>Regression</h3>
<p>After Python 3.6 was released with my new feature, <a class="reference external" href="https://bugs.python.org/issue35178">bpo-35178</a> was reported. The <tt class="docutils literal">warnings</tt> module
called a custom <tt class="docutils literal">formatwarning()</tt> with the <tt class="docutils literal">line</tt> argument passed as a
keyword argument, whereas other arguments are passed as positional arguments.
The <a class="reference external" href="https://github.com/python/cpython/commit/be7c460fb50efe3b88a00281025d76acc62ad2fd">fix was trivial</a>,
but it shows that backward compatibility is hard.</p>
</div>
<div class="section" id="example">
<h3>Example</h3>
<p>By the way, example of the feature using a <tt class="docutils literal">filebug.py</tt> script:</p>
<pre class="literal-block">
def func():
f = open(__file__)
f = None
func()
</pre>
<p>The feature adds the "Object allocated at" traceback, whereas existing <tt class="docutils literal">f =
None</tt> output is worthless.</p>
<pre class="literal-block">
$ python3 -Wd -X tracemalloc=5 filebug.py
filebug.py:3: ResourceWarning: unclosed file <_io.TextIOWrapper name='filebug.py' mode='r' encoding='UTF-8'>
f = None
Object allocated at (most recent call first):
File "filebug.py", lineno 2
f = open(__file__)
File "filebug.py", lineno 5
func()
</pre>
</div>
</div>
<div class="section" id="limitations-of-my-unraisablehook-idea">
<h2>Limitations of my unraisablehook idea</h2>
<p>To come back to <a class="reference external" href="https://bugs.python.org/issue36829">bpo-36829</a>, I identified
a limitation in my <tt class="docutils literal">sys.unraisablehook</tt> idea: unraisable exceptions which
occurs very late during Python finalization cannot be handled by a custom hook.</p>
<p>Thomas said that he is fine with having to use <tt class="docutils literal">gdb</tt> to debug an issue
during Python finalization.</p>
<p>In my experience, using <tt class="docutils literal">gdb</tt> on system Python is unpleasant, since it's
usually deeply optimized (PGO + LTO optimizations). gdb fails to read variables
which are only displayed as <tt class="docutils literal"><optimized out></tt>. By the way, that's why I fixed
the <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build">debug build of Python to be ABI compatible with a release build</a>,
but that's a different story.</p>
<p>Thomas's idea of killing the process allows to detect unraisable exceptions
whenever they occur.</p>
</div>
<div class="section" id="api-discussed-on-python-dev">
<h2>API discussed on python-dev</h2>
<p>I started a discussion on python-dev to get more feedback: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157436.html">bpo-36829: Add
sys.unraisablehook()</a>.</p>
<div class="section" id="new-exception-while-handling-an-exception">
<h3>New exception while handling an exception</h3>
<p><strong>Nathaniel Smith</strong> asked what happens if a custom hook raises a new exception?</p>
<p>This problem is easy to fix: <tt class="docutils literal">PyErr_WriteUnraisable()</tt> calls the default
hook to handle the new exception (I already implemented this solution).</p>
</div>
<div class="section" id="positional-arguments">
<h3>Positional arguments</h3>
<p><strong>Serhiy Storchaka</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157439.html">preferred</a> passing 5
positional arguments (exc_type, exc_value, exc_tb, obj and msg):</p>
<blockquote>
Currently we have no plans for adding more details, and I do not think that
we will need to do this in future.</blockquote>
<p>Later, he added:</p>
<blockquote>
If you have plans for adding new details in future, I propose to add a 6th
parameter "context" or "extra" (always None currently). It is as extensible
as packing all arguments into a single structure, but you do not need to
introduce the structure type and create its instance until you need to pass
additional info.</blockquote>
</div>
<div class="section" id="reuse-sys-excepthook">
<h3>Reuse sys.excepthook</h3>
<p><strong>Steve Dower</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157453.html">proposed to reuse sys.excepthook</a>, rather
than adding a new hook, and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157465.html">create a new exception to pass extra info</a>.</p>
<p><strong>Nathaniel</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157460.html">explained</a> that
<tt class="docutils literal">sys.excepthook</tt> and <tt class="docutils literal">sys.unraisablehook</tt> have different behavior and so
require to be different.</p>
</div>
<div class="section" id="object-resurrection">
<h3>Object resurrection</h3>
<p><strong>Steve Dower</strong> was <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157452.html">concerned by object resurrection</a> and
proposed to only pass <tt class="docutils literal">repr(obj)</tt> to the hook.</p>
<p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157463.html">I explained</a> that an
object can only be resurrected after its finalization, which is different than
deallocation. Accessing a finalized object should not crash Python. The
deallocation makes an object unsable, except that deallocation only happens
once the last references to an object is gone, and so the object is no longer
accessible.</p>
<p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157467.html">Nathaniel added</a> that
<tt class="docutils literal">repr()</tt> would limit features of the hook:</p>
<blockquote>
A clever hook might want the actual object, so it can pretty-print it, or
open an interactive debugger and let it you examine it, or something.</blockquote>
</div>
<div class="section" id="naming">
<h3>Naming</h3>
<p><strong>Gregory P. Smith</strong> proposed the term "uncatchable" rather than "unraisable".</p>
</div>
<div class="section" id="keyword-only-arguments">
<h3>Keyword-only arguments</h3>
<p><strong>Barry Warsaw</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157457.html">suggested</a> to
consider keyword-only arguments to help future proof the call signature.</p>
</div>
<div class="section" id="avoid-redundant-exc-type-and-exc-traceback-parameters">
<h3>Avoid redundant exc_type and exc_traceback parameters</h3>
<p><strong>Petr Viktorin</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157459.html">asked</a> why
<tt class="docutils literal">(exc_type, exc_value, exc_traceback)</tt> triple is needed, wheras <em>exc_type</em>
could be get from <tt class="docutils literal">type(exc_type)</tt> and <em>exc_traceback</em> from
<tt class="docutils literal">exc_value.__traceback__</tt>.</p>
<p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157462.html">I made some tests</a>.
<em>exc_value</em> can be <tt class="docutils literal">NULL</tt> sometimes. In some cases, <em>exc_traceback</em> can be
set, whereas <tt class="docutils literal">exc_value.__traceback__</tt> is not set (<tt class="docutils literal">None</tt>).</p>
</div>
</div>
<div class="section" id="productive-discussion">
<h2>Productive discussion!</h2>
<p>As usual, the python-dev discussion was very productive. Each corner case has
been discussed and the API has been challenged.</p>
<p>Thanks to Petr's remark, I enhanced the existing hook to instanciate an
exception if <em>exc_value</em> is <tt class="docutils literal">NULL</tt>, create a traceback if <em>exc_traceback</em> is
<tt class="docutils literal">NULL</tt>, and set <tt class="docutils literal">exc_value.__traceback__</tt> to the traceback. If one of these
actions fail, the failure is silently ignored.</p>
<p>I also paid more attention to object resurrection.</p>
<p>After one week of discussion, I was not convinced by other alternative
propositions, whereas multiple core devs wrote that they like my API.</p>
<p>I decided to push my <a class="reference external" href="https://github.com/python/cpython/commit/ef9d9b63129a2f243591db70e9a2dd53fab95d86">commit ef9d9b63</a>:</p>
<pre class="literal-block">
commit ef9d9b63129a2f243591db70e9a2dd53fab95d86
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed May 22 11:28:22 2019 +0200
bpo-36829: Add sys.unraisablehook() (GH-13187)
Add new sys.unraisablehook() function which can be overridden to
control how "unraisable exceptions" are handled. It is called when an
exception has occurred but there is no way for Python to handle it.
For example, when a destructor raises an exception or during garbage
collection (gc.collect()).
</pre>
</div>
<div class="section" id="new-err-msg-attribute">
<h2>New err_msg attribute</h2>
<p>Unraisable exception were logged with no context, only an hardcoded
"Exception ignored in:" error message.</p>
<p>Early in <tt class="docutils literal">sys.unraisablehook</tt> discussion, <strong>Serhiy</strong> proposed to add a new
<em>err_msg</em> parameter to pass an optional error message.</p>
<p>I implemented this idea in <a class="reference external" href="https://bugs.python.org/issue36829">bpo-36829</a>
with <a class="reference external" href="https://github.com/python/cpython/commit/71c52e3048dd07567f0c690eab4e5d57be66f534">commit 71c52e30</a>:</p>
<pre class="literal-block">
commit 71c52e3048dd07567f0c690eab4e5d57be66f534
Author: Victor Stinner <vstinner@redhat.com>
Date: Mon May 27 08:57:14 2019 +0200
bpo-36829: Add _PyErr_WriteUnraisableMsg() (GH-13488)
</pre>
<p>I was able to add a new parameter as a new <em>err_msg</em> attribute without breaking the
backward compatibility!</p>
</div>
<div class="section" id="test-support-catch-unraisable-exception">
<h2>test.support.catch_unraisable_exception()</h2>
<p>I wrote a new context manager catching unraisable exceptions:
<tt class="docutils literal">test.support.catch_unraisable_exception()</tt>. The exception is stored and so
can be used for tests in the context manager, but cleared at context manager
exit.</p>
<p>I modified tests to use this new context manager:</p>
<ul class="simple">
<li>test_coroutines</li>
<li>test_cprofile</li>
<li>test_exceptions</li>
<li>test_generators</li>
<li>test_io</li>
<li>test_raise</li>
<li>test_ssl</li>
<li>test_thread</li>
<li>test_yield_from</li>
</ul>
<p>Example:</p>
<pre class="literal-block">
class BrokenDel:
def __del__(self):
raise ValueError("del is broken")
obj = BrokenDel()
with support.catch_unraisable_exception() as cm:
del obj
self.assertEqual(cm.unraisable.object, BrokenDel.__del__)
</pre>
</div>
<div class="section" id="test-io-memory-leak-regression">
<h2>test_io memory leak regression</h2>
<p>I modified test_io to ignore expected unraisable exceptions:</p>
<pre class="literal-block">
commit c15a682603a47f5aef5025f6a2e3babb699273d6
Author: Victor Stinner <vstinner@redhat.com>
Date: Thu Jun 13 00:23:49 2019 +0200
bpo-37223: test_io: silence destructor errors (GH-14031)
</pre>
<p>This change introduced a memory leak, <a class="reference external" href="https://bugs.python.org/issue37261">bpo-37261</a>:</p>
<pre class="literal-block">
test_io leaked [23208, 23204, 23208] references, sum=69620
test_io leaked [7657, 7655, 7657] memory blocks, sum=22969
</pre>
<p>The problem was this <tt class="docutils literal">catch_unraisable_exception</tt> method:</p>
<pre class="literal-block">
def __exit__(self, *exc_info):
del self.unraisable
sys.unraisablehook = self._old_hook
</pre>
<p>Sometimes, <tt class="docutils literal">del self.unraisable</tt> triggered a new unraisable exception. At
this point, <tt class="docutils literal">catch_unraisable_exception</tt> hook was still registered:</p>
<pre class="literal-block">
def _hook(self, unraisable):
self.unraisable = unraisable
</pre>
<p>At the end, <tt class="docutils literal">del self.unraisable</tt> instruction <em>indirectly</em> sets again the
<tt class="docutils literal">self.unraisable</tt> attribute.</p>
<div class="section" id="first-fix">
<h3>First fix</h3>
<p>First, I suspected that the <tt class="docutils literal">io.BufferedRWPair</tt> object which triggered the
first unraisable exception was <strong>resurrected</strong>, and that <tt class="docutils literal">del
self.unraisable</tt> called again its finalizer or deallocator, which triggered
the <em>same</em> unraisable exception again.</p>
<p>My first attempt to fix the issue was to clear the <tt class="docutils literal">sys.unraisablehook</tt> by
setting it to <tt class="docutils literal">None</tt>, and only later delete the attribute:</p>
<pre class="literal-block">
def __exit__(self, *exc_info):
self.unraisablehook = None
sys.unraisablehook = self._old_hook
del self.unraisable
</pre>
<p>If <tt class="docutils literal">self.unraisablehook = None</tt> triggers a new unraisable exception, it is
silently ignored.</p>
</div>
<div class="section" id="second-correct-fix">
<h3>Second correct fix</h3>
<p>But when I chatted with <strong>Pablo Galindo</strong>, he told me that an object cannot be
finalized twice thanks to <strong>Antoine Pitrou</strong>'s <a class="reference external" href="https://www.python.org/dev/peps/pep-0442/">PEP 442: Safe object finalization</a>.</p>
<p>I looked again into gdb. Oh. In fact, it's more subtle. <tt class="docutils literal">del self.unraisable</tt>
clears the last reference to <tt class="docutils literal">BufferedRWPair</tt> which calls its
<strong>deallocator</strong>. The dealloactor indirectly calls the <tt class="docutils literal">BufferedWriter</tt>
finalizer; the <tt class="docutils literal">BufferedWriter</tt> was stored in the <tt class="docutils literal">BufferedRWPair</tt>. This
finalizer triggers a new unraisable exception.</p>
<p><tt class="docutils literal">BufferedRWPair</tt> does not trigger two unraisable exception. It's a different
object (<tt class="docutils literal">BufferedWriter</tt>).</p>
<p>My final fix is to restore the old hook before deleting the <tt class="docutils literal">unraisable</tt>
attribute:</p>
<pre class="literal-block">
def __exit__(self, *exc_info):
sys.unraisablehook = self._old_hook
del self.unraisable
</pre>
<p>And fix test_io using two nested context managers:</p>
<pre class="literal-block">
# Ignore BufferedWriter (of the BufferedRWPair) unraisable exception
with support.catch_unraisable_exception():
# Ignore BufferedRWPair unraisable exception
with support.catch_unraisable_exception():
pair = None
support.gc_collect()
support.gc_collect()
</pre>
<p>I also documented corner cases in <tt class="docutils literal">sys.unraisablehook</tt> documentation:</p>
<blockquote>
<p><tt class="docutils literal">sys.unraisablehook</tt> can be overridden to control how unraisable
exceptions are handled.</p>
<p>Storing <em>exc_value</em> using a custom hook can create a <strong>reference cycle</strong>. It
should be cleared explicitly to break the reference cycle when the exception
is no longer needed.</p>
<p>Storing <em>object</em> using a custom hook <strong>can resurrect</strong> it if it is set to an
object which is being finalized. Avoid storing <em>object</em> after the custom
hook completes to avoid resurrecting objects.</p>
</blockquote>
</div>
</div>
<div class="section" id="regrtest-now-detects-unraisable-exceptions">
<h2>regrtest now detects unraisable exceptions</h2>
<p>Once I fixed tests to silence all expected unraisable exceptions, I created
<a class="reference external" href="https://bugs.python.org/issue37069">bpo-37069</a> to modify regrtest to install
a custom hook. I merged my <a class="reference external" href="https://github.com/python/cpython/commit/95f61c8b1619e736bd5e29a0da0183234634b6e8">commit 95f61c8b</a>:</p>
<pre class="literal-block">
commit 95f61c8b1619e736bd5e29a0da0183234634b6e8
Author: Victor Stinner <vstinner@redhat.com>
Date: Thu Jun 13 01:09:04 2019 +0200
bpo-37069: regrtest uses sys.unraisablehook (GH-13759)
regrtest now uses sys.unraisablehook() to mark a test as "environment
altered" (ENV_CHANGED) if it emits an "unraisable exception".
Moreover, regrtest logs a warning in this case.
Use "python3 -m test --fail-env-changed" to catch unraisable
exceptions in tests.
</pre>
<p>A test is marked as "environment altered" (ENV_CHANGED) if the test triggers an
unraisable exception. Using <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt> option (option used by
default on all Python CIs), a test is marked as failed in this case.</p>
</div>
<div class="section" id="hook-features">
<h2>Hook features</h2>
<p>sys.unraisablehook allows to set a custom hook to handle unraisable exceptions.
It opens many interesting features:</p>
<ul class="simple">
<li>Log the exception into system logs, over the network, or open a popup.</li>
<li>Inspect the Python stack: <tt class="docutils literal">traceback.print_stack()</tt></li>
<li>Inspect <em>object</em> content (object which caused the exception)</li>
<li>Get the traceback where <em>object</em> has been allocated:
<tt class="docutils literal">tracemalloc.get_object_traceback()</tt></li>
</ul>
<p>By the way, reimplementing Thomas's initial idea became trivial:</p>
<pre class="literal-block">
import signal, sys
def abort_hook(unraisable):
signal.raise_signal(signal.SIGABRT)
sys.unraisablehook = abort_hook
</pre>
</div>
<div class="section" id="threading-excepthook">
<h2>threading.excepthook</h2>
<p>Since I was happy of <tt class="docutils literal">sys.unraisablehook</tt>, I decided to work on the 14-years
old issue <a class="reference external" href="https://bugs.python.org/issue1230540">bpo-1230540</a>: I proposed to
add <a class="reference external" href="https://docs.python.org/dev/library/threading.html#threading.excepthook">threading.excepthook()</a>,
but that's a different story!</p>
</div>
asyncio WSASend() memory leak2019-03-06T20:00:00+01:002019-03-06T20:00:00+01:00Victor Stinnertag:vstinner.github.io,2019-03-06:/asyncio-proactor-wsasend-memory-leak.html<a class="reference external image-reference" href="https://www.flickr.com/photos/jronaldlee/5996590138/"><img alt="Leaking tap" src="https://vstinner.github.io/images/leaking_tap.jpg" /></a>
<p>I fixed multiple bugs in asyncio <tt class="docutils literal">ProactorEventLoop</tt> previously. But test_asyncio
still failed sometimes. I noticed a memory leak in <tt class="docutils literal">test_asyncio</tt> which will
haunt me for 1 year in 2018...</p>
<p><strong>Yet another example of a test failure which looks harmless but hides a
critical bug.</strong> The bug is that sending a …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/jronaldlee/5996590138/"><img alt="Leaking tap" src="https://vstinner.github.io/images/leaking_tap.jpg" /></a>
<p>I fixed multiple bugs in asyncio <tt class="docutils literal">ProactorEventLoop</tt> previously. But test_asyncio
still failed sometimes. I noticed a memory leak in <tt class="docutils literal">test_asyncio</tt> which will
haunt me for 1 year in 2018...</p>
<p><strong>Yet another example of a test failure which looks harmless but hides a
critical bug.</strong> The bug is that sending a network packet on Windows using
asyncio <tt class="docutils literal">ProactorEventLoop</tt> can leak the packet. With such bug, it is easy to
imagine a very quick increase of the memory footprint of a network server...</p>
<p>I'm curious why nobody noticed it before me? For me, the only explanation is
that nobody was running a server using <tt class="docutils literal">ProactorEventLoop</tt>. Before Python
3.8, <tt class="docutils literal">SelectorEventLoop</tt> was the default asyncio event loop on Windows.
<a class="reference external" href="https://bugs.python.org/issue34687">bpo-34687</a>: Andrew Svetlov, Yury
Selivanov and me agreed to make <tt class="docutils literal">ProactorEventLoop</tt> the default in Python
3.8! <tt class="docutils literal">Lib/asyncio/windows_events.py</tt> change of my <a class="reference external" href="https://github.com/python/cpython/commit/6ea29c5e90dde6c240bd8e0815614b52ac307ea1">commit 6ea29c5e</a>:</p>
<pre class="literal-block">
-DefaultEventLoopPolicy = WindowsSelectorEventLoopPolicy
+DefaultEventLoopPolicy = WindowsProactorEventLoopPolicy
</pre>
<p>The bug wasn't a regression. It was only discovered 5 years after the code has
been written thanks to new tests.</p>
<p><strong>UPDATE:</strong> I updated the article to add the "Regression? Nope" section and
elaborate the Conclusion.</p>
<p>Previous article:
<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsarecv-cancellation-data-loss.html">asyncio: WSARecv() cancellation causing data loss</a>.</p>
<div class="section" id="yet-another-random-buildbot-failure">
<h2>Yet another random buildbot failure</h2>
<p>One day at the end of January 2018, I noticed a new failure on the AMD64
Windows8.1 Refleaks 3.x" buildbot worker. I reported <a class="reference external" href="https://bugs.python.org/issue32710">bpo-32710</a>:</p>
<blockquote>
<p>AMD64 Windows8.1 Refleaks 3.x:
<a class="reference external" href="http://buildbot.python.org/all/#/builders/80/builds/118">http://buildbot.python.org/all/#/builders/80/builds/118</a></p>
<p>test_asyncio leaked [4, 4, 3] memory blocks, sum=11</p>
<p>I reproduced the issue. I'm running test.bisect to try to isolate this bug.</p>
</blockquote>
<p>Only 15 minutes later thanks to my <tt class="docutils literal">test.bisect</tt> tool, I identified the
leaking test, <strong>test_sendfile_close_peer_in_middle_of_receiving()</strong>:</p>
<pre class="literal-block">
It seems to be related to sendfile():
C:\vstinner\python\master>python -m test -R 3:3 test_asyncio \
-m test.test_asyncio.test_events.ProactorEventLoopTests.test_sendfile_close_peer_in_middle_of_receiving
...
test_asyncio leaked [1, 2, 1] memory blocks, sum=4
</pre>
<p>The test is identified, so it should take a few hours, maximum, to fix the bug,
no? We will see...</p>
</div>
<div class="section" id="april">
<h2>April</h2>
<p>3 months later, I asked:</p>
<blockquote>
The test is still leaking memory blocks. Any progress on investigating the
issue?</blockquote>
<p>Nobody replied.</p>
<p>At that time, I was busy to fix a bunch of various other bugs reported by
buildbots which were easier to fix and I was kind of exhausted by asyncio, I
didn't want to touch it.</p>
</div>
<div class="section" id="june">
<h2>June</h2>
<p>Oh, I found again this bug while working on my <a class="reference external" href="https://github.com/python/cpython/pull/7827">PR 7827</a> (detect handle leaks on Windows
in regrtest).</p>
<p>In 2018, I was very busy with fixing dozens of multiprocessing bugs (fix tests
but also fix some bugs in multiprocessing).</p>
<p>For example, I noticed another memory leak on AMD64 Windows8.1 Refleaks
3.7, <a class="reference external" href="https://bugs.python.org/issue33735#msg318425">bpo-33735</a>:</p>
<blockquote>
<p><a class="reference external" href="http://buildbot.python.org/all/#/builders/132/builds/154">http://buildbot.python.org/all/#/builders/132/builds/154</a></p>
<p>test_multiprocessing_spawn leaked [1, 2, 1] memory blocks, sum=4</p>
</blockquote>
<p>This test_multiprocessing_spawn leak and the test_asyncio leak on Windows
Refleaks haunted me in 2018...</p>
<p>In fact, it wasn't a real leak. After a few runs, <a class="reference external" href="https://bugs.python.org/issue33735#msg320948">the test stopped to leak</a>:</p>
<pre class="literal-block">
$ ./python -m test test_multiprocessing_spawn \
-m test.test_multiprocessing_spawn.WithProcessesTestPool.test_imap_unordered \
-R 1:30
...
test_multiprocessing_spawn leaked [4, 5, 1, 5, 1, 2, 0, 0, 0, ..., 0, 0, 0] memory blocks, sum=18
test_multiprocessing_spawn failed in 42 sec 470 ms
</pre>
<p>I fixed the test with <a class="reference external" href="https://github.com/python/cpython/commit/23401fb960bb94e6ea62d2999527968d53d3fc65">commit
23401fb9</a>.</p>
<p>I fixed other multiprocessing bugs like <a class="reference external" href="https://bugs.python.org/issue33929">bpo-33929</a>.</p>
<p>These multiprocessing bugs kept me busy.</p>
</div>
<div class="section" id="july-december">
<h2>July-December</h2>
<p>Nothing. Nobody looked at the issue.</p>
<p>Again, I was busy fixing various test failures reported by buildbots.</p>
</div>
<div class="section" id="update-in-january-2019">
<h2>Update in January 2019</h2>
<p>In January 2019, after months of hard work on fixing every single buildbot
failure, I realized <strong>suddenly</strong> that the <tt class="docutils literal">test_asyncio</tt> leak, <a class="reference external" href="https://bugs.python.org/issue32710">bpo-32710</a>, was one of the last unfixed known test
failure! So I decided to have a new look at it.</p>
<p>Update on <tt class="docutils literal">test_asyncio.test_sendfile.ProactorEventLoopTests</tt>:</p>
<ul class="simple">
<li><tt class="docutils literal">test_sendfile_close_peer_in_the_middle_of_receiving()</tt> leaks 1 reference per
run: this leak was the obvious bug <a class="reference external" href="https://bugs.python.org/issue35682">bpo-35682</a>, I already fixed it with <a class="reference external" href="https://github.com/python/cpython/commit/80fda712c83f5dd9560d42bf2aa65a72b18b7759">commit
80fda712</a>.</li>
<li><tt class="docutils literal">test_sendfile_fallback_close_peer_in_the_middle_of_receiving()</tt> leaks 1
reference per run: <strong>I don't understand why</strong>.</li>
</ul>
<p>Note: I had to copy/paste these test names a lot of times. Pleeease, for my
comfort, use shorter test names! :-) (I had to copy/paste them, I don't think
that a regular human is able to type these very long names!)</p>
<p>I spent a lot of time to investigate
<tt class="docutils literal">test_sendfile_fallback_close_peer_in_the_middle_of_receiving()</tt> leak and I don't
understand the issue.</p>
<p>The main loop is <tt class="docutils literal">BaseEventLoop._sendfile_fallback()</tt>. For
the specific case of this test, the loop can be simplified to:</p>
<pre class="literal-block">
proto = _SendfileFallbackProtocol(transp)
try:
while True:
data = b'x' * (1024 * 64)
await proto.drain()
transp.write(data)
finally:
await proto.restore()
</pre>
<p>The server closes the connection after it gets 1024 bytes. The client socket
gets a <tt class="docutils literal">ConnectionAbortedError</tt> exception in
<tt class="docutils literal">_ProactorBaseWritePipeTransport._loop_writing()</tt> which calls <tt class="docutils literal">_fatal_error()</tt>:</p>
<pre class="literal-block">
except OSError as exc:
self._fatal_error(exc, 'Fatal write error on pipe transport')
</pre>
<p><tt class="docutils literal">_fatal_error()</tt> calls <tt class="docutils literal">_force_close()</tt> which sets <tt class="docutils literal">_closing</tt> to
<tt class="docutils literal">True</tt>, and calls <tt class="docutils literal">protocol.connection_lost()</tt>. In the meanwhile,
<tt class="docutils literal">drain()</tt> raises <tt class="docutils literal">ConnectionError</tt> because <tt class="docutils literal">is_closing()</tt> is true:</p>
<pre class="literal-block">
async def drain(self):
if self._transport.is_closing():
raise ConnectionError("Connection closed by peer")
...
</pre>
<p>Said differently: <strong>everything works as expected</strong>.</p>
</div>
<div class="section" id="regression-caused-by-my-previous-proactor-fix">
<h2>Regression caused by my previous proactor fix?</h2>
<p>I suspected my own <a class="reference external" href="https://github.com/python/cpython/commit/79790bc35fe722a49977b52647f9b5fe1deda2b7">commit 79790bc3</a>
pushed 7 months ago to fix a race condition in WSARecv() causing data loss
(that's my previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsarecv-cancellation-data-loss.html">asyncio: WSARecv() cancellation causing data loss</a>).</p>
<p>Hint: nah, it's unrelated. Moreover, this change has been pushed in May,
whereas I reported <a class="reference external" href="https://bugs.python.org/issue32710">bpo-32710 leak</a> in
January.</p>
</div>
<div class="section" id="short-script-reproducing-the-leak">
<h2>Short script reproducing the leak</h2>
<p><strong>Identifying a leak of a single reference is really hard</strong> since the test uses
hundreds of Python objects! My blocker issue was to repeat the test enough
times to trigger the leak N times rather than getting a leak of exactly a
single Python reference. The problem was that the test failed when ran more
than once.</p>
<p>All my previous attempts to identify the bug failed:</p>
<ul class="simple">
<li>Use <tt class="docutils literal">gc.get_referrers()</tt> to track references between Python objects.</li>
<li>Use <tt class="docutils literal">tracemalloc</tt> to track memory usage: the leak is too small, it's lost
in the results "noise".</li>
</ul>
<p>I decided to do what I should have done first: <strong>remove as much code as
possible</strong> to reduce the code that I have to audit. I removed most Python
imports, I inlined manually function calls, I removed a lot of code which was
unused in the test, etc.</p>
<p>After a few hours, I managed to reduce the giant pile of code used by the test
into a very short script of only 159 lines of Python code: <a class="reference external" href="https://bugs.python.org/file48030/test_aiosend.py">test_aiosend.py</a>. The script doesn't call
the asyncio <tt class="docutils literal">sendfile()</tt> implementation, but uses its own copy of the code,
simplified to do exactly what the test needs:</p>
<pre class="literal-block">
async def sendfile(transp):
proto = _SendfileFallbackProtocol(transp)
try:
data = b'x' * (1024 * 24)
while True:
await proto.drain()
transp.write(data)
finally:
await proto.restore()
</pre>
<p>with a local copy of the code of <tt class="docutils literal">_SendfileFallbackProtocol</tt> class.</p>
<p>Having all code involved in the bug in a single file is way more efficient to
follow the control flow and understands what happens.</p>
<p>The original code is waaaaay more complex, scattered across multiple Python
files in <tt class="docutils literal">Lib/asyncio</tt> and <tt class="docutils literal">Lib/test/test_asyncio/</tt> directories.</p>
</div>
<div class="section" id="root-bug-identified-wsasend">
<h2>Root bug identified: WSASend()</h2>
<p><strong>It took me 1 year, a few sleepless nights, multiple attempts to understand
the leak, but I eventually found it!</strong> WSASend() doesn't release the memory if
it fails immediately. I expected something way more complex, but it's that
simple...</p>
<p>Using the <tt class="docutils literal">test_aiosend.py</tt> script that I created, I was finally able to
repeat the test in a loop. Thanks to that, it became obvious using
<tt class="docutils literal">tracemalloc</tt> that the leaked memory was the memory passed to <tt class="docutils literal">WSASend()</tt>.</p>
<p>I pushed <a class="reference external" href="https://github.com/python/cpython/commit/a234e148394c2c7419372ab65b773d53a57f3625">commit a234e148</a>
to fix <tt class="docutils literal">WSASend()</tt>:</p>
<pre class="literal-block">
commit a234e148394c2c7419372ab65b773d53a57f3625
Author: Victor Stinner <vstinner@redhat.com>
Date: Tue Jan 8 14:23:09 2019 +0100
bpo-32710: Fix leak in Overlapped_WSASend() (GH-11469)
Fix a memory leak in asyncio in the ProactorEventLoop when ReadFile()
or WSASend() overlapped operation fail immediately: release the
internal buffer.
</pre>
<p>I was very disappointed by the simplicity of the fix, <strong>it only adds a single
line</strong>:</p>
<pre class="literal-block">
diff --git a/Modules/overlapped.c b/Modules/overlapped.c
index 69875a7f37da..bbaa4fb3008f 100644
--- a/Modules/overlapped.c
+++ b/Modules/overlapped.c
@@ -1011,6 +1012,7 @@ Overlapped_WSASend(OverlappedObject *self, PyObject *args)
case ERROR_IO_PENDING:
Py_RETURN_NONE;
default:
+ PyBuffer_Release(&self->user_buffer);
self->type = TYPE_NOT_STARTED;
return SetFromWindowsErr(err);
}
</pre>
<p>So what? One year to add a single line? That's unfair!</p>
<p>My commit contains a very similar fix for <tt class="docutils literal">do_ReadFile()</tt> used by
<tt class="docutils literal">Overlapped_ReadFile()</tt> and <tt class="docutils literal">Overlapped_ReadFileInto()</tt>.</p>
</div>
<div class="section" id="fixing-more-memory-leaks">
<h2>Fixing more memory leaks</h2>
<p>By the way, the <tt class="docutils literal">_overlapped.Overlapped</tt> type has no traverse function: it may
help the garbage collector to add one. Asyncio is famous for building reference
cycles by design in <tt class="docutils literal">Future.set_exception()</tt>.</p>
<p>I wrote <a class="reference external" href="https://github.com/python/cpython/pull/11489">PR 11489</a> to implement
<tt class="docutils literal">tp_traverse</tt> for the <tt class="docutils literal">_overlapped.Overlapped</tt> type. <a class="reference external" href="https://github.com/python/cpython/pull/11489#pullrequestreview-191093765">Serhiy Storchaka
added</a>:</p>
<blockquote>
I suspect that there are leaks when self->type is set to TYPE_NOT_STARTED.</blockquote>
<p>And he was right! I modified my PR to fix all memory leaks. After my PR has
been reviewed, I merged it, <a class="reference external" href="https://github.com/python/cpython/commit/5485085b324a45307c1ff4ec7d85b5998d7d5e0d">commit 5485085b</a>:</p>
<pre class="literal-block">
commit 5485085b324a45307c1ff4ec7d85b5998d7d5e0d
Author: Victor Stinner <vstinner@redhat.com>
Date: Fri Jan 11 14:35:14 2019 +0100
bpo-32710: Fix _overlapped.Overlapped memory leaks (GH-11489)
Fix memory leaks in asyncio ProactorEventLoop on overlapped operation
failures.
Changes:
* Implement the tp_traverse slot in the _overlapped.Overlapped type
to help to break reference cycles and identify referrers in the
garbage collector.
* Always clear overlapped on failure: not only set type to
TYPE_NOT_STARTED, but release also resources.
</pre>
</div>
<div class="section" id="regression-nope">
<h2>Regression? Nope</h2>
<p>Was the memory leak a regression? Nope. The bug existed since the creation of
the <tt class="docutils literal">overlapped.c</tt> file in the "Tulip" project in 2013, <a class="reference external" href="https://github.com/python/asyncio/commit/27c403531670f52cad8388aaa2a13a658f753fd5">commit 27c40353</a>:</p>
<pre class="literal-block">
commit 27c403531670f52cad8388aaa2a13a658f753fd5
Author: Richard Oudkerk <shibturn@gmail.com>
Date: Mon Jan 21 20:34:38 2013 +0000
New experimental iocp branch.
</pre>
<p>Tulip was the old name of the asyncio project, when it was still an external
project on <tt class="docutils literal">code.google.com</tt>. In the meanwhile, <tt class="docutils literal">code.google.com</tt> has been
closed and the project moved to <a class="reference external" href="https://github.com/python/asyncio/">https://github.com/python/asyncio/</a> (now
read-only).</p>
<p><a class="reference external" href="https://github.com/python/asyncio/blob/27c403531670f52cad8388aaa2a13a658f753fd5/overlapped.c#L632-L658">Extract of the original Overlapped_WSASend() implementation</a>,
I added a comment to show the location of the bug:</p>
<pre class="literal-block">
if (!PyArg_Parse(bufobj, "y*", &self->write_buffer))
return NULL;
#if SIZEOF_SIZE_T > SIZEOF_LONG
if (self->write_buffer.len > (Py_ssize_t)PY_ULONG_MAX) {
PyBuffer_Release(&self->write_buffer);
PyErr_SetString(PyExc_ValueError, "buffer to large");
return NULL;
}
#endif
...
self->error = err = (ret < 0 ? WSAGetLastError() : ERROR_SUCCESS);
switch (err) {
case ERROR_SUCCESS:
case ERROR_MORE_DATA:
case ERROR_IO_PENDING:
/********* !!! BUG HERE, BUFFER NOT RELEASED !!! ***********/
Py_RETURN_NONE;
...
}
</pre>
<p><strong>I fixed the memory leak 6 years after the code has been written!</strong></p>
<p>So... why was this bug only discovered in 2018? Multiple very asyncio old bugs
were discovered only recently thanks to more realistic and more advanced
<strong>functional tests</strong>. First tests of asyncio were mostly tiny unit tests
mocking most part of the code. It made sense in the early days of asyncio, when
the code was not mature.</p>
<p>By the way, the <a class="reference external" href="https://github.com/python/cpython/blob/1f58f4fa6a0e3c60cee8df4a35c8dcf3903acde8/Lib/test/test_asyncio/test_sendfile.py#L446-L457">code of the test</a>
which helped to discovered the bug is:</p>
<pre class="literal-block">
def test_sendfile_close_peer_in_the_middle_of_receiving(self):
srv_proto, cli_proto = self.prepare_sendfile(close_after=1024)
with self.assertRaises(ConnectionError):
self.run_loop(
self.loop.sendfile(cli_proto.transport, self.file))
self.run_loop(srv_proto.done)
self.assertTrue(1024 <= srv_proto.nbytes < len(self.DATA),
srv_proto.nbytes)
self.assertTrue(1024 <= self.file.tell() < len(self.DATA),
self.file.tell())
self.assertTrue(cli_proto.transport.is_closing())
</pre>
<p>Note: The test name has been made even longer in the meanwhile (add "the") :-)</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>For such complex bugs, <strong>a reliable debugging method is to remove as much code as
possible</strong> to reduce the number of lines of code that should be read.
<tt class="docutils literal">tracemalloc</tt> remains efficient to identify a memory leak when a test can be
run in a loop to make the leak more obvious (I was blocked at the beginning
because the test failed when run a second time in a loop).</p>
<p>Lessons learned? You should try to <strong>investigate every single failure of your
CI</strong>. It is important to have a test suite with functional tests. "Mock tests"
are fine to quickly write reliable tests, but there are not enough: functional
tests make the difference.</p>
<p>Thanks <strong>Richard Oudkerk</strong> for your great code to use Windows native APIs in
<strong>asyncio</strong> and <strong>multiprocessing</strong>! I like <a class="reference external" href="https://en.wikipedia.org/wiki/Input/output_completion_port">Windows IOCP</a>, even if the
asyncio implementation is quite complex :-)</p>
<p>Ok, <tt class="docutils literal">_overlapped.Overlapped</tt> should now have a few less memory leaks :-)</p>
</div>
asyncio: WSARecv() cancellation causing data loss2019-01-31T15:20:00+01:002019-01-31T15:20:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-31:/asyncio-proactor-wsarecv-cancellation-data-loss.html<a class="reference external image-reference" href="https://www.flickr.com/photos/joybot/6026542856/"><img alt="Unlocked lock" src="https://vstinner.github.io/images/lock.jpg" /></a>
<p>In December 2017, <strong>Yury Selivanov</strong> pushed the long awaited <tt class="docutils literal">start_tls()</tt>
function.</p>
<p>A newly added test failed on Windows. Later, the test started to fail
randomly on Linux as well. In fact, it was a well hidden race condition in the
asynchronous handshake of <tt class="docutils literal">SSLProtocol</tt> which will take 5 months of …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/joybot/6026542856/"><img alt="Unlocked lock" src="https://vstinner.github.io/images/lock.jpg" /></a>
<p>In December 2017, <strong>Yury Selivanov</strong> pushed the long awaited <tt class="docutils literal">start_tls()</tt>
function.</p>
<p>A newly added test failed on Windows. Later, the test started to fail
randomly on Linux as well. In fact, it was a well hidden race condition in the
asynchronous handshake of <tt class="docutils literal">SSLProtocol</tt> which will take 5 months of work to
be identified and fixed. The bug wasn't a recent regression, but only spotted
thanks to newly added tests.</p>
<p>Even after this bug has been fixed, the same test still failed randomly on
Windows! Once I found how to reproduce the bug, I understood that it's a <strong>very
scary bug</strong>: <tt class="docutils literal">WSARecv()</tt> cancellation randomly caused <strong>data loss</strong>! Again,
it was a very well hidden bug which likely existing since the early days of the
<tt class="docutils literal">ProactorEventLoop</tt> implementation.</p>
<p>Previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-connect-pipe-race-condition.html">Asyncio: Proactor ConnectPipe() Race Condition</a>.
Next article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsasend-memory-leak.html">asyncio: WSASend() memory leak</a>.</p>
<div class="section" id="new-start-tls-function">
<h2>New start_tls() function</h2>
<p>The "starttls" feature have been requested since creation of asyncio. At
October 24, 2013, <strong>Guido van Rossum</strong> created <a class="reference external" href="https://github.com/python/asyncio/issues/79">asyncio issue #79</a>:</p>
<blockquote>
<strong>Glyph [Lefkowitz]</strong> and <strong>Antoine [Pitrou]</strong> really want a API to upgrade an
existing Transport/Protocol pair to SSL/TLS, without having to create a new
protocol.</blockquote>
<p>At March 23, 2015, <strong>Giovanni Cannata</strong> created <a class="reference external" href="https://bugs.python.org/issue23749">bpo-23749</a> which is basically the same feature
request. I <a class="reference external" href="https://bugs.python.org/issue23749#msg239022">replied</a>:</p>
<blockquote>
asyncio got a new SSL implementation which makes possible to implement
STARTTLS. Are you interested to implement it?</blockquote>
<p><strong>Elizabeth Myers</strong>, <strong>Antoine Pitrou</strong>, <strong>Guido van Rossum</strong> and
<strong>Yury Selivanov</strong> designed the feature. Yury <a class="reference external" href="https://bugs.python.org/issue23749#msg253495">wrote a prototype</a> in 2015 for PostgreSQL. In
2017, <strong>Barry Warsaw</strong> <a class="reference external" href="https://bugs.python.org/issue23749#msg293912">wrote his own implementation for SMTP</a>.</p>
<p>At the end of 2017, <strong>four year</strong> after Guido van Rossum created the feature
request, <strong>Yury Selivanov</strong> implemented the feature and pushed the <a class="reference external" href="https://github.com/python/cpython/commit/f111b3dcb414093a4efb9d74b69925e535ddc470">commit
f111b3dc</a>:</p>
<pre class="literal-block">
commit f111b3dcb414093a4efb9d74b69925e535ddc470
Author: Yury Selivanov <yury@magic.io>
Date: Sat Dec 30 00:35:36 2017 -0500
bpo-23749: Implement loop.start_tls() (#5039)
</pre>
</div>
<div class="section" id="sslprotocol-race-condition">
<h2>SSLProtocol Race Condition</h2>
<div class="section" id="test-fails-on-appveyor-windows-temporary-fix">
<h3>Test fails on AppVeyor (Windows): temporary fix</h3>
<p>At December 30, 2017, just after Yury pushed his implementation of
<tt class="docutils literal">start_tls()</tt> (the same day), <strong>Antoine Pitrou</strong> reported <a class="reference external" href="https://bugs.python.org/issue32458">bpo-32458</a>: it seems test_asyncio fails
sporadically on AppVeyor:</p>
<pre class="literal-block">
ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\projects\cpython\lib\test\test_asyncio\test_sslproto.py", line 284, in test_start_tls_server_1
asyncio.wait_for(main(), loop=self.loop, timeout=10))
File "C:\projects\cpython\lib\asyncio\base_events.py", line 440, in run_until_complete
return future.result()
File "C:\projects\cpython\lib\asyncio\tasks.py", line 398, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
</pre>
<p><strong>Yury Selivanov</strong> <a class="reference external" href="https://bugs.python.org/issue32458#msg309254">wrote</a>:</p>
<blockquote>
I'm leaving on a two-weeks vacation today. To avoid risking breaking the workflow, I'll mask this tests on AppVeyor. I'll investigate this when I get back.</blockquote>
<p>and skipped the test as a <strong>temporary fix</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a">commit 0c36bed1</a>:</p>
<pre class="literal-block">
commit 0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a
Author: Yury Selivanov <yury@magic.io>
Date: Sat Dec 30 15:40:20 2017 -0500
bpo-32458: Temporarily mask start-tls proactor test on Windows (#5054)
</pre>
</div>
<div class="section" id="bug-reproduced-on-linux">
<h3>Bug reproduced on Linux</h3>
<p>At May 23, 2018, five month after the bug have been reported, <a class="reference external" href="https://bugs.python.org/issue32458#msg317468">I wrote</a>:</p>
<blockquote>
test_start_tls_server_1() just failed on my Linux. It likely depends on the system load.</blockquote>
<p>Christian Heimes <a class="reference external" href="https://bugs.python.org/issue32458#msg317760">added</a>:</p>
<blockquote>
[On Linux,] It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3
enabled. I haven't seen it failing with TLS 1.2 yet.</blockquote>
<p>At May 28, 2018, I found a reliable way to <a class="reference external" href="https://bugs.python.org/issue32458#msg317833">reproduce the issue on Linux</a>:</p>
<blockquote>
<p>Open 3 terminals and run these commands in parallel:</p>
<ol class="arabic simple">
<li><tt class="docutils literal">./python <span class="pre">-m</span> test test_asyncio <span class="pre">-m</span> test_start_tls_server_1 <span class="pre">-F</span></tt></li>
<li><tt class="docutils literal">./python <span class="pre">-m</span> test <span class="pre">-j16</span> <span class="pre">-r</span></tt></li>
<li><tt class="docutils literal">./python <span class="pre">-m</span> test <span class="pre">-j16</span> <span class="pre">-r</span></tt></li>
</ol>
<p>It's a <strong>race condition</strong> which doesn't depend on the OS, but on the system
load.</p>
</blockquote>
</div>
<div class="section" id="root-issue-identified">
<h3>Root issue identified</h3>
<p>Once I found how to reproduce the bug, I was able to investigate it. I created
<a class="reference external" href="https://bugs.python.org/issue33674">bpo-33674</a>.</p>
<p>I found a race condition in <tt class="docutils literal">SSLProtocol</tt> of <tt class="docutils literal">asyncio/sslproto.py</tt>.
Sometimes, <tt class="docutils literal">_sslpipe.feed_ssldata()</tt> is called before
<tt class="docutils literal">_sslpipe.shutdown()</tt>.</p>
<ul class="simple">
<li><tt class="docutils literal">SSLProtocol.connection_made()</tt> -> <tt class="docutils literal">SSLProtocol._start_handshake()</tt>: <tt class="docutils literal">self._loop.call_soon(self._process_write_backlog)</tt></li>
<li><tt class="docutils literal">SSLProtoco.data_received()</tt>: direct call to <tt class="docutils literal">self._sslpipe.feed_ssldata(data)</tt></li>
<li>Later, <tt class="docutils literal">self._process_write_backlog()</tt> calls <tt class="docutils literal">self._sslpipe.do_handshake()</tt></li>
</ul>
<p>The first <strong>write</strong> is <strong>delayed</strong> by <tt class="docutils literal">call_soon()</tt>, whereas the first
<strong>read</strong> is a <strong>direct call</strong> to the SSL pipe.</p>
<p>Workaround:</p>
<pre class="literal-block">
diff --git a/Lib/asyncio/sslproto.py b/Lib/asyncio/sslproto.py
index 2bfa45dd15..4a5dbb38a1 100644
--- a/Lib/asyncio/sslproto.py
+++ b/Lib/asyncio/sslproto.py
@@ -592,7 +592,7 @@ class SSLProtocol(protocols.Protocol):
# (b'', 1) is a special value in _process_write_backlog() to do
# the SSL handshake
self._write_backlog.append((b'', 1))
- self._loop.call_soon(self._process_write_backlog)
+ self._process_write_backlog()
self._handshake_timeout_handle = \
self._loop.call_later(self._ssl_handshake_timeout,
self._check_handshake_timeout)
</pre>
<p>Yury Selivanov wrote:</p>
<blockquote>
<p><strong>The fix is correct and the bug is now obvious</strong>: <tt class="docutils literal">data_received()</tt> occurs
pretty much any time after <tt class="docutils literal">connection_made()</tt> call; if <tt class="docutils literal">call_soon()</tt> is
used in <tt class="docutils literal">connection_made()</tt>, <tt class="docutils literal">data_received()</tt> may find the protocol in
an incorrect state.</p>
<p><strong>Kudos Victor for debugging this.</strong></p>
</blockquote>
<p>I pushed <a class="reference external" href="https://github.com/python/cpython/commit/be00a5583a2cb696335c527b921d1868266a42c6">commit be00a558</a>:</p>
<pre class="literal-block">
commit be00a5583a2cb696335c527b921d1868266a42c6
Author: Victor Stinner <vstinner@redhat.com>
Date: Tue May 29 01:33:35 2018 +0200
bpo-33674: asyncio: Fix SSLProtocol race (GH-7175)
Fix a race condition in SSLProtocol.connection_made() of
asyncio.sslproto: start immediately the handshake instead of using
call_soon(). Previously, data_received() could be called before the
handshake started, causing the handshake to hang or fail.
</pre>
<p>... the change is basically a single line change:</p>
<pre class="literal-block">
- self._loop.call_soon(self._process_write_backlog)
+ self._process_write_backlog()
</pre>
<p>I closed <a class="reference external" href="https://bugs.python.org/issue32458">bpo-32458</a> and <strong>Yury
Selivanov</strong> closed <a class="reference external" href="https://bugs.python.org/issue33674">bpo-33674</a>.</p>
</div>
<div class="section" id="not-a-regression">
<h3>Not a regression</h3>
<p>The SSLProtocol race condition wasn't new: it existed since January 2015,
<a class="reference external" href="https://github.com/python/cpython/commit/231b404cb026649d4b7172e75ac394ef558efe60">commit 231b404c</a>:</p>
<pre class="literal-block">
commit 231b404cb026649d4b7172e75ac394ef558efe60
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Wed Jan 14 00:19:09 2015 +0100
Issue #22560: New SSL implementation based on ssl.MemoryBIO
The new SSL implementation is based on the new ssl.MemoryBIO which is only
available on Python 3.5. On Python 3.4 and older, the legacy SSL implementation
(using SSL_write, SSL_read, etc.) is used. The proactor event loop only
supports the new implementation.
The new asyncio.sslproto module adds _SSLPipe, SSLProtocol and
_SSLProtocolTransport classes. _SSLPipe allows to "wrap" or "unwrap" a socket
(switch between cleartext and SSL/TLS).
Patch written by Antoine Pitrou. sslproto.py is based on gruvi/ssl.py of the
gruvi project written by Geert Jansen.
This change adds SSL support to ProactorEventLoop on Python 3.5 and newer!
It becomes also possible to implement STARTTTLS: switch a cleartext socket to
SSL.
</pre>
<p>This is the new cool asynchronous SSL implementation written by <strong>Antoine
Pitrou</strong> and <strong>Geert Jansen</strong>. It took <strong>3 years</strong> and <strong>new functional tests</strong>
to discover the race condition.</p>
</div>
</div>
<div class="section" id="wsarecv-cancellation-causing-data-loss">
<h2>WSARecv() cancellation causing data loss</h2>
<div class="section" id="yet-another-very-boring-buildbot-test-failure">
<h3>Yet another very boring buildbot test failure</h3>
<p>At May 30, 2018, the day after I fixed SSLProtocol race condition, I created
<a class="reference external" href="https://bugs.python.org/issue33694">bpo-33694</a>.</p>
<p>test_asyncio.test_start_tls_server_1() got multiple fixes recently (see
<a class="reference external" href="https://bugs.python.org/issue32458">bpo-32458</a> and <a class="reference external" href="https://bugs.python.org/issue33674">bpo-33674</a>)... but it still fails on Python on x86
Windows7 3.x at revision bb9474f1fb2fc7c7ed9f826b78262d6a12b5f9e8 which
contains all these fixes.</p>
<p>The test fails even when test_asyncio is re-run alone (not when other tests run
in parallel).</p>
<p>Example of failure:</p>
<pre class="literal-block">
ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLSTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "...\lib\test\test_asyncio\test_sslproto.py", line 467, in test_start_tls_server_1
self.loop.run_until_complete(run_main())
File "...\lib\asyncio\base_events.py", line 566, in run_until_complete
raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.
</pre>
<p>The test fails also on x86 Windows7 3.7. Moreover, 3.7 got an additional failure:</p>
<pre class="literal-block">
ERROR: test_pipe_handle (test.test_asyncio.test_windows_utils.PipeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "...\lib\test\test_asyncio\test_windows_utils.py", line 73, in test_pipe_handle
raise RuntimeError('expected ERROR_INVALID_HANDLE')
RuntimeError: expected ERROR_INVALID_HANDLE
</pre>
</div>
<div class="section" id="unable-to-reproduce-the-bug">
<h3>Unable to reproduce the bug</h3>
<p><strong>Yury Selivanov</strong> <a class="reference external" href="https://bugs.python.org/issue33694#msg318193">failed to reproduce the issue</a> in Windows 7 VM (on macOS) using:</p>
<ol class="arabic simple">
<li>run <tt class="docutils literal">test_asyncio</tt></li>
<li>run <tt class="docutils literal">test_asyncio.test_sslproto</tt></li>
<li>run <tt class="docutils literal">test_asyncio.test_sslproto <span class="pre">-m</span> test_start_tls_server_1</tt></li>
</ol>
<p><strong>Andrew Svetlov</strong> <a class="reference external" href="https://bugs.python.org/issue33694#msg318194">added</a>:</p>
<blockquote>
I used <tt class="docutils literal">SNDBUF</tt> to enforce send buffer overloading. It is not required by
sendfile tests but I thought that better to have non-mocked way to test such
situations. We can remove the socket buffers size manipulation at all
without any problem.</blockquote>
<p>But Yury Selivanov <a class="reference external" href="https://bugs.python.org/issue33694#msg318195">replied</a>:</p>
<blockquote>
When I tried to do that I think <strong>I was having more failures</strong> with that
test. But really up to you.</blockquote>
<p>Next days, I reported more and more similar failures on Windows buildbots and
AppVeyor (our Windows CI).</p>
</div>
<div class="section" id="root-issue-identified-pause-reading">
<h3>Root issue identified: pause_reading()</h3>
<p>Since this bug became more and more frequent, I decided to work on it. Yury and
Andrew failed to reproduce it.</p>
<p>At June 7, 2018, I managed to <strong>reproduce the bug on Linux</strong> by <a class="reference external" href="https://bugs.python.org/issue33694#msg318869">inserting a
sleep at the right place</a>...
I understood one hour later that my patch is wrong: "it introduces a bug in
the test".</p>
<p>On the other hand, I found the root cause: calling <tt class="docutils literal">pause_reading()</tt> and
<tt class="docutils literal">resume_reading()</tt> on the transport is not safe. Sometimes, we loose data.
See the <strong>ugly hack</strong> described in the TODO comment below:</p>
<pre class="literal-block">
class _ProactorReadPipeTransport(_ProactorBasePipeTransport,
transports.ReadTransport):
"""Transport for read pipes."""
(...)
def pause_reading(self):
if self._closing or self._paused:
return
self._paused = True
if self._read_fut is not None and not self._read_fut.done():
# TODO: This is an ugly hack to cancel the current read future
# *and* avoid potential race conditions, as read cancellation
# goes through `future.cancel()` and `loop.call_soon()`.
# We then use this special attribute in the reader callback to
# exit *immediately* without doing any cleanup/rescheduling.
self._read_fut.__asyncio_cancelled_on_pause__ = True
self._read_fut.cancel()
self._read_fut = None
self._reschedule_on_resume = True
if self._loop.get_debug():
logger.debug("%r pauses reading", self)
</pre>
<p>If you remove the "ugly hack", the test no longer hangs...</p>
<p>Extract of <tt class="docutils literal">_ProactorReadPipeTransport.set_transport()</tt>:</p>
<pre class="literal-block">
if self.is_reading():
# reset reading callback / buffers / self._read_fut
self.pause_reading()
self.resume_reading()
</pre>
<p>This method <strong>cancels the pending overlapped</strong> <tt class="docutils literal">WSARecv()</tt>, and then creates
a new overlapped <tt class="docutils literal">WSARecv()</tt>.</p>
<p>Even after <tt class="docutils literal">CancelIoEx(old overlapped)</tt>, the IOCP loop still gets an event
for the completion of the cancelled overlapped <tt class="docutils literal">WSARecv()</tt>. Problem: <strong>since
the Python future is cancelled, the event is ignored and so 176 bytes of data
are lost</strong>.</p>
<p>I'm surprised that an overlapped <tt class="docutils literal">WSARecv()</tt> <strong>cancelled</strong> by
<tt class="docutils literal">CancelIoEx()</tt> still returns data when IOCP polls for events.</p>
<p>Something else. The bug occurs when <tt class="docutils literal">CancelIoEx()</tt> (on the current overlapped
<tt class="docutils literal">WSARecv()</tt>) fails internally with <tt class="docutils literal">ERROR_NOT_FOUND</tt>. According to
overlapped.c, it means:</p>
<pre class="literal-block">
/* CancelIoEx returns ERROR_NOT_FOUND if the I/O completed in-between */
</pre>
<p><tt class="docutils literal">HasOverlappedIoCompleted()</tt> returns 0 in that case.</p>
<p>The problem is that currently, <tt class="docutils literal">Overlapped.cancel()</tt> also returns <tt class="docutils literal">None</tt> in
that case, and later the asyncio IOCP loop ignores the completion event and so
<strong>drops incoming received data</strong>.</p>
</div>
<div class="section" id="release-blocker-bug">
<h3>Release blocker bug?</h3>
<p>Yury, Andrew, Ned: I set the priority to release blocker because I'm scared by
what I saw. The START TLS has a race condition in its ProactorEventLoop
implementation. But the bug doesn't see to be specific to START TLS, but rather
to <tt class="docutils literal">transport.set_transport()</tt>, and even more generally to
<tt class="docutils literal">transport.pause_reading()</tt> / <tt class="docutils literal">transport.resume_reading()</tt>. The bug is quite
severe: we loose data and it's really hard to know why (I spent a few hours to
add many many print and try to reproduce on a very tiny reliable unit test). As
an asyncio user, I expect that transports are 100% reliable, and I would first
look into my code (like looking into <tt class="docutils literal">start_tls()</tt> implementation in my case).</p>
<p>If the bug was very specific to <tt class="docutils literal">start_tls()</tt>, I would suggest to "just"
"disable" start_tls() on ProactorEventLoop (sorry, Windows!). But since the
data loss seems to concern basically any application using
<tt class="docutils literal">ProactorEventLoop</tt>, I don't see any simple workaround.</p>
<p><strong>My hope is that a fix can be written shortly</strong> to not block the 3.7.0 final
release for too long :-(</p>
<p>Yury, Andrew: Can you please just confirm that it's a regression and that a
release blocker is justified?</p>
</div>
<div class="section" id="functional-test-reproducing-the-bug">
<h3>Functional test reproducing the bug</h3>
<p>I wrote <a class="reference external" href="https://bugs.python.org/file47632/race.py">race.py script</a>: simple
echo client and server sending packets in both directions. Pause/resume
reading the client transport every 100 ms to trigger the bug.</p>
<p>Using <tt class="docutils literal">ProactorEventLoop</tt> and 2000 packets of 16 KiB, I easily reproduce the
bug.</p>
<p>So again, it's nothing related to <tt class="docutils literal">start_tls()</tt>, <tt class="docutils literal">start_tls()</tt> was just one
way to spot the bug.</p>
<p>The bug is in Proactor transport: the cancellation of overlapped <tt class="docutils literal">WSARecv()</tt>
sometime drops packets. The bug occurs when <tt class="docutils literal">CancelIoEx()</tt> fails with
<tt class="docutils literal">ERROR_NOT_FOUND</tt> which means that the I/O (<tt class="docutils literal">WSARecv()</tt>) completed.</p>
<p>One solution would be to not cancel <tt class="docutils literal">WSARecv()</tt> on pause_reading(): wait
until the current <tt class="docutils literal">WSARecv()</tt> completes, store data somewhere but don't pass
it to <tt class="docutils literal">protocol.data_received()</tt>, and don't schedule a new <tt class="docutils literal">WSARecv()</tt>.
Once reading is resumed: call <tt class="docutils literal">protocol.data_received()</tt> and schedule a new
<tt class="docutils literal">WSARecv()</tt>.</p>
<p>That would be a workaround. I don't know how to really fix <tt class="docutils literal">WSARecv()</tt>
cancellation without loosing data. A good start would be to modify
<tt class="docutils literal">Overlapped.cancel()</tt> to return a boolean to notice if the overlapped I/O
completed even if we just cancelled it. Currently, the corner case
(<tt class="docutils literal">CancelIoEx()</tt> fails with <tt class="docutils literal">ERROR_NOT_FOUND</tt>) is silently ignored, and then
the IOCP loop silently ignores the event of completed I/O...</p>
</div>
<div class="section" id="fix-the-bug-no-longer-cancel-wsarecv">
<h3>Fix the bug: no longer cancel WSARecv()</h3>
<p>At June 8, 2018, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/79790bc35fe722a49977b52647f9b5fe1deda2b7">commit 79790bc3</a>:</p>
<pre class="literal-block">
commit 79790bc35fe722a49977b52647f9b5fe1deda2b7
Author: Victor Stinner <vstinner@redhat.com>
Date: Fri Jun 8 00:25:52 2018 +0200
bpo-33694: Fix race condition in asyncio proactor (GH-7498)
The cancellation of an overlapped WSARecv() has a race condition
which causes data loss because of the current implementation of
proactor in asyncio.
No longer cancel overlapped WSARecv() in _ProactorReadPipeTransport
to work around the race condition.
Remove the optimized recv_into() implementation to get simple
implementation of pause_reading() using the single _pending_data
attribute.
Move _feed_data_to_bufferred_proto() to protocols.py.
Remove set_protocol() method which became useless.
</pre>
<p>I fixed the root issue (in Python 3.7 and future Python 3.8).</p>
<p>I used my <tt class="docutils literal">race.py</tt> script to validate that the issue is fixed for real.</p>
</div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>I fixed one race condition in the asynchronous handshake of <tt class="docutils literal">SSLProtocol</tt>.</p>
<p>I found and fixed a data loss bug caused by <tt class="docutils literal">WSARecv()</tt> cancellation.</p>
<p>Lessons learnt from these two bugs:</p>
<ul class="simple">
<li>You should <strong>write an extensive test suite</strong> for your code.</li>
<li>You should <strong>keep an eye on your continuous integration (CI)</strong>: any tiny test
failure can hide a very severe bug.</li>
</ul>
</div>
Asyncio: Proactor ConnectPipe() Race Condition2019-01-30T18:00:00+01:002019-01-30T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-30:/asyncio-proactor-connect-pipe-race-condition.html<a class="reference external image-reference" href="https://www.flickr.com/photos/phrawr/7612947262/"><img alt="Pipes" src="https://vstinner.github.io/images/pipes.jpg" /></a>
<p>Between December 2014 and January 2015, once I succeeded to fix the root issue
of the random asyncio crashes on Windows (<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>), I fixed more race conditions
and bugs in <tt class="docutils literal">ProactorEventLoop</tt>:</p>
<ul class="simple">
<li><tt class="docutils literal">ConnectPipe()</tt> Race Condition</li>
<li>Race Condition in <tt class="docutils literal">BaseSubprocessTransport._try_finish()</tt></li>
<li>Close the transport on failure: ResourceWarning</li>
<li>Cleanup code …</li></ul><a class="reference external image-reference" href="https://www.flickr.com/photos/phrawr/7612947262/"><img alt="Pipes" src="https://vstinner.github.io/images/pipes.jpg" /></a>
<p>Between December 2014 and January 2015, once I succeeded to fix the root issue
of the random asyncio crashes on Windows (<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>), I fixed more race conditions
and bugs in <tt class="docutils literal">ProactorEventLoop</tt>:</p>
<ul class="simple">
<li><tt class="docutils literal">ConnectPipe()</tt> Race Condition</li>
<li>Race Condition in <tt class="docutils literal">BaseSubprocessTransport._try_finish()</tt></li>
<li>Close the transport on failure: ResourceWarning</li>
<li>Cleanup code handling pipes</li>
</ul>
<p>Previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>. Next article:
<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsarecv-cancellation-data-loss.html">asyncio: WSARecv() cancellation causing data loss</a>.</p>
<div class="section" id="connectpipe-race-condition">
<h2>ConnectPipe() Race Condition</h2>
<p>Once I succeeded to fix the root issue of the random asyncio crashes on Windows
(<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>), I started to look at the
ConnectPipe special case: <a class="reference external" href="https://github.com/python/asyncio/issues/204">asyncio issue #204: Investigate
IocpProactor.accept_pipe() special case (don't register overlapped)</a> (issue created at 25 Aug
2014).</p>
<p>At January 21, 2015, I opened <a class="reference external" href="https://bugs.python.org/issue23293">bpo-23293: race condition related to
IocpProactor.connect_pipe()</a>.</p>
<p>While fixing <a class="reference external" href="https://bugs.python.org/issue23095">bpo-23095 (race condition when cancelling a _WaitHandleFuture)</a>, I saw that
<tt class="docutils literal">IocpProactor.connect_pipe()</tt> causes "GetQueuedCompletionStatus() returned an
unexpected event" messages to be logged, but also to hang the test suite.</p>
<p><tt class="docutils literal">IocpProactor._register()</tt> contains the comment:</p>
<pre class="literal-block">
# Even if GetOverlappedResult() was called, we have to wait for the
# notification of the completion in GetQueuedCompletionStatus().
# Register the overlapped operation to keep a reference to the
# OVERLAPPED object, otherwise the memory is freed and Windows may
# read uninitialized memory.
#
# For an unknown reason, ConnectNamedPipe() behaves differently:
# the completion is not notified by GetOverlappedResult() if we
# already called GetOverlappedResult(). For this specific case, we
# don't expect notification (register is set to False).
</pre>
<p><tt class="docutils literal">IocpProactor.close()</tt> contains this comment:</p>
<pre class="literal-block">
# The operation was started with connect_pipe() which
# queues a task to Windows' thread pool. This cannot
# be cancelled, so just forget it.
</pre>
<p><tt class="docutils literal">IocpProactor.connect_pipe()</tt> is implemented with <tt class="docutils literal">QueueUserWorkItem()</tt>
which <strong>starts a thread that cannot be interrupted</strong>. Because of that, this
function requires special cases in <tt class="docutils literal">_register()</tt> and <tt class="docutils literal">close()</tt> methods of
<tt class="docutils literal">IocpProactor</tt>.</p>
<p>I proposed a solution to reimplement <tt class="docutils literal">IocpProactor.connect_pipe()</tt> <strong>without
a thread</strong>: <a class="reference external" href="https://code.google.com/p/tulip/issues/detail?id=197">asyncio issue #197: Rewrite IocpProactor.connect_pipe() with
non-blocking calls to avoid non interruptible QueueUserWorkItem()</a>.</p>
<p>At January 22, 2015, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/7ffa2c5fdda8a9cc254edf67c4458b15db1252fa">commit 7ffa2c5f</a>:</p>
<pre class="literal-block">
commit 7ffa2c5fdda8a9cc254edf67c4458b15db1252fa
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Thu Jan 22 22:55:08 2015 +0100
Issue #23293, asyncio: Rewrite IocpProactor.connect_pipe()
</pre>
<p>The change adds <tt class="docutils literal">_overlapped.ConnectPipe()</tt> which tries to connect to the
pipe for asynchronous I/O (overlapped): <strong>call CreateFile() in a loop until
it doesn't fail with ERROR_PIPE_BUSY</strong>. Use an increasing delay between 1 ms
and 100 ms.</p>
</div>
<div class="section" id="race-condition-in-basesubprocesstransport-try-finish">
<h2>Race Condition in BaseSubprocessTransport._try_finish()</h2>
<p>If the process exited before the <tt class="docutils literal">_post_init()</tt> method was called, scheduling
the call to <tt class="docutils literal">_call_connection_lost()</tt> with <tt class="docutils literal">call_soon()</tt> is wrong:
<tt class="docutils literal">connection_made()</tt> must be called before <tt class="docutils literal">connection_lost()</tt>.</p>
<p>Reuse the <tt class="docutils literal">BaseSubprocessTransport._call()</tt> method to schedule the call to
<tt class="docutils literal">_call_connection_lost()</tt> to ensure that <tt class="docutils literal">connection_made()</tt> and
<tt class="docutils literal">connection_lost()</tt> are called in the correct order.</p>
<p>At Dec 18, 2014, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/1b9763d0a9c62c13dc2a06770032e5906b610c96">commit 1b9763d0</a>.
The explanation is long, but the change is basically a single line change,
extract:</p>
<pre class="literal-block">
- self._loop.call_soon(self._call_connection_lost, None)
+ self._call(self._call_connection_lost, None)
</pre>
<p><strong>Ordering properly callbacks in asyncio is challenging!</strong> The order matters
for the semantics of asyncio: it is part of the design of the <a class="reference external" href="https://www.python.org/dev/peps/pep-3156/">PEP 3156 --
Asynchronous IO Support Rebooted: the "asyncio" Module</a>.</p>
</div>
<div class="section" id="close-the-transport-on-failure-resourcewarning">
<h2>Close the transport on failure: ResourceWarning</h2>
<p>At January 15, 2015, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/4bf22e033e975f61c33752db5a3764dc0f7d0b03">commit 4bf22e03</a>,
extract:</p>
<pre class="literal-block">
- yield from transp._post_init()
+ try:
+ yield from transp._post_init()
+ except:
+ transp.close()
+ raise
</pre>
<p>Later, I will spend a lot of time (push many more changes) to ensure that
resources are properly released (especially close transports on failure,
similar to this change).</p>
<p>I will add many <strong>ResourceWarnings</strong> warnings in destructors when a transport,
subprocess or event loop is not closed explicitly.</p>
<p>For example, notice the <tt class="docutils literal">ResourceWarnings</tt> in the current destructor of
<tt class="docutils literal">_SelectorTransport</tt>:</p>
<pre class="literal-block">
class _SelectorTransport(transports._FlowControlMixin,
transports.Transport):
def __del__(self, _warn=warnings.warn):
if self._sock is not None:
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
self._sock.close()
</pre>
<p>I even enhanced Python 3.6 to be able to provide the <strong>traceback where the
leaked resource has been allocated</strong> thanks to my <tt class="docutils literal">tracemalloc</tt> module.
Example with <tt class="docutils literal">filebug.py</tt>:</p>
<pre class="literal-block">
def func():
f = open(__file__)
f = None
func()
</pre>
<p>Output with Python 3.6:</p>
<pre class="literal-block">
$ python3 -Wd -X tracemalloc=5 filebug.py
filebug.py:3: ResourceWarning: unclosed file <_io.TextIOWrapper name='filebug.py' mode='r' encoding='UTF-8'>
f = None
Object allocated at (most recent call first):
File "filebug.py", lineno 2
f = open(__file__)
File "filebug.py", lineno 5
func()
</pre>
<p>The line where the warning is emitted is usually useless to understand the bug,
whereas the traceback is very useful to identify the leaked resource.</p>
<p>See <a class="reference external" href="https://pythondev.readthedocs.io/debug_tools.html#resourcewarning">my ResourceWarning documentation</a>.</p>
</div>
<div class="section" id="cleanup-code-handling-pipes">
<h2>Cleanup code handling pipes</h2>
<p>Thanks to the new implementation of <tt class="docutils literal">connect_pipe()</tt>, I was able to push
changes to simplify the code and remove various hacks in code handling pipes.</p>
<p><a class="reference external" href="https://github.com/python/cpython/commit/2b77c5467f376257ae22cbfbcb3a0e5e6349e92d">commit 2b77c546</a>:</p>
<pre class="literal-block">
commit 2b77c5467f376257ae22cbfbcb3a0e5e6349e92d
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Thu Jan 22 23:50:03 2015 +0100
asyncio, Tulip issue 204: Fix IocpProactor.accept_pipe()
Overlapped.ConnectNamedPipe() now returns a boolean: True if the pipe is
connected (if ConnectNamedPipe() failed with ERROR_PIPE_CONNECTED), False if
the connection is in progress.
This change removes multiple hacks in IocpProactor.
</pre>
<p><a class="reference external" href="https://github.com/python/cpython/commit/3d2256f671b7ed5c769dd34b27ae597cbc69047c">commit 3d2256f6</a>:</p>
<pre class="literal-block">
commit 3d2256f671b7ed5c769dd34b27ae597cbc69047c
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 26 11:02:59 2015 +0100
Issue #23293, asyncio: Cleanup IocpProactor.close()
The special case for connect_pipe() is not more needed. connect_pipe() doesn't
use overlapped operations anymore.
</pre>
<p><a class="reference external" href="https://github.com/python/cpython/commit/a19b7b3fcafe52b98245e14466ffc4d6750ca4f1">commit a19b7b3f</a>:</p>
<pre class="literal-block">
commit a19b7b3fcafe52b98245e14466ffc4d6750ca4f1
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 26 15:03:20 2015 +0100
asyncio: Fix ProactorEventLoop.start_serving_pipe()
If a client connected before the server was closed: drop the client (close the
pipe) and exit.
</pre>
<p><a class="reference external" href="https://github.com/python/cpython/commit/e0fd157ba0cc92e435e7520b4ff641ca68d72244">commit e0fd157b</a>:</p>
<pre class="literal-block">
commit e0fd157ba0cc92e435e7520b4ff641ca68d72244
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 26 15:04:03 2015 +0100
Issue #23293, asyncio: Rewrite IocpProactor.connect_pipe() as a coroutine
Use a coroutine with asyncio.sleep() instead of call_later() to ensure that the
schedule call is cancelled.
Add also a unit test cancelling connect_pipe().
</pre>
<p><a class="reference external" href="https://github.com/python/cpython/commit/41063d2a59a24e257cd9ce62137e36c862e3ab1e">commit 41063d2a</a>:</p>
<pre class="literal-block">
commit 41063d2a59a24e257cd9ce62137e36c862e3ab1e
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 26 22:30:49 2015 +0100
asyncio, Tulip issue 204: Fix IocpProactor.recv()
If ReadFile() fails with ERROR_BROKEN_PIPE, the operation is not pending: don't
register the overlapped.
I don't know if WSARecv() can fail with ERROR_BROKEN_PIPE. Since
Overlapped.WSARecv() already handled ERROR_BROKEN_PIPE, let me guess that it
has the same behaviour than ReadFile().
</pre>
</div>
Asyncio: Proactor Cancellation From Hell2019-01-28T20:20:00+01:002019-01-28T20:20:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-28:/asyncio-proactor-cancellation-from-hell.html<img alt="South Park Hell" src="https://vstinner.github.io/images/south_park_hell.jpg" />
<p>Between 2014 and 2015, I was working on the new shiny <tt class="docutils literal">asyncio</tt> module
(module added to Python 3.4 released in March 2014). I helped to stabilize the
Windows implementation because... well, nobody else was paying attention to it,
and I was worried that test_asyncio <strong>randomly crashed</strong> on Windows.</p>
<p>One …</p><img alt="South Park Hell" src="https://vstinner.github.io/images/south_park_hell.jpg" />
<p>Between 2014 and 2015, I was working on the new shiny <tt class="docutils literal">asyncio</tt> module
(module added to Python 3.4 released in March 2014). I helped to stabilize the
Windows implementation because... well, nobody else was paying attention to it,
and I was worried that test_asyncio <strong>randomly crashed</strong> on Windows.</p>
<p>One bug really annoyed me, I started to fix it in July 2014, but I only
succeeded to <strong>fix the root issue</strong> in January 2015: <strong>six months later</strong>!</p>
<p>It was really difficult to find documentation on IOCP and asynchronous
programming on Windows. <strong>I had to ask for help to someone who had access to
the Windows source code</strong> to understand the bug...</p>
<p><strong>Spoiler:</strong> cancelling an overlapped <tt class="docutils literal">RegisterWaitForSingleObject()</tt> with
<tt class="docutils literal">UnregisterWait()</tt> is asynchronous. The asynchronous part is not well
documented and it took me months of debug to understand it. Moreover, the bug
was well hidden for various reasons that we will see below.</p>
<p>Next article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-connect-pipe-race-condition.html">Asyncio: Proactor ConnectPipe() Race Condition</a>.</p>
<div class="section" id="fix-cancel-when-called-twice">
<h2>Fix cancel() when called twice</h2>
<p>July 2014, <a class="reference external" href="https://github.com/python/asyncio/issues/195">asyncio issue #195</a>: while working on a
<tt class="docutils literal">SIGINT</tt> signal handler for the <tt class="docutils literal">ProactorEventLoop</tt> on Windows (<a class="reference external" href="https://github.com/python/asyncio/issues/195">asyncio
issue #191</a>), I hit a bug on
Windows: <tt class="docutils literal">_WaitHandleFuture.cancel()</tt> crash if the wait event was already
unregistered by <tt class="docutils literal">finish_wait_for_handle()</tt>. The bug was that
<tt class="docutils literal">UnregisterWait()</tt> was called twice.</p>
<p>I pushed <a class="reference external" href="https://github.com/python/cpython/commit/fea6a100dc51012cb0187374ad31de330ebc0035">commit fea6a100</a>
to fix this crash:</p>
<pre class="literal-block">
commit fea6a100dc51012cb0187374ad31de330ebc0035
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Fri Jul 25 00:54:53 2014 +0200
Improve stability of the proactor event loop, especially operations on
overlapped objects (...)
</pre>
<p>Main changes:</p>
<ul class="simple">
<li>Fix a crash: <strong>don't call UnregisterWait() twice if a _WaitHandleFuture
is cancelled twice</strong>.</li>
<li>Fix another crash: <tt class="docutils literal">_OverlappedFuture.cancel()</tt> doesn't cancel the
overlapped anymore if it is already cancelled or completed. Log also an error
if the cancellation failed.</li>
<li><tt class="docutils literal">IocpProactor.close()</tt> now cancels futures rather than cancelling directly
underlaying overlapped objects.</li>
<li>Add a destructor to the <tt class="docutils literal">IocpProactor</tt> class which closes it</li>
</ul>
</div>
<div class="section" id="clear-reference-from-overlappedfuture-to-overlapped">
<h2>Clear reference from _OverlappedFuture to overlapped</h2>
<p>July 2014, I created <a class="reference external" href="https://github.com/python/asyncio/issues/196">asyncio issue #196</a>:
<tt class="docutils literal">_OverlappedFuture.set_result()</tt> should clear the its reference to the
overlapped object.</p>
<p>It is important to explicitly clear references to Python objects as soon as
possible to release resources. Otherwise, an object can remain alive
longer than expected.</p>
<p>I noticed that _OverlappedFuture kept a reference to the undelying overlapped
object even after the asynchronous operation completed. I started to work on a
fix but I had many issues to fix completely this bug... it is just the
beginning of a long journey.</p>
<div class="section" id="clear-the-reference-on-cancellation-and-error">
<h3>Clear the reference on cancellation and error</h3>
<p>I pushed a first fix: <a class="reference external" href="https://github.com/python/cpython/commit/18a28dc5c28ae9a953f537486780159ddb768702">commit 18a28dc5</a>
clears the reference to the overlapped in <tt class="docutils literal">cancel()</tt> and <tt class="docutils literal">set_exception()</tt>
methods of <tt class="docutils literal">_OverlappedFuture</tt>:</p>
<pre class="literal-block">
commit 18a28dc5c28ae9a953f537486780159ddb768702
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Fri Jul 25 13:05:20 2014 +0200
* _OverlappedFuture.cancel() now clears its reference to the overlapped object.
Make also the _OverlappedFuture.ov attribute private.
* _OverlappedFuture.set_exception() now cancels the overlapped operation.
* (...)
</pre>
<p>I started by this change because it didn't make the tests less stable.</p>
</div>
<div class="section" id="clear-the-reference-in-poll">
<h3>Clear the reference in poll()</h3>
<p>Clearing the reference to the overlapped in <tt class="docutils literal">cancel()</tt> and
<tt class="docutils literal">set_exception()</tt> <strong>works well</strong>. But when I try to do the same on success (in
<tt class="docutils literal">set_result()</tt>), <strong>I get random errors</strong>. Example:</p>
<pre class="literal-block">
C:\haypo\tulip>\python33\python.exe runtests.py test_pipe
...
Exception RuntimeError: '<_overlapped.Overlapped object at 0x00000000035E7660> s
till has pending operation at deallocation, the process may crash' ignored
...
Fatal read error on pipe transport
protocol: <asyncio.streams.StreamReaderProtocol object at 0x00000000035EE668>
transport: <_ProactorDuplexPipeTransport fd=348>
Traceback (most recent call last):
File "C:\haypo\tulip\asyncio\proactor_events.py", line 159, in _loop_reading
data = fut.result() # deliver data later in "finally" clause
File "C:\haypo\tulip\asyncio\futures.py", line 271, in result
raise self._exception
File "C:\haypo\tulip\asyncio\windows_events.py", line 488, in _poll
value = callback(transferred, key, ov)
File "C:\haypo\tulip\asyncio\windows_events.py", line 279, in finish_recv
return ov.getresult()
OSError: [WinError 996] Overlapped I/O event is not in a signaled state
...
</pre>
<p>It seems that the problem only occurs in the fast-path of
<tt class="docutils literal">IocpProactor._register()</tt>, when the overlapped is not added to <tt class="docutils literal">_cache</tt>.</p>
<p>Clearing the reference in <tt class="docutils literal">_poll()</tt>, when <tt class="docutils literal">GetQueuedCompletionStatus()</tt> read
the status, <strong>works</strong>! I pushed a second fix, <a class="reference external" href="https://github.com/python/cpython/commit/65dd69a3da16257bd86b92900e5ec5a8dd26f1d9">commit 65dd69a3</a>
changes <tt class="docutils literal">_poll()</tt>:</p>
<pre class="literal-block">
commit 65dd69a3da16257bd86b92900e5ec5a8dd26f1d9
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Fri Jul 25 22:36:05 2014 +0200
IocpProactor._poll() clears the reference to the overlapped operation
when the operation is done. (...)
</pre>
</div>
<div class="section" id="ignore-false-alarms">
<h3>Ignore false alarms</h3>
<p>I tried to add the overlapped into <tt class="docutils literal">_cache</tt> but <strong>then the event loop started
to hang or to fail with new errors</strong>.</p>
<p>I analyzed an overlapped <tt class="docutils literal">WSARecv()</tt> which has been cancelled. Just after
calling <tt class="docutils literal">CancelIoEx()</tt>, <tt class="docutils literal">HasOverlappedIoCompleted()</tt> returns 0.</p>
<p>Even after <tt class="docutils literal">GetQueuedCompletionStatus()</tt> read the status,
<tt class="docutils literal">HasOverlappedIoCompleted()</tt> still returns 0.</p>
<p><strong>After hours of debug, I eventually found the main issue!</strong></p>
<p>Sometimes <tt class="docutils literal">GetQueuedCompletionStatus()</tt> returns an overlapped operation which
has not completed yet. I modified <tt class="docutils literal">IocpProactor._poll()</tt> to ignore the false
alarm, <a class="reference external" href="https://github.com/python/cpython/commit/51e44ea66aefb4229e506263acf40d35596d279c">commit 51e44ea6</a>:</p>
<pre class="literal-block">
commit 51e44ea66aefb4229e506263acf40d35596d279c
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Sat Jul 26 00:58:34 2014 +0200
_OverlappedFuture.set_result() now clears its reference to the
overlapped object.
IocpProactor._poll() now also ignores false alarms:
GetQueuedCompletionStatus() returns the overlapped but it is still
pending.
</pre>
<p>The fix adds this comment:</p>
<pre class="literal-block">
# FIXME: why do we get false alarms?
</pre>
</div>
<div class="section" id="keep-a-reference-of-overlapped">
<h3>Keep a reference of overlapped</h3>
<p>To stabilize the code, I modified <tt class="docutils literal">ProactorIocp</tt> to keep a reference to the
overlapped object (it already kept a reference previously but not in all cases).
<strong>Otherwise the memory may be reused and GetQueuedCompletionStatus() may use
random bytes and behaves badly</strong>. I pushed <a class="reference external" href="https://github.com/python/cpython/commit/42d3bdeed6e34117b787d61a471563a0dba6a894">commit 42d3bdee</a>:</p>
<pre class="literal-block">
commit 42d3bdeed6e34117b787d61a471563a0dba6a894
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jul 28 00:18:43 2014 +0200
ProactorIocp._register() now registers the overlapped
in the _cache dictionary, even if we already got the result. We need to keep a
reference to the overlapped object, otherwise the memory may be reused and
GetQueuedCompletionStatus() may use random bytes and behaves badly.
There is still a hack for ConnectNamedPipe(): the overlapped object is not
registered into _cache if the overlapped object completed directly.
Log also an error in debug mode in ProactorIocp._loop() if we get an unexpected
event.
Add a protection in ProactorIocp.close() to avoid blocking, even if it should
not happen. I still don't understand exactly why some the completion of some
overlapped objects are not notified.
</pre>
<p>The change adds a long comment:</p>
<pre class="literal-block">
# Even if GetOverlappedResult() was called, we have to wait for the
# notification of the completion in GetQueuedCompletionStatus().
# Register the overlapped operation to keep a reference to the
# OVERLAPPED object, otherwise the memory is freed and Windows may
# read uninitialized memory.
#
# For an unknown reason, ConnectNamedPipe() behaves differently:
# the completion is not notified by GetOverlappedResult() if we
# already called GetOverlappedResult(). For this specific case, we
# don't expect notification (register is set to False).
</pre>
<p>I pushed another change to attempt to stabilize the code, <a class="reference external" href="https://github.com/python/cpython/commit/313a9809043ed2ed1ad25282af7169e08cdc92a3">commit 313a9809</a>:</p>
<pre class="literal-block">
commit 313a9809043ed2ed1ad25282af7169e08cdc92a3
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Tue Jul 29 12:58:23 2014 +0200
* _WaitHandleFuture.cancel() now notify IocpProactor through the overlapped
object that the wait was cancelled.
* Optimize IocpProactor.wait_for_handle() gets the result if the wait is
signaled immediatly.
(...)
</pre>
</div>
<div class="section" id="asyncio-issue-196-closed">
<h3>asyncio issue #196 closed</h3>
<p>The initial issue "_OverlappedFuture.set_result() should clear its reference to
the overlapped object" has been fixed, so <strong>I closed this issue</strong>. I didn't
know at this point that all bugs were not fixed yet...</p>
<p>I also opened the new <a class="reference external" href="https://github.com/python/asyncio/issues/204">asyncio issue #204</a> to investigate
<tt class="docutils literal">accept_pipe()</tt> special case. We will analyze this funny bug in another article.</p>
</div>
</div>
<div class="section" id="bpo-23095-race-condition-when-cancelling-a-waithandlefuture">
<h2>bpo-23095: race condition when cancelling a _WaitHandleFuture</h2>
<p>At December 21, 2014, five months after a long serie of changes to stabilize
asyncio... <strong>asyncio was still crashing randomly on Windows</strong>! I created
<a class="reference external" href="https://bugs.python.org/issue23095">bpo-23095: race condition when cancelling a _WaitHandleFuture</a>.</p>
<p>On Windows using the IOCP (proactor) event loop, I noticed race conditions when
running the test suite of Trollius (my old deprecated asyncio port to Python
2). For example, sometimes the return code of a process was <tt class="docutils literal">None</tt>, whereas
this case <strong>must never happen</strong>. It looks like the <tt class="docutils literal">wait_for_handle()</tt> method
doesn't behave properly.</p>
<p>When I run the test suite of asyncio in debug mode (PYTHONASYNCIODEBUG=1),
sometimes I see the message "GetQueuedCompletionStatus() returned an unexpected
event" which <strong>should never occur neither</strong>.</p>
<p>I added debug traces. I saw that the <tt class="docutils literal">IocpProactor.wait_for_handle()</tt> calls
later <tt class="docutils literal">PostQueuedCompletionStatus()</tt> through its internal C callback
(<tt class="docutils literal">PostToQueueCallback</tt>). It looks like <strong>sometimes the callback is called
whereas the wait was cancelled/acked</strong> by <tt class="docutils literal">UnregisterWait()</tt>.</p>
<p>... I didn't understand the logic between <tt class="docutils literal">RegisterWaitForSingleObject()</tt>,
<tt class="docutils literal">UnregisterWait()</tt> and the callback ....</p>
<p>It looks like sometimes the overlapped object created in Python
(<tt class="docutils literal">ov = _overlapped.Overlapped(NULL)</tt>) is destroyed, before
<tt class="docutils literal">PostToQueueCallback()</tt> is called. In the unit tests, <strong>it doesn't crash
because a different overlapped object is created and it gets the same memory
address</strong> (the memory allocator reuses a just freed memory block).</p>
<p>The implementation of <tt class="docutils literal">wait_for_handle()</tt> had an optimization: it polls
immediatly the wait to check if it already completed. I tried to remove it, but
I got some different issues. If I understood correctly, <strong>this optimization
hides other bugs and reduce the probability of getting the race condition</strong>.</p>
<p><tt class="docutils literal">wait_for_handle()</tt> is used to wait for the completion of a subprocess, so by
all unit tests running subprocesses, but also in <tt class="docutils literal">test_wait_for_handle()</tt> and
<tt class="docutils literal">test_wait_for_handle_cancel()</tt> tests. I suspect that running
<tt class="docutils literal">test_wait_for_handle()</tt> or <tt class="docutils literal">test_wait_for_handle_cancel()</tt> triggers the
bug.</p>
<p>Removing <tt class="docutils literal">_winapi.CloseHandle(self._iocp)</tt> in <tt class="docutils literal">IocpProactor.close()</tt>
works around the bug. The bug looks to be an expected call to
<tt class="docutils literal">PostToQueueCallback()</tt> which calls <tt class="docutils literal">PostQueuedCompletionStatus()</tt> on an
IOCP. Not closing the IOCP means using a different IOCP for each test, so the
unexpected call to <tt class="docutils literal">PostQueuedCompletionStatus()</tt> has no effect on following
tests.</p>
<p>I rewrote some parts of the IOCP code in asyncio. Maybe I introduced this issue
during the refactoring. Maybe <strong>it already existed before but nobody noticed
it, asyncio had fewer unit tests before</strong>.</p>
</div>
<div class="section" id="fixing-the-root-issue-overlapped-cancellation-from-hell">
<h2>Fixing the root issue: Overlapped Cancellation From Hell</h2>
<p>I looked into Twisted implemented of proactor, but it didn't support
subprocesses.</p>
<p>I looked at libuv: it supported processes but not cancelling a wait on a
process handle...</p>
<p><strong>I had to ask for help to someone who had access to the Windows source code</strong>
to understand the bug...</p>
<p><strong>After six months of intense debugging, I eventually identified the root
issue</strong> (I pushed the first fix at July 25, 2014). I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/d0a28dee78d099fcadc71147cba4affb6efa0c97">commit
d0a28dee</a>
(<a class="reference external" href="https://bugs.python.org/issue23095">bpo-23095</a>):</p>
<pre class="literal-block">
commit d0a28dee78d099fcadc71147cba4affb6efa0c97
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Wed Jan 21 23:39:51 2015 +0100
Issue #23095, asyncio: Rewrite _WaitHandleFuture.cancel()
</pre>
<p>This change fixes a race conditon related to <tt class="docutils literal">_WaitHandleFuture.cancel()</tt>
leading to a Python crash or "GetQueuedCompletionStatus() returned an
unexpected event" logs. Previously, <strong>it was possible that the cancelled wait
completes whereas the overlapped object was already destroyed</strong>. Sometimes, a
different overlapped was allocated at the same address, emitting a log about
unexpected completition (but no crash).</p>
<p><tt class="docutils literal">_WaitHandleFuture.cancel()</tt> now <strong>waits until the handle wait is cancelled</strong>
(until the cancellation completes) before clearing its reference to the
overlapped object. To wait until the cancellation completes,
<tt class="docutils literal">UnregisterWaitEx()</tt> is used with an event (instead of using
<tt class="docutils literal">UnregisterWait()</tt>).</p>
<p>To wait for this event, a new <tt class="docutils literal">_WaitCancelFuture</tt> class was added. It's a
simplified version of <tt class="docutils literal">_WaitCancelFuture</tt>. For example, its <tt class="docutils literal">cancel()</tt>
method calls <tt class="docutils literal">UnregisterWait()</tt>, not <tt class="docutils literal">UnregisterWaitEx()</tt>.
<tt class="docutils literal">_WaitCancelFuture</tt> should not be cancelled.</p>
<p>The overlapped object is <strong>kept alive</strong> in <tt class="docutils literal">_WaitHandleFuture</tt> <strong>until the
wait is unregistered</strong>.</p>
<p>Later, I pushed a few more changes to fix corner cases.</p>
<p><a class="reference external" href="https://github.com/python/cpython/commit/1ca9392c7083972c1953c02e6f2cca54934ce0a6">commit 1ca9392c</a>:</p>
<pre class="literal-block">
commit 1ca9392c7083972c1953c02e6f2cca54934ce0a6
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Thu Jan 22 00:17:54 2015 +0100
Issue #23095, asyncio: IocpProactor.close() must not cancel pending
_WaitCancelFuture futures
</pre>
<p><a class="reference external" href="https://github.com/python/cpython/commit/752aba7f999b08c833979464a36840de8be0baf0">commit 752aba7f</a>:</p>
<pre class="literal-block">
commit 752aba7f999b08c833979464a36840de8be0baf0
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Thu Jan 22 22:47:13 2015 +0100
asyncio: IocpProactor.close() doesn't cancel anymore futures which are already
cancelled
</pre>
<p><a class="reference external" href="https://github.com/python/cpython/commit/24dfa3c1d6b21e731bd167a13153968bba8fa5ce">commit 24dfa3c1</a>:</p>
<pre class="literal-block">
commit 24dfa3c1d6b21e731bd167a13153968bba8fa5ce
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 26 22:30:28 2015 +0100
Issue #23095, asyncio: Fix _WaitHandleFuture.cancel()
If UnregisterWaitEx() fais with ERROR_IO_PENDING, it doesn't mean that the wait
is unregistered yet. We still have to wait until the wait is cancelled.
</pre>
<p>I think that <em>this</em> issue can now be closed: <tt class="docutils literal">UnregisterWaitEx()</tt> really do
what we need in asyncio.</p>
<p>I don't like the complexity of the IocpProactor._unregister() method and of the
_WaitCancelFuture class, but it looks that it's how we are supposed to wait
until a wait for a handle is cancelled...</p>
<p>Windows IOCP API is much more complex that what I expected. It's probably
because some parts (especially <tt class="docutils literal">RegisterWaitForSingleObject()</tt>) are
implemented with threads in user land, not in the kernel.</p>
<p>In short, I'm very happy that have fixed this very complex but also very
annoying IOCP bug in asyncio.</p>
<p>I got a nice comment from <a class="reference external" href="https://bugs.python.org/issue23095#msg234453">Guido van Rossum</a>:</p>
<blockquote>
<strong>Congrats with the fix, and thanks for your perseverance!</strong></blockquote>
</div>
<div class="section" id="summary-of-the-race-condition">
<h2>Summary of the race condition</h2>
<p>Events of the crashing unit test:</p>
<ul class="simple">
<li>The loop (ProactorEventLoop) spawns a subprocess.</li>
<li>The loop creates a _WaitHandleFuture object which creates an overlapped to
wait until the process completes (call <tt class="docutils literal">RegisterWaitForSingleObject()</tt>):
<strong>allocate</strong> memory for the overlapped.</li>
<li>The wait future is cancelled (call <tt class="docutils literal">UnregisterWait()</tt>).</li>
<li>The overlapped is destroyed: <strong>free</strong> overlapped memory.</li>
<li>The overlapped completes: <strong>write</strong> into the overlapped memory.</li>
</ul>
<p>The main issue is the order of the two last events.</p>
<p>Sometimes, the overlapped completed before the memory was freed: everything is
fine.</p>
<p>Sometimes, the overlapped completed after the memory was freed: Python crashed
(segmentation fault).</p>
<p>Sometimes, another _WaitHandleFuture was created in the meanwhile and created a
second overlapped which was allocated at the same memory address than the freed
memory of the previous overlapped. In this case, when the first overlapped
completes, Python didn't crash but logged an unexpected completion message.</p>
<p>Sometimes, the write was done in freed memory: the write didn't crash Python,
but caused bugs which didn't make sense.</p>
<p>There were even more cases causing even more surprising behaviors.</p>
<p>Summary of the fix:</p>
<ul class="simple">
<li>(... similar steps for the beginning ...)</li>
<li>The wait future is cancelled: <strong>create an event</strong> to wait until the
cancellation completes (call <tt class="docutils literal">UnregisterWaitEx()</tt>).</li>
<li>Wait for the event.</li>
<li>The event is signalled which means that the cancellation completed: <strong>write</strong>
into the overlapped memory.</li>
<li>The overlapped is destroyed: <strong>free</strong> overlapped memory.</li>
</ul>
</div>
Locale Bugfixes in Python 32019-01-09T00:30:00+01:002019-01-09T00:30:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-09:/locale-bugfixes-python3.html<a class="reference external image-reference" href="https://www.flickr.com/photos/svensson/40467591/"><img alt="Unicode Mixed Bag" src="https://vstinner.github.io/images/unicode_bag.jpg" /></a>
<p>This article describes a few locales bugs that I fixed in Python 3 between 2012
(Python 3.3) and 2018 (Python 3.7):</p>
<ul class="simple">
<li>Support non-ASCII decimal point and thousands separator</li>
<li>Crash with non-ASCII decimal point</li>
<li>LC_NUMERIC encoding different than LC_CTYPE encoding</li>
<li>LC_MONETARY encoding different than LC_CTYPE encoding</li>
<li>Tests non-ASCII locales …</li></ul><a class="reference external image-reference" href="https://www.flickr.com/photos/svensson/40467591/"><img alt="Unicode Mixed Bag" src="https://vstinner.github.io/images/unicode_bag.jpg" /></a>
<p>This article describes a few locales bugs that I fixed in Python 3 between 2012
(Python 3.3) and 2018 (Python 3.7):</p>
<ul class="simple">
<li>Support non-ASCII decimal point and thousands separator</li>
<li>Crash with non-ASCII decimal point</li>
<li>LC_NUMERIC encoding different than LC_CTYPE encoding</li>
<li>LC_MONETARY encoding different than LC_CTYPE encoding</li>
<li>Tests non-ASCII locales</li>
</ul>
<p>See also my previous locale bugfixes: <a class="reference external" href="https://vstinner.github.io/python3-locales-encodings.html">Python 3, locales and encodings</a></p>
<div class="section" id="introduction">
<h2>Introduction</h2>
<p>Each language and each country has different ways to represent dates, monetary
values, numbers, etc. Unix has "locales" to configure applications for a
specific language and a specific country. For example, there are <tt class="docutils literal">fr_BE</tt> for
Belgium (french) and <tt class="docutils literal">fr_FR</tt> for France (french).</p>
<p>In practice, each locale uses its own encoding and problems arise when an
application uses a different encoding than the locale. There are LC_NUMERIC
locale for numbers, LC_MONETARY locale for monetary and LC_CTYPE for the
encoding. Not only it's possible to configure an application to use LC_NUMERIC
with a different encoding than LC_CTYPE, but some users use such configuration!</p>
<p>In an application which only uses bytes for text, as Python 2 does mostly, it's
mostly fine: in the worst case, users see <a class="reference external" href="https://en.wikipedia.org/wiki/Mojibake">mojibake</a>, but the application doesn't
"crash" (exit and/or data loss). On the other side, <strong>Python 3 is designed to
use Unicode for text and fail with hard Unicode errors if it fails to decode
bytes and fails to encode text</strong>.</p>
</div>
<div class="section" id="support-non-ascii-decimal-point-and-thousands-separator">
<h2>Support non-ASCII decimal point and thousands separator</h2>
<p>The Unicode type has been reimplemented in Python 3.3 to use "compact string":
<a class="reference external" href="https://www.python.org/dev/peps/pep-0393/">PEP 393 "Flexible String Representation"</a>. The new implementation is more
complex and the format() function has been limited to ASCII for the decimal
point and thousands separator (format a number using the "n" type).</p>
<p>In January 2012, Stefan Krah noticed the regression (compared to Python 3.2)
and reported <a class="reference external" href="https://bugs.python.org/issue13706">bpo-13706</a>. I fixed the
code to support non-ASCII in format (<a class="reference external" href="https://github.com/python/cpython/commit/a4ac600d6f9c5b74b97b99888b7cf3a7973cadc8">commit a4ac600d</a>).
But when I did more tests, I noticed that the "n" type doesn't decode properly
the decimal point and thousands seprator which come from the <tt class="docutils literal">localeconv()</tt>
function which uses byte strings.</p>
<p>I fixed <tt class="docutils literal">format(int, "n")</tt> with <a class="reference external" href="https://github.com/python/cpython/commit/41a863cb81608c779d60b49e7be8a115816734fc">commit 41a863cb</a>,
decode decimal point and the thousands separator (<tt class="docutils literal">localeconv()</tt> fields) from
the locale encoding, rather than latin1, using <tt class="docutils literal">PyUnicode_DecodeLocale()</tt>:</p>
<pre class="literal-block">
commit 41a863cb81608c779d60b49e7be8a115816734fc
Author: Victor Stinner <victor.stinner@haypocalc.com>
Date: Fri Feb 24 00:37:51 2012 +0100
Issue #13706: Fix format(int, "n") for locale with non-ASCII thousands separator
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
(from the locale encoding), instead of decoding them implicitly from latin1
* Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
* Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
character if unicode is NULL
* (...)
</pre>
<p>Note: I decided to not fix Python 3.2:</p>
<blockquote>
Hum, <strong>it is not trivial to redo the work on Python 3.2</strong>. I prefer to leave
the code unchanged to not introduce a regression, and I wait until a Python
3.2 user complains (the bug exists since Python 3.0 and nobody complained).</blockquote>
</div>
<div class="section" id="crash-with-non-ascii-decimal-point">
<h2>Crash with non-ASCII decimal point</h2>
<p>Six years later, in June 2018, I noticed that Python does crash when running
tests on locales:</p>
<pre class="literal-block">
$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03)
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)
</pre>
<p>I reported the issue as <a class="reference external" href="https://bugs.python.org/issue33954">bpo-33954</a>. The
bug only occurrs for decimal point larger than U+00FF (code point greater than
255). It was a bug in my <a class="reference external" href="https://bugs.python.org/issue13706">bpo-13706</a>
fix: <a class="reference external" href="https://github.com/python/cpython/commit/a4ac600d6f9c5b74b97b99888b7cf3a7973cadc8">commit a4ac600d</a>.</p>
<p>I pushed a second fix to properly support all cases, <a class="reference external" href="https://github.com/python/cpython/commit/59423e3ddd736387cef8f7632c71954c1859bed0">commit 59423e3d</a>:</p>
<pre class="literal-block">
commit 59423e3ddd736387cef8f7632c71954c1859bed0
Author: Victor Stinner <vstinner@redhat.com>
Date: Mon Nov 26 13:40:01 2018 +0100
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)
Fix str.format(), float.__format__() and complex.__format__() methods
for non-ASCII decimal point when using the "n" formatter.
Changes:
* Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires
a _PyUnicodeWriter object for the buffer and a Python str object
for digits.
* Rename FILL() macro to unicode_fill(), convert it to static inline function,
add "assert(0 <= start);" and rework its code.
</pre>
</div>
<div class="section" id="lc-numeric-encoding-different-than-lc-ctype-encoding">
<h2>LC_NUMERIC encoding different than LC_CTYPE encoding</h2>
<p>In August 2017, Petr Viktorin identified a bug in Koji (server building Fedora
packages): <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1484497">UnicodeDecodeError in localeconv() makes test_float fail in Koji</a></p>
<blockquote>
"This is tripped by Python's test suite, namely
test_float.GeneralFloatCases.test_float_with_comma"</blockquote>
<p>He wrote a short reproducer script:</p>
<pre class="literal-block">
import locale
locale.setlocale(locale.LC_ALL, 'C.UTF-8')
locale.setlocale(locale.LC_NUMERIC, 'fr_FR.ISO8859-1')
print(locale.localeconv())
</pre>
<p>Two months later, Charalampos Stratakis reported the bug upstream: <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a>. The problem arises when <strong>the
LC_NUMERIC locale uses a different encoding than the LC_CTYPE encoding</strong>.</p>
<p>The bug was already known:</p>
<ul class="simple">
<li>2015-12-05: Serhiy Storchaka reported <a class="reference external" href="https://bugs.python.org/issue25812">bpo-25812</a> with uk_UA locale</li>
<li>2016-11-03: Guillaume Pasquet reported <a class="reference external" href="https://bugs.python.org/issue28604">bpo-28604</a> with en_GB locale</li>
</ul>
<p>Moreover, <strong>the bug was known since 2009</strong>, Stefan Krah reported a very similar
bug: <a class="reference external" href="https://bugs.python.org/issue7442">bpo-7442</a>. I was even involved in
this issue in 2013, but then I forgot about it (as usual, I am working on too
many issues in parallel :-)).</p>
<p>In 2010, PostgreSQL <a class="reference external" href="https://www.postgresql.org/message-id/20100422015552.4B7E07541D0@cvs.postgresql.org">had the same issue</a>
and <a class="reference external" href="https://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/pg_locale.c?r1=1.53&r2=1.54">fixed the bug by changing temporarily the LC_CTYPE locale to the
LC_NUMERIC locale</a>.</p>
<p>In January 2018, I came back to this 9 years old bug. I was fixing bugs in the
implementation of my <a class="reference external" href="https://www.python.org/dev/peps/pep-0540/">PEP 540 "Add a new UTF-8 Mode"</a>. I pushed a large change to fix
locale encodings in <a class="reference external" href="https://bugs.python.org/issue29240">bpo-29240</a>, <a class="reference external" href="https://github.com/python/cpython/commit/7ed7aead9503102d2ed316175f198104e0cd674c">commit
7ed7aead</a>:</p>
<pre class="literal-block">
commit 7ed7aead9503102d2ed316175f198104e0cd674c
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 15 10:45:49 2018 +0100
bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.
Changes: (...)
</pre>
<p>Stefan Krah asked:</p>
<blockquote>
I have the exact same questions as Marc-Andre. This is one of the reasons
why I blocked the _decimal change. I don't fully understand the role of the
new glibc, since #7442 has existed for ages -- and <strong>it is a open question
whether it is a bug or not</strong>.</blockquote>
<p>I replied:</p>
<blockquote>
<p>Past 10 years, I repeated to every single user I met that "Python 3 is
right, your system setup is wrong". But that's a waste of time. People
continue to associate Python3 and Unicode to annoying bugs, because they
don't understand how locales work.</p>
<p>Instead of having to repeat to each user that "hum, maybe your config is
wrong", <strong>I prefer to support this non convential setup and work as expected
("it just works")</strong>. With my latest implementation, setlocale() is only done
when LC_CTYPE and LC_NUMERIC are different, which is the corner case which
"shouldn't occur in practice".</p>
</blockquote>
<p>Marc-Andre Lemburg added:</p>
<blockquote>
Sounds like a good compromise :-)</blockquote>
<p>After doing more tests on FreeBSD, Linux and macOS, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/cb064fc2321ce8673fe365e9ef60445a27657f54">commit cb064fc2</a>
to fix <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a> by changing
temporarily the LC_CTYPE locale to the LC_NUMERIC locale:</p>
<pre class="literal-block">
commit cb064fc2321ce8673fe365e9ef60445a27657f54
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Mon Jan 15 15:58:02 2018 +0100
bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174)
* Add _Py_GetLocaleconvNumeric() function: decode decimal_point and
thousands_sep fields of localeconv() from the LC_NUMERIC encoding,
rather than decoding from the LC_CTYPE encoding.
* Modify locale.localeconv() and "n" formatter of str.format() (for
int, float and complex to use _Py_GetLocaleconvNumeric()
internally.
</pre>
<p>I dislike my own fix because changing temporarily the LC_CTYPE locale impacts
all threads, not only the current thread. But we failed to find another
solution. <strong>The LC_CTYPE locale is only changed if the LC_NUMERIC locale is
different than the LC_CTYPE locale and if the decimal point or the thousands
separator is non-ASCII.</strong></p>
<p>Note: I proposed a change to fix the same bug in the <tt class="docutils literal">decimal</tt> module: <a class="reference external" href="https://github.com/python/cpython/pull/5191">PR
#5191</a>, but I abandonned my
patch.</p>
</div>
<div class="section" id="lc-monetary-encoding-different-than-lc-ctype-encoding">
<h2>LC_MONETARY encoding different than LC_CTYPE encoding</h2>
<p>Fixing <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a> drained all my
energy, but sadly... there was a similar bug with LC_MONETARY!</p>
<p>At 2016-11-03, Guillaume Pasquet reported <a class="reference external" href="https://bugs.python.org/issue28604">bpo-28604: Exception raised by
python3.5 when using en_GB locale</a>.</p>
<p>The fix is similar to the LC_NUMERIC fix: change temporarily the LC_CTYPE
locale to the LC_MONETARY locale, <a class="reference external" href="https://github.com/python/cpython/commit/02e6bf7f2025cddcbde6432f6b6396198ab313f4">commit 02e6bf7f</a>:</p>
<pre class="literal-block">
commit 02e6bf7f2025cddcbde6432f6b6396198ab313f4
Author: Victor Stinner <vstinner@redhat.com>
Date: Tue Nov 20 16:20:16 2018 +0100
bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606)
locale.localeconv() now sets temporarily the LC_CTYPE locale to the
LC_MONETARY locale if the two locales are different and monetary
strings are non-ASCII. This temporary change affects other threads.
Changes:
* locale.localeconv() can now set LC_CTYPE to LC_MONETARY to decode
monetary fields.
* (...)
</pre>
</div>
<div class="section" id="tests-non-ascii-locales">
<h2>Tests non-ASCII locales</h2>
<p>To test my bugfixes, I used manual tests. The first issue was to identify
locales with problematic characters: non-ASCII decimal point or thousands
separator for example. I wrote my own "test suite" for Windows, Linux, macOS
and FreeBSD on my website: <a class="reference external" href="https://vstinner.readthedocs.io/unicode.html#test-non-ascii-characters-with-locales">Test non-ASCII characters with locales</a>.</p>
<p>Example with localeconv() on Fedora 27:</p>
<table border="1" class="docutils">
<colgroup>
<col width="15%" />
<col width="8%" />
<col width="16%" />
<col width="25%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">LC_ALL locale</th>
<th class="head">Encoding</th>
<th class="head">Field</th>
<th class="head">Bytes</th>
<th class="head">Text</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>es_MX.utf8</td>
<td>UTF-8</td>
<td>thousands_sep</td>
<td><tt class="docutils literal">0xE2 0x80 0x89</tt></td>
<td>U+2009</td>
</tr>
<tr><td>fr_FR.UTF-8</td>
<td>UTF-8</td>
<td>currency_symbol</td>
<td><tt class="docutils literal">0xE2 0x82 0xAC</tt></td>
<td>U+20AC (€)</td>
</tr>
<tr><td>ps_AF.utf8</td>
<td>UTF-8</td>
<td>thousands_sep</td>
<td><tt class="docutils literal">0xD9 0xAC</tt></td>
<td>U+066C (٬)</td>
</tr>
<tr><td>uk_UA.koi8u</td>
<td>KOI8-U</td>
<td>currency_symbol</td>
<td><tt class="docutils literal">0xC7 0xD2 0xCE 0x2E</tt></td>
<td>U+0433 U+0440 U+043d U+002E (грн.)</td>
</tr>
<tr><td>uk_UA.koi8u</td>
<td>KOI8-U</td>
<td>thousands_sep</td>
<td><tt class="docutils literal">0x9A</tt></td>
<td>U+00A0</td>
</tr>
</tbody>
</table>
<p>Manual tests became more and more complex, since there are so many cases: each
operating system use different locale names and the result depends on the libc
version. After months of manual tests, I wrote my small personal <strong>portable</strong>
locale test suite: <a class="reference external" href="https://github.com/vstinner/misc/blob/master/python/test_all_locales.py">test_all_locales.py</a>.
It supports:</p>
<ul class="simple">
<li>FreeBSD 11</li>
<li>macOS</li>
<li>Fedora (Linux)</li>
</ul>
<p>Example:</p>
<pre class="literal-block">
def test_zh_TW_Big5(self):
loc = "zh_TW.Big5" if BSD else "zh_TW.big5"
if FREEBSD:
currency_symbol = u'\uff2e\uff34\uff04'
decimal_point = u'\uff0e'
thousands_sep = u'\uff0c'
date_str = u'\u661f\u671f\u56db 2\u6708'
else:
currency_symbol = u'NT$'
decimal_point = u'.'
thousands_sep = u','
if MACOS:
date_str = u'\u9031\u56db 2\u6708'
else:
date_str = u'\u9031\u56db \u4e8c\u6708'
self.set_locale(loc, "Big5")
lc = locale.localeconv()
self.assertLocaleEqual(lc['currency_symbol'], currency_symbol)
self.assertLocaleEqual(lc['decimal_point'], decimal_point)
self.assertLocaleEqual(lc['thousands_sep'], thousands_sep)
self.assertLocaleEqual(time.strftime('%A %B', FEBRUARY), date_str)
</pre>
<p>The best would be to integrate directly these tests into the Python test suite,
but it's not portable nor future-proof, since most constants are hardcoded and
depends on the operating sytem and the libc version.</p>
</div>
Python 3, locales and encodings2018-09-06T16:00:00+02:002018-09-06T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2018-09-06:/python3-locales-encodings.html<img alt="I □ Unicode" src="https://vstinner.github.io/images/i-square-unicode.jpg" />
<p>Recently, I worked on a change which looked simple: move the code to initialize
the <tt class="docutils literal">sys.stdout</tt> encoding before <tt class="docutils literal">Py_Initialize()</tt>. While I was on it,
I also decided to move the code which selects the Python "filesystem encoding".
I didn't expect that I would spend 2 weeks on these issues …</p><img alt="I □ Unicode" src="https://vstinner.github.io/images/i-square-unicode.jpg" />
<p>Recently, I worked on a change which looked simple: move the code to initialize
the <tt class="docutils literal">sys.stdout</tt> encoding before <tt class="docutils literal">Py_Initialize()</tt>. While I was on it,
I also decided to move the code which selects the Python "filesystem encoding".
I didn't expect that I would spend 2 weeks on these issues... This article
tells me about my recent journey in locales and encodings on AIX, HP-UX,
Windows, Linux, macOS, Solaris and FreeBSD.</p>
<p>Table of Contents:</p>
<ul class="simple">
<li>Lying HP-UX</li>
<li>Standard streams and filesystem encodings</li>
<li>POSIX locale on FreeBSD</li>
<li>C locale on Windows</li>
<li>Back to stdio encoding</li>
<li>Back to filesystem encoding</li>
<li>Use surrogatepass on Windows</li>
<li>Filesystem encoding documentation</li>
<li>Final FreeBSD 10 issue</li>
<li>Configuration of locales and encodings</li>
</ul>
<div class="section" id="lying-hp-ux">
<h2>Lying HP-UX</h2>
<p>At 2018-08-14, Michael Osipov reported <a class="reference external" href="https://bugs.python.org/issue34403">bpo-34403</a>:
"test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions":</p>
<pre class="literal-block">
======================================================================
FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
(...)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
: roman8:['h\xc3\xa9\xe2\x82\xac']
</pre>
<p>Interesting, HP-UX uses "roman8" as its locale encoding. What is this "new"
encoding? Wikipedia: <a class="reference external" href="https://en.wikipedia.org/wiki/HP_Roman#Roman-8">HP Roman-8</a>. Oh, that's even older than
the common ISO 8859 encodings like Latin1!</p>
<p>Michael Felt was working on a similar test_utf8_mode failure on AIX, so they
tried to debug the issue together, but failed to understand the issue. Osipov
proposed to give up and just skip the test on HP-UX...</p>
<p>I showed up and proposed a fix for the unit test: <a class="reference external" href="https://github.com/python/cpython/pull/8967/files">PR 8967</a>. The test was hardcoding
the expected locale encoding. I modified the test to query the locale encoding
at runtime instead.</p>
<p>Bad surprise, the test still fails, oh. <a class="reference external" href="https://bugs.python.org/issue34403#msg324219">I commented</a>:</p>
<blockquote>
Hum, it looks like a bug in the C library of HP-UX.</blockquote>
<p>I wrote a C program calling mbstowcs() to check what is the actual encoding
used by the C library: <a class="reference external" href="https://bugs.python.org/file47767/c_locale.c">c_locale.c</a>. <a class="reference external" href="https://bugs.python.org/issue34403#msg324225">Result</a>:</p>
<blockquote>
Well, it confirms what I expected: <tt class="docutils literal">nl_langinfo(CODESET)</tt> announces
<tt class="docutils literal">"roman8"</tt>, but <tt class="docutils literal">mbstowcs()</tt> uses Latin1 encoding in practice.</blockquote>
<p>So I wrote a workaround similar to the one used on FreeBSD and Solaris: check
if the libc is announcing an encoding different than the real encoding, and if
it's the case: force the usage of the ASCII encoding in Python. See
my <a class="reference external" href="https://github.com/python/cpython/commit/d500e5307aec9c5d535f66d567fadb9c587a9a36">commit d500e530</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Tue Aug 28 17:27:36 2018 +0200
bpo-34403: On HP-UX, force ASCII for C locale (GH-8969)
On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns
"ascii" instead of "roman8" (when the UTF-8 Mode is disabled and the C locale
is not coerced).
nl_langinfo(CODESET) announces "roman8" whereas it uses the Latin1
encoding in practice.
</pre>
<p>Extract of the heuristic code:</p>
<pre class="literal-block">
if (strcmp(encoding, "roman8") == 0) {
unsigned char ch = (unsigned char)0xA7;
wchar_t wch;
size_t res = mbstowcs(&wch, (char*)&ch, 1);
if (res != (size_t)-1 && wch == L'\xA7') {
/* On HP-UX withe C locale or the POSIX locale,
nl_langinfo(CODESET) announces "roman8",
whereas mbstowcs() uses Latin1 encoding in practice.
Force ASCII in this case. Roman8 decodes 0xA7
to U+00CF. Latin1 decodes 0xA7 to U+00A7. */
return 1;
}
}
</pre>
<p>Python 3.8 will handle better Unicode support on HP-UX. The test_utf8_mode
failure was just a hint for a real underlying bug!</p>
</div>
<div class="section" id="standard-streams-and-filesystem-encodings">
<h2>Standard streams and filesystem encodings</h2>
<p>While reworking the Python initialization, I tried to move <strong>all</strong>
configuration parameters to a new <tt class="docutils literal">_PyCoreConfig</tt> structure. But I know that
I missed at least the standard streams encoding (ex: <tt class="docutils literal">sys.stdout.encoding</tt>).
My first attempt failed to move the code, it broke many tests. I created
<a class="reference external" href="https://bugs.python.org/issue34485">bpo-34485</a>: "_PyCoreConfig: add
stdio_encoding and stdio_errors".</p>
<p>While I was working on stdio encoding, I also recalled that the Python
filesystem encoding is also initialized "late". I also created <a class="reference external" href="https://bugs.python.org/issue34523">bpo-34523</a>: "Choose the filesystem encoding before
Python initialization (add _PyCoreConfig.filesystem_encoding)" to move this
code as well.</p>
<p>I quickly had an implementation, but it didn't go as well as expected...</p>
</div>
<div class="section" id="posix-locale-on-freebsd">
<h2>POSIX locale on FreeBSD</h2>
<p><a class="reference external" href="https://bugs.python.org/issue34485">bpo-34485</a>: For me, the "C" and "POSIX"
locales were the same locale: C is an alias to POSIX, or the opposite, it
didn't really matter for me. But Python handles them differently in some corner
cases. For example, Nick Coghlan's PEP 538 (C locale coercion) is only enabled
if the LC_CTYPE locale is equal to "C", not if it's equal to "POSIX".</p>
<p>In Python 3.5, I changed stdin and stdout error handlers from strict to
surrogateescape if the LC_CTYPE locale is "C": <a class="reference external" href="https://bugs.python.org/issue19977">bpo-19977</a>. But when I tested my
stdio and filesystem changes on Linux, FreeBSD and Windows, I noticed that
I forgot to handle the "POSIX" locale. On FreeBSD, <tt class="docutils literal">LC_ALL=POSIX</tt> and <tt class="docutils literal">LC_ALL=C</tt>
behave differently:</p>
<ul class="simple">
<li>With <tt class="docutils literal">LC_ALL=POSIX</tt> environment, <tt class="docutils literal">setlocale(LC_CTYPE, "")</tt> returns <tt class="docutils literal">"POSIX"</tt></li>
<li>With <tt class="docutils literal">LC_ALL=C</tt> environment, <tt class="docutils literal">setlocale(LC_CTYPE, "")</tt> returns <tt class="docutils literal">"C"</tt></li>
</ul>
<p>I fixed that to also use the "surrogateescape" error handler for the POSIX
locale on FreeBSD. <a class="reference external" href="https://github.com/python/cpython/commit/315877dc361d554bec34b4b62c270479ad36a1be">Commit 315877dc</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 09:58:12 2018 +0200
bpo-34485: stdout uses surrogateescape on POSIX locale (GH-8986)
Standard streams like sys.stdout now use the "surrogateescape" error
handler, instead of "strict", on the POSIX locale (when the C locale is not
coerced and the UTF-8 Mode is disabled).
Add tests on sys.stdout.errors with LC_ALL=POSIX.
</pre>
<p>The most important change is just one line:</p>
<pre class="literal-block">
- if (strcmp(ctype_loc, "C") == 0) {
+ if (strcmp(ctype_loc, "C") == 0 || strcmp(ctype_loc, "POSIX") == 0) {
return "surrogateescape";
}
</pre>
<p><a class="reference external" href="https://bugs.python.org/issue34527">bpo-34527</a>: Since I was testing
various configurations, I also noticed that my UTF-8 Mode (PEP 540) had the
same bug. Python 3.7 enables it if the LC_CTYPE locale is equal to "C",
but not if it's equal to "POSIX". I also changed that (<a class="reference external" href="https://github.com/python/cpython/commit/5cb258950ce9b69b1f65646431c464c0c17b1510">commit 5cb25895</a>).</p>
</div>
<div class="section" id="c-locale-on-windows">
<h2>C locale on Windows</h2>
<p>While testing my changes on Windows, I noticed that Python starts with the
LC_CTYPE locale equal to "C", whereas <tt class="docutils literal">locale.setlocale(locale.LC_CTYPE, "")</tt>
changes the LC_CTYPE locale to something like <tt class="docutils literal">English_United States.1252</tt>
(English with the code page 1252). Example with Python 3.6:</p>
<pre class="literal-block">
C:\> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, None)
'C'
>>> locale.setlocale(locale.LC_CTYPE, "")
'English_United States.1252'
>>> locale.setlocale(locale.LC_CTYPE, None)
'English_United States.1252'
</pre>
<p>On UNIX, Python 2 starts with the default C locale, whereas Python 3 always
sets the LC_CTYPE locale to my preference. Example on Fedora 28 with
<tt class="docutils literal"><span class="pre">LANG=fr_FR.UTF-8</span></tt>:</p>
<pre class="literal-block">
$ python2 -c 'import locale; print(locale.setlocale(locale.LC_CTYPE, None))'
C
$ python3 -c 'import locale; print(locale.setlocale(locale.LC_CTYPE, None))'
fr_FR.UTF-8
</pre>
<p>I modified Windows to behave as UNIX, <a class="reference external" href="https://github.com/python/cpython/commit/177d921c8c03d30daa32994362023f777624b10d">commit 177d921c</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 11:25:15 2018 +0200
bpo-34485, Windows: LC_CTYPE set to user preference (GH-8988)
On Windows, the LC_CTYPE is now set to the user preferred locale at
startup: _Py_SetLocaleFromEnv(LC_CTYPE) is now called during the
Python initialization. Previously, the LC_CTYPE locale was "C" at
startup, but changed when calling setlocale(LC_CTYPE, "") or
setlocale(LC_ALL, "").
pymain_read_conf() now also calls _Py_SetLocaleFromEnv(LC_CTYPE) to
behave as _Py_InitializeCore(). Moreover, it doesn't save/restore the
LC_ALL anymore.
On Windows, standard streams like sys.stdout now always use
surrogateescape error handler by default (ignore the locale).
</pre>
<p>Example:</p>
<pre class="literal-block">
C:\> python3.6 -c "import locale; print(locale.setlocale(locale.LC_CTYPE, None))"
C
C:\> python3.8 -c "import locale; print(locale.setlocale(locale.LC_CTYPE, None))"
English_United States.1252
</pre>
<p>On Windows, Python 3.8 now starts with the LC_CTYPE locale set to my
preference, as it was already previously done on UNIX.</p>
</div>
<div class="section" id="back-to-stdio-encoding">
<h2>Back to stdio encoding</h2>
<p>After all previous changes and fixes, I was able to push my <a class="reference external" href="https://github.com/python/cpython/commit/dfe0dc74536dfb6f331131d9b2b49557675bb6b7">commit dfe0dc74</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 11:47:29 2018 +0200
bpo-34485: Add _PyCoreConfig.stdio_encoding (GH-8881)
* Add stdio_encoding and stdio_errors fields to _PyCoreConfig.
* Add unit tests on stdio_encoding and stdio_errors.
</pre>
</div>
<div class="section" id="back-to-filesystem-encoding">
<h2>Back to filesystem encoding</h2>
<p><a class="reference external" href="https://github.com/python/cpython/commit/b2457efc78b74a1d6d1b77d11a939e886b8a4e2c">Commit b2457efc</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 13:25:36 2018 +0200
bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().
_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.
Changes:
* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
now use the interpreter configuration rather than
Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
private functions to only modify Py_FileSystemDefaultEncoding and
Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
_PyCoreConfig for the warning.
</pre>
</div>
<div class="section" id="use-surrogatepass-on-windows">
<h2>Use surrogatepass on Windows</h2>
<p>While working on the filesystem encoding change, I had a bug in
_freeze_importlib.exe which failed at startup:</p>
<pre class="literal-block">
ValueError: only 'strict' and 'surrogateescape' error handlers are supported, not 'surrogatepass'
</pre>
<p>I used the following workaround in <tt class="docutils literal">_freeze_importlib.c</tt>:</p>
<pre class="literal-block">
#ifdef MS_WINDOWS
/* bpo-34523: initfsencoding() is not called if _install_importlib=0,
so interp->fscodec_initialized value remains 0.
PyUnicode_EncodeFSDefault() doesn't support the "surrogatepass" error
handler in such case, whereas it's the default error handler on Windows.
Force the "strict" error handler to work around this bootstrap issue. */
config.filesystem_errors = "strict";
#endif
</pre>
<p>But I wasn't fully happy with the workaround. When running more manual tests, I
found that the <tt class="docutils literal">PYTHONLEGACYWINDOWSFSENCODING</tt> environment variable wasn't
handled properly. I pushed a first fix,
<a class="reference external" href="https://github.com/python/cpython/commit/c5989cd87659acbfd4d19dc00dbe99c3a0fc9bd2">commit c5989cd8</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 19:32:47 2018 +0200
bpo-34523: Py_DecodeLocale() use UTF-8 on Windows (GH-8998)
Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding on
Windows if Py_LegacyWindowsFSEncodingFlag is zero.
pymain_read_conf() now sets Py_LegacyWindowsFSEncodingFlag in its
loop, but restore its value at exit.
</pre>
<p>My intent was to be able to use the <tt class="docutils literal">surrogatepass</tt> error handler. If
<tt class="docutils literal">Py_DecodeLocale()</tt> is hardcoded to use UTF-8 on Windows, we should get
access to the <tt class="docutils literal">surrogatepass</tt> error handler. Previously, <tt class="docutils literal">mbstowcs()</tt>
function was used and this function only support <tt class="docutils literal">strict</tt> or
<tt class="docutils literal">surrogateescape</tt> error handlers.</p>
<p>I pushed a second big change to add support for the <tt class="docutils literal">surrogatepass</tt> error
handler in locale codecs, <a class="reference external" href="https://github.com/python/cpython/commit/3d4226a832cabc630402589cc671cc4035d504e5">commit 3d4226a8</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 22:21:32 2018 +0200
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.
Changes:
* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
the _Py_error_handler enum instead of "int surrogateescape" to pass
the error handler. These functions now return -3 if the error
handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
workaround anymore.
</pre>
<p><tt class="docutils literal">PyUnicode_DecodeFSDefault()</tt> and <tt class="docutils literal">PyUnicode_EncodeFSDefault()</tt> functions
use <tt class="docutils literal">Py_DecodeLocale()</tt> and <tt class="docutils literal">Py_EncodeLocale()</tt> before the Python codec of
the filesystem encoding is loaded. With this big change, <tt class="docutils literal">Py_DecodeLocale()</tt>
and <tt class="docutils literal">Py_EncodeLocale()</tt> now really behave as the Python codec.</p>
<p>Previously, Python started with the <tt class="docutils literal">surrogateescape</tt> error handler, and
switched to the <tt class="docutils literal">surrogatepass</tt> error handler once the Python codec was
loaded.</p>
</div>
<div class="section" id="filesystem-encoding-documentation">
<h2>Filesystem encoding documentation</h2>
<p>One "last" change, I documented how Python selects the filesystem encoding,
<a class="reference external" href="https://github.com/python/cpython/commit/de427556746aa41a8b5198924ce423021bc0c718">commit de427556</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 23:26:55 2018 +0200
bpo-34523: Py_FileSystemDefaultEncoding NULL by default (GH-9003)
* Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
default value is now NULL: initfsencoding() set them
during Python initialization.
* Document how Python chooses the filesystem encoding and error
handler.
* Add an assertion to _PyCoreConfig_Read().
</pre>
<p>Documentation:</p>
<pre class="literal-block">
/* Python filesystem encoding and error handler:
sys.getfilesystemencoding() and sys.getfilesystemencodeerrors().
Default encoding and error handler:
* if Py_SetStandardStreamEncoding() has been called: they have the
highest priority;
* PYTHONIOENCODING environment variable;
* The UTF-8 Mode uses UTF-8/surrogateescape;
* locale encoding: ANSI code page on Windows, UTF-8 on Android,
LC_CTYPE locale encoding on other platforms;
* On Windows, "surrogateescape" error handler;
* "surrogateescape" error handler if the LC_CTYPE locale is "C" or "POSIX";
* "surrogateescape" error handler if the LC_CTYPE locale has been coerced
(PEP 538);
* "strict" error handler.
Supported error handlers: "strict", "surrogateescape" and
"surrogatepass". The surrogatepass error handler is only supported
if Py_DecodeLocale() and Py_EncodeLocale() use directly the UTF-8 codec;
it's only used on Windows.
initfsencoding() updates the encoding to the Python codec name.
For example, "ANSI_X3.4-1968" is replaced with "ascii".
On Windows, sys._enablelegacywindowsfsencoding() sets the
encoding/errors to mbcs/replace at runtime.
See Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors.
*/
char *filesystem_encoding;
char *filesystem_errors;
</pre>
</div>
<div class="section" id="final-freebsd-10-issue">
<h2>Final FreeBSD 10 issue</h2>
<p><a class="reference external" href="https://bugs.python.org/issue34544">bpo-34544</a>: The stdio and filesystem
encodings are now properly selected before Py_Initialize(), the LC_CTYPE locale
should be properly initialized, the "POSIX" locale is now properly handled, but
the FreeBSD 10 buildbot still complained about my recent changes... Many
<tt class="docutils literal">test_c_locale_coerce</tt> tests started to fail with:</p>
<blockquote>
Fatal Python error: get_locale_encoding: failed to get the locale encoding: nl_langinfo(CODESET) failed</blockquote>
<p>Sadly, I wasn't able to reproduce the issue on my FreeBSD 11 VM. I also got
access to the FreeBSD CURRENT buildbot, but I also failed to reproduce the bug
there. I was supposed to get access to the FreeBSD 10 buildbot, but there was a
DNS issue.</p>
<p>I had to <em>guess</em> the origin of the bug and I attempted a fix, <a class="reference external" href="https://github.com/python/cpython/commit/f01b2a1b84ee08df73a78cf1017eecf15e3cb995">commit f01b2a1b</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Mon Sep 3 14:38:21 2018 +0200
bpo-34544: Fix setlocale() in pymain_read_conf() (GH-9041)
bpo-34485, bpo-34544: On some FreeBSD, nl_langinfo(CODESET) fails if
LC_ALL or LC_CTYPE is set to an invalid locale name. Replace
_Py_SetLocaleFromEnv(LC_CTYPE) with _Py_SetLocaleFromEnv(LC_ALL) to
initialize properly locales.
Partially revert commit 177d921c8c03d30daa32994362023f777624b10d.
</pre>
<p>... but it didn't work.</p>
<p>I decided to install a FreeBSD 10 VM and one week later... I finally succeded
to reproduce the issue!</p>
<p>The bug was that the <tt class="docutils literal">_Py_CoerceLegacyLocale()</tt> function doesn't restore the
LC_CTYPE to its previous value if it attempted to coerce the LC_CTYPE locale
but no locale worked.</p>
<p>Previously, it didn't matter, since the LC_CTYPE locale was initialized again
later, or it was saved/restored indirectly. But with my latest changes, the
LC_CTYPE was left unchanged.</p>
<p>The fix is just to restore LC_CTYPE if <tt class="docutils literal">_Py_CoerceLegacyLocale()</tt> fails,
<a class="reference external" href="https://github.com/python/cpython/commit/8ea09110d413829f71d979d8c7073008cb87fb03">commit 8ea09110</a>:</p>
<pre class="literal-block">
Author: Victor Stinner <vstinner@redhat.com>
Date: Mon Sep 3 17:05:18 2018 +0200
_Py_CoerceLegacyLocale() restores LC_CTYPE on fail (GH-9044)
bpo-34544: If _Py_CoerceLegacyLocale() fails to coerce the C locale,
restore the LC_CTYPE locale to the its previous value.
</pre>
<p>Finally, I succeded to do what I wanted to do initially, remove the code which
saved/restored the LC_ALL locale: <tt class="docutils literal">pymain_read_conf()</tt> is now really
responsible to set the LC_CTYPE locale, and it doesn't modify the LC_ALL locale
anymore.</p>
</div>
<div class="section" id="configuration-of-locales-and-encodings">
<h2>Configuration of locales and encodings</h2>
<p>Python has <strong>many</strong> options to configure the locales and encodings.</p>
<p>Main options of Python 3.7:</p>
<ul class="simple">
<li>Legacy Windows stdio (PEP 528)</li>
<li>Legacy Windows filesystem encoding (PEP 529)</li>
<li>C locale coercion (PEP 538)</li>
<li>UTF-8 mode (PEP 540)</li>
</ul>
<p>The combination of C locale coercion and UTF-8 mode is non-obvious and should
be carefully tested!</p>
<p>Environment variables:</p>
<ul class="simple">
<li><tt class="docutils literal">PYTHONCOERCECLOCALE=0</tt></li>
<li><tt class="docutils literal">PYTHONCOERCECLOCALE=1</tt></li>
<li><tt class="docutils literal">PYTHONCOERCECLOCALE=warn</tt></li>
<li><tt class="docutils literal"><span class="pre">PYTHONIOENCODING=:<errors></span></tt></li>
<li><tt class="docutils literal"><span class="pre">PYTHONIOENCODING=<encoding>:<errors></span></tt></li>
<li><tt class="docutils literal"><span class="pre">PYTHONIOENCODING=<encoding></span></tt></li>
<li><tt class="docutils literal">PYTHONLEGACYWINDOWSFSENCODING=1</tt></li>
<li><tt class="docutils literal">PYTHONLEGACYWINDOWSSTDIO=1</tt></li>
<li><tt class="docutils literal">PYTHONUTF8=0</tt></li>
<li><tt class="docutils literal">PYTHONUTF8=1</tt></li>
</ul>
<p>Command line options:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">-X</span> utf8=0</tt></li>
<li><tt class="docutils literal"><span class="pre">-X</span> utf8</tt> or <tt class="docutils literal"><span class="pre">-X</span> utf8=1</tt></li>
<li><tt class="docutils literal"><span class="pre">-E</span></tt> or <tt class="docutils literal"><span class="pre">-I</span></tt> (ignore <tt class="docutils literal">PYTHON*</tt> environment variables)</li>
</ul>
<p>Global configuration variables:</p>
<ul class="simple">
<li><tt class="docutils literal">Py_FileSystemDefaultEncodeErrors</tt></li>
<li><tt class="docutils literal">Py_FileSystemDefaultEncoding</tt></li>
<li><tt class="docutils literal">Py_LegacyWindowsFSEncodingFlag</tt></li>
<li><tt class="docutils literal">Py_LegacyWindowsStdioFlag</tt></li>
<li><tt class="docutils literal">Py_UTF8Mode</tt></li>
</ul>
<p>_PyCoreConfig:</p>
<ul class="simple">
<li><tt class="docutils literal">coerce_c_locale</tt></li>
<li><tt class="docutils literal">coerce_c_locale_warn</tt></li>
<li><tt class="docutils literal">filesystem_encoding</tt></li>
<li><tt class="docutils literal">filesystem_errors</tt></li>
<li><tt class="docutils literal">stdio_encoding</tt></li>
<li><tt class="docutils literal">stdio_errors</tt></li>
</ul>
<p>The LC_CTYPE locale depends on 3 environment variables:</p>
<ul class="simple">
<li><tt class="docutils literal">LC_ALL</tt></li>
<li><tt class="docutils literal">LC_CTYPE</tt></li>
<li><tt class="docutils literal">LANG</tt></li>
</ul>
<p>Depending on the platform, the following configuration gives a different
LC_CTYPE locale:</p>
<ul class="simple">
<li><tt class="docutils literal">LC_ALL= LC_CTYPE= LANG=</tt> (no variable set)</li>
<li><tt class="docutils literal">LC_ALL= LC_CTYPE=C LANG=</tt> (C locale)</li>
<li><tt class="docutils literal">LC_ALL= LC_CTYPE=POSIX LANG=</tt> (POSIX locale)</li>
</ul>
<p>In case of doubt, I also tested:</p>
<ul class="simple">
<li><tt class="docutils literal">LC_ALL=C LC_CTYPE= LANG=</tt> (C locale)</li>
<li><tt class="docutils literal">LC_ALL=POSIX LC_CTYPE= LANG=</tt> (POSIX locale)</li>
</ul>
<p>The LC_CTYPE encoding (locale encoding) can be queried using
<tt class="docutils literal">nl_langinfo(CODESET)</tt>. On FreeBSD, Solaris, HP-UX and maybe other platforms,
<tt class="docutils literal">nl_langinfo(CODESET)</tt> announces an encoding which is different than the
codec used by <tt class="docutils literal">mbstowcs()</tt> and <tt class="docutils literal">wcstombs()</tt> functions, and so Python forces
the usage of the ASCII encoding.</p>
<p>The test matrix of all these configurations and all platforms is quite big.
Honestly, I would not bet that Python 3.8 will behave properly in all possible
cases. At least, I tried to fix all issues that I spotted! Moreover, I added
many tests which should help to detect bugs and prevent regressions.</p>
</div>
Python 3.7 UTF-8 Mode2018-03-27T20:00:00+02:002018-03-27T20:00:00+02:00Victor Stinnertag:vstinner.github.io,2018-03-27:/python37-new-utf8-mode.html<a class="reference external image-reference" href="https://www.flickr.com/photos/99444752@N06/9368903367/"><img alt="Sunrise" src="https://vstinner.github.io/images/sunrise.jpg" /></a>
<p>Since Python 3.0 was released in 2008, each time an user reported an encoding
issue, someone showed up and asked why Python does not "simply" always use UTF-8.
Well, it's not that easy. <strong>UTF-8 is the best encoding in most cases, but it is
still not the best encoding …</strong></p><a class="reference external image-reference" href="https://www.flickr.com/photos/99444752@N06/9368903367/"><img alt="Sunrise" src="https://vstinner.github.io/images/sunrise.jpg" /></a>
<p>Since Python 3.0 was released in 2008, each time an user reported an encoding
issue, someone showed up and asked why Python does not "simply" always use UTF-8.
Well, it's not that easy. <strong>UTF-8 is the best encoding in most cases, but it is
still not the best encoding in all cases</strong>, even in 2018. The locale encoding
remains the best default filesystem encoding for Python. I would say that <strong>the
locale encoding is the least bad filesystem encoding</strong>.</p>
<p>This article tells the story of my <a class="reference external" href="https://www.python.org/dev/peps/pep-0540/">PEP 540: Add a new UTF-8 Mode</a> which adds an opt-in option to
<strong>"use UTF-8" everywhere"</strong>. Moreover, the UTF-8 Mode is enabled by the POSIX
locale: <strong>Python 3.7 now uses UTF-8 for the POSIX locale</strong>. My
PEP 540 is complementary to Nick Coghlan's PEP 538.</p>
<p>When I started to write this article, I wrote something like: "Hey! I added a
new option to use UTF-8, enjoy!". Written like that, it seems like using UTF-8
was an obvious choice and that it was really easy to write such PEP. No.
<strong>Nothing was obvious, nothing was simple.</strong></p>
<p>It took me one year to design and implement my PEP 540, and to get it accepted.
I wrote five articles before this one to show that the PEP 540 only came after
a long painful journey, starting with Python 3.0, to choose the best Python
encoding. My PEP rely on the all the great work done previously.</p>
<p><strong>This article is the sixth and last in a series of articles telling the
history and rationale of the Python 3 Unicode model for the operating system:</strong></p>
<ul class="simple">
<li><ol class="first arabic">
<li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li>
</ol>
</li>
<li><ol class="first arabic" start="2">
<li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li>
</ol>
</li>
<li><ol class="first arabic" start="3">
<li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li>
</ol>
</li>
<li><ol class="first arabic" start="4">
<li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li>
</ol>
</li>
<li><ol class="first arabic" start="5">
<li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li>
</ol>
</li>
<li><ol class="first arabic" start="6">
<li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li>
</ol>
</li>
</ul>
<div class="section" id="fallback-to-utf-8-if-getting-the-locale-encoding-fails">
<h2>Fallback to UTF-8 if getting the locale encoding fails?</h2>
<p>May 2010, I reported <a class="reference external" href="https://bugs.python.org/issue8610">bpo-8610</a>:
"Python3/POSIX: errors if file system encoding is None". I asked what should
be the default encoding when getting the locale encoding fails. I proposed
to fallback to UTF-8. <a class="reference external" href="https://bugs.python.org/issue8610#msg105008">I wrote</a>:</p>
<blockquote>
<strong>UTF-8 is also an optimist choice</strong>: I bet that more and more operating
systems will move to UTF-8.</blockquote>
<p><a class="reference external" href="https://bugs.python.org/issue8610#msg105010">Marc-Andre commented</a>:</p>
<blockquote>
Ouch, that was a poor choice. <strong>In Python we have a tradition to avoid
guessing</strong>, if possible. Since we cannot guarantee that the file system
will indeed use UTF-8, it would have been safer to use ASCII. Not sure why
this reasoning wasn't applied for the file system encoding.</blockquote>
<p>In practice, Python already used UTF-8 when the filesystem encoding was set to
<tt class="docutils literal">None</tt>. I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/b744ba1d14c5487576c95d0311e357b707600b47">commit b744ba1d</a>
into the Python 3.2 development branch to make the default encoding (UTF-8)
more obvious. But before Python 3.2 was released, I removed the fallback with
my <a class="reference external" href="https://github.com/python/cpython/commit/e474309bb7f0ba6e6ae824c215c45f00db691889">commit e474309b</a>
(Oct 2010):</p>
<blockquote>
<p><tt class="docutils literal">initfsencoding()</tt>: <tt class="docutils literal">get_codeset()</tt> failure is now a fatal error</p>
<p>Don't fallback to UTF-8 anymore to avoid mojibake. I never got any error
from his function.</p>
</blockquote>
</div>
<div class="section" id="the-utf8-option-proposed-for-windows">
<h2>The utf8 option proposed for Windows</h2>
<p>August 2016, <a class="reference external" href="https://bugs.python.org/issue27781">bpo-27781</a>: when <strong>Steve
Dower</strong> <a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">was working on changing the filesystem encoding to UTF-8</a>, I was not sure that Windows should use UTF-8
by default. I was more in favor on <strong>making the backward incompatible change an
opt-in option</strong>. <a class="reference external" href="https://bugs.python.org/issue27781#msg272950">I wrote</a>:</p>
<blockquote>
<p><strong>If you go in this direction, I would like to follow you for the UNIX/BSD
side to make the switch portable. I was thinking about "-X utf8" which
avoids to change the command line parser.</strong></p>
<p>If we agree on a plan, <strong>I would like to write it down as a PEP since I
expect a lot of complains and questions which I would prefer to only answer
once</strong> (see for example the length of your thread on python-ideas where
each people repeated the same things multiple times ;-))</p>
</blockquote>
<p><a class="reference external" href="https://bugs.python.org/issue27781#msg272962">I added</a>:</p>
<blockquote>
I mean that <tt class="docutils literal">python3 <span class="pre">-X</span> utf8</tt> should force
<tt class="docutils literal">sys.getfilesystemencoding()</tt> to UTF-8 on UNIX/BSD, it would ignore the
current locale setting.</blockquote>
<p>Since Steve chose to <strong>change the default to UTF-8</strong> on Windows, my <tt class="docutils literal"><span class="pre">-X</span> utf8</tt>
option idea was ignored in this issue.</p>
</div>
<div class="section" id="the-utf8-option-proposed-for-the-posix-locale">
<h2>The utf8 option proposed for the POSIX locale</h2>
<p>September 2016: <strong>Jan Niklas Hasse</strong> opened <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a> about Docker images,
<strong>"sys.getfilesystemencoding() should default to utf-8"</strong>.</p>
<p><a class="reference external" href="https://bugs.python.org/issue28180#msg276707">I proposed again my option</a>:</p>
<blockquote>
I proposed to add <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option for UNIX to force utf8
encoding. Would it work for you?</blockquote>
<p><strong>Jan Niklas Hasse</strong> <a class="reference external" href="https://bugs.python.org/issue28180#msg276709">answered</a>:</p>
<blockquote>
Unfortunately no, as this would mean I'll have to change all my python
invocations in my scripts and it wouldn't work for executable files with</blockquote>
<p>December 2016, <a class="reference external" href="https://bugs.python.org/issue28180#msg283408">I added</a>:</p>
<blockquote>
<p>Usually, when a new option is added to Python, we add a command line option
(-X utf8) but also an environment variable: <strong>I propose PYTHONUTF8=1</strong>.</p>
<p>Use your favorite method to define the env var "system wide" in your docker
containers.</p>
<p>Note: Technically, I'm not sure that it's possible to support -E option
with PYTHONUTF8, since -E comes from the command line, and we first need to
decode command line arguments with an encoding to parse these options....
Chicken-and-egg issue ;-)</p>
</blockquote>
<p><strong>Nick Coghlan</strong> <a class="reference external" href="https://vstinner.github.io/posix-locale.html">wrote his PEP 538 "Coercing the C locale to a UTF-8 based
locale"</a> which has been approved in May 2017
and finally implemented in June 2017.</p>
<p>Again, my utf8 idea was ignored in this issue.</p>
</div>
<div class="section" id="first-version-of-my-pep-540-add-a-new-utf-8-mode">
<h2>First version of my PEP 540: Add a new UTF-8 Mode</h2>
<p>January 2017, as a follow-up of <a class="reference external" href="https://bugs.python.org/issue27781">bpo-27781</a> and <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>, I wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0540/">PEP 540: Add a new UTF-8
Mode</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044089.html">I posted it to
python-ideas for comments</a>.</p>
<p>Abstract:</p>
<blockquote>
Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system
data instead of the locale encoding. Add <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option
and <tt class="docutils literal">PYTHONUTF8</tt> environment variable.</blockquote>
<p>After ten hours after and a few messages, I <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044099.html">wrote a second version</a>:</p>
<blockquote>
I modified my PEP: <strong>the POSIX locale now enables the UTF-8 mode</strong>.</blockquote>
<p><strong>INADA Naoki</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044112.html">wrote</a>:</p>
<blockquote>
<p>I want UTF-8 mode is <strong>enabled by default (opt-out option) even if locale
is not POSIX</strong>, like <cite>PYTHONLEGACYWINDOWSFSENCODING</cite>.</p>
<p>Users depends on locale know what locale is and how to configure it. They
can understand difference between locale mode and UTF-8 mode and they can
opt-out UTF-8 mode.</p>
<p><strong>But many people lives in "UTF-8 everywhere" world</strong>, and don't know about
locale.</p>
</blockquote>
<p>Always ignoring the locale to <strong>always use UTF-8 would be a backward
incompatible change</strong>. I wasn't brave enough to propose it, I only
wanted to propose an opt-in option, except of the specific case of the POSIX
locale.</p>
<p>Not only people had different opinons, but most people had strong opinions on
how to handle Unicode and were not ready for compromises.</p>
</div>
<div class="section" id="third-version-of-my-pep-540">
<h2>Third version of my PEP 540</h2>
<p>One week and 59 emails later, I <a class="reference external" href="https://bugs.python.org/issue29240">implemented my PEP 540</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044197.html">I wrote a third version of my PEP</a>:</p>
<blockquote>
<p>I made multiple changes since the first version of my PEP:</p>
<ul class="simple">
<li>The <strong>UTF-8 Strict mode now only uses strict for inputs and outputs</strong>:
it keeps surrogateescape for operating system data. Read the "Use the
strict error handler for operating system data" alternative for the
rationale.</li>
<li>The POSIX locale now enables the UTF-8 mode. See the "Don't modify
the encoding of the POSIX locale" alternative for the rationale.</li>
<li>Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc.</li>
</ul>
<p>The PEP version 3 has a longer rationale with more example. (...)</p>
</blockquote>
<p>The new thread also got 19 emails, total: <strong>78 emails in one month</strong>. The same
month, Nick Coghlan's PEP 538 was also under discussion.</p>
</div>
<div class="section" id="silence-during-one-year">
<h2>Silence during one year</h2>
<p>Because of the tone of the python-ideas threads and because I didn't know how
to deal with Nick Coghlan's PEP 538, <strong>I decided to do nothing during one
year</strong> (January to December 2017).</p>
<p>April 2017, Nick <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147795.html">proposed</a>
<strong>INADA Naoki</strong> as the BDFL Delegate for his PEP 538 and my PEP 540. Guido
<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147796.html">accepted to delegate</a>.</p>
<p>May 2017, Naoki approved Nick's PEP 538, and Nick implemented it.</p>
</div>
<div class="section" id="pep-540-version-3-posted-to-python-dev">
<h2>PEP 540 version 3 posted to python-dev</h2>
<p>At the end of 2017, when I looked at my contributions in Python 3.7 in the
<a class="reference external" href="https://docs.python.org/dev/whatsnew/3.7.html">What’s New In Python 3.7</a>
document, I didn't see any significant contribution. I wanted to propose
something. Moreover, the deadline for the Python 3.7 feature freeze (first beta
version) was getting close, end of January 2018: see the <a class="reference external" href="https://www.python.org/dev/peps/pep-0537/">PEP 537: Python 3.7
Release Schedule</a>.</p>
<p>December 2017, I decided to move to the next step: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151054.html">I sent my PEP to the
python-dev mailing list</a>.</p>
<p>Guido van Rossum <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151069.html">complained about the length of the PEP</a>:</p>
<blockquote>
<p>I've been discussing this PEP offline with Victor, but he suggested we
should discuss it in public instead.</p>
<p><strong>I am very worried about this long and rambling PEP, and I propose that it
not be accepted without a major rewrite to focus on clarity of the
specification. The "Unicode just works" summary is more a wish than a
proper summary of the PEP.</strong></p>
<p>(...)</p>
<p>So I guess PEP acceptance week is over. :-(</p>
</blockquote>
</div>
<div class="section" id="pep-rewritten-from-scratch">
<h2>PEP rewritten from scratch</h2>
<p>Even if <strong>I was not fully convinced myself that my PEP was a good idea</strong>, I
wanted to get an official vote, to know if my idea should be implemented or
abandonned. I decided to rewrite my PEP from scratch:</p>
<ul class="simple">
<li><a class="reference external" href="https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt">PEP version 3 (before rewrite)</a>:
1,017 lines</li>
<li><a class="reference external" href="https://github.com/python/peps/blob/0bb19ff93af9855db327e9a02f3e86b6f932a25a/pep-0540.txt">PEP version 4 (after rewrite)</a>:
263 lines (26% of the previous version)</li>
</ul>
<p>I reduced the rationale to the strict minimum, to explain <strong>key points</strong> of the
PEP:</p>
<ul class="simple">
<li>Locale encoding and UTF-8</li>
<li>Passthough undecodable bytes: surrogateescape</li>
<li>Strict UTF-8 for correctness</li>
<li>No change by default for best backward compatibility</li>
</ul>
</div>
<div class="section" id="reading-jpeg-pictures-with-surrogateescape">
<h2>Reading JPEG pictures with surrogateescape</h2>
<p>December 2017, I sent the <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151074.html">shorter PEP version 4 to python-dev</a>.</p>
<p>INADA Naoki, the BDFL-delegate, <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151081.html">spotted a design issue</a>:</p>
<blockquote>
<p>And I have one worrying point. With UTF-8 mode, <strong>open()'s default</strong>
encoding/error handler <strong>is UTF-8/surrogateescape</strong>.</p>
<p>(...)</p>
<p>And <strong>opening binary file without "b" option is very common mistake</strong> of
new developers. If default error handler is surrogateescape, <strong>they lose a
chance to notice their bug</strong>.</p>
</blockquote>
<p>He <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151101.html">gave a concrete example</a>:</p>
<blockquote>
<p>With PEP 538 (C.UTF-8 locale), <tt class="docutils literal">open()</tt> uses UTF-8/strict, not
UTF-8/surrogateescape.</p>
<p>For example, this code raises <tt class="docutils literal">UnicodeDecodeError</tt> with PEP 538 if the
file is JPEG file.</p>
<pre class="literal-block">
with open(fn) as f:
f.read()
</pre>
</blockquote>
<p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151132.html">I replied</a>:</p>
<blockquote>
<p>While I'm not strongly convinced that <tt class="docutils literal">open()</tt> error handler must be
changed for <tt class="docutils literal">surrogateescape</tt>, first <strong>I would like to make sure that
it's really a very bad idea</strong> before changing it :-)</p>
<p>(...)</p>
<p>Using a JPEG image, the example is obviously wrong.</p>
<p>But using surrogateescape on open() has been chosen to <strong>read text files
which are mostly correctly encoded to UTF-8, except a few bytes</strong>.</p>
<p>I'm not sure how to explain the issue. The Mercurial wiki page has a good
example of this issue that they call the <a class="reference external" href="https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22">"Makefile problem"</a>.</p>
</blockquote>
<p><strong>Guido van Rossum</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151134.html">finished to convinced me</a>:</p>
<blockquote>
You will quickly get decoding errors, and that is <strong>INADA</strong>'s point.
(Unless you use <tt class="docutils literal"><span class="pre">encoding='Latin-1'</span></tt>.) His worry is that the
surrogateescape error handler makes it so that you won't get decoding
errors, and then <strong>the failure mode is much harder to debug</strong>.</blockquote>
<p>I <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151136.html">wrote a 5th version of my PEP</a>:</p>
<blockquote>
<p>I made the following two changes to the PEP 540:</p>
<ul class="simple">
<li>open() error handler remains <tt class="docutils literal">"strict"</tt></li>
<li>Remove the "Strict UTF8 mode" which doesn't make much sense anymore</li>
</ul>
</blockquote>
</div>
<div class="section" id="last-question-on-locale-getpreferredencoding">
<h2>Last question on locale.getpreferredencoding()</h2>
<p>December 2017, <strong>INADA Naoki</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151144.html">asked</a>:</p>
<blockquote>
Or <tt class="docutils literal">locale.getpreferredencoding()</tt> returns <tt class="docutils literal"><span class="pre">'UTF-8'</span></tt> in UTF-8 mode too?</blockquote>
<p>Oh, that's a good question! I <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151148.html">looked at the code</a> and
agreed to return UTF-8:</p>
<blockquote>
<p>I checked the stdlib, and I found many places where
<tt class="docutils literal">locale.getpreferredencoding()</tt> is used to get the user preferred
encoding:</p>
<ul class="simple">
<li>builtin <tt class="docutils literal">open()</tt>: default encoding</li>
<li><tt class="docutils literal">cgi.FieldStorage</tt>: encode the query string</li>
<li><tt class="docutils literal">encoding._alias_mbcs()</tt>: check if the requested encoding is the ANSI
code page</li>
<li><tt class="docutils literal">gettext.GNUTranslations</tt>: <tt class="docutils literal">lgettext()</tt> and <tt class="docutils literal">lngettext()</tt> methods</li>
<li><tt class="docutils literal">xml.etree.ElementTree</tt>: <tt class="docutils literal"><span class="pre">ElementTree.write(encoding='unicode')</span></tt></li>
</ul>
<p>In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all use
the UTF-8 encoding by default. So <strong>locale.getpreferredencoding() should
return UTF-8 if the UTF-8 mode is enabled</strong>.</p>
</blockquote>
<p>I <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151151.html">sent a 6th version of my PEP</a>:</p>
<blockquote>
locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 Mode.</blockquote>
<p>Moreover, I also wrote a new much better written "Relationship with the locale
coercion (PEP 538)" section replacing the "Annex: Differences between
PEP 538 and PEP 540" section. The new section was asked by many people who were
confused by the relationship between PEP 538 and PEP 540.</p>
<p>Finally, one year after the first PEP version, INADA Naoki <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151193.html">approved my PEP</a>!</p>
</div>
<div class="section" id="first-incomplete-implementation">
<h2>First incomplete implementation</h2>
<p>I started to work on the implementation of my PEP 540 in March 2017. Once the
PEP has been approved, I asked INADA Naoki for a review. <a class="reference external" href="https://github.com/python/cpython/pull/855#issuecomment-351089573">He asked me to fix the
command line parsing</a> to handle
properly the <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option:</p>
<blockquote>
And when <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option is found, we can decode from <tt class="docutils literal">char **argv</tt>
again. Since <tt class="docutils literal">mbstowcs()</tt> doesn't guarantee round tripping, it is better
than re-encode <tt class="docutils literal">wchar_t **argv</tt>.</blockquote>
<p>Implementing properly the <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option was tricky. Parsing the command line
was done on <tt class="docutils literal">wchar_t*</tt> C strings (Unicode), which requires to decode the
<tt class="docutils literal">char** argv</tt> C array of byte strings (bytes). Python starts by decoding byte
strings from the locale encoding. If the utf8 option is detected, <tt class="docutils literal">argv</tt> byte
strings must be decoded again, but now from UTF-8. The problem was that the
code was not designed for that, and it required to refactor a lot of code in
<tt class="docutils literal">Py_Main()</tt>.</p>
<p><a class="reference external" href="https://github.com/python/cpython/pull/855#issuecomment-351252873">I replied</a>:</p>
<blockquote>
<p><tt class="docutils literal">main()</tt> and <tt class="docutils literal">Py_Main()</tt> are very complex. With the <a class="reference external" href="https://www.python.org/dev/peps/pep-0432/">PEP 432</a>, <strong>Nick Coghlan</strong>, <strong>Eric
Snow</strong> and me are working on making this code better. See for example
<a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a>.</p>
<p>(...)</p>
<p>For all these reasons, <strong>I propose to merge this uncomplete PR and write a
different PR for the most complex part</strong>, re-encode wchar_t* command line
arguments, implement Py_UnixMain() or another even better option?</p>
</blockquote>
<p>I wanted to get my code merged as soon as possible to make sure that it will
get into the first Python 3.7 beta, to get a longer testing period before
Python 3.7 final.</p>
<p>December 2017, <a class="reference external" href="https://bugs.python.org/issue29240">bpo-29240</a>, I pushed my
<a class="reference external" href="https://github.com/python/cpython/commit/91106cd9ff2f321c0f60fbaa09fd46c80aa5c266">commit 91106cd9</a>:</p>
<blockquote>
<p>PEP 540: Add a new UTF-8 Mode</p>
<ul class="simple">
<li>Add <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option, <tt class="docutils literal">PYTHONUTF8</tt> environment variable
and a new <tt class="docutils literal">sys.flags.utf8_mode</tt> flag.</li>
<li><tt class="docutils literal">locale.getpreferredencoding()</tt> now returns 'UTF-8' in the UTF-8
mode. As a side effect, open() now uses the UTF-8 encoding by
default in this mode.</li>
</ul>
</blockquote>
</div>
<div class="section" id="split-py-main-into-subfunctions">
<h2>Split Py_Main() into subfunctions</h2>
<p>November 2017, I created <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a> to
split the big <tt class="docutils literal">Py_Main()</tt> function into smaller subfunctions. My motivation
was to be able to properly implement my PEP 540.</p>
<p>It will take me <strong>3 months of work and 45 commits</strong> to completely cleanup
<tt class="docutils literal">Py_Main()</tt> and put almost all Python configuration options into the private
C <tt class="docutils literal">_PyCoreConfig</tt> structure.</p>
</div>
<div class="section" id="parse-again-the-command-line-when-x-utf8-is-used">
<h2>Parse again the command line when -X utf8 is used</h2>
<p>December 2017, <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a>, thanks to
the <tt class="docutils literal">Py_Main()</tt> refactoring, I was able to finish the implementation of my
PEP.</p>
<p>I pushed my <a class="reference external" href="https://github.com/python/cpython/commit/9454060e84a669dde63824d9e2fcaf295e34f687">commit 9454060e</a>:</p>
<blockquote>
<p><tt class="docutils literal">Py_Main()</tt> re-reads config if encoding changes</p>
<p>If the encoding change (C locale coerced or UTF-8 Mode changed),
<tt class="docutils literal">Py_Main()</tt> now reads again the configuration with the new encoding.</p>
</blockquote>
<p>If the encoding changed after reading the Python configuration, cleanup the
configuration and <strong>read again the configuration with the new encoding.</strong> The
key feature here allowed by the refactoring is to be able to cleanup properly
all the configuration.</p>
</div>
<div class="section" id="utf-8-mode-and-the-locale-encoding">
<h2>UTF-8 Mode and the locale encoding</h2>
<p>January 2018, while working on <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a> "localeconv() should decode numeric
fields from LC_NUMERIC encoding, not from LC_CTYPE encoding", I tested various
combinations of locales and encodings. <strong>I found bugs with the UTF-8 mode.</strong></p>
<p>When the UTF-8 mode is enabled explicitly by <tt class="docutils literal"><span class="pre">-X</span> utf8</tt>, the intent is to use
UTF-8 "everywhere". Right. But <strong>there are some places, where the current
locale encoding is really the correct encoding</strong>, like the <tt class="docutils literal">time.strftime()</tt>
function.</p>
<p><a class="reference external" href="https://bugs.python.org/issue29240">bpo-29240</a>: I pushed a first fix,
<a class="reference external" href="https://github.com/python/cpython/commit/cb3ae5588bd7733e76dc09277bb7626652d9bb64">commit cb3ae558</a>:</p>
<blockquote>
<p>Ignore UTF-8 Mode in the <tt class="docutils literal">time</tt> module</p>
<p><tt class="docutils literal">time.strftime()</tt> must use the current <tt class="docutils literal">LC_CTYPE</tt> encoding, not UTF-8
if the UTF-8 mode is enabled.</p>
</blockquote>
<p>I tested more cases and found... <strong>more bugs</strong>. More functions must really use the
current locale encoding, rather than UTF-8 if the UTF-8 Mode is enabled.</p>
<p>I pushed a second fix, <a class="reference external" href="https://github.com/python/cpython/commit/7ed7aead9503102d2ed316175f198104e0cd674c">commit 7ed7aead</a>:</p>
<blockquote>
<p>Fix locale encodings in UTF-8 Mode</p>
<p>Modify <tt class="docutils literal">locale.localeconv()</tt>, <tt class="docutils literal">time.tzname</tt>, <tt class="docutils literal">os.strerror()</tt> and
other functions to ignore the UTF-8 Mode: always use the current locale
encoding.</p>
</blockquote>
<p>The second fix documented the encoding used by the public C functions
<a class="reference external" href="https://docs.python.org/dev/c-api/sys.html#c.Py_DecodeLocale">Py_DecodeLocale()</a> and
<a class="reference external" href="https://docs.python.org/dev/c-api/sys.html#c.Py_EncodeLocale">Py_EncodeLocale()</a>:</p>
<blockquote>
<p>Encoding, highest priority to lowest priority:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">UTF-8</span></tt> on macOS and Android;</li>
<li><tt class="docutils literal"><span class="pre">UTF-8</span></tt> if the Python UTF-8 mode is enabled;</li>
<li><tt class="docutils literal">ASCII</tt> if the <tt class="docutils literal">LC_CTYPE</tt> locale is <tt class="docutils literal">"C"</tt>,
<tt class="docutils literal">nl_langinfo(CODESET)</tt> returns the <tt class="docutils literal">ASCII</tt> encoding (or an alias),
and <tt class="docutils literal">mbstowcs()</tt> and <tt class="docutils literal">wcstombs()</tt> functions uses the
<tt class="docutils literal"><span class="pre">ISO-8859-1</span></tt> encoding.</li>
<li>the current locale encoding.</li>
</ul>
</blockquote>
<p>The fix was complex to be written because I had to extend Py_DecodeLocale() and
Py_EncodeLocale() to support internally the <tt class="docutils literal">strict</tt> error handler. I also
extended to API to report an error message (called "reason") on failure.</p>
<p>For example, <tt class="docutils literal">Py_DecodeLocale()</tt> has the prototype:</p>
<pre class="literal-block">
wchar_t*
Py_DecodeLocale(const char* arg, size_t *wlen)
</pre>
<p>whereas the new extended and more generic <tt class="docutils literal">_Py_DecodeLocaleEx()</tt> has a much
more complex prototype:</p>
<pre class="literal-block">
int
_Py_DecodeLocaleEx(const char* arg, wchar_t **wstr, size_t *wlen,
const char **reason,
int current_locale, int surrogateescape)
</pre>
<p>To decode, there are two main use cases:</p>
<ul class="simple">
<li>(FILENAME) Use UTF-8 if the UTF-8 Mode is enabled, or the locale encoding
otherwise. See <tt class="docutils literal">Py_DecodeLocale()</tt> documentation for the exact used
encoding, the truth is more complex.</li>
<li>(LOCALE) Always use the current locale encoding</li>
</ul>
<p>(FILENAME) examples:</p>
<ul class="simple">
<li><tt class="docutils literal">Py_DecodeLocale()</tt>, <tt class="docutils literal">PyUnicode_DecodeFSDefaultAndSize()</tt>: use the
<tt class="docutils literal">surrogateescape</tt> error handler</li>
<li><tt class="docutils literal">os.fsdecode()</tt></li>
<li><tt class="docutils literal">os.listdir()</tt></li>
<li><tt class="docutils literal">os.environ</tt></li>
<li><tt class="docutils literal">sys.argv</tt></li>
<li>etc.</li>
</ul>
<p>(LOCALE) examples:</p>
<ul class="simple">
<li><tt class="docutils literal">PyUnicode_DecodeLocale()</tt>: the error handler is passed as an argument and
must be <tt class="docutils literal">strict</tt> or <tt class="docutils literal">surrogateescape</tt></li>
<li><tt class="docutils literal">time.strftime()</tt></li>
<li><tt class="docutils literal">locale.localeconv()</tt></li>
<li><tt class="docutils literal">time.tzname</tt></li>
<li><tt class="docutils literal">os.strerror()</tt></li>
<li><tt class="docutils literal">readline</tt> module: internal <tt class="docutils literal">decode()</tt> function</li>
<li>etc.</li>
</ul>
</div>
<div class="section" id="summary-of-pep-540-history">
<h2>Summary of PEP 540 history</h2>
<ul class="simple">
<li>Version 1: first version sent to python-ideas</li>
<li>Version 2: the POSIX locale now enables the UTF-8 mode</li>
<li>Version 3: the UTF-8 Strict mode now only uses the <tt class="docutils literal">strict</tt> error handler
for inputs and outputs</li>
<li>Version 4: PEP rewritten from scratch to be shorter</li>
<li>Version 5: open() error handler remains <tt class="docutils literal">strict</tt>, and the "Strict UTF8
mode" has been removed</li>
<li>Version 6: locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
Mode.</li>
</ul>
<p>Abstract of the final approved PEP:</p>
<blockquote>
<p>Add a new "UTF-8 Mode" to enhance Python's use of UTF-8. When UTF-8 Mode
is active, Python will:</p>
<ul class="simple">
<li>use the <tt class="docutils literal"><span class="pre">utf-8</span></tt> encoding, irregardless of the locale currently set by
the current platform, and</li>
<li>change the <tt class="docutils literal">stdin</tt> and <tt class="docutils literal">stdout</tt> error handlers to
<tt class="docutils literal">surrogateescape</tt>.</li>
</ul>
<p>This mode is off by default, but is automatically activated when using
the "POSIX" locale.</p>
<p>Add the <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option and <tt class="docutils literal">PYTHONUTF8</tt> environment
variable to control UTF-8 Mode.</p>
</blockquote>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>It's now time for a well deserved nap... until the next major Unicode issue in Python.</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/manager_2000/2911858714/"><img alt="Tiger nap" src="https://vstinner.github.io/images/tiger_nap.jpg" /></a>
<p>(I love tigers: my favorite animals!)</p>
</div>
Python 3.7 and the POSIX locale2018-03-23T13:00:00+01:002018-03-23T13:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-23:/posix-locale.html<a class="reference external image-reference" href="https://www.flickr.com/photos/rj65/15010849568/"><img alt="Bee" src="https://vstinner.github.io/images/bee.jpg" /></a>
<p>During the childhood of Python 3, encodings issues were common, even on well
configured systems. Python used UTF-8 rather than the locale encoding, and so
commonly produced <a class="reference external" href="https://en.wikipedia.org/wiki/Mojibake">mojibake</a>. For
these reasons, when users complained about the Python behaviour with the POSIX
locale, bug reports were closed with a message like …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/rj65/15010849568/"><img alt="Bee" src="https://vstinner.github.io/images/bee.jpg" /></a>
<p>During the childhood of Python 3, encodings issues were common, even on well
configured systems. Python used UTF-8 rather than the locale encoding, and so
commonly produced <a class="reference external" href="https://en.wikipedia.org/wiki/Mojibake">mojibake</a>. For
these reasons, when users complained about the Python behaviour with the POSIX
locale, bug reports were closed with a message like: "your system is not
properly configured, please fix your locale".</p>
<p>I only started to make a shy change for the POSIX locale in Python 3.5 at the
end of 2013: use <tt class="docutils literal">surrogateescape</tt> for stdin and stdout. We will have to wait
for Nick Coghlan in 2017 for significant changes in Python 3.7.</p>
<p>This article explains the slow transition, <strong>six years</strong> since the first bug
report (2011) to the significant change (2017), from "you must fix your locale"
to "maybe Python can do something for you".</p>
<p><strong>This article is the fifth in a series of articles telling the history and
rationale of the Python 3 Unicode model for the operating system:</strong></p>
<ul class="simple">
<li><ol class="first arabic">
<li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li>
</ol>
</li>
<li><ol class="first arabic" start="2">
<li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li>
</ol>
</li>
<li><ol class="first arabic" start="3">
<li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li>
</ol>
</li>
<li><ol class="first arabic" start="4">
<li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li>
</ol>
</li>
<li><ol class="first arabic" start="5">
<li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li>
</ol>
</li>
<li><ol class="first arabic" start="6">
<li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li>
</ol>
</li>
</ul>
<div class="section" id="first-rejected-attempt-2011">
<h2>First rejected attempt, 2011</h2>
<p>December 2011, <strong>Martin Packman</strong>, a Bazaar developer, reported <a class="reference external" href="https://bugs.python.org/issue13643">bpo-13643</a> to propose to use UTF-8 in Python if the
locale encoding is ASCII:</p>
<blockquote>
<p>Currently when running Python on a non-OSX posix environment under either
the <strong>C locale</strong>, or with an invalid or missing locale, it's <strong>not possible
to operate using unicode filenames outside the ascii range</strong>. Using bytes
works, as does reading expecting unicode, using the surrogates hack.</p>
<p>This makes robustly working with non-ascii filenames on different platforms
needlessly annoying, given <strong>no modern nix should have problems just using
UTF-8 in these cases</strong>.</p>
<p>See the <a class="reference external" href="https://bugs.launchpad.net/bzr/+bug/794353">downstream bzr bug for more</a>.</p>
<p>One option is to <strong>just use UTF-8</strong> for encoding and decoding filenames
<strong>when otherwise ascii would be used</strong>. As a strict superset, this
shouldn't break too many existing assumptions, and <strong>it's unlikely that
non-UTF-8 filenames will accidentally be mangled due to a locale setting
blip.</strong> See the attached patch for this behaviour change. It does not
include a test currently, but it's possible to write one using subprocess
and overriden <tt class="docutils literal">LANG</tt> and <tt class="docutils literal">LC_ALL</tt> vars.</p>
</blockquote>
<p><a class="reference external" href="https://bugs.python.org/issue13643#msg149928">He added</a>:</p>
<blockquote>
<p>This is more about <strong>un-encodable filenames</strong>.</p>
<p>At the moment work with non-ascii filenames in Python robustly requires two
branches, one using unicode and one that encodes to bytestrings and deals
with the case where the name can't be represented in the declared
filesystem encoding.</p>
<p><strong>That may be something that just had to be lived with</strong>, but it's a little
annoying when even without a UTF-8 locale for a particular process, that's
what most systems will want on disk.</p>
</blockquote>
<p>At this time, I was still traumatised by the <tt class="docutils literal">PYTHONFSENCODING</tt> mess: using a
filesystem encoding different than the locale encoding caused many issues (see
<a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a>). <a class="reference external" href="https://bugs.python.org/issue13643#msg149926">I wrote</a>:</p>
<blockquote>
It was already discussed: using a different encoding for filenames and for
other things is really not a good idea. (...)</blockquote>
<p>and <a class="reference external" href="https://bugs.python.org/issue13643#msg149927">I added</a>:</p>
<blockquote>
The right fix is to <strong>fix your locale, not Python</strong>.</blockquote>
<p>Antoine Pitrou <a class="reference external" href="https://bugs.python.org/issue13643#msg149949">suggested to fix the operating system, not Python</a>:</p>
<blockquote>
<p>So <strong>why don't these supposedly "modern" systems at least set the
appropriate environment variables</strong> for Python to infer the proper
character encoding? (since these "modern" systems don't have a
well-defined encoding...)</p>
<p>Answer: because they are not modern at all, <strong>they are antiquated,
inadapted and obsolete pieces of software designed and written by clueless
Anglo-American people</strong>. Please report bugs against these systems. <strong>The
culprit is not Python, it's the Unix crap</strong> and the utterly clueless
attitude of its maintainers ("filesystems are just bytes", yeah,
whatever...).</p>
</blockquote>
<p><strong>Martin Pool</strong> <a class="reference external" href="https://bugs.python.org/issue13643#msg149951">wrote</a>:</p>
<blockquote>
The standard encoding is UTF-8. Python shouldn't need to have a variable
set to tell it this.</blockquote>
<p><a class="reference external" href="https://bugs.python.org/issue13643#msg149952">Antoine replied</a>:</p>
<blockquote>
How so? I don't know of any Linux or Unix spec which says so.</blockquote>
<p>Four days and 34 messages later, <strong>Terry J. Reedy</strong>
<a class="reference external" href="https://bugs.python.org/issue13643#msg150204">closed the issue</a>:</p>
<blockquote>
<p>Martin, after reading most all of the <strong>unusually large sequence of
messages</strong>, I am closing this because <strong>three of the core developers</strong> with
the most experience in this area are <strong>dead-set against your proposal</strong>.</p>
<p>That does not make it 'wrong', but does mean that it will not be approved
and implemented without new data and more persuasive arguments than those
presented so far. I do not see that continued repetition of what has been
said so far will change anything.</p>
</blockquote>
<p>Getting many messages in short time is common when discussing Unicode issues
:-)</p>
<p>March 2011, <strong>Armin Ronacher</strong> and <strong>Carl Meyer</strong> reported a similar issue:
<a class="reference external" href="https://bugs.python.org/issue11574">bpo-11574</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2011-March/109361.html">[Python-Dev] Low-Level Encoding Behavior on Python 3</a>. I
closed the issue as "wont fixed" in April 2012.</p>
</div>
<div class="section" id="second-attempt-2013">
<h2>Second attempt, 2013</h2>
<p>November 2013, <strong>Sworddragon</strong> reported <a class="reference external" href="https://bugs.python.org/issue19846">bpo-19846</a>: <tt class="docutils literal">LANG=C python3 <span class="pre">-c</span> <span class="pre">'print("\xe4")'</span></tt>
fails with an <tt class="docutils literal">UnicodeEncodeError</tt>.</p>
<p><strong>Antoine Pitrou</strong> wrote a patch to use UTF-8 when the locale encoding is
ASCII, same approach than the first attempt <a class="reference external" href="https://bugs.python.org/issue13643">bpo-13643</a>.</p>
<p><strong>The patch was incomplete and so caused many issues.</strong> Python used the C codec
of the locale encoding during Python initialization, and so Python had to use
the locale encoding as its filesystem encoding.</p>
<p>I listed all functions that should be modified to fix issues and get a fully
working solution. Nobody came up with a full implementation, likely because
<strong>too many changes were required</strong>.</p>
<p>One month and 66 messages (almost the double of the previous attempt) later,
again, <a class="reference external" href="https://bugs.python.org/issue19846#msg205675">I closed the issue</a>:</p>
<blockquote>
<p>I'm closing the issue as invalid, because <strong>Python 3 behaviour is correct</strong>
and must not be changed.</p>
<p>Standard streams (sys.stdin, sys.stdout, sys.stderr) uses the locale
encoding. (...) These encodings and error handlers can be overriden by the
<strong>PYTHONIOENCODING</strong>.</p>
</blockquote>
<p>My <a class="reference external" href="https://bugs.python.org/issue19846#msg205675">full long comment</a>
describes encodings used on each platform.</p>
</div>
<div class="section" id="use-surrogateescape-for-stdin-and-stdout-in-python-3-5">
<h2>Use surrogateescape for stdin and stdout in Python 3.5</h2>
<p>December 2013: Just after closing the second attempt <a class="reference external" href="https://bugs.python.org/issue19846">bpo-19846</a>, I created <a class="reference external" href="https://bugs.python.org/issue19977">bpo-19977</a> to propose to use the
<tt class="docutils literal">surrogateescape</tt> error handler in <tt class="docutils literal">sys.stdin</tt> and <tt class="docutils literal">sys.stdout</tt> for the
POSIX locale.</p>
<p><strong>R. David Murray</strong> <a class="reference external" href="https://bugs.python.org/issue19977#msg206131">disliked my idea</a>:</p>
<blockquote>
<p><strong>Reintroducing moji-bake intentionally doesn't sound like a particularly
good idea</strong>, wasn't that what python3 was supposed to help prevent?</p>
<p>It does seem like a <strong>utf-8 default is the Way of the Future</strong>. Or even the
present, most places.</p>
</blockquote>
<p>March 2014, since <strong>Serhiy Storchaka</strong> and <strong>Nick Coghlan</strong> supported my idea,
I pushed my <a class="reference external" href="https://github.com/python/cpython/commit/7143029d4360637aadbd7ddf386ea5c64fb83095">commit 7143029d</a>
in Python 3.5:</p>
<blockquote>
Issue #19977: When the <tt class="docutils literal">LC_TYPE</tt> locale is the POSIX locale (<tt class="docutils literal">C</tt>
locale), <tt class="docutils literal">sys.stdin</tt> and <tt class="docutils literal">sys.stdout</tt> are now using the
<tt class="docutils literal">surrogateescape</tt> error handler, instead of the <tt class="docutils literal">strict</tt> error handler.</blockquote>
<p>Previously, <strong>Python 3 was very strict on encodings</strong>, all core developers were
convinced to be able to force developers to fix their applications. This change
is one the <strong>first Python 3 change which can produce "mojibake" on purpose</strong>.</p>
<p><strong>Six years after the Python 3.0 release, we started to understand that while
developers can fix their code, we cannot ask users to fix their configuration
("fix their locale").</strong></p>
</div>
<div class="section" id="read-etc-locale-conf">
<h2>Read /etc/locale.conf?</h2>
<p>April 2014, <strong>Nick Coghlan</strong> created <a class="reference external" href="https://bugs.python.org/issue21368">bpo-21368</a>: "Check for systemd locale on
startup if current locale is set to POSIX".</p>
<blockquote>
If a modern Linux system is using systemd as the process manager, then
there will likely be <strong>a "/etc/locale.conf" file</strong> providing settings like
LANG - due to problematic requirements in the POSIX specification, <strong>this
file</strong> (when available) is <strong>likely to be a better "source of truth"
regarding the system encoding</strong> than the environment where the interpreter
process is started, at least when the latter is claiming ASCII as the
default encoding.</blockquote>
<p><a class="reference external" href="https://bugs.python.org/issue21368#msg217328">I disliked the idea</a>:</p>
<blockquote>
I don't think that Python should read such configuration file. If you
consider that something is wrong here, <strong>please report the issue to the C
library</strong>.</blockquote>
<p>Since no consensus was found, no action was taken.</p>
</div>
<div class="section" id="misconfigured-locales-in-docker-images">
<h2>Misconfigured locales in Docker images</h2>
<p>September 2016: <strong>Jan Niklas Hasse</strong> opened <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>, <strong>"sys.getfilesystemencoding() should
default to utf-8"</strong>.</p>
<blockquote>
<strong>Working with Docker I often end up with an environment where the locale
isn't correctly set.</strong> In these cases <strong>it would be great if
sys.getfilesystemencoding() could default to 'utf-8'</strong> instead of
<tt class="docutils literal">'ascii'</tt>, as it's the encoding of the future and ascii is a subset of it
anyway.</blockquote>
<p>December 2016, <strong>Jan Niklas Hasse</strong> <a class="reference external" href="https://bugs.python.org/issue28180#msg282972">mentioned</a> the <tt class="docutils literal"><span class="pre">C.UTF-8</span></tt> locale:</p>
<blockquote>
<p><a class="reference external" href="https://sourceware.org/glibc/wiki/Proposals/C.UTF-8#Defaults">glibc C.UTF-8 article</a> mentions
that <strong>C.UTF-8 should be glibc's default</strong>.</p>
<p>This bug report <a class="reference external" href="https://sourceware.org/bugzilla/show_bug.cgi?id=17318">also mentions Python</a>. It <strong>hasn't been
fixed yet</strong>, though :/</p>
</blockquote>
<p><strong>Marc-Andre Lemburg</strong> <a class="reference external" href="https://bugs.python.org/issue28180#msg282977">added</a>:</p>
<blockquote>
<p>If we just restrict this to the file system encoding (and not the whole
LANG setting), how about:</p>
<ul class="simple">
<li>default the file system encoding to 'utf-8' and use the surrogate escape
handler as default error handler</li>
<li>add a <tt class="docutils literal">PYTHONFSENCODING</tt> env var to set the file system encoding to
something else (*)</li>
</ul>
<p>(*) I believe we discussed this at some point already, but don't remember the outcome.</p>
</blockquote>
<p>The removed <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable, using a filesystem
encoding different than the locale encoding, caused many issues: see <a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python
3.2 Painful History of the Filesystem Encoding</a>.</p>
<p><strong>Nick Coghlan</strong> <cite>proposed to experiment using the C.UTF-8 locale</cite> in Fedora
26:</p>
<blockquote>
<p><strong>For Fedora 26,</strong> I'm going to explore the feasibility of patching our system
3.6 installation such that the python3 command itself (rather than the
shared library) <strong>checks for "LC_CTYPE=C"</strong> as almost the first thing it
does, and forcibly <strong>sets LANG and LC_ALL to C.UTF-8</strong> if it gets an answer
it doesn't like. If we're able to do that successfully in the more
constrained environment of a specific recent Fedora release, then I think
it will bode well for doing something similar by default in CPython 3.7</p>
<p><a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1404918">Downstream Fedora issue proposing the above idea for F26</a>.</p>
</blockquote>
<p>Fedora 26 integrated a downstream change in Python 3.6:
see <a class="reference external" href="https://fedoraproject.org/wiki/Releases/26/ChangeSet#Python_3_C.UTF-8_locale">Python 3 C.UTF-8 locale</a>.</p>
</div>
<div class="section" id="pep-538-coercing-the-c-locale-to-a-utf-8-based-locale">
<h2>PEP 538: Coercing the C locale to a UTF-8 based locale</h2>
<a class="reference external image-reference" href="http://www.curiousefficiency.org/"><img alt="Nick Coghlan" src="https://vstinner.github.io/images/nick_coghlan.jpg" /></a>
<p>December 2016, as a follow-up of <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>, <strong>Nick Coghlan</strong> wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0538/">PEP
538: Coercing the legacy C locale to a UTF-8 based locale</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044130.html">posted it to python-ideas
list</a>
and <a class="reference external" href="https://mail.python.org/pipermail/linux-sig/2017-January/000014.html">to the linux-sig list</a>.</p>
<p>April 2017, Nick <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147795.html">proposed</a>
<strong>INADA Naoki</strong> as the BDFL Delegate for his PEP. Guido <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147796.html">accepted to delegate</a>.</p>
<p>May 2017, after 5 months of discussions and changes, INADA Naoki <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-May/148035.html">approved the
PEP</a>.</p>
<p>June 2017, <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>: Nick Coghlan
pushed the <a class="reference external" href="https://github.com/python/cpython/commit/6ea4186de32d65b1f1dc1533b6312b798d300466">commit 6ea4186d</a>:</p>
<blockquote>
bpo-28180: Implementation for PEP 538 (#659)</blockquote>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>A first attempt to use a different encoding for the POSIX locale was rejected
in 2011. A second attempt was also rejected in 2013.</p>
<p>I modified Python 3.5 in 2014 to use the <tt class="docutils literal">surrogateescape</tt> error handler in
<tt class="docutils literal">stdin</tt> and <tt class="docutils literal">stdout</tt> for the POSIX locale. Six years after the Python 3.0
release, we started to understand that while developers can fix their code, we
cannot ask users to "fix their locale" (configure properly their locale).</p>
<p>In 2016, the problem occurred again with misconfigured locales in Docker
images. In 2017, Nick Coghlan wrote the PEP 538 "Coercing the legacy C locale
to a UTF-8 based locale" which has been approved by INADA Naoki and implemented
in Python 3.7.</p>
</div>
Python 3.6 now uses UTF-8 on Windows2018-03-22T17:00:00+01:002018-03-22T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-22:/python36-utf8-windows.html<p>September 2016, a few days before the CPython core dev sprint, <strong>Steve Dower</strong>
proposed two major backward incompatible changes for Python 3.6 on Windows:
<a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> and <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows
filesystem encoding to UTF-8</a>.
At the first read, I was sure that …</p><p>September 2016, a few days before the CPython core dev sprint, <strong>Steve Dower</strong>
proposed two major backward incompatible changes for Python 3.6 on Windows:
<a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> and <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows
filesystem encoding to UTF-8</a>.
At the first read, I was sure that the PEP 529 will break all applications on
Windows. This article tells the story behind the PEPs approval.</p>
<p><strong>This article is the fourth in a series of articles telling the history and
rationale of the Python 3 Unicode model for the operating system:</strong></p>
<ul class="simple">
<li><ol class="first arabic">
<li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li>
</ol>
</li>
<li><ol class="first arabic" start="2">
<li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li>
</ol>
</li>
<li><ol class="first arabic" start="3">
<li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li>
</ol>
</li>
<li><ol class="first arabic" start="4">
<li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li>
</ol>
</li>
<li><ol class="first arabic" start="5">
<li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li>
</ol>
</li>
<li><ol class="first arabic" start="6">
<li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li>
</ol>
</li>
</ul>
<div class="section" id="pep-529">
<h2>PEP 529</h2>
<p>September 2016, <strong>Steve Dower</strong>, who works for Microsoft, wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529:
Change Windows filesystem encoding to UTF-8</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-September/146051.html">posted it to python-dev</a> for
comments.</p>
<a class="reference external image-reference" href="http://stevedower.id.au/blog/"><img alt="Steve Dower" src="https://vstinner.github.io/images/steve_dower.jpg" /></a>
<p>Abstract:</p>
<blockquote>
<p><strong>Historically, Python uses the ANSI APIs</strong> for interacting with the
Windows operating system, often via C Runtime functions. However, these
have been long discouraged in favor of the UTF-16 APIs. Within the
operating system, all text is represented as UTF-16, and the ANSI APIs
perform encoding and decoding using the active code page. See Naming Files,
Paths, and Namespaces for more details.</p>
<p>This PEP proposes <strong>changing the default filesystem encoding on Windows to
utf-8</strong>, and changing all filesystem functions to use the Unicode APIs for
filesystem paths. This will not affect code that uses strings to represent
paths, however those that use bytes for paths will now be able to correctly
round-trip all valid paths in Windows filesystems. <strong>Currently, the
conversions between Unicode (in the OS) and bytes (in Python) were lossy</strong>
and would fail to round-trip characters outside of the user's active code
page.</p>
<p>Notably, this does not impact the encoding of the contents of files. These
will continue to default to <tt class="docutils literal">locale.getpreferredencoding()</tt> (for text
files) or plain bytes (for binary files). This only affects the encoding
used when users pass a bytes object to Python where it is then passed to
the operating system as a path name.</p>
</blockquote>
</div>
<div class="section" id="my-analysis">
<h2>My analysis</h2>
<p>Here is my analysis on the rationale for the PEP 529 change.</p>
<p><strong>On Unix, the native type for filenames is bytes</strong>. A filename is seen by the
Linux kernel as an opaque object. The ext4 filesystem stores filenames as
bytes. If a Python 2 application uses Unicode for filenames, filesystem
operations can fail with a Unicode error (encoding or decoding error) depending
on the locale encoding. If the locale encoding is ASCII, Unicode errors are
likely to occur at the first non-ASCII filename. For example, Mercurial handles
filenames as bytes.</p>
<p>On Python 3, handling filenames as Unicode works thanks to the
<tt class="docutils literal">surrogateescape</tt> error handler. <strong>Most Python 2 applications ported to
Python 3 keep their Python 2 support, and so still handle filenames bytes.</strong></p>
<p>Problems arise when such software is used on Windows.</p>
<p><strong>On Windows, the native type for filenames is Unicode</strong>. Many functions come
in two flavors: "ANSI" (bytes) and "Wide" (Unicode) versions. In my opinion,
the ANSI flavor mostly exists for backward compatibility. In Python 3.5,
passing a filename as bytes uses the ANSI flavor, whereas the Wide flavor is
used for Unicode filenames. The ANSI flavor uses the ANSI code page which is
very limited compared to Unicode, usually only 256 code points or less. Some
filenames not encodable to the ANSI code page simply cannot be opened, renamed,
etc. using the ANSI API.</p>
<p>The other issue is that <strong>some developers only develop on Unix</strong> (ex: Linux or
macOS) <strong>and never test their application on Windows</strong>.</p>
<p>For a better rationale, read the <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/#background">Background section</a> of Steve Dower's PEP
:-)</p>
</div>
<div class="section" id="discussion-at-the-cpython-sprint-and-guido-s-approval">
<h2>Discussion at the CPython sprint and Guido's approval</h2>
<p>Honestly, <strong>at the first read, I was sure that the PEP 529 will break all
applications on Windows</strong>.</p>
<p>Hopefully, thanks to the PSF and Instagram, I was able to attend my first
CPython sprint at Instagram headquarters: <a class="reference external" href="https://vstinner.github.io/cpython-sprint-2016.html">CPython sprint, september 2016</a>. I discussed there with <strong>Steve who
reassured me and explained me his PEP</strong>. Later, we talked with <strong>Guido van
Rossum</strong>.</p>
<p>Even if I liked the idea of using UTF-8, I was still not fully confident that the
change will not break the world. <strong>We agreed to try the change during Python
3.6 beta phase</strong>, but revert it if something bad happens.</p>
<a class="reference external image-reference" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html"><img alt="CPython developers at the Facebook sprint" src="https://vstinner.github.io/images/cpython_sprint_2016_photo.jpg" /></a>
<p>Following this talk, <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-September/146277.html">Guido accepted the PEP under conditions</a></p>
<blockquote>
<p>I'm hijacking this thread to <strong>provisionally accept PEP 529</strong>. (I'll also
do this for PEP 528, in its own thread.)</p>
<p><strong>I've talked things over with Steve and Victor and we're going to do an
experiment</strong> (as <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/#beta-experiment">now written up in the PEP</a>) to tease out
any issues with this change during the beta. <strong>If serious problems crop up
we may have to roll back the changes and reject the PEP</strong> -- we won't get
another chance at getting this right. (That would also mean that using the
binary filesystem APIs will remain deprecated and will eventually be
disallowed; as long as the PEP remains accepted they are undeprecated.)</p>
<p>Congrats Steve! Thanks for the massive amount of work on the
implementation and the thinking that went into the design. Thanks
everyone else for their feedback.</p>
<p class="attribution">—Guido</p>
</blockquote>
<p><strong>I was honoured that Guido listened to my Unicode experience</strong> to take a
decision on the PEP ;-)</p>
<p>Steve chose the right timing to get his PEP accepted. Thanks to the sprint
which helped to quickly discussed such backward incompatible change, <strong>the PEP
has been approved in just 12 days</strong>! For comparison, some of my PEPs like my
<a class="reference external" href="https://www.python.org/dev/peps/pep-0446/">PEP 446: Make newly created file descriptors non-inheritable</a> (another backward incompatible
change) took 8 months to get accepted.</p>
</div>
<div class="section" id="pep-528-windows-console">
<h2>PEP 528: Windows console</h2>
<p>Just before the PEP 529, Steve Dower also wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows
console encoding to UTF-8</a>. This
change only impacts the Windows console, so there is a lower risk of breaking
the world.</p>
<p>This PEP was also <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-September/146278.html">quickly approved by Guido</a>
during the CPython sprint. Steve implemented it in Python 3.6.</p>
<p>Even if it's smaller change, it is <strong>yet another change towards using UTF-8
everywhere</strong>.</p>
</div>
<div class="section" id="great-success">
<h2>Great success!</h2>
<p>Hopefully, I was wrong about the risk of breaking the world. <strong>No user
complained about these two backward incompatible changes: Python 3.6 on Windows
is a success!</strong></p>
<p>Python 3.6 now has a <strong>better Unicode support</strong> on Windows thanks to the PEP
528 and PEP 529!</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>September 2016: Steve Dower proposed two major backward incompatible changes
for Python 3.6 on Windows: <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> and <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows
filesystem encoding to UTF-8</a>.</p>
<p>At the first read, I was sure that the PEP 529 (filesystem encoding) will break
all applications on Windows.</p>
<p>Thanks to the CPython core dev sprint, I was able to discuss with Steve who
reassured me and explained me his PEP 529. We agreed with Guido van Rossum to
try the change during Python 3.6 beta phase, but revert it if something bad
happens. I was honoured that Guido listened to my Unicode experience to take a
decision on the PEP.</p>
<p>The <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> was also quickly approved,
another change towards using UTF-8 everywhere.</p>
<p>No user complained about these two backward incompatible changes: Python 3.6 on
Windows is a success!</p>
<p>Python 3.6 now has a better Unicode support thanks on Windows to the PEP 528
and PEP 529!</p>
</div>
Python 3.2 Painful History of the Filesystem Encoding2018-03-15T23:00:00+01:002018-03-15T23:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-15:/painful-history-python-filesystem-encoding.html<p>Between Python 3.0 released in 2008 and Python 3.4 released in 2014, the Python
filesystem encoding changed multiple times. <strong>It took 6 years to choose the best
Python filesystem encoding on each platform.</strong></p>
<p><strong>I have been officially promoted as a core developer</strong> in January 2010 by
<strong>Martin von …</strong></p><p>Between Python 3.0 released in 2008 and Python 3.4 released in 2014, the Python
filesystem encoding changed multiple times. <strong>It took 6 years to choose the best
Python filesystem encoding on each platform.</strong></p>
<p><strong>I have been officially promoted as a core developer</strong> in January 2010 by
<strong>Martin von Loewis</strong>. I spent the whole year of 2010 to fix dozens of encoding
issues during the development of Python 3.2, following my Unicode work started
in 2008.</p>
<p>This article is focused on the long discussions to choose the best Python
filesystem encoding on each platform in 2010 for Python 3.2.</p>
<p><strong>This article is the third in a series of articles telling the history and
rationale of the Python 3 Unicode model for the operating system:</strong></p>
<ul class="simple">
<li><ol class="first arabic">
<li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li>
</ol>
</li>
<li><ol class="first arabic" start="2">
<li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li>
</ol>
</li>
<li><ol class="first arabic" start="3">
<li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li>
</ol>
</li>
<li><ol class="first arabic" start="4">
<li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li>
</ol>
</li>
<li><ol class="first arabic" start="5">
<li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li>
</ol>
</li>
<li><ol class="first arabic" start="6">
<li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li>
</ol>
</li>
</ul>
<a class="reference external image-reference" href="https://commons.wikimedia.org/wiki/File:Longleat-maze.jpg"><img alt="Maze" src="https://vstinner.github.io/images/maze.jpg" /></a>
<div class="section" id="python-3-0-loves-utf-8">
<h2>Python 3.0 loves UTF-8</h2>
<p>When Python 3.0 was released, it was unclear which encodings should be used
for:</p>
<ul class="simple">
<li>File content: <tt class="docutils literal"><span class="pre">open().read()</span></tt></li>
<li>Filenames: <tt class="docutils literal">os.listdir()</tt>, <tt class="docutils literal">open()</tt>, etc.</li>
<li>Command line arguments: <tt class="docutils literal">sys.argv</tt> and <tt class="docutils literal">subprocess.Popen</tt> arguments</li>
<li>Environment variables: <tt class="docutils literal">os.environ</tt></li>
<li>etc.</li>
</ul>
<p>Python 3.0 was forked from Python 2.6 and functions were modified to use
Unicode. Many Python 3 functions only used UTF-8 because the implementation
were modified to use the default encoding which is UTF-8: it was not a
deliberate choice.</p>
<p><strong>While UTF-8 is a good choice in most cases, it is not the best choice in
all cases.</strong> Almost everything worked well in Python 3.0 when all data used
UTF-8, but Python 3.0 failed badly if the locale encoding was not UTF-8.</p>
<p>Python 3.1, 3.2 and 3.3 will get a lot of changes to adjust encodings in all
corners of the standard library.</p>
<p>Python 3.1 got the <tt class="docutils literal">surrogateescape</tt> error handler (PEP 383) which reduced
Unicode errors: read my previous article <a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error
handler (PEP 383)</a>.</p>
</div>
<div class="section" id="add-sys-setfilesystemencoding">
<h2>Add sys.setfilesystemencoding()</h2>
<p>September 2008, <a class="reference external" href="https://bugs.python.org/issue3187">bpo-3187</a>: To fix
<tt class="docutils literal">os.listdir(str)</tt> to support undecodable filenames, <strong>Martin v. Löwis</strong>
<a class="reference external" href="https://bugs.python.org/issue3187#msg74080">proposed a new function to change the filesystem encoding</a>:</p>
<blockquote>
Here is a patch that solves the issue in a different way: it introduces
sys.setfilesystemencoding. <strong>If applications invoke
sys.setfilesystemencoding("iso-8859-1"), all file names can be successfully
converted into a character string.</strong></blockquote>
<p>The ISO-8859-1 encoding has a very interesting property for bytes: it maps
exactly the <tt class="docutils literal">0x00 - 0xff</tt> byte range to the U+0000 - U+00ff Unicode range,
the decoder cannot fail:</p>
<pre class="literal-block">
$ python3.6 -q
>>> all(ord((b'%c' % byte).decode('iso-8859-1')) == byte for byte in range(256))
True
>>> all(ord(('%c' % char).encode('iso-8859-1')) == char for char in range(256))
True
</pre>
<p>Guido van Rossum <a class="reference external" href="https://bugs.python.org/issue3187#msg74173">commented</a>:</p>
<blockquote>
<p>I will check in Victor's changes (with some edits).</p>
<p>Together this means that the various <strong>suggested higher-level solutions</strong>
(like returning path-like objects, or some kind of roudtripping
almost-but-not-quite-utf-8 encoding) <strong>can be implemented in pure Python</strong>.</p>
</blockquote>
<p>October 2008, <strong>Martin v. Löwis</strong> pushed the <a class="reference external" href="https://github.com/python/cpython/commit/04dc25c53728f5c2fe66d9e66af67da0c9b8959d">commit 04dc25c5</a>:</p>
<pre class="literal-block">
Issue #3187: Add sys.setfilesystemencoding.
</pre>
<p>Python 3.0 will be the first major release with this function.</p>
<p>In retrospective, I see this function as asking developers and users to be
smart and choose the encoding themself.</p>
<p>While the ISO-8859-1 encoding trick is tempting, we will see later that
<tt class="docutils literal">setfilesystemencoding()</tt> is broken by design and so cannot be used in
practice.</p>
</div>
<div class="section" id="what-if-getting-the-locale-encoding-fails">
<h2>What if getting the locale encoding fails?</h2>
<p>May 2010, I reported <a class="reference external" href="https://bugs.python.org/issue8610">bpo-8610</a>,
"Python3/POSIX: errors if file system encoding is None":</p>
<blockquote>
On POSIX (but not on Mac OS X), Python3 calls get_codeset() to get the file
system encoding. If this function fails, sys.getfilesystemencoding()
returns None.</blockquote>
<p>I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/b744ba1d14c5487576c95d0311e357b707600b47">commit b744ba1d</a>:</p>
<blockquote>
Issue #8610: Load file system codec at startup, and <strong>display a fatal error
on failure</strong>. <strong>Set the file system encoding to utf-8</strong> (instead of None)
<strong>if getting the locale encoding failed</strong>, or if nl_langinfo(CODESET)
function is missing.</blockquote>
<p>This change <strong>adds the function initfsencoding()</strong>: logic to initialize the
filesystem encoding.</p>
<p>In practice, Python already used UTF-8 when the filesystem encoding was set to
<tt class="docutils literal">None</tt>, but this change makes the default more obvious. The change also makes
the error case better defined: Python exits immediately with a fatal error.</p>
</div>
<div class="section" id="support-locale-encodings-different-than-utf-8">
<h2>Support locale encodings different than UTF-8</h2>
<p>My biggest Unicode project in Python 3 was to <strong>fix the encoding</strong> in all
corners of the standard library. This task kept me busy between Python 3.0 and
Python 3.4, at least.</p>
<p>May 2010, I created <a class="reference external" href="https://bugs.python.org/issue8611">bpo-8611</a>:</p>
<blockquote>
<strong>Python3 is unable to start</strong> (bootstrap failure) on a POSIX system <strong>if
the locale encoding is different than utf8 and the Python path</strong> (standard
library path where the encoding module is stored) <strong>contains a non-ASCII
character</strong>. (Windows and Mac OS X are not affected by this issue because
the file system encoding is hardcoded.)</blockquote>
<p>For example, <a class="reference external" href="https://bugs.python.org/issue8242">bpo-8242</a> "Improve support
of PEP 383 (surrogates) in Python3" is a meta issue tracking multiple issues:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue7606">bpo-7606</a>:
test_xmlrpc fails with non-ascii path</li>
<li><a class="reference external" href="https://bugs.python.org/issue8092">bpo-8092</a>:
utf8, backslashreplace and surrogates</li>
<li><a class="reference external" href="https://bugs.python.org/issue8383">bpo-8383</a>:
pickle is unable to encode unicode surrogates</li>
<li><a class="reference external" href="https://bugs.python.org/issue8390">bpo-8390</a>:
tarfile: use surrogates for undecode fields</li>
<li><a class="reference external" href="https://bugs.python.org/issue8391">bpo-8391</a>:
os.execvpe() doesn't support surrogates in env</li>
<li><a class="reference external" href="https://bugs.python.org/issue8393">bpo-8393</a>:
subprocess: support undecodable current working directory on POSIX OS</li>
<li><a class="reference external" href="https://bugs.python.org/issue8394">bpo-8394</a>:
ctypes.dlopen() doesn't support surrogates</li>
<li><a class="reference external" href="https://bugs.python.org/issue8412">bpo-8412</a>:
os.system() doesn't support surrogates nor bytes</li>
<li><a class="reference external" href="https://bugs.python.org/issue8467">bpo-8467</a>:
subprocess: surrogates of the error message (Python implementation on non-Windows)</li>
<li><a class="reference external" href="https://bugs.python.org/issue8468">bpo-8468</a>:
bz2: support surrogates in filename, and bytes/bytearray filename</li>
<li><a class="reference external" href="https://bugs.python.org/issue8477">bpo-8477</a>:
_ssl: support surrogates in filenames, and bytes/bytearray filenames</li>
<li><a class="reference external" href="https://bugs.python.org/issue8485">bpo-8485</a>:
Don't accept bytearray as filenames, or simplify the API</li>
</ul>
<p>I fixed all these issues, and reported most of them.</p>
<p>October 2010, finally, five months later, I succeeded to close the issue!</p>
<blockquote>
Starting at r85691, the full test suite of Python 3.2 pass with ASCII,
ISO-8859-1 and UTF-8 locale encodings in a non-ascii directory.
<strong>The work on this issue is done.</strong></blockquote>
<p>At that time, I didn't know that it will take me a few more years to really fix
<strong>all</strong> encoding issues. For example, it will take me <strong>3 years</strong> to modify the
core of the import machinery to pass filenames as Unicode on Windows: <a class="reference external" href="https://bugs.python.org/issue3080">bpo-3080</a> <strong>Full unicode import system</strong>.</p>
</div>
<div class="section" id="add-pythonfsencoding-environment-variable">
<h2>Add PYTHONFSENCODING environment variable</h2>
<p>May 2010, while discussing how to fix <a class="reference external" href="https://bugs.python.org/issue8610">bpo-8610</a> "Python3/POSIX: errors if file system
encoding is None", I asked what is the best encoding if reading the locale
encoding fails. As a follow-up, <strong>Marc-Andre Lemburg</strong> created <a class="reference external" href="https://bugs.python.org/issue8622">bpo-8622</a>:</p>
<blockquote>
<p>As discussed on issue8610, we need a way to <strong>override the automatic
detection of the file system encoding</strong> - for much the same reasons we also
do for the I/O encoding: the detection mechanism isn't fail-safe.</p>
<p>We should add a new environment variable with the same functionality as
<tt class="docutils literal">PYTHONIOENCODING</tt>:</p>
<pre class="literal-block">
PYTHONFSENCODING: Encoding[:errors] used for file system.
</pre>
</blockquote>
<p>I implemented the idea since I liked it. August 2010, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/94908bbc1503df830d1d615e7b57744ae1b41079">commit
94908bbc</a>:</p>
<blockquote>
<p>Issue #8622: Add <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable to override the
filesystem encoding.</p>
<p><tt class="docutils literal">initfsencoding()</tt> displays also a better error message
if <tt class="docutils literal">get_codeset()</tt> failed.</p>
</blockquote>
</div>
<div class="section" id="remove-sys-setfilesystemencoding">
<h2>Remove sys.setfilesystemencoding()</h2>
<p>August 2010, just after adding <tt class="docutils literal">PYTHONFSENCODING</tt>, I opened <a class="reference external" href="https://bugs.python.org/issue9632">bpo-9632</a> to remove the
<tt class="docutils literal">sys.setfilesystemencoding()</tt> function:</p>
<blockquote>
<p>The <tt class="docutils literal">sys.setfilesystemencoding()</tt> function is <strong>dangerous</strong> because it
introduces a lot of inconsistencies: this function is <strong>unable to reencode
all filenames</strong> of all objects (eg. Python is unable to find filenames in
user objects or 3rd party libraries). Eg. if you change the filesystem from
utf8 to ascii, it will not be possible to use existing non-ascii (unicode)
filenames: they will raise UnicodeEncodeError.</p>
<p>As <tt class="docutils literal">sys.setdefaultencoding()</tt> in Python2, I think that
<tt class="docutils literal">sys.setfilesystemencoding()</tt> is the <strong>root of evil</strong> :-)
<strong>PYTHONFSENCODING</strong> (issue #8622) <strong>is the right solution</strong> to set the
filesysteme encoding.</p>
</blockquote>
<p><strong>Marc-Andre Lemburg</strong> complained that applications embedding Python may want
to set the encoding used by Python. I proposed to use the <tt class="docutils literal">PYTHONFSENCODING</tt>
environment variable as a workaround, even if it was not the best option.</p>
<p>One month later, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/5b519e02016ea3a51f784dee70eead3be4ab1aff">commit 5b519e02</a>:</p>
<blockquote>
Issue #9632: Remove <tt class="docutils literal">sys.setfilesystemencoding()</tt> function: use
<tt class="docutils literal">PYTHONFSENCODING</tt> environment variable to set the filesystem encoding at
Python startup. <tt class="docutils literal">sys.setfilesystemencoding()</tt> created inconsistencies
because it was unable to reencode all filenames of all objects.</blockquote>
</div>
<div class="section" id="reencode-filenames-when-setting-the-filesystem-encoding">
<h2>Reencode filenames when setting the filesystem encoding</h2>
<p>August 2010, I created <a class="reference external" href="https://bugs.python.org/issue9630">bpo-9630</a>:
"Reencode filenames when setting the filesystem encoding".</p>
<p>Since the beginning of 2010, I identified a design flaw in the Python
initialization. Python starts by <strong>decoding strings from the default encoding
UTF-8</strong>. Later, Python reads the locale encoding and loads the Python codec of
this encoding. Then Python <strong>decodes string from the locale encoding</strong>.
Problem: if the locale encoding is not UTF-8, <strong>encoding strings decoded from
UTF-8 to the locale encoding can fail</strong> in different ways.</p>
<p>I wrote a patch to "reencode" filenames of all module and code objects once the
filesystem encoding is set, in <tt class="docutils literal">initfsencoding()</tt>,</p>
<p>When I wrote the patch, I knew that it was an <strong>ugly hack and not the proper
design</strong>. I proposed to try to avoid importing any Python module before the Python
codec of the locale encoding is loaded, but there was a pratical issue. Python
only has builtin implementation (written in C) of the most popular encodings
like ASCII and UTF-8. Some encodings like ISO-8859-15 are only implemented in
Python.</p>
<p>I also proposed to "unload all modules, clear all caches and delete all code
objects" after setting the filesystem encoding. This option would be very
inefficient and make Python startup slower, whereas Python 3 startup was already
way slower than Python 2 startup.</p>
<p>September 2010, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/c39211f51e377919952b139c46e295800cbc2a8d">commit c39211f5</a>:</p>
<blockquote>
<p>Issue #9630: Redecode filenames when setting the filesystem encoding</p>
<p>Redecode the filenames of:</p>
<blockquote>
<ul class="simple">
<li>all modules: __file__ and __path__ attributes</li>
<li>all code objects: co_filename attribute</li>
<li>sys.path</li>
<li>sys.meta_path</li>
<li>sys.executable</li>
<li>sys.path_importer_cache (keys)</li>
</ul>
</blockquote>
<p>Keep weak references to all code objects until <tt class="docutils literal">initfsencoding()</tt> is
called, to be able to redecode co_filename attribute of all code objects.</p>
</blockquote>
<p>The list of weak references to code objects really looks like a hack and I
disliked it, but I failed to find a better way to fix Python startup.</p>
</div>
<div class="section" id="pythonfsencoding-dead-end">
<h2>PYTHONFSENCODING dead end</h2>
<p>Even with my latest big and ugly "redecode filenames when setting the
filesystem encoding" fix, there were <strong>issues when the filesystem encoding was
different than the locale encoding</strong>. I identified 4 bugs:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue9992">bpo-9992</a>, <tt class="docutils literal">sys.argv</tt>: decoded from the <strong>locale</strong> encoding, but subprocess encodes process arguments to the <strong>filesystem</strong> encoding</li>
<li><a class="reference external" href="https://bugs.python.org/issue10014">bpo-10014</a>, <tt class="docutils literal">sys.path</tt>: decoded from the <strong>locale</strong> encoding, but import encodes paths to the <strong>filesystem</strong> encoding</li>
<li><a class="reference external" href="https://bugs.python.org/issue10039">bpo-10039</a>, the script name: read on the command line
(ex: <tt class="docutils literal">python script.py</tt>) which is decoded from the locale encoding, whereas
it is used to fill <tt class="docutils literal">sys.path[0]</tt> and import encodes paths to the
<strong>filesystem</strong> encoding.</li>
<li><a class="reference external" href="https://bugs.python.org/issue9988">bpo-9988</a>, <tt class="docutils literal">PYTHONWARNINGS</tt> environment variable: decoded from the
<strong>locale</strong> encoding, but <tt class="docutils literal">subprocess</tt> encodes environment variables to the
<strong>filesystem</strong> encoding.</li>
</ul>
<p>October 2010, I wrote an email to the python-dev list: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2010-October/104509.html">Inconsistencies if
locale and filesystem encodings are different</a>. I
proposed two solutions:</p>
<ul class="simple">
<li>(a) use the same encoding to encode and decode values (it can be different
for each issue).</li>
<li>(b) <strong>remove PYTHONFSENCODING variable</strong> and raise an error if locale and
filesystem encodings are different (ensure that both encodings are the same).</li>
</ul>
<p><strong>Marc-Andre Lemburg</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2010-October/104511.html">replied</a>:</p>
<blockquote>
<p>You have to differentiate between the meaning of a file system
encoding and the locale:</p>
<p>A file system encoding defines how the applications interact
with the file system.</p>
<p>A locale defines how the user expects to interact with the
application.</p>
<p>It is well possible that the two are different. Mac OS X is
just one example. Another common example is having a Unix
account using the C locale (=ASCII) while working on a UTF-8
file system.</p>
</blockquote>
<p>This email is a good example of dilemma we had when having to choose <strong>one</strong>
encoding. There is a big temptation to use multiple encodings, but at the end,
<strong>data are not isolated</strong>. A filename can be found in command line arguments
(<tt class="docutils literal">python3 script.py file.txt</tt>), in environment variables
(<tt class="docutils literal">LOG_FILE=log.txt</tt>), in file content (ex: <tt class="docutils literal">Makefile</tt> or a configuration
file), etc. Using multiple encodings does not work in practice.</p>
<img alt="Dead end" src="https://vstinner.github.io/images/dead_end.jpg" />
</div>
<div class="section" id="remove-pythonfsencoding">
<h2>Remove PYTHONFSENCODING</h2>
<p>September 2010, I reported <a class="reference external" href="https://bugs.python.org/issue9992">bpo-9992</a>:
Command-line arguments are not correctly decoded if locale and fileystem
encodings are different.</p>
<p>I proposed a patch to use the <strong>locale encoding</strong> to decode and encode command
line arguments, rather than using the <strong>filesystem encoding</strong>.</p>
<p><strong>Martin v. Löwis</strong> proposed to use the <strong>locale encoding</strong> for the command
line arguments, environment variables and all filenames. <a class="reference external" href="https://bugs.python.org/issue9992#msg118352">My summary</a>:</p>
<blockquote>
<p>You mean that we should use the following encoding:</p>
<ul class="simple">
<li>Mac OS X: UTF-8</li>
<li>Windows: unicode for command line/env, mbcs to decode filenames</li>
<li>others OSes: <strong>locale encoding</strong></li>
</ul>
<p>To do that, we have to:</p>
<ul class="simple">
<li>"others OSes": <strong>delete the PYTHONFSENCODING variable</strong></li>
<li>Mac OS X: use UTF-8 to decode the command line arguments (we can use
<tt class="docutils literal">PyUnicode_DecodeUTF8()</tt> + <tt class="docutils literal">PyUnicode_AsWideCharString()</tt> before
Python is initialized)</li>
</ul>
</blockquote>
<p>October 2010, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/8f6b6b0cc3febd15e33a96bd31dcb3cbef2ad1ac">commit 8f6b6b0c</a>:</p>
<blockquote>
Issue #9992: Remove PYTHONFSENCODING environment variable.</blockquote>
<p>Two days later, I pushed an important change to <strong>use the locale encoding</strong> and
remove the ugly <tt class="docutils literal">redecode_filenames()</tt> hack, <a class="reference external" href="https://github.com/python/cpython/commit/f3170ccef8809e4a3f82fe9f82dc7a4a486c28c1">commit f3170cce</a>:</p>
<blockquote>
<p>Use locale encoding if <tt class="docutils literal">Py_FileSystemDefaultEncoding</tt> is not set</p>
<ul class="simple">
<li><tt class="docutils literal">PyUnicode_EncodeFSDefault()</tt>, <tt class="docutils literal">PyUnicode_DecodeFSDefaultAndSize()</tt>
and <tt class="docutils literal">PyUnicode_DecodeFSDefault()</tt> use the locale encoding instead of
UTF-8 if <tt class="docutils literal">Py_FileSystemDefaultEncoding</tt> is <tt class="docutils literal">NULL</tt></li>
<li><tt class="docutils literal">redecode_filenames()</tt> functions and <tt class="docutils literal">_Py_code_object_list</tt> (issue #9630)
are no more needed: remove them</li>
</ul>
</blockquote>
<p>This change has been made possible by enhancements of
<tt class="docutils literal">PyUnicode_EncodeFSDefault()</tt> and <tt class="docutils literal">PyUnicode_DecodeFSDefaultAndSize()</tt>.
Previously, <strong>these functions used UTF-8</strong> before the filesystem was set. With
my change, these functions <strong>now use the C implementation of the locale
encoding</strong>: use <tt class="docutils literal">mbstowcs()</tt> to decode and <tt class="docutils literal">wcstombs()</tt> to encode. In
practice, the code is more complex because Python uses the <tt class="docutils literal">surrogateescape</tt>
error handler.</p>
<p>Using the C implementation of the locale encoding fixed a lot of "bootstrap"
issues of the Python initialization. It works because <strong>the Python codec of the
locale encoding is 100% compatible with the C implementation</strong> of the locale
codec.</p>
</div>
<div class="section" id="encodings-used-by-python-3-2">
<h2>Encodings used by Python 3.2</h2>
<p>February 2011, Python 3.2 has been released. Summary of the used filesystem
encodings:</p>
<ul class="simple">
<li><strong>ANSI code page</strong> on Windows;</li>
<li><strong>UTF-8</strong> on macOS;</li>
<li><strong>locale encoding</strong> on other platforms.</li>
</ul>
<p>Note: UTF-8 is used if the <tt class="docutils literal">nl_langinfo(CODESET)</tt> function is not available.</p>
</div>
<div class="section" id="force-ascii-encoding-on-freebsd-and-solaris">
<h2>Force ASCII encoding on FreeBSD and Solaris</h2>
<p>November 2012, I created <a class="reference external" href="https://bugs.python.org/issue16455">bpo-16455</a>:</p>
<blockquote>
<p>On FreeBSD and OpenIndiana, <tt class="docutils literal">sys.getfilesystemencoding()</tt> returns
<tt class="docutils literal">'ascii'</tt> when the locale is not set, whereas the locale encoding is
<tt class="docutils literal"><span class="pre">ISO-8859-1</span></tt> in practice.</p>
<p>This inconsistency causes different issues.</p>
</blockquote>
<p>December 2012, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/d45c7f8d74d30de0a558b10e04541b861428b7c1">commit d45c7f8d</a>:</p>
<blockquote>
Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces
an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in
practice.</blockquote>
<p>Extract of the main comment:</p>
<blockquote>
<p>Workaround FreeBSD and OpenIndiana locale encoding issue with the C locale.
On these operating systems, <strong>nl_langinfo(CODESET) announces an alias of
the ASCII encoding, whereas mbstowcs() and wcstombs() functions use the
ISO-8859-1 encoding</strong>. The problem is that os.fsencode() and
<tt class="docutils literal">os.fsdecode()</tt> use <tt class="docutils literal">locale.getpreferredencoding()</tt> codec. For example,
if command line arguments are decoded by <tt class="docutils literal">mbstowcs()</tt> and encoded back by
<tt class="docutils literal">os.fsencode()</tt>, we get a <tt class="docutils literal">UnicodeEncodeError</tt> instead of retrieving
the original byte string.</p>
<p>The workaround is enabled if <tt class="docutils literal">setlocale(LC_CTYPE, NULL)</tt> returns <tt class="docutils literal">"C"</tt>,
<tt class="docutils literal">nl_langinfo(CODESET)</tt> announces <tt class="docutils literal">"ascii"</tt> (or an alias to ASCII), and
at least one byte in range 0x80-0xff can be decoded from the locale
encoding. The workaround is also enabled on error, for example if getting
the locale failed.</p>
</blockquote>
<p>Python 3.4 will be the first major release getting fix (March 2014), but I also
backported the change to Python 3.2 and 3.3 branches.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p><strong>It took 6 years</strong> to fix Python to use the best Python filesystem encoding.</p>
<p>Python 3.0 mostly uses UTF-8 everywhere, but it was not a deliberate choice and
it caused many issues when the locale encoding was not UTF-8. Python 3.1 got
the <tt class="docutils literal">surrogateescape</tt> error handler (PEP 383) which reduced Unicode errors.</p>
<p>October 2008, <strong>Martin v. Löwis</strong> added <tt class="docutils literal">sys.setfilesystemencoding()</tt> to
Python 3.0.</p>
<p>August 2010, I added a new <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable,
<strong>Marc-Andre Lemburg</strong>'s idea.</p>
<p>September 2010, I removed the <tt class="docutils literal">sys.setfilesystemencoding()</tt> function because
it creates mojibake by design. I also pushed an ugly change to reencode
filenames to fix many <tt class="docutils literal">PYTHONFSENCODING</tt> bugs.</p>
<p>October 2010, I fixed all tests when Python lives in a non-ASCII directory:
first milestone of supporting locale encodings different than UTF-8. I also
removed the <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable after a long discussion.
Moreover, I pushed the most important Python 3.2 change: <strong>Python now uses the
locale encoding as the filesystem encoding</strong>. This change fixed many issues.</p>
<p>December 2012, I forced the filesystem encoding to ASCII on FreeBSD and Solaris
when the announced locale encoding is wrong.</p>
</div>
Python 3.1 surrogateescape error handler (PEP 383)2018-03-15T18:00:00+01:002018-03-15T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-15:/pep-383.html<p>In my previous article, I wrote that <tt class="docutils literal">os.listdir(str)</tt> ignored silently
undecodable filenames in Python 3.0 and that lying on the real content of a
directory looks like a very bad idea.</p>
<p><strong>Martin v. Löwis</strong> found a very smart solution to this problem: the
<tt class="docutils literal">surrogateescape</tt> error handler.</p>
<p><strong>This …</strong></p><p>In my previous article, I wrote that <tt class="docutils literal">os.listdir(str)</tt> ignored silently
undecodable filenames in Python 3.0 and that lying on the real content of a
directory looks like a very bad idea.</p>
<p><strong>Martin v. Löwis</strong> found a very smart solution to this problem: the
<tt class="docutils literal">surrogateescape</tt> error handler.</p>
<p><strong>This article is the second in a series of articles telling the history and
rationale of the Python 3 Unicode model for the operating system:</strong></p>
<ul class="simple">
<li><ol class="first arabic">
<li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li>
</ol>
</li>
<li><ol class="first arabic" start="2">
<li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li>
</ol>
</li>
<li><ol class="first arabic" start="3">
<li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li>
</ol>
</li>
<li><ol class="first arabic" start="4">
<li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li>
</ol>
</li>
<li><ol class="first arabic" start="5">
<li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li>
</ol>
</li>
<li><ol class="first arabic" start="6">
<li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li>
</ol>
</li>
</ul>
<div class="section" id="first-attempt-to-propose-the-solution">
<h2>First attempt to propose the solution</h2>
<p>September 2008, <a class="reference external" href="https://bugs.python.org/issue3187">bpo-3187</a>: While
solutions to fix <tt class="docutils literal">os.listdir(str)</tt> were discussed, <strong>Martin v. Löwis</strong>
<a class="reference external" href="https://bugs.python.org/issue3187#msg73992">proposed a different approach</a>:</p>
<blockquote>
<p>I'd like to propose yet another approach: make sure that <strong>conversion</strong>
according to the file system encoding <strong>always succeeds</strong>. <strong>If an
unconvertable byte is detected, map it into some private-use character.</strong>
To reduce the chance of conflict with other people's private-use
characters, we can use some of the plane 15 private-use characters, e.g.
map byte 0xPQ to U+F30PQ (in two-byte Unicode mode, this would result in
a surrogate pair).</p>
<p>This would make all file names accessible to all text processing
(including glob and friends); UI display would typically either report
an encoding error, or arrange for some replacement glyph to be shown.</p>
<p>There are certain variations of the approach possible, in case there is
objection to a specific detail.</p>
</blockquote>
<p>He amended this proposal:</p>
<blockquote>
<p><strong>James Knight</strong> points out that UTF-8b can be used to give unambiguous
round-tripping of characters in a UTF-8 locale. So I would like to amend my
previous proposal:</p>
<ul class="simple">
<li>for a non-UTF-8 encoding, use private-use characters for roundtripping</li>
<li>if the locale's charset is UTF-8, use UTF-8b as the file system encoding.</li>
</ul>
</blockquote>
<p><strong>But Martin's smart idea was lost</strong> in the middle of long discussion.</p>
<a class="reference external image-reference" href="https://github.com/loewis"><img alt="Martin v. Löwis" src="https://vstinner.github.io/images/martin_von_loewis.jpg" /></a>
</div>
<div class="section" id="pep-383">
<h2>PEP 383</h2>
<p>April 2009, Martin v. Löwis proposed again his idea, now as the well defined
<a class="reference external" href="https://peps.python.org/pep-0383">PEP 383</a>: <strong>Non-decodable Bytes in System Character Interfaces</strong>. He <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2009-April/088919.html">posted
his PEP to python-dev</a> for
comments.</p>
<p>Abstract:</p>
<blockquote>
<p>File names, environment variables, and command line arguments are defined
as being character data in POSIX; the C APIs however allow passing
arbitrary bytes - whether these conform to a certain encoding or not.</p>
<p><strong>This PEP proposes a means of dealing with such irregularities by embedding
the bytes in character strings in such a way that allows recreation of the
original byte string.</strong></p>
</blockquote>
<p>The <tt class="docutils literal">surrogateescape</tt> encoding is based on <strong>Markus Kuhn</strong>'s idea that he
called <strong>UTF-8b</strong>. Undecodable bytes in range <tt class="docutils literal"><span class="pre">0x80-0xff</span></tt> are mapped as
Unicode surrogate characters: range <tt class="docutils literal">U+DC80</tt> - <tt class="docutils literal">U+DCFF</tt>.</p>
<p>Example:</p>
<pre class="literal-block">
>>> b'nonascii\xff'.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff (...)
>>> b'nonascii\xff'.decode('ascii', 'surrogateescape')
'nonascii\udcff'
>>> 'nonascii\udcff'.encode('ascii', 'surrogateescape')
b'nonascii\xff'
</pre>
<p>Using the <tt class="docutils literal">surrogateescape</tt> error handler, <strong>decoding cannot fail</strong>. For
example, <tt class="docutils literal">os.listdir(str)</tt> no longer ignores silently undecodable filenames,
since all filenames became decodable with any encoding. Moreover, encoding
filenames with <tt class="docutils literal">surrogateescape</tt> returns the original bytes unchanged.</p>
<p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2009-April/089278.html">The PEP was accepted</a> by
<strong>Guido van Rossum</strong> in less than one week!</p>
</div>
<div class="section" id="implementation">
<h2>Implementation</h2>
<p>May 2009, Martin v. Löwis opened the <a class="reference external" href="https://bugs.python.org/issue5915">bpo-5915</a> to get a review on his implementation.</p>
<p>Two days later, after <strong>Benjamin Peterson</strong> and <strong>Antoine Pitrou</strong> reviews,
Martin pushed the <a class="reference external" href="https://github.com/python/cpython/commit/011e8420339245f9b55d41082ec6036f2f83a182">commit 011e8420</a>:</p>
<blockquote>
Issue #5915: Implement PEP 383, Non-decodable Bytes
in System Character Interfaces.</blockquote>
<p>Five days later, Martin renamed his "utf8b" error handler to its final name
<strong>surrogateescape</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/43c57785d3319249c03c3fa46c9df42a8ccd3e52">commit 43c57785</a>:</p>
<blockquote>
Rename utf8b error handler to surrogateescape.</blockquote>
<p><strong>Python 3.1</strong> will be the first release getting the <tt class="docutils literal">surrogateescape</tt> error
handler.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>In Python 3.0, <tt class="docutils literal">os.listdir(str)</tt> ignored silently undecodable filenames which
was not ideal.</p>
<p><strong>Martin v. Löwis</strong> proposed to apply <strong>Markus Kuhn</strong>'s idea called <strong>UTF-8b</strong>
in Python as a new <tt class="docutils literal">surrogateescape</tt> error handler.</p>
<p>Martin's PEP was approved in less than one week and implemented a few days
later.</p>
<p>Using the <tt class="docutils literal">surrogateescape</tt> error handler, decoding cannot fail:
<tt class="docutils literal">os.listdir(str)</tt> no longer ignores silently undecodable filenames.
Moreover, encoding filenames with <tt class="docutils literal">surrogateescape</tt> returns the original
bytes unchanged.</p>
<p>The <tt class="docutils literal">surrogateescape</tt> error handler fixed a lot of old and very complex
Unicode issues on Unix. It is still widely used in Python 3.6 to <strong>not annoy
users with Unicode errors</strong>.</p>
</div>
Python 3.0 listdir() Bug on Undecodable Filenames2018-03-09T13:00:00+01:002018-03-09T13:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-09:/python30-listdir-undecodable-filenames.html<p>Ten years ago, when Python 3.0 final was released, <tt class="docutils literal">os.listdir(str)</tt>
<strong>ignored silently undecodable filenames</strong>:</p>
<pre class="literal-block">
$ python3.0
>>> os.mkdir(b'x')
>>> open(b'x/nonascii\xff', 'w').close()
>>> os.listdir('x')
[]
</pre>
<p>You had to use bytes to see all filenames:</p>
<pre class="literal-block">
>>> os.listdir(b'x')
[b'nonascii\xff']
</pre>
<p>If the locale is POSIX …</p><p>Ten years ago, when Python 3.0 final was released, <tt class="docutils literal">os.listdir(str)</tt>
<strong>ignored silently undecodable filenames</strong>:</p>
<pre class="literal-block">
$ python3.0
>>> os.mkdir(b'x')
>>> open(b'x/nonascii\xff', 'w').close()
>>> os.listdir('x')
[]
</pre>
<p>You had to use bytes to see all filenames:</p>
<pre class="literal-block">
>>> os.listdir(b'x')
[b'nonascii\xff']
</pre>
<p>If the locale is POSIX or C, listdir() ignored silently all non-ASCII
filenames. Hopefully, <tt class="docutils literal">os.listdir()</tt> accepts <tt class="docutils literal">bytes</tt>, right? In fact, 4
months before the 3.0 final release, it was not the case.</p>
<p>Lying on the real content of a directory looks like a very bad idea. Well,
there is a rationale behind this design. Let me tell you this story which is
now 10 years old.</p>
<p><strong>This article is the first in a series of articles telling the history and
rationale of the Python 3 Unicode model for the operating system:</strong></p>
<ul class="simple">
<li><ol class="first arabic">
<li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li>
</ol>
</li>
<li><ol class="first arabic" start="2">
<li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li>
</ol>
</li>
<li><ol class="first arabic" start="3">
<li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li>
</ol>
</li>
<li><ol class="first arabic" start="4">
<li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li>
</ol>
</li>
<li><ol class="first arabic" start="5">
<li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li>
</ol>
</li>
<li><ol class="first arabic" start="6">
<li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li>
</ol>
</li>
</ul>
<div class="section" id="the-os-walk-bug">
<h2>The os.walk() bug</h2>
<a class="reference external image-reference" href="http://www.dailymail.co.uk/news/article-3592525/Classic-crashes-Incredible-black-white-photos-chaos-roads-early-days-automobile-beautiful-vintage-motors-smashing-trees-careering-canals-plummeting-bridges.html"><img alt="Boston Herald-Traveler photographer Leslie Jones had an eye for a dramatic scene, including when this seven-tonne dump truck plunged through the Warren Avenue bridge, in Boston" src="https://vstinner.github.io/images/car_accident_hole.jpg" /></a>
<p><a class="reference external" href="https://bugs.python.org/issue3187">bpo-3187</a>, june 2008: <strong>Helmut
Jarausch</strong> tested the <strong>first beta release of Python 3.0</strong> and reported a bug
on <tt class="docutils literal">os.walk()</tt> when he tried to walk into his home directory:</p>
<pre class="literal-block">
Traceback (most recent call last):
File "WalkBug.py", line 5, in <module>
for Dir, SubDirs, Files in os.walk('/home/jarausch') :
File "/usr/local/lib/python3.0/os.py", line 278, in walk
for x in walk(path, topdown, onerror, followlinks):
File "/usr/local/lib/python3.0/os.py", line 268, in walk
if isdir(join(top, name)):
File "/usr/local/lib/python3.0/posixpath.py", line 64, in join
if b.startswith('/'):
TypeError: expected an object with the buffer interface
</pre>
<p>In Python 3.0b1, <tt class="docutils literal">os.listdir(str)</tt> returned undecodable filenames as
<tt class="docutils literal">bytes</tt>. The caller must be prepared to get filenames as two types: <tt class="docutils literal">str</tt>
and <tt class="docutils literal">bytes</tt>: it wasn't the case for <tt class="docutils literal">os.walk()</tt> which failed with a
<tt class="docutils literal">TypeError</tt>.</p>
<p><strong>At the first look, the bug seems trivial to fix. In fact, many solutions were
proposed, it will take 4 months and 79 messages to fix the bug</strong>.</p>
</div>
<div class="section" id="i-proposed-a-new-filename-class">
<h2>I proposed a new Filename class</h2>
<p>August 2008, <a class="reference external" href="https://bugs.python.org/issue3187#msg71612">my first comment proposed</a> to use a custom "Filename" type
to store the original <tt class="docutils literal">bytes</tt> filename, but also gives a Unicode view of the
filename, in a single object, using an hypothetical <tt class="docutils literal">myformat()</tt> function:</p>
<pre class="literal-block">
class Filename:
def __init__(self, orig):
self.as_bytes = orig
self.as_str = myformat(orig)
def __str__(self):
return self.as_str
def __bytes__(self):
return self.as_bytes
</pre>
<p><strong>Antoine Pitrou</strong> suggested to inherit from <tt class="docutils literal">str</tt>:</p>
<blockquote>
I agree that logically it's the right solution. It's also the most
invasive. If that class is <strong>made a subclass of str</strong>, however, existing
code shouldn't break more than it currently does.</blockquote>
<p>I preferred to inherit from <tt class="docutils literal">bytes</tt> for pratical reasons. Antoine noted that
the native type for filenames on Windows is <tt class="docutils literal">str</tt>, and so inheriting from
<tt class="docutils literal">bytes</tt> can be an issue on Windows.</p>
<p>Anyway, <a class="reference external" href="https://bugs.python.org/issue3187#msg71749">Guido van Rossum disliked the idea</a> (comment on InvalidFilename, a
variant of the class):</p>
<blockquote>
I'm not interested in the InvalidFilename class; it's an API complification
that might seem right for your situation but <strong>will hinder most other
people</strong>.</blockquote>
</div>
<div class="section" id="guido-van-rossum-proposed-to-use-replace-error-handler">
<h2>Guido van Rossum proposed to use replace error handler</h2>
<p><strong>Guido van Rossum</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg71655">proposed to use the replace error handler</a> to prevent decoding error. For
example, <tt class="docutils literal">b'nonascii\xff'</tt> is decoded as <tt class="docutils literal">'nonascii�'</tt>.</p>
<p>The problem is that this filename cannot be used to read the file content using
<tt class="docutils literal">open()</tt> or to remove the file using <tt class="docutils literal">os.unlink()</tt>, since the operating
system doesn't know the Unicode filename containing the "�" character.</p>
<p>An important property is that <strong>encoding back the Unicode filename to bytes
must return the same original bytes filename</strong>.</p>
</div>
<div class="section" id="defer-the-choice-to-the-caller-pass-a-callback">
<h2>Defer the choice to the caller: pass a callback</h2>
<p>As no obvious choice arised, <a class="reference external" href="https://bugs.python.org/issue3187#msg71680">I proposed to use a callback to handle
undecodable filenames</a>.
Pseudo-code:</p>
<pre class="literal-block">
def listdir(path, fallback_decoder=default_fallback_decoder):
charset = sys.getfilesystemcharset()
dir_fd = opendir(path)
try:
for bytesname in readdir(dir_fd):
try:
name = str(bytesname, charset)
exept UnicodeDecodeError:
name = fallback_decoder(bytesname)
yield name
finally:
closedir(dir_fd)
</pre>
<p>The default behaviour is to raise an exception on decoding error:</p>
<pre class="literal-block">
def default_fallback_decoder(name):
raise
</pre>
<p>Example of callback returning the raw bytes string unchanged (Python 3.0 beta1
behaviour):</p>
<pre class="literal-block">
def return_undecodable_unchanged(name):
return name
</pre>
<p>Example to use a custom filename class:</p>
<pre class="literal-block">
class Filename:
...
def filename_decoder(name):
return Filename(name)
</pre>
<p><a class="reference external" href="https://bugs.python.org/issue3187#msg71699">Guido also disliked my callback idea</a>:</p>
<blockquote>
The callback variant is <strong>too complex</strong>; you could <strong>write it yourself by
using os.listdir() with a bytes argument</strong>.</blockquote>
</div>
<div class="section" id="emit-a-warning-on-undecodable-filename">
<h2>Emit a warning on undecodable filename</h2>
<a class="reference external image-reference" href="http://www.unicode.org/"><img alt="Warning: venoumous snakes" src="https://vstinner.github.io/images/warning_venomous_snakes.png" /></a>
<p>As ignoring undecodable filenames in <tt class="docutils literal">os.listdir(str)</tt> slowly became the most
popular option, <strong>Benjamin Peterson</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg71700">proposed to emit a warning</a> if a filename cannot be decoded,
to ease debugging:</p>
<blockquote>
(...) I don't like the idea of silently losing the contents of a directory.
That's asking for difficult to discover bugs. Could Python emit a warning
in this case?</blockquote>
<p>Guido van Rossum <a class="reference external" href="https://bugs.python.org/issue3187#msg71705">liked the idea</a>:</p>
<blockquote>
This may be the best compromise yet.</blockquote>
<p><strong>Amaury Forgeot d'Arc</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg73535">asked</a>:</p>
<blockquote>
Does the warning warn multiple times? IIRC the default behaviour is to warn
once.</blockquote>
<p><strong>Benjamin Peterson</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg73535">replied</a>:</p>
<blockquote>
<strong>Making a warning happen more than once is tricky because it requires
messing with the warnings filter.</strong> This of course takes away some of the
user's control which is one of the main reasons for using the Python
warning system in the first place.</blockquote>
<p>Because of this issue, the warning idea was abandonned.</p>
</div>
<div class="section" id="support-bytes-and-fix-os-listdir">
<h2>Support bytes and fix os.listdir()</h2>
<p>Guido repeated that the best workaround is to pass filenames as <tt class="docutils literal">bytes</tt>,
which is the native type for filenames on Unix, but most functions only
accepted filenames as <tt class="docutils literal">str</tt>.</p>
<p>I started to write multiple patches to support passing filenames as <tt class="docutils literal">bytes</tt>:</p>
<ul class="simple">
<li><tt class="docutils literal">posix_path_bytes.patch</tt>: enhance <tt class="docutils literal">posixpath.join()</tt></li>
<li><tt class="docutils literal">io_byte_filename.patch</tt>: enhance <tt class="docutils literal">open()</tt></li>
<li><tt class="docutils literal">fnmatch_bytes.patch</tt>: enhance <tt class="docutils literal">fnmatch.filter()</tt></li>
<li><tt class="docutils literal">glob1_bytes.patch</tt>: enhance <tt class="docutils literal">glob.glob()</tt></li>
<li><tt class="docutils literal">getcwd_bytes.patch</tt>: <tt class="docutils literal">os.getcwd()</tt> returns bytes if unicode conversion fails</li>
<li><tt class="docutils literal">merge_os_getcwd_getcwdu.patch</tt>: Remove <tt class="docutils literal">os.getcwdu()</tt>;
<tt class="docutils literal">os.getcwd(bytes=True)</tt> returns bytes</li>
<li><tt class="docutils literal">os_getcwdb.patch</tt>: Fix <tt class="docutils literal">os.getcwd()</tt> by using <tt class="docutils literal">PyUnicode_Decode()</tt> and
add <tt class="docutils literal">os.getcwdb()</tt> which returns <tt class="docutils literal">bytes</tt></li>
</ul>
<p>Guido van Rossum created a <a class="reference external" href="https://codereview.appspot.com/3055">review on my combined patches</a>. Then I also combined my patches into a
single <tt class="docutils literal">python3_bytes_filename.patch</tt> file.</p>
<p><strong>After one month of development, 6 versions of the combined patch, Guido
commited my big change</strong> as the <a class="reference external" href="https://github.com/python/cpython/commit/f0af3e30db9475ab68bcb1f1ce0b5581e214df76">commit f0af3e30</a>:</p>
<pre class="literal-block">
commit f0af3e30db9475ab68bcb1f1ce0b5581e214df76
Author: Guido van Rossum <guido@python.org>
Date: Thu Oct 2 18:55:37 2008 +0000
Issue #3187: Better support for "undecodable" filenames. Code by Victor
Stinner, with small tweaks by GvR.
Lib/fnmatch.py | 27 ++++---
Lib/genericpath.py | 5 +-
Lib/glob.py | 17 +++--
Lib/io.py | 15 ++--
Lib/posixpath.py | 171 +++++++++++++++++++++++++++++++-----------
Lib/test/test_fnmatch.py | 9 +++
Lib/test/test_posix.py | 2 +-
Lib/test/test_posixpath.py | 150 ++++++++++++++++++++++++++++++++----
Lib/test/test_unicode_file.py | 6 +-
Misc/NEWS | 10 ++-
Modules/posixmodule.c | 90 +++++++++-------------
11 files changed, 358 insertions(+), 144 deletions(-)
</pre>
<p>My change:</p>
<ul class="simple">
<li>Modify <tt class="docutils literal">os.listdir(str)</tt> to <strong>ignore silently undecodable filenames</strong>,
instead of returning them as <tt class="docutils literal">bytes</tt></li>
<li>Add <tt class="docutils literal">os.getcwdb()</tt> function: similar to <tt class="docutils literal">os.getcwd()</tt> but returns the
current working directory as <tt class="docutils literal">bytes</tt>.</li>
<li>Support <tt class="docutils literal">bytes</tt> paths:<ul>
<li><tt class="docutils literal">fnmatch.filter()</tt></li>
<li><tt class="docutils literal">glob.glob1()</tt></li>
<li><tt class="docutils literal">glob.iglob()</tt></li>
<li><tt class="docutils literal">open()</tt></li>
<li><tt class="docutils literal">os.path.isabs()</tt></li>
<li><tt class="docutils literal">os.path.issep()</tt></li>
<li><tt class="docutils literal">os.path.join()</tt></li>
<li><tt class="docutils literal">os.path.split()</tt></li>
<li><tt class="docutils literal">os.path.splitext()</tt></li>
<li><tt class="docutils literal">os.path.basename()</tt></li>
<li><tt class="docutils literal">os.path.dirname()</tt></li>
<li><tt class="docutils literal">os.path.splitdrive()</tt></li>
<li><tt class="docutils literal">os.path.ismount()</tt></li>
<li><tt class="docutils literal">os.path.expanduser()</tt></li>
<li><tt class="docutils literal">os.path.expandvars()</tt></li>
<li><tt class="docutils literal">os.path.normpath()</tt></li>
<li><tt class="docutils literal">os.path.abspath()</tt></li>
<li><tt class="docutils literal">os.path.realpath()</tt></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="more-bytes-patches">
<h2>More bytes patches</h2>
<p>I looked if other functions accepted passing filenames as <tt class="docutils literal">bytes</tt> and... I
was disappointed. It took me some years to fix the full Python standard
library. Example of issues between 2008 and 2010:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue4035">bpo-4035</a>: Support bytes in <tt class="docutils literal"><span class="pre">os.exec*()</span></tt></li>
<li><a class="reference external" href="https://bugs.python.org/issue4036">bpo-4036</a>: Support bytes in <tt class="docutils literal">subprocess.Popen()</tt></li>
<li><a class="reference external" href="https://bugs.python.org/issue8513">bpo-8513</a>: <tt class="docutils literal">subprocess</tt>: support bytes program name (POSIX)</li>
<li><a class="reference external" href="https://bugs.python.org/issue8514">bpo-8514</a>: Add <tt class="docutils literal">fsencode()</tt> functions to os module</li>
<li><a class="reference external" href="https://bugs.python.org/issue8603">bpo-8603</a>: Create a bytes version of <tt class="docutils literal">os.environ</tt> and <tt class="docutils literal">getenvb()</tt> -- Add <tt class="docutils literal">os.environb</tt></li>
<li><a class="reference external" href="https://bugs.python.org/issue8412">bpo-8412</a>: <tt class="docutils literal">os.system()</tt> doesn't support surrogates nor bytes</li>
<li><a class="reference external" href="https://bugs.python.org/issue8468">bpo-8468</a>: <tt class="docutils literal">bz2</tt> module: support surrogates in filename, and bytes/bytearray filename</li>
<li><a class="reference external" href="https://bugs.python.org/issue8477">bpo-8477</a>: <tt class="docutils literal">ssl</tt> module: support surrogates in filenames, and bytes/bytearray filenames</li>
<li><a class="reference external" href="https://bugs.python.org/issue8640">bpo-8640</a>: <tt class="docutils literal">subprocess:</tt> canonicalize env to bytes on Unix (Python3)</li>
<li><a class="reference external" href="https://bugs.python.org/issue8776">bpo-8776</a>: Bytes version of <tt class="docutils literal">sys.argv</tt> (REJECTED)</li>
</ul>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>At the first look, <strong>Helmut Jarausch</strong>'s <tt class="docutils literal">os.walk()</tt> bug looked trivial to
fix.</p>
<p>I proposed a <strong>new Filename class</strong> storing filenames as <tt class="docutils literal">bytes</tt> and <tt class="docutils literal">str</tt>,
but Guido van Rossum rejected the idea because this API complification
would <em>hinder most people</em>.</p>
<p>Guido van Rossum proposed to <strong>use the replace error handler</strong>, but decoded
filenames were not recognized by the operating system making them useless for
most cases.</p>
<p>I proposed to <strong>use callback to handle undecodable filenames</strong>, but Guido van
Rossum also rejected this idea because it was too complex and could be written
using os.listdir() with a bytes argument.</p>
<p>Benjamin Peterson proposed to <strong>emit a warning</strong> when a filename cannot be
decoded, but the idea was abandonned because of the warnings filters complexity
to emit the warning multiple times.</p>
<p>I wrote a big change modifying <tt class="docutils literal">os.listdir()</tt> to ignore silently undecodable
filenames, but also modify a lot of functions to also accept filenames as
<tt class="docutils literal">bytes</tt>. I made further changes the following years to fix the full Python
standard library to accept <tt class="docutils literal">bytes</tt>.</p>
<p>While it "only" took 4 months to fix the <tt class="docutils literal">os.listdir(str)</tt> issue, <strong>this kind
of bugs will keep me busy the next 10 years</strong> (2008-2018)...</p>
<p><strong>This article is the first in a series of articles telling the history and
rationale of the Python 3 Unicode model for the operating system.</strong></p>
</div>
How I fixed a very old GIL race condition in Python 3.72018-03-08T10:00:00+01:002018-03-08T10:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-08:/python37-gil-change.html<p><strong>It took me 4 years to fix a nasty bug in the famous Python GIL</strong> (Global
Interpreter Lock), one of the most critical part of Python. I had to dig the
Git history to find a <strong>change made 26 years ago</strong> by <strong>Guido van Rossum</strong>: at
this time, <em>threads were …</em></p><p><strong>It took me 4 years to fix a nasty bug in the famous Python GIL</strong> (Global
Interpreter Lock), one of the most critical part of Python. I had to dig the
Git history to find a <strong>change made 26 years ago</strong> by <strong>Guido van Rossum</strong>: at
this time, <em>threads were something esoteric</em>. Let me tell you my story.</p>
<div class="section" id="fatal-python-error-caused-by-a-c-thread-and-the-gil">
<h2>Fatal Python error caused by a C thread and the GIL</h2>
<p>In March 2014, <strong>Steve Dower</strong> reported the bug <a class="reference external" href="https://bugs.python.org/issue20891">bpo-20891</a> when a "C thread" uses the Python C
API:</p>
<blockquote>
<p>In Python 3.4rc3, calling <tt class="docutils literal">PyGILState_Ensure()</tt> from a thread that was
not created by Python and without any calls to <tt class="docutils literal">PyEval_InitThreads()</tt>
will cause a fatal exit:</p>
<p><tt class="docutils literal">Fatal Python error: take_gil: NULL tstate</tt></p>
</blockquote>
<p>My first comment:</p>
<blockquote>
IMO it's a bug in <tt class="docutils literal">PyEval_InitThreads()</tt>.</blockquote>
<a class="reference external image-reference" href="https://twitter.com/kwinkunks/status/619496450834087938"><img alt="Release the GIL!" src="https://vstinner.github.io/images/release_the_gil.png" /></a>
</div>
<div class="section" id="pygilstate-ensure-fix">
<h2>PyGILState_Ensure() fix</h2>
<p>I forgot the bug during 2 years. In March 2016, I modified Steve's test
program to make it compatible with Linux (the test was written for Windows). I
succeeded to reproduce the bug on my computer and I wrote a fix for
<tt class="docutils literal">PyGILState_Ensure()</tt>.</p>
<p>One year later, november 2017, <strong>Marcin Kasperski</strong> asked:</p>
<blockquote>
Is this fix released? I can't find it in the changelog…</blockquote>
<p>Oops, again, I completely forgot this issue! This time, not only I <strong>applied my
PyGILState_Ensure() fix</strong>, but I also wrote the <strong>unit test</strong>
<tt class="docutils literal">test_embed.test_bpo20891()</tt>:</p>
<blockquote>
Ok, the bug is now fixed in Python 2.7, 3.6 and master (future 3.7). On 3.6
and master, the fix comes with an unit test.</blockquote>
<p>My fix for the master branch, commit <a class="reference external" href="https://github.com/python/cpython/commit/b4d1e1f7c1af6ae33f0e371576c8bcafedb099db">b4d1e1f7</a>:</p>
<pre class="literal-block">
bpo-20891: Fix PyGILState_Ensure() (#4650)
When PyGILState_Ensure() is called in a non-Python thread before
PyEval_InitThreads(), only call PyEval_InitThreads() after calling
PyThreadState_New() to fix a crash.
Add an unit test in test_embed.
</pre>
<p>And I closed the issue <a class="reference external" href="https://bugs.python.org/issue20891">bpo-20891</a>...</p>
</div>
<div class="section" id="random-crash-of-the-test-on-macos">
<h2>Random crash of the test on macOS</h2>
<p>Everything was fine... but one week later, I noticed <strong>random</strong> crashes on
macOS buildbots on my newly added unit test. I succeeded to reproduce the bug
manually, example of crash at the 3rd run:</p>
<pre class="literal-block">
macbook:master haypo$ while true; do ./Programs/_testembed bpo20891 ||break; date; done
Lun 4 déc 2017 12:46:34 CET
Lun 4 déc 2017 12:46:34 CET
Lun 4 déc 2017 12:46:34 CET
Fatal Python error: PyEval_SaveThread: NULL tstate
Current thread 0x00007fffa5dff3c0 (most recent call first):
Abort trap: 6
</pre>
<p><tt class="docutils literal">test_embed.test_bpo20891()</tt> on macOS showed a race condition in
<tt class="docutils literal">PyGILState_Ensure()</tt>: the creation of the GIL lock itself... was not
protected by a lock! Adding a new lock to check if Python currently has the GIL
lock doesn't make sense...</p>
<p>I proposed an incomplete fix for <tt class="docutils literal">PyThread_start_new_thread()</tt>:</p>
<blockquote>
I found a working fix: call <tt class="docutils literal">PyEval_InitThreads()</tt> in
<tt class="docutils literal">PyThread_start_new_thread()</tt>. So the GIL is created as soon as a second
thread is spawned. The GIL cannot be created anymore while two threads are
running. At least, with the <tt class="docutils literal">python</tt> binary. It doesn't fix the issue if
a thread is not spawned by Python, but this thread calls
<tt class="docutils literal">PyGILState_Ensure()</tt>.</blockquote>
</div>
<div class="section" id="why-not-always-create-the-gil">
<h2>Why not always create the GIL?</h2>
<p><strong>Antoine Pitrou</strong> asked a simple question:</p>
<blockquote>
Why not <em>always</em> call <tt class="docutils literal">PyEval_InitThreads()</tt> at interpreter
initialization? Are there any downsides?</blockquote>
<p>Thanks to <tt class="docutils literal">git blame</tt> and <tt class="docutils literal">git log</tt>, I found the origin of the code
creating the GIL "on demand", <strong>a change made 26 years ago</strong>!</p>
<pre class="literal-block">
commit 1984f1e1c6306d4e8073c28d2395638f80ea509b
Author: Guido van Rossum <guido@python.org>
Date: Tue Aug 4 12:41:02 1992 +0000
* Makefile adapted to changes below.
* split pythonmain.c in two: most stuff goes to pythonrun.c, in the library.
* new optional built-in threadmodule.c, build upon Sjoerd's thread.{c,h}.
* new module from Sjoerd: mmmodule.c (dynamically loaded).
* new module from Sjoerd: sv (svgen.py, svmodule.c.proto).
* new files thread.{c,h} (from Sjoerd).
* new xxmodule.c (example only).
* myselect.h: bzero -> memset
* select.c: bzero -> memset; removed global variable
(...)
+void
+init_save_thread()
+{
+#ifdef USE_THREAD
+ if (interpreter_lock)
+ fatal("2nd call to init_save_thread");
+ interpreter_lock = allocate_lock();
+ acquire_lock(interpreter_lock, 1);
+#endif
+}
+#endif
</pre>
<p>My guess was that the intent of dynamically created GIL is to reduce the
"overhead" of the GIL for applications only using a single Python thread (never
spawn a new Python thread).</p>
<p>Luckily, <strong>Guido van Rossum</strong> was around and was able to elaborate the
rationale:</p>
<blockquote>
Yeah, the original reasoning was that <strong>threads were something esoteric and
not used by most code</strong>, and at the time we definitely felt that <strong>always
using the GIL would cause a (tiny) slowdown</strong> and <strong>increase the risk of
crashes</strong> due to bugs in the GIL code. I'd be happy to learn that we no
longer need to worry about this and <strong>can just always initialize it</strong>.</blockquote>
</div>
<div class="section" id="second-fix-for-py-initialize-proposed">
<h2>Second fix for Py_Initialize() proposed</h2>
<p>I proposed a <strong>second fix</strong> for <tt class="docutils literal">Py_Initialize()</tt> to always create the GIL as
soon as Python starts, and no longer "on demand", to prevent any risk of a race
condition:</p>
<pre class="literal-block">
+ /* Create the GIL */
+ PyEval_InitThreads();
</pre>
<p><strong>Nick Coghlan</strong> asked if I could you run my patch through the performance
benchmarks. I ran <a class="reference external" href="http://pyperformance.readthedocs.io/">pyperformance</a> on my <a class="reference external" href="https://github.com/python/cpython/pull/4700/">PR 4700</a>. Differences of at least 5%:</p>
<pre class="literal-block">
haypo@speed-python$ python3 -m perf compare_to \
2017-12-18_12-29-master-bd6ec4d79e85.json.gz \
2017-12-18_12-29-master-bd6ec4d79e85-patch-4700.json.gz \
--table --min-speed=5
+----------------------+--------------------------------------+-------------------------------------------------+
| Benchmark | 2017-12-18_12-29-master-bd6ec4d79e85 | 2017-12-18_12-29-master-bd6ec4d79e85-patch-4700 |
+======================+======================================+=================================================+
| pathlib | 41.8 ms | 44.3 ms: 1.06x slower (+6%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| scimark_monte_carlo | 197 ms | 210 ms: 1.07x slower (+7%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| spectral_norm | 243 ms | 269 ms: 1.11x slower (+11%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| sqlite_synth | 7.30 us | 8.13 us: 1.11x slower (+11%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| unpickle_pure_python | 707 us | 796 us: 1.13x slower (+13%) |
+----------------------+--------------------------------------+-------------------------------------------------+
Not significant (55): 2to3; chameleon; chaos; (...)
</pre>
<p>Oh, 5 benchmarks were slower. Performance regressions are not welcome in
Python: we are working hard on <a class="reference external" href="https://lwn.net/Articles/725114/">making Python faster</a>!</p>
</div>
<div class="section" id="skip-the-failing-test-before-christmas">
<h2>Skip the failing test before Christmas</h2>
<p>I didn't expect that 5 benchmarks would be slower. It required further
investigation, but I didn't have time for that and I was too shy or ashame to
take the responsibility of pushing a performance regression.</p>
<p>Before the christmas holiday, no decision was taken whereas
<tt class="docutils literal">test_embed.test_bpo20891()</tt> was still failing randomly on macOS buildbots.
I <strong>was not confortable to touch a critical part of Python</strong>, its GIL, just
before leaving for two weeks. So I decided to skip <tt class="docutils literal">test_bpo20891()</tt> until
I'm back.</p>
<p>No gift for you, Python 3.7.</p>
<a class="reference external image-reference" href="https://drawception.com/panel/drawing/0teL3336/charlie-brown-sad-about-small-christmas-tree/"><img alt="Sad Christmas tree" src="https://vstinner.github.io/images/sad_christmas_tree.png" /></a>
</div>
<div class="section" id="new-benchmark-run-and-second-fix-applied-to-master">
<h2>New benchmark run and second fix applied to master</h2>
<p>At the end of january 2018, I ran again the 5 benchmarks made slower by my PR.
I ran these benchmarks manually on my laptop using CPU isolation:</p>
<pre class="literal-block">
vstinner@apu$ python3 -m perf compare_to ref.json patch.json --table
Not significant (5): unpickle_pure_python; sqlite_synth; spectral_norm; pathlib; scimark_monte_carlo
</pre>
<p>Ok, it confirms that my second fix has <strong>no significant impact on
performances</strong> according to the <a class="reference external" href="http://pyperformance.readthedocs.io/">Python "performance" benchmark suite</a>.</p>
<p>I decided to <strong>push my fix</strong> to the master branch, commit <a class="reference external" href="https://github.com/python/cpython/commit/2914bb32e2adf8dff77c0ca58b33201bc94e398c">2914bb32</a>:</p>
<pre class="literal-block">
bpo-20891: Py_Initialize() now creates the GIL (#4700)
The GIL is no longer created "on demand" to fix a race condition when
PyGILState_Ensure() is called in a non-Python thread.
</pre>
<p>Then I reenabled <tt class="docutils literal">test_embed.test_bpo20891()</tt> on the master branch.</p>
</div>
<div class="section" id="no-second-fix-for-python-2-7-and-3-6-sorry">
<h2>No second fix for Python 2.7 and 3.6, sorry!</h2>
<p><strong>Antoine Pitrou</strong> considered that backport for Python 3.6 <a class="reference external" href="https://github.com/python/cpython/pull/5421#issuecomment-361214537">should not be
merged</a>:</p>
<blockquote>
I don't think so. People can already call <tt class="docutils literal">PyEval_InitThreads()</tt>.</blockquote>
<p><strong>Guido van Rossum</strong> didn't want to backport this change neither. So I only
removed <tt class="docutils literal">test_embed.test_bpo20891()</tt> from the 3.6 branch.</p>
<p>I didn't apply my second fix to Python 2.7 neither for the same reason.
Moreover, Python 2.7 has no unit test, since it was too difficult to backport
it.</p>
<p>At least, Python 2.7 and 3.6 got my first <tt class="docutils literal">PyGILState_Ensure()</tt> fix.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>Python still has some race conditions in corner cases. Such bug was found in
the creation of the GIL when a C thread starts using the Python API. I pushed a
first fix, but a new and different race condition was found on macOS.</p>
<p>I had to dig into the very old history (1992) of the Python GIL. Luckily,
<strong>Guido van Rossum</strong> was also able to elaborate the rationale.</p>
<p>After a glitch in benchmarks, we agreed to modify Python 3.7 to always create
the GIL, instead of creating the GIL "on demand". The change has no significant
impact on performances.</p>
<p>It was also decided to leave Python 2.7 and 3.6 unchanged, to prevent any risk
of regression: continue to create the GIL "on demand".</p>
<p><strong>It took me 4 years to fix a nasty bug in the famous Python GIL.</strong> I am never
confortable when touching such <strong>critical part</strong> of Python. I am now happy that
the bug is behind us: it's now fully fixed in the future Python 3.7!</p>
<p>See <a class="reference external" href="https://bugs.python.org/issue20891">bpo-20891</a> for the full story.
Thanks to all developers who helped me to fix this bug!</p>
</div>
Python 3.7 nanoseconds2018-03-06T16:30:00+01:002018-03-06T16:30:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-06:/python37-pep-564-nanoseconds.html<p>Thanks to my <a class="reference external" href="https://vstinner.github.io/python37-perf-counter-nanoseconds.html">latest change on time.perf_counter()</a>, all Python 3.7 clocks now use
nanoseconds as integer internally. It became possible to propose again my old
idea of getting time as nanoseconds at Python level and so I wrote a new
<a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a> "Add new time functions with nanosecond …</p><p>Thanks to my <a class="reference external" href="https://vstinner.github.io/python37-perf-counter-nanoseconds.html">latest change on time.perf_counter()</a>, all Python 3.7 clocks now use
nanoseconds as integer internally. It became possible to propose again my old
idea of getting time as nanoseconds at Python level and so I wrote a new
<a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a> "Add new time functions with nanosecond resolution". While the PEP
was discussed, I also deprecated <tt class="docutils literal">time.clock()</tt> and removed
<tt class="docutils literal">os.stat_float_times()</tt>.</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/dkalo/2909921582/"><img alt="Old clock" src="https://vstinner.github.io/images/clock.jpg" /></a>
<div class="section" id="time-clock">
<h2>time.clock()</h2>
<p>Since I wrote the <a class="reference external" href="https://peps.python.org/pep-0418">PEP 418</a> "Add monotonic time, performance counter, and
process time functions" in 2012, I dislike <tt class="docutils literal">time.clock()</tt>. This clock is not
portable: on Windows it mesures wall-clock, whereas it measures CPU time on
Unix. Extract of <a class="reference external" href="https://docs.python.org/dev/library/time.html#time.clock">time.clock() documentation</a>:</p>
<blockquote>
<em>Deprecated since version 3.3: The behaviour of this function depends on
the platform: use perf_counter() or process_time() instead, depending on
your requirements, to have a well defined behaviour.</em></blockquote>
<p>My PEP 418 deprecated <tt class="docutils literal">time.clock()</tt> in the documentation. In <a class="reference external" href="https://bugs.python.org/issue31803">bpo-31803</a>, I modified <tt class="docutils literal">time.clock()</tt> and
<tt class="docutils literal"><span class="pre">time.get_clock_info('clock')</span></tt> to also emit a <tt class="docutils literal">DeprecationWarning</tt> warning.
I replaced <tt class="docutils literal">time.clock()</tt> with <tt class="docutils literal">time.perf_counter()</tt> in tests and demos. I
also removed <tt class="docutils literal">hasattr(time, 'monotonic')</tt> in <tt class="docutils literal">test_time</tt> since
<tt class="docutils literal">time.monotonic()</tt> is always available since Python 3.5.</p>
</div>
<div class="section" id="os-stat-float-times">
<h2>os.stat_float_times()</h2>
<p>The <tt class="docutils literal">os.stat_float_times()</tt> function was introduced in Python 2.3 to get file
modification times with sub-second resolution (commit <a class="reference external" href="https://github.com/python/cpython/commit/f607bdaa77475ec8c94614414dc2cecf8fd1ca0a">f607bdaa</a>),
the default was still to get time as seconds (integer). The function was
introduced to get a smooth transition to time as floating point number, to keep
the backward compatibility with Python 2.2.</p>
<p><tt class="docutils literal">os.stat()</tt> was modified to return time as float by default in Python 2.5
(commit <a class="reference external" href="https://github.com/python/cpython/commit/fe33d0ba87f5468b50f939724b303969711f3be5">fe33d0ba</a>).
Python 2.5 was released 11 years ago, I consider that people had enough time to
migrate their code to float time :-) I modified <tt class="docutils literal">os.stat_float_times()</tt> in
Python 3.1 to emit a <tt class="docutils literal">DeprecationWarning</tt> warning (commit <a class="reference external" href="https://github.com/python/cpython/commit/034d0aa2171688c40cee1a723ddcdb85bbce31e8">034d0aa2</a>
of <a class="reference external" href="https://bugs.python.org/issue14711">bpo-14711</a>).</p>
<p>Finally, I removed <tt class="docutils literal">os.stat_float_times()</tt> in Python 3.7: <a class="reference external" href="https://bugs.python.org/issue31827">bpo-31827</a>.</p>
<p>Serhiy Storchaka proposed to also remove last three items from
<tt class="docutils literal">os.stat_result</tt>. For example, <tt class="docutils literal">stat_result[stat.ST_MTIME]</tt> could be
replaced with <tt class="docutils literal">stat_result.st_time</tt>. But I tried to remove these items and
it broke the <tt class="docutils literal">logging</tt> module, so I decided to leave it unchanged.</p>
</div>
<div class="section" id="pep-564-time-time-ns">
<h2>PEP 564: time.time_ns()</h2>
<p>Six years ago (2012), I wrote the <a class="reference external" href="https://peps.python.org/pep-0410">PEP 410</a> "Use decimal.Decimal type for
timestamps" which proposes a large and complex change in all Python functions
returning time to support nanosecond resolution using the <tt class="docutils literal">decimal.Decimal</tt>
type. The PEP was <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2012-February/116837.html">rejected for different reasons</a>.</p>
<p>Since all clock now use nanoseconds internally in Python 3.7, I proposed a new
<a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a> "Add new time functions with nanosecond resolution". Abstract:</p>
<blockquote>
<p>Add six new "nanosecond" variants of existing functions to the <tt class="docutils literal">time</tt>
module: <tt class="docutils literal">clock_gettime_ns()</tt>, <tt class="docutils literal">clock_settime_ns()</tt>,
<tt class="docutils literal">monotonic_ns()</tt>, <tt class="docutils literal">perf_counter_ns()</tt>, <tt class="docutils literal">process_time_ns()</tt> and
<tt class="docutils literal">time_ns()</tt>. While similar to the existing functions without the
<tt class="docutils literal">_ns</tt> suffix, they provide nanosecond resolution: they return a number of
nanoseconds as a Python <tt class="docutils literal">int</tt>.</p>
<p>The <tt class="docutils literal">time.time_ns()</tt> resolution is 3 times better than the <tt class="docutils literal">time.time()</tt>
resolution on Linux and Windows.</p>
</blockquote>
<p>People were now convinced by the need for nanosecond resolution, so I
added an "Issues caused by precision loss" section with 2 examples:</p>
<ul class="simple">
<li>Example 1: measure time delta in long-running process</li>
<li>Example 2: compare times with different resolution</li>
</ul>
<p>As for my previous PEP 410, many people proposed many alternatives recorded in
the PEP: sub-nanosecond resolution, modifying <tt class="docutils literal">time.time()</tt> result type,
different types, different API, a new module, etc.</p>
<p>Hopefully for me, Guido van Rossum quickly approved my PEP for Python 3.7!</p>
</div>
<div class="section" id="implementaton-of-the-pep-564">
<h2>Implementaton of the PEP 564</h2>
<p>I implemented my PEP 564 in <a class="reference external" href="https://bugs.python.org/issue31784">bpo-31784</a>
with the commit <a class="reference external" href="https://github.com/python/cpython/commit/c29b585fd4b5a91d17fc5dd41d86edff28a30da3">c29b585f</a>.
I added 6 new time functions:</p>
<ul class="simple">
<li><tt class="docutils literal">time.clock_gettime_ns()</tt></li>
<li><tt class="docutils literal">time.clock_settime_ns()</tt></li>
<li><tt class="docutils literal">time.monotonic_ns()</tt></li>
<li><tt class="docutils literal">time.perf_counter_ns()</tt></li>
<li><tt class="docutils literal">time.process_time_ns()</tt></li>
<li><tt class="docutils literal">time.time_ns()</tt></li>
</ul>
<p>Example:</p>
<pre class="literal-block">
$ python3.7
Python 3.7.0b2+ (heads/3.7:31e2b76f7b, Mar 6 2018, 15:31:29)
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
>>> import time
>>> time.time()
1520354387.7663522
>>> time.time_ns()
1520354388319257562
</pre>
<p>I also added tests on <tt class="docutils literal">os.times()</tt> in <tt class="docutils literal">test_os</tt>, previously the function
wasn't tested at all!</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>I added 6 new functions to get time with a nanosecond resolution like
<tt class="docutils literal">time.time_ns()</tt> with my approved <a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a>. I also modified
<tt class="docutils literal">time.clock()</tt> to emit a <tt class="docutils literal">DeprecationWarning</tt> and I removed the legacy
<tt class="docutils literal">os.stat_float_times()</tt> function.</p>
</div>
Python 3.7 perf_counter() nanoseconds2018-03-06T15:00:00+01:002018-03-06T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-06:/python37-perf-counter-nanoseconds.html<p>Since 2012, I have been trying to convert all Python clocks to use internally
nanoseconds. The last clock which still used floating point internally was
<tt class="docutils literal">time.perf_counter()</tt>. INADA Naoki's new importtime tool was an opportunity
for me to have a new look on a tricky integer overflow issue.</p>
<div class="section" id="modify-importtime-to-use-time-perf-counter-clock">
<h2>Modify importtime …</h2></div><p>Since 2012, I have been trying to convert all Python clocks to use internally
nanoseconds. The last clock which still used floating point internally was
<tt class="docutils literal">time.perf_counter()</tt>. INADA Naoki's new importtime tool was an opportunity
for me to have a new look on a tricky integer overflow issue.</p>
<div class="section" id="modify-importtime-to-use-time-perf-counter-clock">
<h2>Modify importtime to use time.perf_counter() clock</h2>
<p>INADA Naoki added to Python 3.7 a new cool <a class="reference external" href="https://docs.python.org/dev/using/cmdline.html#id5">-X importtime</a> command line option to
analyze the Python import performance. This tool can be used optimize the
startup time of your application. Example:</p>
<pre class="literal-block">
vstinner@apu$ ./python -X importtime -c pass
import time: self [us] | cumulative | imported package
(...)
import time: 901 | 1902 | io
import time: 374 | 374 | _stat
import time: 663 | 1037 | stat
import time: 617 | 617 | genericpath
import time: 877 | 1493 | posixpath
import time: 3840 | 3840 | _collections_abc
import time: 2106 | 8474 | os
import time: 674 | 674 | _sitebuiltins
import time: 922 | 922 | sitecustomize
import time: 598 | 598 | usercustomize
import time: 1444 | 12110 | site
</pre>
<p>Read Naoki's article <a class="reference external" href="https://dev.to/methane/how-to-speed-up-python-application-startup-time-nkf">How to speed up Python application startup time</a>
(Jan 19, 2018) for a concrete analysis of <tt class="docutils literal">pipenv</tt> performance.</p>
<p>Naoki chose to use the <tt class="docutils literal">time.monotonic()</tt> clock internally to measure elapsed
time. On Windows, this clock (<tt class="docutils literal">GetTickCount64()</tt> function) has a resolution
around 15.6 ms, whereas most Python imports take less than 10 ms, and so most
numbers are just zeros. Example:</p>
<pre class="literal-block">
f:\dev\3x>python -X importtime -c "import idlelib.pyshell"
Running Debug|Win32 interpreter...
import time: self [us] | cumulative | imported package
import time: 0 | 0 | _codecs
import time: 0 | 0 | codecs
import time: 0 | 0 | encodings.aliases
import time: 15000 | 15000 | encodings
import time: 0 | 0 | encodings.utf_8
import time: 0 | 0 | _signal
import time: 0 | 0 | encodings.latin_1
import time: 0 | 0 | _weakrefset
import time: 0 | 0 | abc
import time: 0 | 0 | io
import time: 0 | 0 | _stat
(...)
</pre>
<p>In <a class="reference external" href="https://bugs.python.org/issue31415">bpo-31415</a>, I fixed the issue by
adding a new C function <tt class="docutils literal">_PyTime_GetPerfCounter()</tt> to access the
<tt class="docutils literal">time.perf_counter()</tt> clock at the C level and I modified "importtime" to use
it.</p>
<p>Problem solved! ... almost...</p>
</div>
<div class="section" id="double-integer-float-conversions">
<h2>Double integer-float conversions</h2>
<p>My commit <a class="reference external" href="https://github.com/python/cpython/commit/a997c7b434631f51e00191acea2ba6097691e859">a997c7b4</a>
of <a class="reference external" href="https://bugs.python.org/issue31415">bpo-31415</a> adding
<tt class="docutils literal">_PyTime_GetPerfCounter()</tt> moved the C code from <tt class="docutils literal">Modules/timemodule.c</tt> to
<tt class="docutils literal">Python/pytime.c</tt>, but also changed the internal type storing time from
floatting point number (C <tt class="docutils literal">double</tt>) to integer number (<tt class="docutils literal">_PyTyime_t</tt> which
is <tt class="docutils literal">int64_t</tt> in practice).</p>
<p>The drawback of this change is that <tt class="docutils literal">time.perf_counter()</tt> now converts
<tt class="docutils literal">QueryPerformanceCounter() / QueryPerformanceFrequency()</tt> float into a
<tt class="docutils literal">_PyTime_t</tt> (integer) and then back to a float, and these conversions cause a
precision loss. I computed that the conversions start to loose precision
starting after a single second with <tt class="docutils literal">QueryPerformanceFrequency()</tt> equals to
<tt class="docutils literal">3,579,545</tt> (3.6 MHz).</p>
<p>To fix the precision loss, I modified again <tt class="docutils literal">time.clock()</tt> and
<tt class="docutils literal">time.perf_counter()</tt> to not use <tt class="docutils literal">_PyTime_t</tt> anymore, only double.</p>
</div>
<div class="section" id="grumpy-victor">
<h2>Grumpy Victor</h2>
<img alt="Grumpy" src="https://vstinner.github.io/images/grumpy.jpg" />
<p>My change to replace <tt class="docutils literal">_PyTime_t</tt> with <tt class="docutils literal">double</tt> made me grumpy. I have been
trying to convert all Python clocks to <tt class="docutils literal">_PyTime_t</tt> since 6 years (2012).</p>
<p>Being blocked by a single clock made me grumpy, especially because the issue
is specific to the Windows implementation. The Linux implementation of
<tt class="docutils literal">time.perf_counter()</tt> uses <tt class="docutils literal">clock_gettime()</tt> which directly returns
nanoseconds as integers, no division needed to get time as <tt class="docutils literal">_PyTime_t</tt>.</p>
<p>I looked at the clock sources in the Linux kernel source code:
<a class="reference external" href="https://github.com/torvalds/linux/blob/master/kernel/time/clocksource.c">kernel/time/clocksource.c</a>.
Linux clocks only use integers and support nanosecond resolution. I'm always
impressed by the quality of the Linux kernel source code, the code is
straightforward C code. If Linux is able to use integers for various kinds of
clocks, I should be able to use integers for my specific Windows
implementations of <tt class="docutils literal">time.perf_counter()</tt>, no?</p>
<p>In practice, the <tt class="docutils literal">_PyTime_t</tt> is a number of nanoseconds, so the computation
is:</p>
<pre class="literal-block">
(QueryPerformanceCounter() * 1_000_000_000) / QueryPerformanceFrequency()
</pre>
<p>where <tt class="docutils literal">1_000_000_000</tt> is the number of nanoseconds in one second. <strong>The problem
is preventing integer overflow</strong> on the first part, using <tt class="docutils literal">_PyTime_t</tt> which is
<tt class="docutils literal">int64_t</tt> in practice:</p>
<pre class="literal-block">
QueryPerformanceCounter() * 1_000_000_000
</pre>
</div>
<div class="section" id="some-maths-to-avoid-the-precision-loss">
<h2>Some maths to avoid the precision loss</h2>
<p>Using a pencil, a sheet of paper and some maths, I found a solution!</p>
<pre class="literal-block">
(a * b) / q == (a / q) * b + ((a % q) * b) / q
</pre>
<img alt="Math rocks" src="https://vstinner.github.io/images/math_rocks.jpg" />
<p>This prevents the risk of integer overflow. C implementation:</p>
<pre class="literal-block">
Py_LOCAL_INLINE(_PyTime_t)
_PyTime_MulDiv(_PyTime_t ticks, _PyTime_t mul, _PyTime_t div)
{
_PyTime_t intpart, remaining;
/* Compute (ticks * mul / div) in two parts to prevent integer overflow:
compute integer part, and then the remaining part.
(ticks * mul) / div == (ticks / div) * mul + (ticks % div) * mul / div
The caller must ensure that "(div - 1) * mul" cannot overflow. */
intpart = ticks / div;
ticks %= div;
remaining = ticks * mul;
remaining /= div;
return intpart * mul + remaining;
}
</pre>
<p>Simplified Windows implementation of perf_counter():</p>
<pre class="literal-block">
_PyTime_t win_perf_counter(void)
{
LARGE_INTEGER freq;
LONGLONG frequency;
LARGE_INTEGER now;
LONGLONG ticksll;
_PyTime_t ticks;
(void)QueryPerformanceFrequency(&freq);
frequency = freq.QuadPart;
QueryPerformanceCounter(&now);
ticksll = now.QuadPart;
ticks = (_PyTime_t)ticksll;
return _PyTime_MulDiv(ticks, SEC_TO_NS, (_PyTime_t)frequency);
}
</pre>
<p>On Windows, I added the following sanity checks to make sure that integer
overflows cannot occur:</p>
<pre class="literal-block">
/* Check that frequency can be casted to _PyTime_t.
Make also sure that (ticks * SEC_TO_NS) cannot overflow in
_PyTime_MulDiv(), with ticks < frequency.
Known QueryPerformanceFrequency() values:
* 10,000,000 (10 MHz): 100 ns resolution
* 3,579,545 Hz (3.6 MHz): 279 ns resolution
None of these frequencies can overflow with 64-bit _PyTime_t, but
check for overflow, just in case. */
if (frequency > _PyTime_MAX
|| frequency > (LONGLONG)_PyTime_MAX / (LONGLONG)SEC_TO_NS) {
PyErr_SetString(PyExc_OverflowError,
"QueryPerformanceFrequency is too large");
return -1;
}
</pre>
<p>Since I also modified the macOS implementation of <tt class="docutils literal">time.monotonic()</tt> to use
<tt class="docutils literal">_PyTime_MulDiv()</tt>, I also added this check for macOS:</p>
<pre class="literal-block">
/* Make sure that (ticks * timebase.numer) cannot overflow in
_PyTime_MulDiv(), with ticks < timebase.denom.
Known time bases:
* always (1, 1) on Intel
* (1000000000, 33333335) or (1000000000, 25000000) on PowerPC
None of these time bases can overflow with 64-bit _PyTime_t, but
check for overflow, just in case. */
if ((_PyTime_t)timebase.numer > _PyTime_MAX / (_PyTime_t)timebase.denom) {
PyErr_SetString(PyExc_OverflowError,
"mach_timebase_info is too large");
return -1;
}
</pre>
</div>
<div class="section" id="pytime-c-source-code">
<h2>pytime.c source code</h2>
<p>If you are curious, the full code lives at <a class="reference external" href="https://github.com/python/cpython/blob/master/Python/pytime.c">Python/pytime.c</a> and is
currently around 1,100 lines of C code.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>INADA Naoki's importtime tool was using <tt class="docutils literal">time.monotonic()</tt> clock which failed
to measure short import times on Windows. I modified it to use
<tt class="docutils literal">time.perf_counter()</tt> internally to get better precision on Windows. I
identified a precision loss caused by my internal <tt class="docutils literal">_PyTime_t</tt> type to store
time as nanoseconds. Thanks to maths, I succeeded to use nanoseconds and
prevent any risk of integer overflow.</p>
</div>
My contributions to CPython during 2017 Q3: Part 3 (funny bugs)2017-10-19T16:00:00+02:002017-10-19T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-10-19:/contrib-cpython-2017q3-part3.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3
(july, august, september), Part 3 (funny bugs).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling
threads)</a>.</p>
<p>Summary:</p>
<ul class="simple">
<li>FreeBSD bug: minor() device regression</li>
<li>regrtest snowball effect when hunting memory leaks</li>
<li>Bugfixes</li>
<li>Other Changes</li>
</ul>
<div class="section" id="freebsd-bug-minor-device-regression">
<h2>FreeBSD bug: minor() device regression</h2>
<a class="reference external image-reference" href="https://www.freebsd.org/"><img alt="Logo of the FreeBSD project" src="https://vstinner.github.io/images/freebsd.png" /></a>
<p><a class="reference external" href="https://bugs.python.org/issue31044">bpo-31044</a>: The …</p></div><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3
(july, august, september), Part 3 (funny bugs).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling
threads)</a>.</p>
<p>Summary:</p>
<ul class="simple">
<li>FreeBSD bug: minor() device regression</li>
<li>regrtest snowball effect when hunting memory leaks</li>
<li>Bugfixes</li>
<li>Other Changes</li>
</ul>
<div class="section" id="freebsd-bug-minor-device-regression">
<h2>FreeBSD bug: minor() device regression</h2>
<a class="reference external image-reference" href="https://www.freebsd.org/"><img alt="Logo of the FreeBSD project" src="https://vstinner.github.io/images/freebsd.png" /></a>
<p><a class="reference external" href="https://bugs.python.org/issue31044">bpo-31044</a>: The test_makedev() of
test_posix started to fail in the build 632 (Wed Jul 26 10:47:01 2017) of AMD64
FreeBSD CURRENT. The test failed on Debug, but also Non-Debug buildbots, in
master and 3.6 branches. It looks more like a change on the buildbot, maybe a
FreeBSD upgrade?</p>
<p>Thanks to <strong>koobs</strong>, I have a SSH access to the buildbot. I was able to
reproduce the bug manually. I noticed that minor() truncates most significant
bits.</p>
<p>I continued my analysis and I found that, at May 23, the FreeBSD <tt class="docutils literal">dev_t</tt> type
changed from 32 bits to 64 bits in the kernel, but the <tt class="docutils literal">minor()</tt> userland
function was not updated.</p>
<p>I reported a bug to FreeBSD: <a class="reference external" href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221048">Bug 221048 - minor() truncates device number to
32 bits, whereas dev_t type was extended to 64 bits</a>.</p>
<p>In the meanwhile, I skipped test_posix.test_makedev() on FreeBSD if <tt class="docutils literal">dev_t</tt>
is larger than 32-bit.</p>
<p>Hopefully, the FreeBSD bug was quickly fixed!</p>
</div>
<div class="section" id="regrtest-snowball-effect-when-hunting-memory-leaks">
<h2>regrtest snowball effect when hunting memory leaks</h2>
<p>While trying to fix all reference leaks on the new Windows and Linux "Refleaks"
buildbots, I reported the bug <a class="reference external" href="https://bugs.python.org/issue31217">bpo-31217</a>:</p>
<pre class="literal-block">
test_code leaked [1, 1, 1] memory blocks, sum=3
</pre>
<p>Two weeks after reporting the bug, I was able to reproduce the bug, but <strong>only
with Python compiled in 32-bit mode</strong>. Strange.</p>
<p>I spent one day to understand the bug. I removed as much as possible while
making sure that I can still reproduce the bug. At the end, I wrote <a class="reference external" href="https://bugs.python.org/file47114/leak2.py">leak2.py</a> which reproduces the bug with a
single import: <tt class="docutils literal">import sys</tt>. Even if the script is only 86 lines long, I was
still unable to understand the bug.</p>
<p>My first hypothesis:</p>
<blockquote>
It seems like the "leak" is the call to <tt class="docutils literal">sys.getallocatedblocks()</tt> which
creates a new integer, and the integer is kept alive between two loop
iterations.</blockquote>
<p><strong>Antoine Pitrou</strong> rejected it:</p>
<blockquote>
I doubt it. If that was the case, the reference count would increase as
well.</blockquote>
<p>It was Antoine Pitrou who understood the bug:</p>
<pre class="literal-block">
Ahah.
Actually, it's quite simple :-) On 64-bit Python:
>>> id(82914 - 82913) == id(1)
True
On 32-bit Python:
>>> id(82914 - 82913) == id(1)
False
So the first non-zero alloc_delta really has a snowball effect, as it
creates new memory block which will produce a non-zero alloc_delta on the
next run, etc.
</pre>
<p>I implemented Antoine's idea to fix the bug, <a class="reference external" href="https://github.com/python/cpython/commit/6c2feabc5dac2f3049b15134669e9ad5af573193">commit</a>:</p>
<pre class="literal-block">
Use a pool of integer objects to prevent false alarm when checking for
memory block leaks. Fill the pool with values in -1000..1000 which
are the most common (reference, memory block, file descriptor)
differences.
Co-Authored-By: Antoine Pitrou <pitrou@free.fr>
</pre>
<p>The bug is probably as old as the code hunting memory leaks.</p>
</div>
<div class="section" id="bugfixes">
<h2>Bugfixes</h2>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue30891">bpo-30891</a>: Second fix for
importlib <tt class="docutils literal">_find_and_load()</tt> to handle correctly parallelism with threads.
Call <tt class="docutils literal">sys.modules.get()</tt> in the <tt class="docutils literal">with _ModuleLockManager(name):</tt> block to
protect the dictionary key with the module lock and use an atomic get to
prevent race conditions.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31019">bpo-31019</a>:
<tt class="docutils literal">multiprocessing.Process.is_alive()</tt> now removes the process from the
<tt class="docutils literal">_children set</tt> if the process completed. The change prevents leaking
"dangling" processes.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31326">bpo-31326</a>, <tt class="docutils literal">concurrent.futures</tt>:
<tt class="docutils literal">ProcessPoolExecutor.shutdown()</tt> now explicitly closes the call queue.
Moreover, <tt class="docutils literal">shutdown(wait=True)</tt> now also joins the call queue thread, to
prevent leaking a dangling thread.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31170">bpo-31170</a>: Update libexpat from
2.2.3 to 2.2.4: fix copying of partial characters for UTF-8 input (<a class="reference external" href="https://github.com/libexpat/libexpat/issues/115">libexpat
bug 115</a>). Later, I also
wrote non-regression tests for this bug (libexpat doesn't have any test
for this bug).</li>
<li><a class="reference external" href="https://bugs.python.org/issue31499">bpo-31499</a>, <tt class="docutils literal">xml.etree</tt>:
<tt class="docutils literal">xmlparser_gc_clear()</tt> now sets self.parser to <tt class="docutils literal">NULL</tt> to prevent a crash
in <tt class="docutils literal">xmlparser_dealloc()</tt> if <tt class="docutils literal">xmlparser_gc_clear()</tt> was called previously
by the garbage collector, because the parser was part of a reference cycle.
Fix co-written with <strong>Serhiy Storchaka</strong>.</li>
<li><a class="reference external" href="https://bugs.python.org/issue30892">bpo-30892</a>: Fix <tt class="docutils literal">_elementtree</tt>
module initialization (accelerator of <tt class="docutils literal">xml.etree</tt>), handle correctly
<tt class="docutils literal">getattr(copy, 'deepcopy')</tt> failure to not fail with an assertion error.</li>
</ul>
</div>
<div class="section" id="other-changes">
<h2>Other Changes</h2>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue30866">bpo-30866</a>: Add _testcapi.stack_pointer(). I used it to write the "Stack
consumption" section of a previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython
during 2017 Q1</a></li>
<li>_ssl_: Fix compiler warning. Cast Py_buffer.len (Py_ssize_t, signed) to
size_t (unsigned) to prevent the "comparison between signed and unsigned
integer expressions" warning.</li>
<li><a class="reference external" href="https://bugs.python.org/issue30486">bpo-30486</a>: Make cell_set_contents() symbol private. Don't export the
<tt class="docutils literal">cell_set_contents()</tt> symbol in the C API.</li>
</ul>
</div>
My contributions to CPython during 2017 Q3: Part 2 (dangling threads)2017-10-19T15:00:00+02:002017-10-19T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-10-19:/contrib-cpython-2017q3-part2.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3
(july, august, september), Part 2: "Dangling threads".</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</p>
<p>Next reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li>
</ul>
<p>Summary:</p>
<ul class="simple">
<li>Bugfixes: Reference cycles</li>
<li>socketserver leaking threads and processes<ul>
<li>test_logging random bug …</li></ul></li></ul><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3
(july, august, september), Part 2: "Dangling threads".</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</p>
<p>Next reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li>
</ul>
<p>Summary:</p>
<ul class="simple">
<li>Bugfixes: Reference cycles</li>
<li>socketserver leaking threads and processes<ul>
<li>test_logging random bug</li>
<li>Skip failing tests</li>
<li>Fix socketserver for processes</li>
<li>Fix socketserver for threads</li>
<li>Issue not done yet</li>
</ul>
</li>
<li>Environment altered and dangling threads<ul>
<li>Environment changed</li>
<li>test.support and regrtest enhancements</li>
<li>multiprocessing bug fixes</li>
<li>concurrent.futures bug fixes</li>
<li>test_threading and test_thread</li>
<li>Other fixes</li>
</ul>
</li>
</ul>
<div class="section" id="bugfixes-reference-cycles">
<h2>Bugfixes: Reference cycles</h2>
<p>While fixing "dangling threads" (see below), I found and fixed 4 reference
cycles which caused memory leaks and objects to live longer than expected. I
was surprised that the bug in the common <tt class="docutils literal">socket.create_connection()</tt>
function was not noticed before! So my work on dangling threads was useful!</p>
<p>The typical pattern of such reference cycle is:</p>
<pre class="literal-block">
def func():
err = None
try:
do_something()
except Exception as exc:
err = exc
if err is not None:
handle_error(exc)
# the exception is stored in the 'err' variable
func()
# surprise, surprise, the exception is still alive at this point!
</pre>
<p>Or the variant:</p>
<pre class="literal-block">
def func():
try:
do_something()
except Exception as exc:
exc_info = sys.exc_info()
handle_error(exc_info)
# the exception is stored in the 'exc_info' variable
func()
# surprise, surprise, the exception is still alive at this point!
</pre>
<p>It's not easy to spot the bug, the bug is subtle. An exception object in Python
3 has a <tt class="docutils literal">__traceback__</tt> attribute which contains frames. If a frame stores
the exception in a variable, like <tt class="docutils literal">err</tt> in the first example, or <tt class="docutils literal">exc_info</tt>
in the second example, a cycle exists between the exception and frames. In this
case, the exception, the traceback, the frames, <strong>and all variables of all
frames are kept alive</strong> by the reference cycle, <strong>until the cycle is break by
the garbage collector</strong>.</p>
<p>The problem is that the garbage collector is only called infrequently, so the
cycle may stay alive for a long time.</p>
<p>Sometimes, the reference cycle is even more subtle than the simple examples
above.</p>
<p>Fixed reference cycles:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>,
<tt class="docutils literal">socket.create_connection()</tt>: Fix reference cycle.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31247">bpo-31247</a>: <tt class="docutils literal">xmlrpc.server</tt> now explicitly breaks reference cycles when using
<tt class="docutils literal">sys.exc_info()</tt> in code handling exceptions.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31249">bpo-31249</a>, <tt class="docutils literal">concurrent.futures</tt>:
<tt class="docutils literal">WorkItem.run()</tt> used by ThreadPoolExecutor now explicitly breaks a
reference cycle between an exception object and the <tt class="docutils literal">WorkItem</tt> object.
<tt class="docutils literal">ThreadPoolExecutor.shutdown()</tt> now also clears its threads set.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31238">bpo-31238</a>: <tt class="docutils literal">pydoc</tt>:
<tt class="docutils literal">ServerThread.stop()</tt> now joins itself to wait until
<tt class="docutils literal">DocServer.serve_until_quit()</tt> completes and then explicitly sets its
docserver attribute to None to break a reference cycle. This change was made
to fix <tt class="docutils literal">test_doc</tt>.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31323">bpo-31323</a>: Fix reference leak in
test_ssl. Store exceptions as string rather than object to prevent reference
cycles which cause leaking dangling threads.</li>
</ul>
<p>I also started a discussion on reference cycles caused by exceptions:
<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-September/149586.html">[Python-Dev] Evil reference cycles caused Exception.__traceback__</a>.
Sadly, no action was taken, no obvious solution was found.</p>
<p>I found the <tt class="docutils literal">socket.create_connection()</tt> reference cycle because of an
unrelated change in test.support:</p>
<pre class="literal-block">
bpo-29639: change test.support.HOST to "localhost"
</pre>
<p>Read <a class="reference external" href="https://bugs.python.org/issue29639#msg302087">my message</a> on bpo-29639
for the full story. Extract:</p>
<blockquote>
Modifying support.HOST to "localhost" triggered a reference cycle!?</blockquote>
</div>
<div class="section" id="socketserver-leaking-threads-and-processes">
<h2>socketserver leaking threads and processes</h2>
<div class="section" id="test-logging-random-bug">
<h3>test_logging random bug</h3>
<p>This story starts at July, 3, with test_logging failing randomly on FreeBSD,
<a class="reference external" href="https://bugs.python.org/issue30830">bpo-30830</a>:</p>
<pre class="literal-block">
test_output (test.test_logging.HTTPHandlerTest) ... ok
Warning -- threading_cleanup() failed to cleanup -1 threads after 3 sec (count: 0, dangling: 1)
</pre>
<p>I failed to reproduce the bug on my FreeBSD VM, nor on Linux. The bug only
occurred on one specific FreeBSD buildbot. I even got access to the buildbot...
and I still failed to reproduce the bug! I tried to run test_logging multiple
times in parallel, increase the system load, etc. I felt disappointed. I used
my <tt class="docutils literal">system_load.py</tt> script which spawns Python processes running <tt class="docutils literal">while 1:
pass</tt> to stress the CPU.</p>
<p>After one month, I succeeded to reproduce the bug by running two commands in
parallel.</p>
<p>Command 1 to trigger the bug:</p>
<pre class="literal-block">
./python -m test -v test_logging \
--fail-env-changed \
--forever \
-m test.test_logging.DatagramHandlerTest.test_output \
-m test.test_logging.ConfigDictTest.test_listen_config_10_ok \
-m test.test_logging.SocketHandlerTest.test_output
</pre>
<p>Command 2 to stress the system:</p>
<pre class="literal-block">
./python -m test -j4
</pre>
<p>It seems like the Python test suite is a very good tool to stress a system to
trigger a race condition!</p>
<p>Finally, I was able to identify the bug:</p>
<blockquote>
The problem is that <tt class="docutils literal">socketserver.ThreadingMixIn</tt> spawns threads without
waiting for their completion in server_close().</blockquote>
</div>
<div class="section" id="skip-failing-tests">
<h3>Skip failing tests</h3>
<p>To stabilize the buildbots and to be able to work on other bugs, I decided to
first skip all tests using <tt class="docutils literal">socketserver.ThreadingMixIn</tt> until this class was
fixed to prevent "dangling threads".</p>
</div>
<div class="section" id="fix-socketserver-for-processes">
<h3>Fix socketserver for processes</h3>
<p>While trying to see how to fix <tt class="docutils literal">socketserver.ThreadingMixIn</tt>, I understood
that <a class="reference external" href="https://bugs.python.org/issue31151">bpo-31151</a> was a similar bug in
the <tt class="docutils literal">socketserver</tt> module but for processes:</p>
<pre class="literal-block">
test_ForkingUDPServer (test.test_socketserver.SocketServerTest) ... creating server
(...)
Warning -- reap_children() reaped child process 18281
</pre>
<p>My analysis:</p>
<blockquote>
The problem is that <tt class="docutils literal">socketserver.ForkinMixin</tt> doesn't wait until all
children completes. It only calls <tt class="docutils literal">os.waitpid()</tt> in non-blocking module
(using <tt class="docutils literal">os.WNOHANG</tt>) after each loop iteration. If a child process
completes after the last call to <tt class="docutils literal">ForkingMixIn.collect_children()</tt>, the
server leaks zombie processes.</blockquote>
<p>I fixed <tt class="docutils literal">socketserver.ForkingMixIn</tt> by modifying the <tt class="docutils literal">server_close()</tt>
method to <strong>block</strong> until all child processes complete: <a class="reference external" href="https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049">commit</a>.</p>
<p>Just after pushing my fix, I understood that my fix changed the
<tt class="docutils literal">ForkingMixIn</tt> behaviour. I wrote an email to ask if it's the good behaviour
or if a change was needed: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-August/148826.html">[Python-Dev] socketserver ForkingMixin waiting for
child processes</a>.
The answer is that not everybody wants this behaviour. Sadly, I didn't have
time yet to let the user chooses the behaviour.</p>
</div>
<div class="section" id="fix-socketserver-for-threads">
<h3>Fix socketserver for threads</h3>
<p>Fixing <tt class="docutils literal">socketserver.ForkinMixin</tt> was simple because the code already tracked
the (identifier of) child processes and already had code to wait for child
completion.</p>
<p>Fixing <tt class="docutils literal">socketserver.ThreadingMixIn</tt> (<a class="reference external" href="https://bugs.python.org/issue31233">bpo-31233</a>) was more complicated since it didn't
keep track of spawned threads.</p>
<p>I chose to keep a list of <tt class="docutils literal">threading.Thread</tt> objects, but only for
non-daemonic threads. <tt class="docutils literal">socketserver.ThreadingMixIn.server_close()</tt> now joins
all threads: <a class="reference external" href="https://github.com/python/cpython/commit/b8f4163da30e16c7cd58fe04f4b17e38d53cd57e">commit</a>.</p>
</div>
<div class="section" id="issue-not-done-yet">
<h3>Issue not done yet</h3>
<p>As I wrote above, the <tt class="docutils literal">socketserver</tt> still needs to be reworked to let the
user decides if the server must gracefully wait for child completion or not.
Maybe expose also a method to explicitly wait for children, maybe with a
timeout?</p>
</div>
</div>
<div class="section" id="environment-altered-and-dangling-threads">
<h2>Environment altered and dangling threads</h2>
<p>This part kept me busy for the whole quarter. While trying to fix "all bugs", I
looked at two specific "environment changes": "dangling threads" and "zombie
processes". A dangling thread comes from a test spawning a thread but doesn't
proper "clean" the thread.</p>
<p>Leaking threads or processes is a very bad side effect since it is likely to
cause random bugs in following tests.</p>
<p>At the beginning, I expected that only 2 or 3 bugs should be fixed. At the end,
it was closer to 100 bugs. I don't regret, I'm now sure that I made the Python
test suite more reliable, and this work allowed me to catch <strong>and fix</strong> old
reference cycles bugs (see above).</p>
<div class="section" id="environment-changed">
<h3>Environment changed</h3>
<p>To detect bugs, I modified Travis CI jobs, AppVeyor and buildbots to run tests
with <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt>. With this option, if a test alters the
environment, the full test suite is marked as failed with "ENV_CHANGED".</p>
<p>I also fixed <tt class="docutils literal">python3 <span class="pre">-m</span> test <span class="pre">--fail-env-changed</span> <span class="pre">--forever</span></tt> in <a class="reference external" href="https://bugs.python.org/issue30764">bpo-30764</a>: --forever now stops if a test alters
the environment.</p>
</div>
<div class="section" id="test-support-and-regrtest-enhancements">
<h3>test.support and regrtest enhancements</h3>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue30845">bpo-30845</a>: reap_children() now logs
warnings.</li>
<li><tt class="docutils literal">support.reap_children()</tt> now sets environment_altered to <tt class="docutils literal">True</tt> if a
test leaked a zombie process, to detect bugs using <tt class="docutils literal">python3 <span class="pre">-m</span> test
<span class="pre">--fail-env-changed</span></tt>.</li>
<li>regrtest: count also "env changed" tests as failed tests in the test
progress.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>:
<tt class="docutils literal">support.threading_cleanup()</tt> now emits a warning immediately if there are
threads running in the background, to be able to catch bugs more easily.
Previously, the warning was only emitted if the function failed to cleanup
these threads after 1 second.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Add
<tt class="docutils literal">test.support.wait_threads_exit()</tt>. Use <tt class="docutils literal">_thread.count()</tt> to wait until
threads exit. The new context manager prevents the "dangling thread" warning.
Add also <tt class="docutils literal">support.join_thread()</tt> helper: joins a thread but raises an
AssertionError if the thread is still alive after <em>timeout</em> seconds.</li>
</ul>
</div>
<div class="section" id="multiprocessing-bug-fixes">
<h3>multiprocessing bug fixes</h3>
<p>The multiprocessing module is very complex. multiprocessing tests are failing
randomly for years, but nobody seems able to fix them. I can only hope that my
following fixes will help to make these tests more reliable.</p>
<ul class="simple">
<li>multiprocessing.Queue.join_thread() now waits until the thread
completes, even if the thread was started by the same process which
created the queue.</li>
<li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: Avoid daemon processes in _test_multiprocessing. test_level() of
_test_multiprocessing._TestLogging now uses regular processes rather than
daemon processes to prevent zombi processes (to not "leak" processes).</li>
<li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: Fix more dangling processes and threads in test_multiprocessing.
Queue: call close() followed by join_thread(). Process: call join() or
self.addCleanup(p.join).</li>
<li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: test_multiprocessing now detects dangling processes and threads
per test case classes.</li>
<li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: test_multiprocessing close more queues. Close explicitly queues to
make sure that we don't leave dangling threads. test_queue_in_process():
remove unused queue. test_access() joins also the process to fix a random
warning.</li>
<li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: _test_multiprocessing now marks the test as ENV_CHANGED on
dangling process or thread.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31069">bpo-31069</a>, Fix a warning about dangling processes in test_rapid_restart() of
_test_multiprocessing: join the process.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>, test_multiprocessing:
Give 30 seconds to join_process(), instead of 5 or 10 seconds, to wait until
the process completes.</li>
</ul>
</div>
<div class="section" id="concurrent-futures-bug-fixes">
<h3>concurrent.futures bug fixes</h3>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue30845">bpo-30845</a>: Enhance test_concurrent_futures cleanup. Make sure that tests
don't leak threads nor processes. Clear explicitly the reference to the
executor to make sure that it's destroyed.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31249">bpo-31249</a>: test_concurrent_futures checks dangling threads. Add a
BaseTestCase class to test_concurrent_futures to check for dangling threads
and processes on all tests, not only tests using ExecutorMixin.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31249">bpo-31249</a>: Fix test_concurrent_futures dangling thread.
ProcessPoolShutdownTest.test_del_shutdown() now closes the call queue and
joins its thread, to prevent leaking a dangling thread.</li>
</ul>
</div>
<div class="section" id="test-threading-and-test-thread">
<h3>test_threading and test_thread</h3>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: test_threaded_import: fix
test_side_effect_import(). Don't leak the module into sys.modules. Avoid
also dangling threads.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>:
test_thread.test_forkinthread() now waits until the thread completes.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Try to fix the
threading_cleanup() warning in test.lock_tests: wait a little bit longer to
give time to the threads to complete. Warning seen on test_thread and
test_importlib.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in test_threading. Call thread.join() to prevent the
"dangling thread" warning.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join timers in
test_threading. Call the .join() method of threading.Timer timers to prevent
the threading_cleanup() warning.</li>
</ul>
</div>
<div class="section" id="other-fixes">
<h3>Other fixes</h3>
<ul class="simple">
<li>test_urllib2_localnet: clear server variable. Set the server attribute to
None in cleanup to avoid dangling threads.</li>
<li><a class="reference external" href="https://bugs.python.org/issue30818">bpo-30818</a>: test_ftplib calls asyncore.close_all(). Always clear asyncore
socket map using asyncore.close_all(ignore_all=True) in tearDown() method.</li>
<li><a class="reference external" href="https://bugs.python.org/issue30908">bpo-30908</a>: Fix dangling thread in test_os.TestSendfile. tearDown() now clears
explicitly the self.server variable to make sure that the thread is
completely cleared when tearDownClass() checks if all threads have been
cleaned up.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31067">bpo-31067</a>: test_subprocess now also calls reap_children() in tearDown(), not
only on setUp().</li>
<li><a class="reference external" href="https://bugs.python.org/issue31160">bpo-31160</a>: Fix test_builtin for zombie process. PtyTests.run_child() now calls
os.waitpid() to read the exit status of the child process to avoid creating
zombie process and leaking processes in the background.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31160">bpo-31160</a>: Fix test_random for zombie process. TestModule.test_after_fork()
now calls os.waitpid() to read the exit status of the child process to avoid
creating a zombie process.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31160">bpo-31160</a>: test_tempfile: TestRandomNameSequence.test_process_awareness() now
calls os.waitpid() to avoid leaking a zombie process.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: fork_wait.py tests now joins threads, to not leak running threads
in the background.</li>
<li><a class="reference external" href="https://bugs.python.org/issue30830">bpo-30830</a>: test_logging uses threading_setup/cleanup. Replace
@support.reap_threads on some methods with support.threading_setup() in
setUp() and support.threading_cleanup() in tearDown() in BaseTest.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: test_httpservers joins the server thread.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31250">bpo-31250</a>, test_asyncio: fix dangling threads. Explicitly call
shutdown(wait=True) on executors to wait until all threads complete to
prevent side effects between tests. Fix test_loop_self_reading_exception():
don't mock loop.close(). Previously, the original close() method was called
rather than the mock, because how set_event_loop() registered loop.close().</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Explicitly clear the server attribute in test_ftplib and
test_poplib to prevent dangling thread. Clear also self.server_thread
attribute in TestTimeouts.tearDown().</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in tests. Call thread.join() on threads to prevent
the "dangling threads" warning.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in test_hashlib: use thread.join() to wait until the
parallel hash tasks complete rather than using events. Calling thread.join()
prevent "dangling thread" warnings.</li>
<li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in test_queue. Call thread.join() to prevent the
"dangling thread" warning.</li>
</ul>
<p><strong>Next report:</strong> <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny
bugs)</a>.</p>
</div>
</div>
My contributions to CPython during 2017 Q3: Part 12017-10-18T15:00:00+02:002017-10-18T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-10-18:/contrib-cpython-2017q3-part1.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3
(july, august, september), Part 1.</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part1)</a>.</p>
<p>Next reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling
threads)</a>.</li>
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li>
</ul>
<p>Summary:</p>
<ul class="simple">
<li>Statistics</li>
<li>Security fixes …</li></ul><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3
(july, august, september), Part 1.</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part1)</a>.</p>
<p>Next reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling
threads)</a>.</li>
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li>
</ul>
<p>Summary:</p>
<ul class="simple">
<li>Statistics</li>
<li>Security fixes</li>
<li>Enhancement: socket.close() now ignores ECONNRESET</li>
<li>Removal of the macOS job of Travis CI</li>
<li>New test.pythoninfo utility</li>
<li>Revert commits if buildbots are broken</li>
<li>Fix the Python test suite</li>
</ul>
<div class="section" id="statistics">
<h2>Statistics</h2>
<pre class="literal-block">
# All branches
$ git log --after=2017-06-30 --before=2017-10-01 --reverse --branches='*' --author=Stinner|grep '^commit ' -c
209
# Master branch only
$ git log --after=2017-06-30 --before=2017-10-01 --reverse --author=Stinner origin/master|grep '^commit ' -c
97
</pre>
<p>Statistics: I pushed <strong>97</strong> commits in the master branch on a <strong>total of 209
commits</strong>, remaining: 112 commits in the other branches (backports, fixes
specific to Python 2.7, security fixes in Python 3.3 and 3.4, etc.)</p>
</div>
<div class="section" id="security-fixes">
<h2>Security fixes</h2>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue30947">bpo-30947</a>: Update libexpat from 2.2.1 to 2.2.3. Fix applied to master, 3.6,
3.5, 3.4, 3.3 and 2.7 branches! Expat 2.2.2 and 2.2.3 fixed multiple security
vulnerabilities.
<a class="reference external" href="http://python-security.readthedocs.io/vuln/expat_2.2.3.html">http://python-security.readthedocs.io/vuln/expat_2.2.3.html</a></li>
<li>Fix whichmodule() of _pickle: : _PyUnicode_FromId() can return NULL, replace
Py_INCREF() with Py_XINCREF(). Fix coverity report: CID 1417269.</li>
<li><a class="reference external" href="https://bugs.python.org/issue30860">bpo-30860</a>: <tt class="docutils literal">_PyMem_Initialize()</tt> contains code which is never executed.
Replace the runtime check with a build assertion. Fix Coverity CID 1417587.</li>
</ul>
<p>See also my <a class="reference external" href="http://python-security.readthedocs.io/">python-security website</a>.</p>
</div>
<div class="section" id="enhancement-socket-close-now-ignores-econnreset">
<h2>Enhancement: socket.close() now ignores ECONNRESET</h2>
<p><a class="reference external" href="https://bugs.python.org/issue30319">bpo-30319</a>: socket.close() now ignores ECONNRESET. Previously, many network
tests failed randomly with ConnectionResetError on socket.close().</p>
<p>Patching all functions calling socket.close() would require a lot of work, and
it was surprising to get a "connection reset" when closing a socket.</p>
<p>Who cares that the peer closed the connection, since we are already closing
it!?</p>
<p>Note: socket.close() was modified in Python 3.6 to raise OSError on failure
(<a class="reference external" href="https://bugs.python.org/issue26685">bpo-26685</a>).</p>
</div>
<div class="section" id="removal-of-the-macos-job-of-travis-ci">
<h2>Removal of the macOS job of Travis CI</h2>
<a class="reference external image-reference" href="https://travis-ci.org/"><img alt="call_method microbenchmark" class="align-right" src="https://vstinner.github.io/images/travis-ci.png" /></a>
<p>While the Linux jobs of Travis CI usually takes 15 minutes, up to 30 minutes in
the worst case, the macOS job of Travis CI regulary took longer than 30
minutes, sometimes longer than 1 hour.</p>
<p>While the macOS job was optional, sometimes it gone mad and prevented a PR to
be merged. Cancelling the job marked Travis CI as failed on a PR, so it was
still not possible to merge the PR, whereas, again, the job is marked as
optional ("Allowed Failure").</p>
<p>Moreover, when the macOS job failed, the failure was not reported on the PR,
since the job was marked as optional. The only way to notify a failure was to
go to Travis CI and wait at least 30 minutes (whereas the Linux jobs already
completed and it was already possible merge a PR...).</p>
<p>I sent a first mail in June: <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-June/004661.html">[python-committers] macOS Travis CI job became
mandatory?</a></p>
<p>In september, we decided to remove the macOS job during the CPython sprint at
Instagram (see my previous <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">New C API</a>
article), to not slowdown our development speed (<a class="reference external" href="https://bugs.python.org/issue31355">bpo-31355</a>). I sent another
email to announce the change: <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-September/004824.html">[python-committers] Travis CI: macOS is now
blocking -- remove macOS from Travis CI?</a>.</p>
<p>After the sprint, it was decided to not add again the macOS job, since we have
3 macOS buildbots. It's enough to detect regressions specific to macOS.</p>
<p>After the removal of the macOS end, at the end of september, Travis CI
published an article about the bad performances of their macOS fleet: <a class="reference external" href="https://blog.travis-ci.com/2017-09-22-macos-update">Updating
Our macOS Open Source Offering</a>. Sadly, the article
confirms that the situation is not going to evolve quickly.</p>
</div>
<div class="section" id="new-test-pythoninfo-utility">
<h2>New test.pythoninfo utility</h2>
<p>To understand the "Segfault when readline history is more then 2 * history
size" crash of <a class="reference external" href="https://bugs.python.org/issue29854">bpo-29854</a>, I modified
<tt class="docutils literal">test_readline</tt> to log libreadline versions. I also added
<tt class="docutils literal">readline._READLINE_LIBRARY_VERSION</tt>. My colleague <strong>Nir Soffer</strong> wrote the
final readline fix: skip the test on old readline versions.</p>
<p>As a follow-up of this issue, I added a new <tt class="docutils literal">test.pythoninfo</tt> program to log
many information to debug Python tests (<a class="reference external" href="https://bugs.python.org/issue30871">bpo-30871</a>). pythoninfo is now run on
Travis CI, AppVeyor and buildbots.</p>
<p>Example of output:</p>
<pre class="literal-block">
$ ./python -m test.pythoninfo
(...)
_decimal.__libmpdec_version__: 2.4.2
expat.EXPAT_VERSION: expat_2.2.4
gdb_version: GNU gdb (GDB) Fedora 8.0.1-26.fc26
locale.encoding: UTF-8
os.cpu_count: 4
(...)
time.timezone: -3600
time.tzname: ('CET', 'CEST')
tkinter.TCL_VERSION: 8.6
tkinter.TK_VERSION: 8.6
tkinter.info_patchlevel: 8.6.6
zlib.ZLIB_RUNTIME_VERSION: 1.2.11
zlib.ZLIB_VERSION: 1.2.11
</pre>
<p><tt class="docutils literal">test.pythoninfo</tt> can be easily extended to log more information, without
polluting the output of the Python test suite which is already too verbose and
very long.</p>
</div>
<div class="section" id="revert-commits-if-buildbots-are-broken">
<h2>Revert commits if buildbots are broken</h2>
<p>Thanks to my work done last months on the Python test suite, the buildbots are
now very reliable. When a buildbot fails, it becomes very likely that it's a
real regression, and not a random failure caused by a bug in the Python test
suite.</p>
<p>I proposed a new rule: <strong>revert a change if it breaks builbots and the but
cannot be fixed easily</strong>:</p>
<blockquote>
<p>So I would like to set a new rule: if I'm unable to fix buildbots
failures caused by a recent change quickly (say, in less than 2
hours), I propose to revert the change.</p>
<p>It doesn't mean that the commit is bad and must not be merged ever.
No. It would just mean that we need time to work on fixing the issue,
and it shouldn't impact other pending changes, to keep a sane master
branch.</p>
</blockquote>
<p><a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-June/004588.html">[python-committers] Revert changes which break too many buildbots</a>.</p>
<div class="section" id="test-datetime">
<h3>test_datetime</h3>
<p>The first revert was an enhancement of test_datetime, <a class="reference external" href="https://bugs.python.org/issue30822">bpo-30822</a>:</p>
<pre class="literal-block">
commit 98b6bc3bf72532b784a1c1fa76eaa6026a663e44
Author: Utkarsh Upadhyay <mail@musicallyut.in>
Date: Sun Jul 2 14:46:04 2017 +0200
bpo-30822: Fix testing of datetime module. (#2530)
Only C implementation was tested.
</pre>
<p>I wrote an email to announce the revert: <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-July/004673.html">[python-committers] Revert changes
which break too many buildbots</a>.</p>
<p>It took 15 days to decide how to fix properly the issue (exclude <tt class="docutils literal">tzdata</tt>
from test resources). I don't regret my revert, since having broken buildbots
for 15 days would be very annoying.</p>
</div>
<div class="section" id="python-gdb-py-fix">
<h3>python-gdb.py fix</h3>
<p>I also reverted this commit of <a class="reference external" href="https://bugs.python.org/issue30983">bpo-30983</a>:</p>
<pre class="literal-block">
commit 2e0f4db114424a00354eab889ba8f7334a2ab8f0
Author: Bruno "Polaco" Penteado <polaco@gmail.com>
Date: Mon Aug 14 23:14:17 2017 +0100
bpo-30983: eval frame rename in pep 0523 broke gdb's python extension (#2803)
pep 0523 renames PyEval_EvalFrameEx to _PyEval_EvalFrameDefault while the gdb python extension only looks for PyEval_EvalFrameEx to understand if it is dealing with a frame.
Final effect is that attaching gdb to a python3.6 process doesnt resolve python objects. Eg. py-list and py-bt dont work properly.
This patch fixes that. Tested locally on python3.6
</pre>
<p>My comment on the issue:</p>
<blockquote>
<p>I chose to revert the change because I don't have the bandwidth right now
to investigate why the change broke test_gdb.</p>
<p>I'm surprised that a change affecting python-gdb.py wasn't properly tested
manually using test_gdb.py :-( I understand that Travis CI doesn't have gdb
and/or that the test pass in some cases?</p>
<p>The revert only gives us more time to design the proper solution.</p>
</blockquote>
<p>Hopefully, a new fixed commit was pushed 4 days later and this one didn't break
buildbots!</p>
</div>
</div>
<div class="section" id="fix-the-python-test-suite">
<h2>Fix the Python test suite</h2>
<p>As usual, I spent a significant part of my time to fix bugs in the Python test
suite to make it more reliable and more "usable".</p>
<ul>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue30822">bpo-30822</a>: Exclude <tt class="docutils literal">tzdata</tt> from <tt class="docutils literal">regrtest <span class="pre">--all</span></tt>. When running the test suite
using <tt class="docutils literal"><span class="pre">--use=all</span></tt> / <tt class="docutils literal"><span class="pre">-u</span> all</tt>, exclude <tt class="docutils literal">tzdata</tt> since it makes
test_datetime too slow (15-20 min on some buildbots, just this single test
file) which then times out on some buildbots. <tt class="docutils literal"><span class="pre">-u</span> tzdata</tt> must now be
enabled explicitly.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue30188">bpo-30188</a>, test_nntplib: Catch also
ssl.SSLEOFError in NetworkedNNTPTests.setUpClass(), not only EOFError.
(<em>Sadly, test_nntplib still fails randomly with EOFError or SSLEOFError...</em>)</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31009">bpo-31009</a>: Fix
<tt class="docutils literal">support.fd_count()</tt> on Windows. Call <tt class="docutils literal">msvcrt.CrtSetReportMode()</tt> to not
kill the process nor log any error on stderr on os.dup(fd) if the file
descriptor is invalid.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31034">bpo-31034</a>: Reliable signal handler for test_asyncio. Don't rely on the
current SIGHUP signal handler, make sure that it's set to the "default"
signal handler: SIG_DFL. A colleague reported me that the Python test suite
hangs on running test_subprocess_send_signal() of test_asyncio. After
analysing the issue, it seems like the test hangs because the RPM package
builder ignores SIGHUP.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31028">bpo-31028</a>: Fix test_pydoc when run
directly. Fix <tt class="docutils literal">get_pydoc_link()</tt>: get the absolute path to <tt class="docutils literal">__file__</tt> to
prevent relative directories.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31066">bpo-31066</a>: Fix
<tt class="docutils literal">test_httpservers.test_last_modified()</tt>. Write the temporary file on disk
and then get its modification time.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31173">bpo-31173</a>: Rewrite WSTOPSIG test of test_subprocess.</p>
<p>The current <tt class="docutils literal">test_child_terminated_in_stopped_state()</tt> function test creates a
child process which calls <tt class="docutils literal">ptrace(PTRACE_TRACEME, 0, 0)</tt> and then crash
(SIGSEGV). The problem is that calling <tt class="docutils literal">os.waitpid()</tt> in the parent process is
not enough to close the process: the child process remains alive and so the
unit test leaks a child process in a strange state. Closing the child process
requires non-trivial code, maybe platform specific.</p>
<p>Remove the functional test and replaces it with an unit test which mocks
<tt class="docutils literal">os.waitpid()</tt> using a new <tt class="docutils literal">_testcapi.W_STOPCODE()</tt> function to test the
<tt class="docutils literal">WIFSTOPPED()</tt> path.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31008">bpo-31008</a>: Fix asyncio
test_wait_for_handle on Windows, tolerate a difference of 50 ms.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31235">bpo-31235</a>: Fix ResourceWarning in
test_logging: always close all asyncore dispatchers (ignoring errors if any).</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue30121">bpo-30121</a>: Add test_subprocess.test_nonexisting_with_pipes(). Test the Popen
failure when Popen was created with pipes. Create also NONEXISTING_CMD
variable in test_subprocess.py.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31250">bpo-31250</a>, test_asyncio: fix EventLoopTestsMixin.tearDown(). Call
doCleanups() to close the loop after calling executor.shutdown(wait=True).</p>
</li>
<li><p class="first">test_ssl: Implement timeout in ssl_io_loop(). The timeout parameter was not
used.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31448">bpo-31448</a>, test_poplib: Call POP3.close(), don't close close directly the
sock attribute to fix a ResourceWarning.</p>
</li>
<li><p class="first">os.test_utime_current(): tolerate 50 ms delta.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31135">bpo-31135</a>: ttk: fix LabeledScale and OptionMenu destroy() method. Call the
parent destroy() method even if the used attribute doesn't exist. The
LabeledScale.destroy() method now also explicitly clears label and scale
attributes to help the garbage collector to destroy all widgets.</p>
</li>
<li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31479">bpo-31479</a>: Always reset the signal alarm in tests. Use
the <tt class="docutils literal">try: ... finally: signal.signal(0)</tt> pattern to make sure that tests
don't "leak" a pending fatal signal alarm. Move some signal.alarm() calls
into the try block.</p>
</li>
</ul>
<p><strong>Next report:</strong> <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling
threads)</a>.</p>
</div>
Python Security2017-09-15T22:00:00+02:002017-09-15T22:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-09-15:/python-security.html<p>I am working on the Python security for years, but I never wrote anything about
that. Let's fix this!</p>
<div class="section" id="psrt">
<h2>PSRT</h2>
<p>I am part of the Python Security Response Team (PSRT): I get emails sent to
<a class="reference external" href="mailto:security@python.org">security@python.org</a>. I try to analyze each report to validate that the bug
is …</p></div><p>I am working on the Python security for years, but I never wrote anything about
that. Let's fix this!</p>
<div class="section" id="psrt">
<h2>PSRT</h2>
<p>I am part of the Python Security Response Team (PSRT): I get emails sent to
<a class="reference external" href="mailto:security@python.org">security@python.org</a>. I try to analyze each report to validate that the bug
is reproductible, find impacted Python versions and start to discuss how to fix
the vulnerability. In some cases, the reported issue is not a security
vulnerability, is not related to CPython, or sometimes is already fixed. We
also get reports about CPython, but also the web sites and other projects
related to Python.</p>
<p>Warning: I don't represent the PSRT, I speak for my own!</p>
</div>
<div class="section" id="vulnerabilities-sent-to-psrt">
<h2>Vulnerabilities sent to PSRT</h2>
<p>In this article, I will focus on vulnerabilities impacting CPython: the C and
Python code of CPython core and the standard library.</p>
<p>When vulnerabilities are obvious bugs, they are quickly fixed. Done.</p>
<p>But it's not uncommon that fixing a vulnerability impacts the backward
compatibility which is a major concern of CPython core developers. There is
also a risk of rejecting legit input data because the added checks are too
strict. We have to be very careful and so fixing vulnerabilities can take
weeks, if not months in the worst case.</p>
<p>While CPython has few active core developers, the PSRT has even lesser active
members to handle incoming reports. We are volunteers, so please be kind and
patient...</p>
</div>
<div class="section" id="example-of-a-complex-fix">
<h2>Example of a complex fix</h2>
<p>The <a class="reference external" href="https://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html">urllib FTP protocol stream injection</a>
vulnerability was reported to the PSRT at 2016-01-15. The fix was only merged
at 2017-07-26.</p>
<p>First, it was not obvious how the vulnerability can be exploited, nor if it
should be fixed.</p>
<p>Then it was not obvious if the vulnerability should be fixed in the urllib
module or in the ftplib module.</p>
<p>Even if the bug was public, it didn't get much attention. Since I don't know
well how the urllib module, I wrote an email to the python-dev mailing
list: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-July/148699.html">Need help to fix urllib(.parse) vulnerabilities</a>.</p>
<p>I proposed a fix for the urllib module: <a class="reference external" href="https://bugs.python.org/issue30713">Reject newline character (U+000A) in
URLs in urllib.parse</a>. But it was
rejected, since it was the wrong approach and my checks were too strict in many
cases (rejected legit requests).</p>
<p>The final fix rejects <tt class="docutils literal">\b</tt> and <tt class="docutils literal">\r</tt> newline characters in the putline()
method of the ftplib module.</p>
</div>
<div class="section" id="track-known-and-fixed-cpython-vulnerabilities">
<h2>Track known and fixed CPython vulnerabilities</h2>
<p>Currently, not least that six branches still get security fixes!</p>
<ul class="simple">
<li>Python 2.7</li>
<li>Python 3.3</li>
<li>Python 3.4</li>
<li>Python 3.5</li>
<li>Python 3.6</li>
<li>master: the development branch</li>
</ul>
<p>Last year, I added a table to the Python developer guide to help me to track
the status of each branch: see the <a class="reference external" href="https://devguide.python.org/#status-of-python-branches">Status of Python branches</a>.</p>
<p>This year, I created a tool to help me to track known CPython vulnerabilities:
<a class="reference external" href="https://github.com/vstinner/python-security">python-security project</a> (hosted
at GitHub). The <a class="reference external" href="https://github.com/vstinner/python-security/blob/master/vulnerabilities.yaml">vulnerabilities.yaml file</a>
is a YAML file with one section per vulnerability. Each vulnerability has
a title, link to the Python bug, disclosure date, reported date, commits, etc.</p>
<p>The tool gets the date of commits and the Git tags which contains the commit
to infer the first Python versions of each branch which contain the fix. It
also build a timeline to help to understand how the vulnerability was handled.</p>
<p>I also wanted to be more transparent on how we handle vulnerabilities and our
velocity to fix them.</p>
<p>Honestly, I was disappointed that it took so long to fix some vulnerabilities
in the past. Hopefully, it seems like we are more reactive nowadays!</p>
</div>
<div class="section" id="example-of-a-fixed-vulnerability">
<h2>Example of a fixed vulnerability</h2>
<p>Example: <a class="reference external" href="https://python-security.readthedocs.io/vuln/cve-2016-5699_http_header_injection.html">CVE-2016-5699: HTTP header injection</a>.</p>
<p>Right now, Python 3.3 is still vulnerable (my fix was commited, I am now
waiting Python 3.3.7 which is coming at the end of september).</p>
<p>Since the vulnerability was reported, it took 108 days to merge the fix, 72
more days (total 180 days) for the first release including the fix (Python
2.7.10).</p>
<p>Sadly, the PSRT doesn't compute a severity of vulnerabilities yet.</p>
<p>Hopefully, for this vulnerability, web frameworks were able to workaround the
vulnerability by input sanitization.</p>
</div>
<div class="section" id="backport-all-fixes">
<h2>Backport all fixes</h2>
<p>Last months, I backported fixes to the six branches which still accept security
fixes, to respect the contract with our users: we are doing our best to protect
you!</p>
<p>The good news is that with Python 2.7.14 and Python 3.3.7 releases scheduled
this month, all major security vulnerabilities will be fixed in all maintained
Python branches!</p>
<p>Some fixes were not backported on purpose. For example, the <a class="reference external" href="https://python-security.readthedocs.io/vuln/cve-2013-7040_hash_not_properly_randomized.html#cve-2013-7040-hash-not-properly-randomized">CVE-2013-7040:
Hash not properly randomized</a>
vulnerability requires to change the hash algorithm and we decided to not touch
Python 2.7 and 3.3 for backward compatibility reasons (don't break code relying
on the exact hash function). The issue was fixed in Python 3.4 by using the
SipHash hash algorithm which uses a hash secret (generated randomly by Python
at startup).</p>
</div>
<div class="section" id="python-security-documentation">
<h2>Python security documentation</h2>
<p>Last months, I also started to collect random notes about the Python security.</p>
<p>Explore my <a class="reference external" href="https://python-security.readthedocs.io/">python-security.readthedocs.io</a> documentation and send me feedback!</p>
</div>
A New C API for CPython2017-09-07T18:00:00+02:002017-09-07T18:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-09-07:/new-python-c-api.html<p>I am currently at a CPython sprint 2017 at Facebook. We are discussing my idea
of writing a new C API for CPython hiding implementation details and replacing
macros with function calls.</p>
<img alt="CPython sprint at Facebook, september 2017" src="https://vstinner.github.io/images/cpython_sprint_sept2017.jpg" />
<p>This article tries to explain why the CPython C API needs to <strong>evolve</strong>.</p>
<div class="section" id="c-api-prevents-further-optimizations">
<h2>C API prevents further optimizations …</h2></div><p>I am currently at a CPython sprint 2017 at Facebook. We are discussing my idea
of writing a new C API for CPython hiding implementation details and replacing
macros with function calls.</p>
<img alt="CPython sprint at Facebook, september 2017" src="https://vstinner.github.io/images/cpython_sprint_sept2017.jpg" />
<p>This article tries to explain why the CPython C API needs to <strong>evolve</strong>.</p>
<div class="section" id="c-api-prevents-further-optimizations">
<h2>C API prevents further optimizations</h2>
<p>The CPython <tt class="docutils literal">PyListObject</tt> type uses an array of <tt class="docutils literal">PyObject*</tt> objects. PyPy
is able to use a C array of integers if the list only contains small integers.
CPython cannot because PyList_GET_ITEM(list, index) is implemented as a macro:</p>
<pre class="literal-block">
#define PyList_GET_ITEM(op, i) ((PyListObject *)op)->ob_item[i]
</pre>
<p>The macro relies on the <tt class="docutils literal">PyListObject</tt> structure:</p>
<pre class="literal-block">
typedef struct {
PyVarObject ob_base;
PyObject **ob_item; // <-- pointer to real data
Py_ssize_t allocated;
} PyListObject;
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
</pre>
</div>
<div class="section" id="api-and-abi">
<h2>API and ABI</h2>
<p>Compiling C extension code using <tt class="docutils literal">PyList_GET_ITEM()</tt> produces machine code
accessing <tt class="docutils literal">PyListObject</tt> members. Something like (C pseudo code):</p>
<pre class="literal-block">
PyObject **items;
PyObject *item;
items = (PyObject **)(((char*)list) + 24);
item = items[i];
</pre>
<p>The offset 24 is hardcoded in the C extension object file: the <strong>API</strong>
(<strong>programming</strong> interface) becomes the <strong>ABI</strong> (<strong>binary</strong> interface).</p>
<p>But debug builds use a different memory layout:</p>
<pre class="literal-block">
typedef struct _object {
struct _object *_ob_next; // <--- two new fields are added
struct _object *_ob_prev; // <--- for debug purpose
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
</pre>
<p>The machine code becomes something like:</p>
<pre class="literal-block">
items = (PyObject **)(((char*)op) + 40);
item = items[i];
</pre>
<p>The offset changes from 24 to 40 (+16, two pointers of 8 bytes).</p>
<p>C extensions have to be recompiled to work on Python compiled in debug mode.</p>
<p>Another example is Python 2.7 which uses a different ABI for UTF-16 and UCS-4
Unicode string: the <tt class="docutils literal"><span class="pre">--with-wide-unicode</span></tt> configure option.</p>
</div>
<div class="section" id="stable-abi">
<h2>Stable ABI</h2>
<p>If the machine code doesn't use the offset, it would be able to only compile C
extensions once.</p>
<p>A solution is to replace PyList_GET_ITEM() <strong>macro</strong> with a <strong>function</strong>:</p>
<pre class="literal-block">
PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t index);
</pre>
<p>defined as:</p>
<pre class="literal-block">
PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t index)
{
return ((PyListObject *)list)->ob_item[i];
}
</pre>
<p>The machine code becomes a <strong>function call</strong>:</p>
<pre class="literal-block">
PyObject *item;
item = PyList_GET_ITEM(list, index);
</pre>
</div>
<div class="section" id="specialized-list-for-small-integers">
<h2>Specialized list for small integers</h2>
<p>If C extension objects don't access structure members anymore, it becomes
possible to modify the memory layout.</p>
<p>For example, it's possible to design a specialized implementation of
<tt class="docutils literal">PyListObject</tt> for small integers:</p>
<pre class="literal-block">
typedef struct {
PyVarObject ob_base;
int use_small_int;
PyObject **pyobject_array;
int32_t *small_int_array; // <-- new compact C array for integers
Py_ssize_t allocated;
} PyListObject;
PyObject* PyList_GET_ITEM(PyObject *op, Py_ssize_t index)
{
PyListObject *list = (PyListObject *)op;
if (list->use_small_int) {
int32_t item = list->small_int_array[index];
/* create a new object at each call */
return PyLong_FromLong(item);
}
else {
return list->pyobject_array[index];
}
}
</pre>
<p>It's just an example to show that it becomes possible to modify PyObject
structures. I'm not sure that it's useful in practice.</p>
</div>
<div class="section" id="multiple-python-runtimes">
<h2>Multiple Python "runtimes"</h2>
<p>Assuming that all used C extensions use the new stable ABI, we can now imagine
multiple specialized Python runtimes installed in parallel, instead of a single
runtime:</p>
<ul class="simple">
<li>python3.7: regular/legacy CPython, backward compatible</li>
<li>python3.7-dbg: runtime checks to ease debug</li>
<li>fasterpython3.7: use specialized list</li>
<li>etc.</li>
</ul>
<p>The <tt class="docutils literal">python3</tt> runtime would remain <strong>fully</strong> compatible since it would use
the old C API with macros and full structures. So by default, everything will
continue to work.</p>
<p>But the other runtimes require that all imported C extensions were compiled
with the new C API.</p>
<p><tt class="docutils literal"><span class="pre">python3.7-dbg</span></tt> adds more checks tested at runtime. Example:</p>
<pre class="literal-block">
PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t index)
{
assert(PyList_Check(list));
assert(0 <= index && index < Py_SIZE(list));
return ((PyListObject *)list)->ob_item[i];
}
</pre>
<p>Currently, some Linux distributions provide a <tt class="docutils literal"><span class="pre">python3-dbg</span></tt> binary, but may
not provide <tt class="docutils literal"><span class="pre">-dbg</span></tt> binary packages of all C extensions. So all C extensions
have to be recompiled manually which is quite painful (need to install build
dependencies, wait until everthing is recompiled, etc.).</p>
</div>
<div class="section" id="experiment-optimizations">
<h2>Experiment optimizations</h2>
<p>With the new C API, it becomes possible to implement a new class of
optimizations.</p>
<div class="section" id="tagged-pointer">
<h3>Tagged pointer</h3>
<p>Store small integers directly into the pointer value. Reduce the memory usage,
avoid expensive unboxing-boxing.</p>
<p>See <a class="reference external" href="https://en.wikipedia.org/wiki/Tagged_pointer">Wikipedia: Tagged pointer</a>.</p>
</div>
<div class="section" id="no-garbage-collector-gc-at-all">
<h3>No garbage collector (GC) at all</h3>
<p>Python runtime without GC at all. Remove the following header from objects
tracked by the GC:</p>
<pre class="literal-block">
struct {
union _gc_head *gc_next;
union _gc_head *gc_prev;
Py_ssize_t gc_refs;
} PyGC_Head;
</pre>
<p>It would remove 24 bytes per object tracked by the GC.</p>
<p>For comparison, the smallest Python object is "object()" which only takes 16
bytes.</p>
</div>
<div class="section" id="tracing-garbage-collector-without-reference-counting">
<h3>Tracing garbage collector without reference counting</h3>
<p>This idea is really the most complex and most experimental idea, but IMHO it's
required to "unlock" Python performances.</p>
<ul class="simple">
<li>Write a new API to keep track of pointers:<ul>
<li>Declare a variable storing a <tt class="docutils literal">PyObject*</tt> object</li>
<li>Set a pointer</li>
<li>Maybe also read a pointer?</li>
</ul>
</li>
<li>Modify C extensions to use this new API</li>
<li>Implement a tracing garbage collector which can move objects in memory
to compact memory</li>
<li>Remove reference counting</li>
</ul>
<p>It even seems possible to implement a tracing garbage collector <strong>and</strong> use
reference counting. But I'm not an expert in this area, need to dig the topic.</p>
<p>Questions:</p>
<ul class="simple">
<li>Is it possible to fix all C extensions to use the new API? Should be an
opt-in option in a first stage.</li>
<li>Is it possible to emulate Py_INCREF/DECREF API, for backward compatibility,
using an hash table which maintains a reference counter outside <tt class="docutils literal">PyObject</tt>?</li>
<li>Do we need to fix all C extensions?</li>
</ul>
<p>Read also <a class="reference external" href="https://en.wikipedia.org/wiki/Tracing_garbage_collection">Wikipedia: Tracing garbage collection</a>.</p>
</div>
<div class="section" id="gilectomy">
<h3>Gilectomy</h3>
<p>Abstracting the ABI allows to customize the runtime for Gilectomy needs, to be
able to reemove the GIL.</p>
<p>Removing reference counting would make Gilectomy much simpler.</p>
</div>
</div>
My contributions to CPython during 2017 Q2 (part 3)2017-07-13T17:00:00+02:002017-07-13T17:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/contrib-cpython-2017q2-part3.html<p>This is the third part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p>
<ul class="simple">
<li>Security</li>
<li>Trick bug: Clang 4.0, dtoa and strict aliasing</li>
<li>sigwaitinfo() race condition in test_eintr</li>
<li>FreeBSD test_subprocess core dump</li>
</ul>
<p>Previous reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</li>
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part2.html">My contributions to CPython …</a></li></ul><p>This is the third part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p>
<ul class="simple">
<li>Security</li>
<li>Trick bug: Clang 4.0, dtoa and strict aliasing</li>
<li>sigwaitinfo() race condition in test_eintr</li>
<li>FreeBSD test_subprocess core dump</li>
</ul>
<p>Previous reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</li>
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part2.html">My contributions to CPython during 2017 Q2 (part 2)</a>.</li>
</ul>
<p>Next report:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</li>
</ul>
<div class="section" id="security">
<h2>Security</h2>
<div class="section" id="backport-fixes">
<h3>Backport fixes</h3>
<p>I am trying to fix all known security fixes in the 6 maintained Python
branches: 2.7, 3.3, 3.4, 3.5, 3.6 and master.</p>
<p>I created the <a class="reference external" href="http://python-security.readthedocs.io/">python-security.readthedocs.io</a> website to track these
vulnerabilities, especially which Python versions are fixed, to identifiy
missing backports.</p>
<p>Python 2.7, 3.5, 3.6 and master are quite good, I am still working on
backporting fixes into 3.4 and 3.3. Larry Hastings merged my 3.4 backports and
other security fixes, and scheduled a new 3.4.7 release next weeks. Later, I
will try to fix Python 3.3 as well, before its end-of-life, scheduled for the
end of september.</p>
<p>See the <a class="reference external" href="https://docs.python.org/devguide/#status-of-python-branches">Status of Python branches</a> in the
devguide.</p>
</div>
<div class="section" id="libexpat-2-2">
<h3>libexpat 2.2</h3>
<p>Python embeds a copy of libexpat to ease Python compilation on Windows and
macOS. It means that we have to remind to upgrade it at each libexpat release.
It is especially important when security vulerabilities are fixed in libexpat.</p>
<p>libexpat 2.2 was released at 2016-06-21 and it contains such fixes for
vulnerabilities, see: <a class="reference external" href="http://python-security.readthedocs.io/vuln/cve-2016-0718_expat_2.2_bug_537.html">CVE-2016-0718: expat 2.2, bug #537</a>.</p>
<p>Sadly, it took us a few months to upgrade libexpat. I wrote a short shell
script to easily upgrade libexpat: recreate the <tt class="docutils literal">Modules/expat/</tt> directory
from a libexpat tarball.</p>
<p>My commit:</p>
<blockquote>
<p>bpo-29591: Upgrade Modules/expat to libexpat 2.2 (#2164)</p>
<p>Remove the configuration (<tt class="docutils literal"><span class="pre">Modules/expat/*config.h</span></tt>) of unsupported
platforms: Amiga, MacOS Classic on PPC32, Open Watcom.</p>
<p>Remove XML_HAS_SET_HASH_SALT define: it became useless since our local
expat copy was upgrade to expat 2.1 (it's now expat 2.2.0).</p>
</blockquote>
<p>I upgraded libexpat to 2.2 in Pytohn 2.7, 3.4, 3.5, 3.6 and master branches.
I still have a pending pull request for 3.3.</p>
</div>
<div class="section" id="libexpat-2-2-1">
<h3>libexpat 2.2.1</h3>
<p>Just after I finally upgraded our libexpat copy to 2.2.0... libexpat 2.2.1 was
released with new security fixes! See <a class="reference external" href="http://python-security.readthedocs.io/vuln/cve-2017-9233_expat_2.2.1.html">CVE-2017-9233: Expat 2.2.1</a></p>
<p>Again, I upgraded libexpat to 2.2.1 in all branches (pending: 3.3), see
bpo-30694. My commit:</p>
<blockquote>
<p>Upgrade expat copy from 2.2.0 to 2.2.1 to get fixes
of multiple security vulnerabilities including:</p>
<ul class="simple">
<li>CVE-2017-9233 (External entity infinite loop DoS),</li>
<li>CVE-2016-9063 (Integer overflow, re-fix),</li>
<li>CVE-2016-0718 (Fix regression bugs from 2.2.0's fix to CVE-2016-0718)</li>
<li>CVE-2012-0876 (Counter hash flooding with SipHash).</li>
</ul>
<p>Note: the CVE-2016-5300 (Use os-specific entropy sources like getrandom)
doesn't impact Python, since Python already gets entropy from the OS to set
the expat secret using <tt class="docutils literal">XML_SetHashSalt()</tt>.</p>
</blockquote>
</div>
<div class="section" id="urllib-splithost-vulnerability">
<h3>urllib splithost() vulnerability</h3>
<p>Vulnerability: <a class="reference external" href="http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html">bpo-30500: urllib connects to a wrong host</a>.</p>
<p>While it was quick to confirm the vulnerability, it was tricky to decide how to
properly <strong>fix it without breaking backward compatibility</strong>. We had too few
unit tests, and no obvious definition of the <em>expected</em> behaviour. I
contributed to the discussed and to polish the fix:</p>
<p>bpo-30500 commit:</p>
<blockquote>
Fix urllib.parse.splithost() to correctly parse fragments. For example,
<tt class="docutils literal"><span class="pre">splithost('//127.0.0.1#@evil.com/')</span></tt> now correctly returns the
<tt class="docutils literal">127.0.0.1</tt> host, instead of treating <tt class="docutils literal">@evil.com</tt> as the host in an
authentification (<tt class="docutils literal">login@host</tt>).</blockquote>
<p>Fix applied to master, 3.6, 3.5, 3.4 and 2.7; pending pull request for 3.3.</p>
</div>
<div class="section" id="travis-ci">
<h3>Travis CI</h3>
<p>I also wrote a pull request to enable Travis CI and AppVeyor CI on Python 3.3
and 3.4 branches, to test security on CI. These changes are complex and not
merged yet, but I am now confident that the CI will be enabled on 3.4!</p>
<p>My PR for Python 3.4: <a class="reference external" href="https://github.com/python/cpython/pull/2475">[3.4] Backport CI config from master</a>.</p>
</div>
</div>
<div class="section" id="tricky-bug-clang-4-0-dtoa-and-strict-aliasing">
<h2>Tricky bug: Clang 4.0, dtoa and strict aliasing</h2>
<p>Aha, another funny story about compilers: bpo-30104.</p>
<p>I noticed that the following tests started to fail on the "AMD64 FreeBSD
CURRENT Debug 3.x" buildbot:</p>
<ul class="simple">
<li>test_cmath</li>
<li>test_float</li>
<li>test_json</li>
<li>test_marshal</li>
<li>test_math</li>
<li>test_statistics</li>
<li>test_strtod</li>
</ul>
<p>First, I bet on a libc change on FreeBSD. Then, I found that test_strtod fails
on FreeBSD using clang 4.0, but pass on FreeBSD using clang 3.8.</p>
<p>I started to bisect the code on Linux using a subset of <tt class="docutils literal">Python/dtoa.c</tt>:</p>
<ul class="simple">
<li>Start (integrated in CPython code base): 2,876 lines</li>
<li>dtoa2.c (standalone): 2,865 lines</li>
<li>dtoa5.c: 50 lines</li>
</ul>
<p>Extract of dtoa5.c:</p>
<pre class="literal-block">
typedef union { double d; uint32_t L[2]; } U;
struct Bigint { int wds; };
static double
ratio(struct Bigint *a)
{
U da, db;
int k, ka, kb;
double r;
da.d = 1.682;
ka = 6;
db.d = 1.0;
kb = 5;
k = ka - kb + 32 * (a->wds - 12);
printf("k=%i\n", k);
if (k > 0)
da.L[1] += k * 0x100000;
else {
k = -k;
db.L[1] += k * 0x100000;
}
r = da.d / db.d;
/* r == 3.364 */
return r;
}
</pre>
<p>Even if I had a very short C code (50 lines) reproducing the bug, I was still
unable to understand the bug. I read many articles about aliasing, and I still
don't understand fully the bug... I suggest you these two good articles:</p>
<ul class="simple">
<li><a class="reference external" href="http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html">Understanding Strict Aliasing</a>
(Mike Acton, June 1, 2006)</li>
<li><a class="reference external" href="http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html">Demystifying The Restrict Keyword</a>
(Mike Acton, May 29, 2006)</li>
</ul>
<p>Anyway, I wanted to report the bug to clang (LLVM), but the LLVM bug tracker was
migrating and I was unable to subscribe to get an account!</p>
<p>In the meanwhile, <strong>Dimitry Andric</strong>, a FreeBSD developer, told me that he got
<em>exactly</em> the same clang 4.0 issue with "dtoa.c" in the <em>julia</em> programming
language. Two months before I saw the same bug, he already reported the bug to
FreeBSD: <a class="reference external" href="https://bugs.freebsd.org/216770">lang/julia: fails to build with clang 4.0</a>, and to clang: <a class="reference external" href="https://bugs.llvm.org//show_bug.cgi?id=31928">After r280351: if/else
blocks incorrectly optimized away?</a>.</p>
<p>The "problem" is that clang
developers disagree that it's a bug. In short, the discussion was around the C
standard: does clang respect C aliasing rules or not? At the end, clang
developers consider that they are right to optimize. To summarize:</p>
<blockquote>
It's a bug in the code, not in the compiler</blockquote>
<p>So I made a first change to use the <tt class="docutils literal"><span class="pre">-fno-strict-aliasing</span></tt> flag when Python
is compiled with clang:</p>
<blockquote>
Python/dtoa.c is not compiled correctly with clang 4.0 and
optimization level -O2 or higher, because of an aliasing issue on
the double/ULong[2] union.</blockquote>
<p>But this change can make Python slower when compiled on clang, so I was asked
to only compile <tt class="docutils literal">Python/dtoa.c</tt> with this flag:</p>
<blockquote>
On clang, only compile dtoa.c with -fno-strict-aliasing, use strict
aliasing to compile all other C files.</blockquote>
</div>
<div class="section" id="sigwaitinfo-race-condition-in-test-eintr">
<h2>sigwaitinfo() race condition in test_eintr</h2>
<div class="section" id="the-tricky-test-eintr">
<h3>The tricky test_eintr</h3>
<p>When I wrote and implemented the <a class="reference external" href="https://www.python.org/dev/peps/pep-0475/">PEP 475, Retry system calls failing with
EINTR</a>, I didn't expect so many
annoying bugs of the newly written <tt class="docutils literal">test_eintr</tt> unit test. This test calls
system calls while sending signals every 100 ms. Usually the test tries to
block on a system call during at least 200 ms, to make sure that the syscall
was interrupted at least once by a signal, to check that Python correctly
retries the interrupted system call.</p>
<p>Since the PEP was implemented, I already fixed many race conditions in
<tt class="docutils literal">test_eintr</tt>, but there was still a race condition on the <tt class="docutils literal">sigwaitinfo()</tt>
unit test. <em>Sometimes</em> on a <em>few specific buildbots</em> (FreeBSD), the test fails
randomly.</p>
</div>
<div class="section" id="first-attempt">
<h3>First attempt</h3>
<p>My first attempt was the <a class="reference external" href="http://bugs.python.org/issue25277">bpo-25277</a>,
opened at 2015-09-30. I added faulthandler to dump tracebacks if a test hangs
longer than 10 minutes. Then I changed the sleep from 200 ms to 2 seconds in
the <tt class="docutils literal">sigwaitinfo()</tt> test... just to make the bug less likely, but using a
longer sleep doesn't fix the root issue.</p>
</div>
<div class="section" id="second-attempt">
<h3>Second attempt</h3>
<p>My second attempt was the <a class="reference external" href="http://bugs.python.org/issue25868">bpo-25868</a>,
opened at 2015-12-15. I added a pipe to "synchronize the parent and the child
processes", to try to make the sigwaitinfo() test a little bit more reliable. I
also reduced the sleep from 2 seconds to 100 ms.</p>
<p>7 minutes after my fix, <strong>Martin Panter</strong> wrote:</p>
<blockquote>
<p>With the pipe, there is still a potential race after the parent writes to
the pipe and before sigwaitinfo() is invoked, versus the child sleep()
call.</p>
<p>What do you think of my suggestion to block the signal? Then (in theory) it
should be robust, rather than relying on timing.</p>
</blockquote>
<p>I replied that I wasn't sure that sigwaitinfo() EINTR error was still tested if
we make his proposed change.</p>
<p>One month later, Martin wrote a patch but I was unable to take a decision on
his change. In september 2016, Martin noticed a new test failure on the FreeBSD
9 buildbot.</p>
</div>
<div class="section" id="third-attempt">
<h3>Third attempt</h3>
<p>My third attempt is the bpo-30320, opened at 2017-05-09. This time, I really
wanted to fix <em>all</em> buildbot random failures. Since I was now able to reproduce
the bug on my FreeBSD VM, I was able to write a fix but also to check that:</p>
<ul class="simple">
<li>sigwaitinfo() and sigtimedwait() fail with EINTR and Python automatically
restarts the interrupted syscall</li>
<li>I hacked the test file to only run the sigwaitinfo() and sigtimedwait() unit
tests. Running the test in a loop doesn't fail: I ran the test during 5
minutes in 10 shells (tests running 10 times in parallel) => no failure, the
race condition seems to be gone.</li>
</ul>
<p>So I <a class="reference external" href="https://github.com/python/cpython/commit/211a392cc15f9a7b1b8ce65d8f6c9f8237d1b77f">pushed my fix</a>:</p>
<blockquote>
<p>bpo-30320: test_eintr now uses pthread_sigmask()</p>
<p>Rewrite sigwaitinfo() and sigtimedwait() unit tests for EINTR using
pthread_sigmask() to fix a race condition between the child and the
parent process.</p>
<p>Remove the pipe which was used as a weak workaround against the race
condition.</p>
<p>sigtimedwait() is now tested with a child process sending a signal
instead of testing the timeout feature which is more unstable
(especially regarding to clock resolution depending on the platform).</p>
</blockquote>
<p>To be honest, I wasn't really confident, when I pushed my fix, that blocking
the waited signal is the proper fix.</p>
<p>So it took <strong>1 year and 8 months</strong> to really find and fix the root bug.</p>
<p>Sadly, while I was working on dozens of other bugs, I completely lost track of
Martin's patch, even if I opened the bpo-25868. Sorry Martin for forgotting to
review your patch! But when you wrote it, I was unable to test that
sigwaitinfo() was still failing with EINTR.</p>
</div>
</div>
<div class="section" id="freebsd-test-subprocess-core-dump">
<h2>FreeBSD test_subprocess core dump</h2>
<p>bpo-30448: During one month, some FreeBSD buildbots was emitting this warning
which started to annoy me, since I was trying to fix <em>all</em> buildbots warnings:</p>
<pre class="literal-block">
Warning -- files was modified by test_subprocess
Before: []
After: ['python.core']
</pre>
<p>I tried and failed to reproduce the warning on my FreeBSD 11 VM. I also asked a
friend to reproduce the bug, but he also failed. I was developping my
<tt class="docutils literal">test.bisect</tt> tool and I wanted to get access to a machine to reproduce the
bug!</p>
<p>Later, <strong>Kubilay Kocak</strong> aka <em>koobs</em> gave me access to his FreeBSD buildbots
and in a few seconds with my new test.bisect tool, I identified that the
<tt class="docutils literal">test_child_terminated_in_stopped_state()</tt> test triggers a deliberate crash,
but doesn't disable core dump creation. The fix is simple, use
<tt class="docutils literal">test.support.SuppressCrashReport</tt> context manager. Thanks <em>koobs</em> for the
access!</p>
<p>Maybe only FreeBSD 10 and older dump a core on this specific test, not FreeBSD
11. I don't know why. The test is special, it tests a process which crashs
while being traced with <tt class="docutils literal">ptrace()</tt>.</p>
</div>
My contributions to CPython during 2017 Q2 (part 2)2017-07-13T16:30:00+02:002017-07-13T16:30:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/contrib-cpython-2017q2-part2.html<p>This is the second part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p>
<ul class="simple">
<li>Mentoring</li>
<li>Reference and memory leaks</li>
<li>Contributions</li>
<li>Enhancements</li>
<li>Bugfixes</li>
<li>Stars of the CPython GitHub project</li>
</ul>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</p>
<p>Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part3.html">My contributions to CPython during 2017 Q2 …</a></p><p>This is the second part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p>
<ul class="simple">
<li>Mentoring</li>
<li>Reference and memory leaks</li>
<li>Contributions</li>
<li>Enhancements</li>
<li>Bugfixes</li>
<li>Stars of the CPython GitHub project</li>
</ul>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</p>
<p>Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part3.html">My contributions to CPython during 2017 Q2 (part 3)</a>.</p>
<div class="section" id="mentoring">
<h2>Mentoring</h2>
<p>During this quarter, I tried to mark "easy" issues using a "[EASY]" tag in
their title and the "easy" or "easy C" keyword. I announced these issues on the
<a class="reference external" href="https://www.python.org/dev/core-mentorship/">core-mentorship mailing list</a>.
I asked core developers to not fix these easy issues, but rather explain how to
fix them. In each issue, I described how fix these issues.</p>
<p>It was a success since all easy issues were fixed quickly, usually the PR was
merged in less than 24 hours after I created the issue!</p>
<p>I mentored <strong>Stéphane Wirtel</strong> and <strong>Louie Lu</strong> to fix issues (easy or not).
During this quarter, Stéphane Wirtel got <strong>5 commits</strong> merged into master (on a
<strong>total of 11 commits</strong>), and Louie lu got <strong>6 commits</strong> merged into master (on
a <strong>total of 10 commits</strong>).</p>
<p>They helped me to fix reference leaks spotted by the new Refleaks buildbots.</p>
</div>
<div class="section" id="reference-and-memory-leaks">
<h2>Reference and memory leaks</h2>
<p>Zachary Ware installed a Gentoo and a Windows buildbots running the Python test
suite with <tt class="docutils literal"><span class="pre">--huntrleaks</span></tt> to detect reference and memory leaks.</p>
<p>I worked hard with others, especially Stéphane Wirtel and Louie Lu, to fix
<em>all</em> reference leaks and memory leaks in Python 2.7, 3.5, 3.6 and master.
Right now, there is no more leaks on Windows! For Gentoo, the buildbot is
currently offline, but I am confident that all leaks also fixed.</p>
<ul class="simple">
<li>bpo-30598: _PySys_EndInit() now duplicates warnoptions. Fix a reference leak
in subinterpreters, like test_callbacks_leak() of test_atexit. warnoptions is
a list used to pass options from the command line to the sys module
constructor. Before this change, the list was shared by multiple interpreter
which is not the expected behaviour. Each interpreter should have their own
independent mutable world. This change duplicates the list in each
interpreter. So each interpreter owns its own list, so each interpreter can
clear its own list.</li>
<li>bpo-30601: Fix a refleak in WindowsConsoleIO. Fix a reference leak in
_io._WindowsConsoleIO: PyUnicode_FSDecoder() always initialize decodedname
when it succeed and it doesn't clear input decodedname object.</li>
<li>bpo-30599: Fix test_threaded_import reference leak. Mock
os.register_at_fork() when importing the random module, since this function
doesn't allow to unregister callbacks and so leaked memory.</li>
<li>2.7: _tkinter: Fix refleak in getint(). PyNumber_Int() creates a new reference:
need to decrement result reference counter.</li>
<li>bpo-30635: Fix refleak in test_c_locale_coercion. When checking for reference
leaks, test_c_locale_coercion is run multiple times and so
_LocaleCoercionTargetsTestCase.setUpClass() is called multiple times.
setUpClass() appends new value at each call, so it looks like a reference
leak. Moving the setup from setUpClass() to setUpModule() avoids this,
eliminating the false alarm.</li>
<li>bpo-30602: Fix refleak in os.spawnve(). When os.spawnve() fails while
handling arguments, free correctly argvlist: pass lastarg+1 rather than
lastarg to free_string_array() to also free the first item.</li>
<li>bpo-30602: Fix refleak in os.spawnv(). When os.spawnv() fails while handling
arguments, free correctly argvlist: pass lastarg+1 rather than lastarg to
free_string_array() to also free the first item.</li>
<li>Fix ref cycles in TestCase.assertRaises(). bpo-23890:
unittest.TestCase.assertRaises() now manually breaks a reference cycle to not
keep objects alive longer than expected.</li>
<li>Python 2.7: bpo-30675: Fix refleak hunting in regrtest. regrtest now warms up
caches: create explicitly all internal singletons which are created on demand
to prevent false positives when checking for reference leaks.</li>
<li>_winconsoleio: Fix memory leak. Fix memory leak when _winconsoleio tries to
open a non-console file: free the name buffer.</li>
<li>bpo-30813: Fix unittest when hunting refleaks. bpo-11798, bpo-16662,
bpo-16935, bpo-30813: Skip
test_discover_with_module_that_raises_SkipTest_on_import() and
test_discover_with_init_module_that_raises_SkipTest_on_import() of
test_unittest when hunting reference leaks using regrtest.</li>
<li>bpo-30704, bpo-30604: Fix memleak in code_dealloc(): Free also
co_extra->ce_extras, not only co_extra. XXX Serhiy rewrote the structure in
master to use a single memory block, implemented my idea.</li>
</ul>
<div class="section" id="python-3-5-regrtest-fix">
<h3>Python 3.5 regrtest fix</h3>
<p>bpo-30675, Fix the multiprocessing code in regrtest:</p>
<ul class="simple">
<li>Rewrite code to pass <tt class="docutils literal">slaveargs</tt> from the master process to worker
processes: reuse the same code of the Python master branch.</li>
<li>Move code to initialize tests in a new <tt class="docutils literal">setup_tests()</tt> function,
similar change was done in the master branch.</li>
<li>In a worker process, call <tt class="docutils literal">setup_tests()</tt> with the namespace built
from <tt class="docutils literal">slaveargs</tt> to initialize correctly tests.</li>
</ul>
<p>Before this change, <tt class="docutils literal">warm_caches()</tt> was not called in worker processes
because the setup was done before rebuilding the namespace from <tt class="docutils literal">slaveargs</tt>.
As a consequence, the <tt class="docutils literal">huntrleaks</tt> feature was unstable. For example,
<tt class="docutils literal">test_zipfile</tt> reported randomly false positive on reference leaks.</p>
</div>
<div class="section" id="false-positives">
<h3>False positives</h3>
<p>bpo-30776: reduce regrtest -R false positives (#2422)</p>
<ul class="simple">
<li>Change the regrtest --huntrleaks checker to decide if a test file
leaks or not. Require that each run leaks at least 1 reference.</li>
<li>Warmup runs are now completely ignored: ignored in the checker test
and not used anymore to compute the sum.</li>
<li>Add an unit test for a reference leak.</li>
</ul>
<p>Example of reference differences previously considered a failure
(leak) and now considered as success (success, no leak):</p>
<pre class="literal-block">
[3, 0, 0]
[0, 1, 0]
[8, -8, 1]
</pre>
<p>The same change was done to check for memory leaks.</p>
</div>
</div>
<div class="section" id="contributions">
<h2>Contributions</h2>
<p>This quarter, I helped to merge two contributions:</p>
<ul class="simple">
<li>bpo-9850: Deprecate the macpath module. Co-Authored-By: <strong>Chi Hsuan Yen</strong>.</li>
<li>bpo-30595: Fix multiprocessing.Queue.get(timeout).
multiprocessing.Queue.get() with a timeout now polls its reader in
non-blocking mode if it succeeded to aquire the lock but the acquire took
longer than the timeout. Co-Authored-By: <strong>Grzegorz Grzywacz</strong>.</li>
</ul>
</div>
<div class="section" id="enhancements">
<h2>Enhancements</h2>
<ul class="simple">
<li>bpo-30265: support.unlink() now only ignores ENOENT and ENOTDIR, instead of
ignoring all OSError exception.</li>
<li>bpo-30054: Expose tracemalloc C API: make PyTraceMalloc_Track() and
PyTraceMalloc_Untrack() functions public. numpy is able to use
tracemalloc since numpy 1.13.</li>
</ul>
</div>
<div class="section" id="bugfixes">
<h2>Bugfixes</h2>
<ul class="simple">
<li>bpo-30125: On Windows, faulthandler.disable() now removes the exception
handler installed by faulthandler.enable().</li>
<li>bpo-30284: Fix regrtest for out of tree build. Use a build/ directory in the
build directory, not in the source directory, since the source directory may
be read-only and must not be modified. Fallback on the source directory if
the build directory is not available (missing "abs_builddir" sysconfig
variable).</li>
<li>test_locale now ignores the DeprecationWarning, don't fail anymore if test
run with <tt class="docutils literal">python3 <span class="pre">-Werror</span></tt>. Fix also deprecation message: add a space.</li>
<li>Fix a compiler warnings on AIX: only define get_zone() and get_gmtoff() if
needed.</li>
<li>Fix a compiler warning in tmtotuple(): use the <tt class="docutils literal">time_t</tt> type for the
<tt class="docutils literal">gmtoff</tt> parameter.</li>
<li>bpo-30264: ExpatParser closes the source on error. ExpatParser.parse() of
xml.sax.xmlreader now always closes the source: close the file object or the
urllib object if source is a string (not an open file-like object). The
change fixes a ResourceWarning on parsing error. Add
test_parse_close_source() unit test.</li>
<li>Fix SyntaxWarning on importing test_inspect. Fix the following warning when
test_inspect.py is compiled to test_inspect.pyc:
<tt class="docutils literal">SyntaxWarning: tuple parameter unpacking has been removed in 3.x</tt></li>
<li>bpo-30418: On Windows, subprocess.Popen.communicate() now also ignore EINVAL
on stdin.write(): ignore also EINVAL if the child process is still running
but closed the pipe.</li>
<li>bpo-30257: _bsddb: Fix newDBObject(). Don't set cursorSetReturnsNone to
DEFAULT_CURSOR_SET_RETURNS_NONE anymore if self->myenvobj is set.
Fix a GCC warning on the strange indentation.</li>
<li>bpo-30231: Remove skipped test_imaplib tests. The public cyrus.andrew.cmu.edu
IMAP server (port 993) doesn't accept TLS connection using our self-signed
x509 certificate. Remove the two tests which are already skipped. Write a new
test_certfile_arg_warn() unit test for the certfile deprecation warning.</li>
</ul>
</div>
<div class="section" id="stars-of-the-cpython-github-project">
<h2>Stars of the CPython GitHub project</h2>
<p>At June 30, I wrote <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148523.html">an email to python-dev</a> about
<a class="reference external" href="https://github.com/showcases/programming-languages">GitHub showcase of hosted programming languages</a>: Python is only #11 with
8,539 stars, behind PHP and Ruby! I suggested to "like" ("star"?) the <a class="reference external" href="https://github.com/python/cpython/">CPython
project on GitHub</a> if you like the Python
programming language!</p>
<p>Four days later, <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-July/148548.html">we got +2,389 new stars (8,539 => 10,928)</a>, thank
you! Python moved from the 11th place to the 9th, before Elixir and Julia.</p>
<p>Ben Hoyt <a class="reference external" href="https://www.reddit.com/r/Python/comments/6kg4w0/cpython_recently_moved_to_github_star_the_project/">posted it on reddit.com/r/Python</a>,
where it got a bit of traction. Terry Jan Reedy also <a class="reference external" href="https://mail.python.org/pipermail/python-list/2017-July/723476.html">posted it on python-list</a>.</p>
<p>Screenshot at 2017-07-13 showing Ruby, PHP and CPython:</p>
<a class="reference external image-reference" href="https://github.com/showcases/programming-languages"><img alt="GitHub showcase: Programming languages" src="https://vstinner.github.io/images/github_cpython_stars.png" /></a>
<p>CPython now has 11,512 stars, only 861 stars behind PHP ;-)</p>
</div>
My contributions to CPython during 2017 Q2 (part 1)2017-07-13T16:00:00+02:002017-07-13T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/contrib-cpython-2017q2-part1.html<p>This is the first part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p>
<ul class="simple">
<li>Statistics</li>
<li>Buidbots and test.bisect</li>
<li>Python 3.6.0 regression</li>
<li>struct.Struct.format type</li>
<li>Optimization: one less syscall per open() call</li>
<li>make regen-all</li>
</ul>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a>.</p>
<p>Next reports …</p><p>This is the first part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p>
<ul class="simple">
<li>Statistics</li>
<li>Buidbots and test.bisect</li>
<li>Python 3.6.0 regression</li>
<li>struct.Struct.format type</li>
<li>Optimization: one less syscall per open() call</li>
<li>make regen-all</li>
</ul>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a>.</p>
<p>Next reports:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part2.html">My contributions to CPython during 2017 Q2 (part 2)</a>.</li>
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part3.html">My contributions to CPython during 2017 Q2 (part 3)</a>.</li>
<li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</li>
</ul>
<p>Next parts</p>
<div class="section" id="statistics">
<h2>Statistics</h2>
<pre class="literal-block">
# All branches
$ git log --after=2017-03-31 --before=2017-06-30 --reverse --branches='*' --author=Stinner > 2017Q2
$ grep '^commit ' 2017Q2|wc -l
222
# Master branch only
$ git log --after=2017-03-31 --before=2017-06-30 --reverse --author=Stinner origin/master|grep '^commit '|wc -l
85
</pre>
<p>Statistics: <strong>85</strong> commits in the master branch, a <strong>total of 222 commits</strong>:
most (but not all) of the remaining 137 commits are cherry-picked backports to
2.7, 3.5 and 3.6 branches.</p>
<p>Note: I didn't use <tt class="docutils literal"><span class="pre">--no-merges</span></tt> since we don't use merge anymore, but <tt class="docutils literal">git
<span class="pre">cherry-pick</span> <span class="pre">-x</span></tt>, to <em>backport</em> fixes. Before GitHub, we used <strong>forwardport</strong>
with Mercurial merges (ex: commit into 3.6, then merge into master).</p>
</div>
<div class="section" id="buildbots-and-test-bisect">
<h2>Buildbots and test.bisect</h2>
<p>Since this article became way too long, I splitted it into sub-articles:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/python-test-bisect.html">New Python test.bisect tool</a></li>
<li><a class="reference external" href="https://vstinner.github.io/python-buildbots-2017q2.html">Work on Python buildbots, 2017 Q2</a></li>
</ul>
</div>
<div class="section" id="python-3-6-0-regression">
<h2>Python 3.6.0 regression</h2>
<p>I am ashamed, I introduced a tricky regression in Pyton 3.6.0 with my work on
FASTCALL optimizations :-( A special way to call C builtin functions was broken:</p>
<pre class="literal-block">
from datetime import datetime
next(iter(datetime.now, None))
</pre>
<p>This code raises a <tt class="docutils literal">StopIteration</tt> exception instead of formatting the
current date and time.</p>
<p>It's even worse. I was aware of the bug, it was already fixed it in master, but
I just forgot to backport my fix: bpo-30524, fix _PyStack_UnpackDict().</p>
<p>To prevent regressions, I wrote exhaustive unit tests on the 3 FASTCALL
functions, commit: <a class="reference external" href="https://github.com/python/cpython/commit/3b5cf85edc188345668f987c824a2acb338a7816">bpo-30524: Write unit tests for FASTCALL</a></p>
</div>
<div class="section" id="struct-struct-format-type">
<h2>struct.Struct.format type</h2>
<p>Sometimes, fixing a bug can take longer than expected. In March 2014, <strong>Zbyszek
Jędrzejewski-Szmek</strong> reported a bug on the <tt class="docutils literal">format</tt> attribute of the
<tt class="docutils literal">struct.Struct</tt> class: this attribute type is bytes, whereas a Unicode string
(str) was expected.</p>
<p>I proposed to "just" change the attribute type in December 2014, but it was an
incompatible change which would break the backward compatibility. <strong>Martin
Panter</strong> agreed and wrote a patch. <strong>Serhiy Storchaka</strong> asked to discuss such
incompatible change on python-dev, but then nothing happened during longer
than... 2 years!</p>
<p>In March 2017, I converted the old Martin's patch into a new GitHub pull
request. <strong>Serhiy</strong> asked again to write to python-dev, so I wrote:
<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-March/147688.html">Issue #21071: change struct.Struct.format type from bytes to str</a>. And...
I got zero answer.</p>
<p>Well, I didn't expect any, since it's a trivial change, and I don't expect that
anyone rely on the exact <tt class="docutils literal">format</tt> attribute type. Moreover, the
<tt class="docutils literal">struct.Struct</tt> constructor already accepts bytes and str types. If the
attribute is passed to the constructor: it just works.</p>
<p>In June 2017, Serhiy Storchaka replied to my email: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148360.html">If nobody opposed to this
change it will be made in short time.</a></p>
<p>Since nobody replied, again, I just merged my pull request. So it took <strong>3
years and 3 months</strong> to change the type of an uncommon attribute :-)</p>
<p>Note: I never used this attribute... Before reading this issue, I didn't even
know that the <tt class="docutils literal">struct</tt> module has a <tt class="docutils literal">struct.Struct</tt> type...</p>
</div>
<div class="section" id="optimization-one-less-syscall-per-open-call">
<h2>Optimization: one less syscall per open() call</h2>
<p>In bpo-30228, I modified FileIO.seek() and FileIO.tell() methods to now set the
internal seekable attribute to avoid one <tt class="docutils literal">fstat()</tt> syscall per Python open()
call in buffered or text mode.</p>
<p>The seekable property is now also more reliable since its value is
set correctly on memory allocation failure.</p>
<p>I still have a second pending pull request to remove one more <tt class="docutils literal">fstat()</tt>
syscall: <a class="reference external" href="https://github.com/python/cpython/pull/1385">bpo-30228: TextIOWrapper uses abs_pos, not tell()</a>.</p>
</div>
<div class="section" id="make-regen-all">
<h2>make regen-all</h2>
<p>I started to look at bpo-23404, because the Python compilation failed on the
"AMD64 FreeBSD 9.x 3.x" buildbot when trying to regenerate the
<tt class="docutils literal">Include/opcode.h</tt> file.</p>
<div class="section" id="old-broken-make-touch">
<h3>Old broken make touch</h3>
<p>We had a <tt class="docutils literal">make touch</tt> command to workaround this file timestamp issue, but
the command uses Mercurial, whereas Python migrated to Git last february. The
buildobt "touch" step was removed because <tt class="docutils literal">make touch</tt> was broken.</p>
<p>I was always annoyed by the Makefile which wants to regenerate generated files
because of wrong file modification time, whereas the generated files were
already up to date.</p>
<p>The bug annoyed me on OpenIndiana where "make touch" didn't work beause the
operating system only provides Python 2.6 and Mercurial didn't work on this
version.</p>
<p>The bug also annoyed me on FreeBSD which has no "python" command, only
"python2.7", and so required manual steps.</p>
<p>The bug was also a pain point when trying to cross-compile Python.</p>
</div>
<div class="section" id="new-shiny-make-regen-all">
<h3>New shiny make regen-all</h3>
<p>I decided to rewrite the Makefile to not regenerate generated files based on
the file modification time anymore. Instead, I added a new <tt class="docutils literal">make <span class="pre">regen-all</span></tt>
command to regenerate explicitly all generated files. Basically, I replaced
<tt class="docutils literal">make touch</tt> with <tt class="docutils literal">make <span class="pre">regen-all</span></tt>.</p>
<p>Changes:</p>
<ul class="simple">
<li>Add a new <tt class="docutils literal">make <span class="pre">regen-all</span></tt> command to rebuild all generated files</li>
<li>Add subcommands to only generate specific files:<ul>
<li><tt class="docutils literal"><span class="pre">regen-ast</span></tt>: Include/Python-ast.h and Python/Python-ast.c</li>
<li><tt class="docutils literal"><span class="pre">regen-grammar</span></tt>: Include/graminit.h and Python/graminit.c</li>
<li><tt class="docutils literal"><span class="pre">regen-importlib</span></tt>: Python/importlib_external.h and Python/importlib.h</li>
<li><tt class="docutils literal"><span class="pre">regen-opcode</span></tt>: Include/opcode.h</li>
<li><tt class="docutils literal"><span class="pre">regen-opcode-targets</span></tt>: Python/opcode_targets.h</li>
<li><tt class="docutils literal"><span class="pre">regen-typeslots</span></tt>: Objects/typeslots.inc</li>
</ul>
</li>
<li>Rename <tt class="docutils literal">PYTHON_FOR_GEN</tt> to <tt class="docutils literal">PYTHON_FOR_REGEN</tt></li>
<li>pgen is now only built by <tt class="docutils literal">make <span class="pre">regen-grammar</span></tt></li>
<li>Add <tt class="docutils literal">$(srcdir)/</tt> prefix to paths to source files to handle correctly
compilation outside the source directory</li>
<li>Remove <tt class="docutils literal">make touch</tt>, <tt class="docutils literal">Tools/hg/hgtouch.py</tt> and <tt class="docutils literal">.hgtouch</tt></li>
</ul>
<p>Note: By default, <tt class="docutils literal">$(PYTHON_FOR_REGEN)</tt> is no more used nor needed by "make".</p>
</div>
</div>
Work on Python buildbots, 2017 Q22017-07-13T09:00:00+02:002017-07-13T09:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/python-buildbots-2017q2.html<p>I spent the last 6 months on working on buildbots: reduce the failure rate,
send email notitication on failure, fix random bugs, detect more bugs using
warnings, backport fixes to older branches, etc. I decided to fix <em>all</em>
buildbots issues: fix all warnings and all unstable tests!</p>
<p>The good news …</p><p>I spent the last 6 months on working on buildbots: reduce the failure rate,
send email notitication on failure, fix random bugs, detect more bugs using
warnings, backport fixes to older branches, etc. I decided to fix <em>all</em>
buildbots issues: fix all warnings and all unstable tests!</p>
<p>The good news is that I made great progress, I fixed most random failures. A
random fail now became the exception rather than the norm. Some issues were not
bugs in tests, but real race conditions in the code. It's always good to fix
unlikely race conditions before users hit them on production!</p>
<ul class="simple">
<li>Introduction: Python Buildbots</li>
<li>Orange Is The New Color</li>
<li>New buildbot-status Mailing List</li>
<li>Hardware issues<ul>
<li>The vacuum cleaner</li>
<li>The memory stick</li>
</ul>
</li>
<li>Warnings</li>
<li>regrtest</li>
<li>Bug fixes</li>
<li>Python 2.7</li>
<li>Buildbot reports to python-dev</li>
</ul>
<div class="section" id="introduction-python-buildbots">
<h2>Introduction: Python Buildbots</h2>
<p>CPython is running a <a class="reference external" href="https://buildbot.net/">Buildbot</a> server for continuous
integration, but tests are run as post-commit: see <a class="reference external" href="https://www.python.org/dev/buildbot/">Python buildbots</a>. CPython is tested by a wide range of
buildbot slaves:</p>
<ul class="simple">
<li>6 operating systems:<ul>
<li>Linux (Debian, Ubuntu, Gentoo, RHEL, SLES)</li>
<li>Windows (7, 8, 8.1 and 10)</li>
<li>macOS (Tiger, El Capitain, Sierra)</li>
<li>FreeBSD (9, 10, CURRENT)</li>
<li>AIX</li>
<li>OpenIndiana (currently offline)</li>
</ul>
</li>
<li>5 CPU architectures:<ul>
<li>ARMv7</li>
<li>x86 (Intel 32 bit)</li>
<li>x86-64 aka "AMD64" (Intel 64-bit)</li>
<li>PPC64, PPC64LE</li>
<li>s390x</li>
</ul>
</li>
<li>3 C compilers:<ul>
<li>GCC</li>
<li>Clang (FreeBSD, macOS)</li>
<li>Visual Studio (Windows)</li>
</ul>
</li>
</ul>
<p>There are different kinds of tests:</p>
<ul class="simple">
<li>Python test suite: the most common check</li>
<li>Docs: check that the documentation can be build and doesn't contain warnings</li>
<li>Refleaks: check for reference leaks and memory leaks, run the Python test
suite with the <tt class="docutils literal"><span class="pre">--huntrleaks</span></tt> option</li>
<li>DMG: Build the macOS installer with the
<tt class="docutils literal"><span class="pre">Mac/BuildScript/build-installer.py</span></tt> script</li>
</ul>
<p>Python is tested in different configurations:</p>
<ul class="simple">
<li>Debug: <tt class="docutils literal">./configure <span class="pre">--with-pydebug</span></tt>, the most common configuration</li>
<li>Non-debug: release mode, with compiler optimizations</li>
<li>PGO: Profiled Guided Optimization, <tt class="docutils literal">./configure <span class="pre">--enable-optimizations</span></tt></li>
<li>Installed: <tt class="docutils literal">./configure <span class="pre">--prefix=XXX</span> && make install</tt></li>
<li>Shared library (libpython): <tt class="docutils literal">./configure <span class="pre">--enable-shared</span></tt></li>
</ul>
<p>Currently, 4 branches are tested:</p>
<ul class="simple">
<li><tt class="docutils literal">master</tt>: called "3.x" on buildbots</li>
<li><tt class="docutils literal">3.6</tt></li>
<li><tt class="docutils literal">3.5</tt></li>
<li><tt class="docutils literal">2.7</tt></li>
</ul>
<p>There is also <tt class="docutils literal">custom</tt>, a special branch used by core developers for testing
patches.</p>
<p>The buildbot configuration can be found in the <a class="reference external" href="https://github.com/python/buildmaster-config/">buildmaster-config project</a> (start with the
<tt class="docutils literal">master/master.cfg</tt> file).</p>
<p>Note: Thanks to the migration to GitHub, Pull Requests are now tested on Linux,
Windows and macOS by Travis CI and AppVeyor. It's the first time in the CPython
development history that we have automated pre-commit tests!</p>
</div>
<div class="section" id="orange-is-the-new-color">
<h2>Orange Is The New Color</h2>
<p>A buildbot now becomes orange when tests contain warnings.</p>
<p>My first change was to modify the buildbot configuration to extract warnings
from the raw test output to create a new "warnings" report, to more easily
detect warnings and tests failing randomly (test fail then pass when re-run).</p>
<p>Example of orange build, x86-64 El Capitain 3.x:</p>
<img alt="Buildbot: orange build" src="https://vstinner.github.io/images/buildbot_orange.png" />
<p>Extract of the current <tt class="docutils literal">master/custom/steps.py</tt>:</p>
<pre class="literal-block">
class Test(BaseTest):
# Regular expression used to catch warnings, errors and bugs
warningPattern = (
# regrtest saved_test_environment warning:
# Warning -- files was modified by test_distutils
# test.support @reap_threads:
# Warning -- threading_cleanup() failed to cleanup ...
r"Warning -- ",
# Py_FatalError() call
r"Fatal Python error:",
# PyErr_WriteUnraisable() exception: usually, error in
# garbage collector or destructor
r"Exception ignored in:",
# faulthandler_exc_handler(): Windows exception handler installed with
# AddVectoredExceptionHandler() by faulthandler.enable()
r"Windows fatal exception:",
# Resource warning: unclosed file, socket, etc.
# NOTE: match the "ResourceWarning" anywhere, not only at the start
r"ResourceWarning",
# regrtest: At least one test failed. Log a warning even if the test
# passed on the second try, to notify that a test is unstable.
r'Re-running failed tests in verbose mode',
# Re-running test 'test_multiprocessing_fork' in verbose mode
r'Re-running test .* in verbose mode',
# Thread last resort exception handler in t_bootstrap()
r'Unhandled exception in thread started by ',
# test_os leaked [6, 6, 6] memory blocks, sum=18,
r'test_[^ ]+ leaked ',
)
# Use ".*" prefix to search the regex anywhere since stdout is mixed
# with stderr, so warnings are not always written at the start
# of a line. The log consumer calls warningPattern.match(line)
warningPattern = r".*(?:%s)" % "|".join(warningPattern)
warningPattern = re.compile(warningPattern)
# if tests have warnings, mark the overall build as WARNINGS (orange)
warnOnWarnings = True
</pre>
</div>
<div class="section" id="new-buildbot-status-mailing-list">
<h2>New buildbot-status Mailing List</h2>
<p>To check buildbots, previously I had to analyze manually the huge "waterfall"
view of four Python branches: 2.7, 3.5, 3.6 and master ("3.x").</p>
<ul class="simple">
<li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable">Python master ("3.x")</a></li>
<li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=3.6.stable&category=3.6.unstable">Python 3.6</a></li>
<li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=3.5.stable&category=3.5.unstable">Python 3.5</a></li>
<li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=2.7.stable&category=2.7.unstable">Python 2.7</a></li>
</ul>
<p>Example of typical buildbot waterfall:</p>
<a class="reference external image-reference" href="http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable"><img alt="Buildbot waterfall" src="https://vstinner.github.io/images/buildbot_waterfall.png" /></a>
<p>The screenshot is obviously truncated since the webpage is giant: I have to
scroll in all directions... It's not convenient to check the status of all
builds, detect random failures, etc.</p>
<p>We also have an IRC bot reporting buildbot failures: when a green (success) or
orange (warning) buildbot becomes red (failure). I wanted to have the same
thing, but by email. Technically, it's trivial to enable email notification,
but I never did it because buildbots were simply too unstable: most failures
were not related to the newly tested changes.</p>
<p>But I decided to fix <em>all</em> buildbots issues, so I enabled email notification
(<a class="reference external" href="https://bugs.python.org/issue30325">bpo-30325</a>). Since May 2017,
buildbots are now sending notifications to a new <a class="reference external" href="https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/">buildbot-status mailing list</a>.</p>
<p>I use the mailing list to check if the failure is known or not: I try to answer
to all failure notification emails. If the failure is known, I copy the link to
the issue. Otherwise, I create a new issue and then copy the link to the new
issue.</p>
</div>
<div class="section" id="hardware-issues">
<h2>Hardware issues</h2>
<p>Unit tests versus real life :-) (or "software versus hardware")</p>
<div class="section" id="the-vacuum-cleaner">
<h3>The vacuum cleaner</h3>
<p>Fixing buildbot issues can be boring sometimes, so let's start with a funny
bug. At June 25, Nick Coghlan wrote to the <a class="reference external" href="https://mail.python.org/mailman/listinfo/python-buildbots">python-buildbots</a> mailing list:</p>
<blockquote>
It looks like the FreeBSD buildbots had an outage a little while ago,
and the FreeBSD 10 one may need a nudge to get back online (the
FreeBSD Current one looks like it came back automatically).</blockquote>
<p>The reason is unexpected :-) <a class="reference external" href="https://mail.python.org/pipermail/python-buildbots/2017-June/000122.html">Kubilay Kocak, owner of the buildbot, answered</a>:</p>
<blockquote>
Vacuum cleaner tripped RCD pulling too much current from the same circuit
as heater was running on. Buildbot worker host on same circuit.</blockquote>
</div>
<div class="section" id="the-memory-stick">
<h3>The memory stick</h3>
<p>I opened at least 50 issues to report random buildbot failures. In the middle
of these issues, you can find <a class="reference external" href="http://bugs.python.org/issue30371">bpo-30371</a>:</p>
<pre class="literal-block">
http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/436/steps/test/logs/stdio
======================================================================
FAIL: test_long_lines (test.test_email.test_email.TestFeedParsers)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_email\test_email.py", line 3526, in test_long_lines
self.assertEqual(m.get_payload(), 'x'*M*N)
AssertionError: 'xxxx[17103482 chars]xxxxxzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[2896464 chars]xxxx' != 'xxxx[17103482 chars]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[2896464 chars]xxxx'
Notice the "z" in "...xxxxxz...".
</pre>
<p>and:</p>
<pre class="literal-block">
New fail, same buildbot:
======================================================================
FAIL: test_long_lines (test.test_email.test_email.TestFeedParsers)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_email\test_email.py", line 3534, in test_long_lines
self.assertEqual(m.items(), [('a', ''), ('b', 'x'*M*N)])
AssertionError: Lists differ: [('a'[1845894 chars]xxxxxzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154072 chars]xx')] != [('a'[1845894 chars]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154072 chars]xx')]
First differing element 1:
('b',[1845882 chars]xxxxxzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154071 chars]xxx')
('b',[1845882 chars]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154071 chars]xxx')
[('a', ''),
('b',
Don't click on http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/439/steps/test/logs/stdio
: the log contains lines of 2 MB which make my Firefox super slow :-)
</pre>
<p>Jeremy Kloth, owner the buildbot, answered:</p>
<blockquote>
Watch this space, but I'm pretty sure that it is (was) bad memory.</blockquote>
<p>He fixed the issue:</p>
<blockquote>
That's the real problem, I'm not <em>sure</em> it's the memory, but it does have
the symptoms. And that is why my buildbot was down earlier, I was
attempting to determine the bad stick and replace it.</blockquote>
</div>
</div>
<div class="section" id="warnings">
<h2>Warnings</h2>
<p>To fix test warnings, I enhanced the test suite to report more information when
a warning is emitted and to ease detection of failures.</p>
<p>A major change is the new <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt> option I added to regrtest
(bpo-30764): make tests fail if the "environment" is changed. This option is
now used on buildbots, Travis CI and AppVeyor, but only for the <em>master</em> branch
yet.</p>
<p>Other changes:</p>
<ul class="simple">
<li>The @reap_threads decorator and the threading_cleanup() function of
test.support now log a warning if they fail to clenaup threads. The log may
help to debug such other warning seen on the AMD64 FreeBSD CURRENT Non-Debug
3.x buildbot: "Warning -- threading._dangling was modified by test_logging".</li>
<li>threading_cleanup() failure marks test as ENV_CHANGED. If threading_cleanup()
fails to cleanup threads, set a a new support.environment_altered flag to
true, flag uses by save_env which is used by regrtest to check if a test
altered the environment. At the end, the test file fails with ENV_CHANGED
instead of SUCCESS, to report that it altered the environment.</li>
<li>regrtest: always show before/after values of modified environment.</li>
</ul>
<p>I backported all these changes to the 2.7, 3.5 and 3.6 branches to make sure
that warnings are fixed in all maintained branches.</p>
</div>
<div class="section" id="regrtest">
<h2>regrtest</h2>
<p>As usual, I spent time our specialized test runner, regrtest:</p>
<ul class="simple">
<li>bpo-30263: regrtest: log system load and the number of CPUs. I tried to find
a relationship between race conditions and the system load. I failed to
find any obvious correlation yet, but I still consider that the system load
is useful.</li>
<li>bpo-27103: regrtest disables -W if -R (reference hunting) is used. Workaround
for a regrtest bug.</li>
</ul>
<p>But the most complex task was to backport <em>all</em> regrtest features and
enhancements from master to regrtest of 3.6, 3.5 and then 2.7 branches.</p>
<p>In Python 3.6, I rewrote regrtest.py file to split it into smaller files a in
new Lib/test/libregrtest/ library, so it was painful to backport changes to 3.5
(bpo-30383) which still uses the single regrtest.py file.</p>
<p>In Python 2.7 (bpo-30283), it is even worse. Lib/test/regrtest.py uses the old
<tt class="docutils literal">getopt</tt> module to parse the command line instead of the new <tt class="docutils literal">argparse</tt>
used in 3.5 and newer. But I succeeded to backport all features and
enhancements from master!</p>
<p>Python 2.7, 3.5, 3.6 and master now have almost the same CLI for <tt class="docutils literal">python <span class="pre">-m</span>
test</tt>, almost the same features (except of one or two missing feature), and
should provide the same level of information on failures and warnings.</p>
<p>By the way, the new <tt class="docutils literal">test.bisect</tt> tool is now also available in all these
branches. See my <a class="reference external" href="https://vstinner.github.io/python-test-bisect.html">New Python test.bisect tool</a> article.</p>
</div>
<div class="section" id="bug-fixes">
<h2>Bug fixes</h2>
<p>As expected, the longest section here is the list of changes I wrote to fix all
buildbot failures and warnings:</p>
<ul class="simple">
<li>bpo-29972: Skip tests known to fail on AIX. See <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147748.html">[Python-Dev] Fix or drop AIX
buildbot?</a>
email.</li>
<li>bpo-29925: Skip test_uuid1_safe() on OS X Tiger</li>
<li>Fix and optimize test_asyncore.test_quick_connect(). Don't use addCleanup() in
test_quick_connect() because it keeps the Thread object alive and so
@reap_threads times out after 1 second. "./python -m test -v
test_asyncore -m test_quick_connect" now takes 185 ms, instead of 11 seconds.</li>
<li>bpo-30106: Fix test_asyncore.test_quick_connect(). test_quick_connect() runs
a thread up to 50 seconds, whereas the socket is connected in 0.2 second and
then the thread is expected to end in less than 3 second. On Linux, the
thread ends quickly because select() seems to always return quickly. On
FreeBSD, sometimes select() fails with timeout and so the thread runs much
longer than expected. Fix the thread timeout to fix a race condition in the
test.</li>
<li>bpo-30106: Fix tearDown() of test_asyncore. Call asyncore.close_all() with
ignore_all=True in the tearDown() method of the test_asyncore base test case.
It prevents keeping alive sockets in asyncore.socket_map if close()
fails with an unexpected error.</li>
<li>bpo-30108: Restore sys.path in test_site. Add setUpModule() and
tearDownModule() functions to test_site to save/restore sys.path at the
module level to prevent warning if the user site directory is created, since
site.addsitedir() modifies sys.path.</li>
<li>bpo-30107: test_io doesn't dump a core file on an expected crash anymore.
test_io has two unit tests which trigger a deadlock:
test_daemon_threads_shutdown_stdout_deadlock() and
test_daemon_threads_shutdown_stderr_deadlock(). These tests call
Py_FatalError() if the expected bug is triggered which calls abort(). Use
test.support.SuppressCrashReport to prevent the creation on a core dump, to
fix the warning:
<tt class="docutils literal">Warning <span class="pre">--</span> files was modified by test_io <span class="pre">(...)</span> After: ['python.core']</tt></li>
<li>bpo-30125: Disable faulthandler to run test_SEH() of test_ctypes to prevent
the following log with a traceback:
<tt class="docutils literal">Windows fatal exception: access violation</tt></li>
<li>bpo-30131: test_logging cleans up threads using @support.reap_threads.</li>
<li>bpo-30132: BuildExtTestCase of test_distutils now uses support.temp_cwd() in
setUp() to remove files created in the current working directory by
BuildExtTestCase unit tests.</li>
<li>bpo-30107: On macOS, test.support.SuppressCrashReport now redirects
/usr/bin/defaults command stderr into a pipe to not pollute stderr. It fixes
a test_io.test_daemon_threads_shutdown_stderr_deadlock() failure when the
CrashReporter domain doesn't exists.</li>
<li>bpo-30175: Skip client cert tests of test_imaplib. The IMAP server
cyrus.andrew.cmu.edu doesn't accept our randomly generated client x509
certificate anymore.</li>
<li>bpo-30175: test_nntplib fails randomly with EOFError in NetworkedNNTPTests.setUpClass():
catch EOFError to skip tests in that case.</li>
<li>bpo-30199: AsyncoreEchoServer of test_ssl now calls
asyncore.close_all(ignore_all=True) to ensure that asyncore.socket_map is
cleared once the test completes, even if ConnectionHandler was not correctly
unregistered. Fix the following warning:
<tt class="docutils literal">Warning <span class="pre">--</span> asyncore.socket_map was modified by test_ssl</tt>.</li>
<li>Fix test_ftplib warning if IPv6 is not available. DummyFTPServer now calls
del_channel() on bind() error to prevent the following warning in
TestIPv6Environment.setUpClass():
<tt class="docutils literal">Warning <span class="pre">--</span> asyncore.socket_map was modified by test_ftplib</tt></li>
<li>bpo-30329: Catch Windows error 10022 on shutdown(). Catch the Windows socket
WSAEINVAL error (code 10022) in imaplib and poplib on shutdown(SHUT_RDWR): An
invalid operation was attempted. This error occurs sometimes on SSL
connections.</li>
<li>bpo-30357: test_thread now uses threading_cleanup(). test_thread: setUp() now
uses support.threading_setup() and support.threading_cleanup() to wait until
threads complete to avoid random side effects on following tests.
Co-Authored-By: <strong>Grzegorz Grzywacz</strong>.</li>
<li>bpo-30339: test_multiprocessing_main_handling timeout.
test_multiprocessing_main_handling: increase the test_source timeout from 10
seconds to 60 seconds, since the test fails randomly on busy buildbots.
Sadly, this change wasn't enough to fix buildbots.</li>
<li>bpo-30387: Fix warning in test_threading. test_is_alive_after_fork() now
joins directly the thread to avoid the following warning added by bpo-30357:
"Warning -- threading_cleanup() failed to cleanup 0 threads after 2 sec
(count: 0, dangling: 21)". Use also a different exit code to catch generic
exit code 1.</li>
<li>bpo-30649: On Windows, test_os now tolerates a delta of 50 ms instead of 20
ms in test_utime_current() and test_utime_current_old(). On other platforms,
reduce the delta from 20 ms to 10 ms. PPC64 Fedora 3.x buildbot requires at
least a delta of 14 ms.</li>
<li>bpo-30595: test_queue_feeder_donot_stop_onexc() of _test_multiprocessing now
uses a timeout of 1 second on Queue.get(), instead of 0.1 second, for slow
buildbots.</li>
<li>bpo-30764, bpo-29335: test_child_terminated_in_stopped_state() of
test_subprocess now uses support.SuppressCrashReport() to prevent the
creation of a core dump on FreeBSD.</li>
<li>bpo-30280: TestBaseSelectorEventLoop of
test.test_asyncio.test_selector_events now correctly closes the event loop:
cleanup its executor to not leak threads: don't override the close() method
of the event loop, only override the_close_self_pipe() method. asyncio base
TestCase now uses threading_setup() and threading_cleanup() of test.support
to cleanup threads.</li>
<li>bpo-26568, bpo-30812: Fix test_showwarnmsg_missing(): restore the attribute
after removing it.</li>
</ul>
</div>
<div class="section" id="python-2-7-1">
<h2>Python 2.7</h2>
<p>I wanted to fix <em>all</em> buildbot issues of <em>all</em> branches including 2.7, whereas
I didn't touch much the Python 2.7 code base last months (last years???). The
first six months of 2017, I backported dozens of commits from master to 2.7!</p>
<p>For example, I added AppVeyor on 2.7: a Windows CI for GitHub!</p>
<p>On Windows we support multiple versions of Visual Studio. I use Visual Studio
2008, whereas most 2.7 Windows buildbots use Visual Studio 2010 or newer. I
fixed sysconfig.is_python_build() if Python is built with Visual Studio 2008
(VS 9.0) (bpo-30342).</p>
<p>Other Python 2.7 changes:</p>
<ul class="simple">
<li>Fix "make tags" command.</li>
<li>bpo-30764: support.SuppressCrashReport backported to 2.7 and "ported" to
Windows. Add Windows support to test.support.SuppressCrashReport: call
SetErrorMode() and CrtSetReportMode(). _testcapi: add CrtSetReportMode() and
CrtSetReportFile() functions and CRT_xxx and CRTDBG_xxx constants needed by
SuppressCrashReport.</li>
<li>bpo-30705: Fix test_regrtest.test_crashed(). Add test.support._crash_python()
which triggers a crash but uses test.support.SuppressCrashReport() to prevent
a crash report from popping up. Modify
test_child_terminated_in_stopped_state() of test_subprocess and
test_crashed() of test_regrtest to use _crash_python().</li>
</ul>
<p>I also backported many fixes wrote by other developers, including old fixes up
to 8 years old!</p>
<p>Usually, <strong>finding</strong> the proper fix takes much more time than the cherry-pick
itself which is usually straighforward (no conflict, nothing to do). I am
always impressed that Git is able to detect that a file was renamed between
Python 2 and Python 3, and applies cleanly the change!</p>
<p>Example of backports from master to 2.7:</p>
<ul class="simple">
<li>bpo-6393: Fix locale.getprerredencoding() on macOS. Python crashes on OSX
when <tt class="docutils literal">$LANG</tt> is set to some (but not all) invalid values due to an invalid
result from nl_langinfo(). Fix written in <strong>September 2009</strong> (8 years ago)!</li>
<li>bpo-15526: test_startfile changes the cwd. Try to fix test_startfile's
inability to clean up after itself in time. Patch by <strong>Jeremy Kloth</strong>.
Fix the following support.rmtree() error while trying to remove the temporary
working directory used by Python tests:
"WindowsError: [Error 32] The process cannot access the file because it is
being used by another process: ...".
Original commit written in <strong>September 2012</strong>!</li>
<li>bpo-11790: Fix sporadic failures in
test_multiprocessing.WithProcessesTestCondition.
Fixed written in <strong>April 2011</strong>. This backported commit was tricky to
identify!</li>
<li>bpo-8799, fix test_threading: Reduce timing sensitivity of condition test by
explicitly. delaying the main thread so that it doesn't race ahead of the
workers. Fix written in <strong>Nov 2013</strong>.</li>
<li>test_distutils: Use EnvironGuard on InstallTestCase, UtilTestCase, and
BuildExtTestCase to prevent the following warning:
<tt class="docutils literal">Warning <span class="pre">--</span> os.environ was modified by test_distutils</tt></li>
<li>Fix test_multprocessing: Relax test timing (bpo-29861) to avoid sporadic
failures.</li>
</ul>
</div>
<div class="section" id="buildbot-reports-to-python-dev">
<h2>Buildbot reports to python-dev</h2>
<p>I also wrote 3 reports to the Python-Dev mailing list:</p>
<ul class="simple">
<li>May 3: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-May/147838.html">Status of Python buildbots</a></li>
<li>June 8: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148271.html">Buildbot report, june 2017</a></li>
<li>June 29: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148511.html">Buildbot report (almost July)</a></li>
</ul>
</div>
New Python test.bisect tool2017-07-12T15:00:00+02:002017-07-12T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-12:/python-test-bisect.html<p>This article tells the story of the new CPython <tt class="docutils literal">test.bisect</tt> tool to
identify failing tests in the CPython test suite.</p>
<div class="section" id="modify-manually-a-test-file">
<h2>Modify manually a test file</h2>
<p>I am fixing reference leaks since many years. When the test file contains more
than 200 tests and is longer than 5,000 lines …</p></div><p>This article tells the story of the new CPython <tt class="docutils literal">test.bisect</tt> tool to
identify failing tests in the CPython test suite.</p>
<div class="section" id="modify-manually-a-test-file">
<h2>Modify manually a test file</h2>
<p>I am fixing reference leaks since many years. When the test file contains more
than 200 tests and is longer than 5,000 lines, it's just not possible to spot a
reference leak. Each time, I modified the long test file and actually <em>removes</em>
enough code until the file becomes short enough so I can read it.</p>
<p>This method <em>works</em>, but it usually took me 20 to 30 minutes, and so it was
common that I made mistakes... and usually had to restart from the start...</p>
</div>
<div class="section" id="first-failed-attempt">
<h2>First failed attempt</h2>
<p>In october 2014, while fixing <a class="reference external" href="http://bugs.python.org/issue22588#msg228905">yet another reference leak in test_capi</a>, <strong>Xavier de Gaye</strong> was
surprised that I identified quickly the leak and wanted to want how I
proceeded. I explained my method removing code, but I also asked for a tool.</p>
<p>Xavier created bpo-22607 at 2014-10-11 and wrote a patch based on an integer
range to run a subset of tests and did something special on the <tt class="docutils literal">subTest()</tt>
context manager. But <strong>Georg Brandl</strong> wasn't convinced by this approach and...
I forgot this issue.</p>
</div>
<div class="section" id="new-design-list-tests-run-a-subset">
<h2>New design: list tests, run a subset</h2>
<p>During this quarter, I had to fix dozens of reference leaks but also tests
failing with "environment changed": one test method modified "something". It
was really painful to identify each time the failing test.</p>
<p>So I created bpo-29512 at 2017-02-09 to ask again the same tool. Technically, I
just wanted to run a subset of tests.</p>
<p>While working on OpenStack, I enjoyed the <tt class="docutils literal">testr</tt> tool, a test runner able to
list tests and to run a subset of tests. <tt class="docutils literal">testr</tt> also provides a bisection
tool to identify a subset of tests enough to reproduce a bug. The subset can
contain more than a single test. Sometimes you need to run two tests
sequentially to trigger a specific bug, and it's usually long and boring to
identify manually these two tests.</p>
<p>I proposed a similar design for my bisection tool. Start by listing all tests,
and then:</p>
<ul class="simple">
<li>create a pure <em>random</em> sample of tests: subset with half the size of the
current test set</li>
<li>If tests still fail, use the subset as the new set. Otherwise, throw the
subset.</li>
<li>Loop until the subset is small enough or the process run longer than 100
iterations.</li>
</ul>
</div>
<div class="section" id="regrtest-list-cases">
<h2>regrtest --list-cases</h2>
<p>To list tests, I created bpo-30523 and wrote a patch for the unittest module.
Modifying unittest didn't work well with doctests and the command line
interface (CLI) didn't work as I wanted. I proposed to modify regrtest instead
of unittest.</p>
<p>I proposed to <strong>Louie Lu</strong> to implement my new idea. I was impressed that he
implemented it so quickly and that it worked so well! I just asked him to not
exclude doctest test cases, since these test cases were working as expected! I
quickly merged his modified patch which adds the <tt class="docutils literal"><span class="pre">--list-cases</span></tt> option to
regrtest.</p>
<p>Note: regrtest already had a <tt class="docutils literal"><span class="pre">--list-tests</span></tt> which lists test <em>files</em>, whereas
<tt class="docutils literal"><span class="pre">--list-cases</span></tt> lists test <em>methods</em> and doctests.</p>
</div>
<div class="section" id="regrtest-matchfile">
<h2>regrtest --matchfile</h2>
<p>I created bpo-30540 to add a --matchfile option to regrtest. regrtest already
had a --match option, but it was only possible to use the option once, and I
wanted to use a text files for my list of tests.</p>
<p>Again, I was surprised that it was so simple to implement the feature. By the
way, I modified regrtest --match to allow to specific the option multiple
times, to run multiple tests instead of a single one.</p>
</div>
<div class="section" id="new-test-bisect-tool">
<h2>New test.bisect tool</h2>
<p>Since I had the two key features: <tt class="docutils literal">regrtest <span class="pre">--list-cases</span></tt> and <tt class="docutils literal">regrtest
<span class="pre">--matchfile</span></tt>, it became trivial to implement the bisection tool. I wrote a
first prototype. The "prototype" worked much better than expected.</p>
<p>My first version required a text file listing test cases. I modified it to run
automatically the new <tt class="docutils literal"><span class="pre">--list-cases</span></tt> command.</p>
<p>I extended the tool to not only track reference leaks, but also "environment
changed" failures like finding a test which creates a file but doesn't remove
it.</p>
<p>I was asked to add this tool in the Python stdlib, so I added it as
<tt class="docutils literal">Lib/test/bisect.py</tt> to use it with:</p>
<pre class="literal-block">
python3 -m test.bisect ...
</pre>
<p>The test.bisect CLI is similar to the test CLI on purpose.</p>
</div>
<div class="section" id="reference-leak-example">
<h2>Reference leak example</h2>
<p>I modified <tt class="docutils literal">test_access()</tt> of test_os to add manually a reference leak:</p>
<pre class="literal-block">
$ ./python -m test -R 3:3 test_os
(...)
test_os leaked [1, 1, 1] references, sum=3
test_os leaked [1, 1, 1] memory blocks, sum=3
test_os failed in 33 sec
(...)
</pre>
<p>Just replace <tt class="docutils literal"><span class="pre">-m</span> test</tt> with <tt class="docutils literal"><span class="pre">-m</span> test.bisect</tt> in the command, and you get
the guilty method:</p>
<pre class="literal-block">
$ ./python -m test.bisect -R 3:3 test_os
Start bisection with 257 tests
Test arguments: -R 3:3 test_os
Bisection will stop when getting 1 or less tests (-n/--max-tests option), or after 100 iterations (-N/--max-iter option)
[+] Iteration 1: run 128 tests/257
+ /home/haypo/prog/python/master/python -m test --matchfile /tmp/tmpvbraed7h -R 3:3 test_os
(...)
Tests succeeded: skip this subtest, try a new subbset
[+] Iteration 2: run 128 tests/257
+ /home/haypo/prog/python/master/python -m test --matchfile /tmp/tmpcjqtzgfe -R 3:3 test_os
(...)
Tests failed: use this new subtest
[+] Iteration 3: run 64 tests/128
(...)
[+] Iteration 15: run 1 tests/2
(...)
Tests (1):
* test.test_os.FileTests.test_access
Bisection completed in 16 iterations and 0:03:10
</pre>
<p>The <tt class="docutils literal">test.bisect</tt> command found the bug I introduced:
<tt class="docutils literal">test.test_os.FileTests.test_access</tt>.</p>
<p>The command takes a few minutes, but I don't care of its performance as soon as
its fully automated! If you use the <tt class="docutils literal"><span class="pre">-o</span> file</tt> option, each time the tool is
able to reduce the size of the test set, it writes the new list of tests on
disk. So even if the tool crashs or fails to find a single failure test, it
already helps!</p>
<p>I am now very happy that <tt class="docutils literal">test.bisect</tt> works better than I expected. So I
backported it to 2.7, 3.5, 3.6 and master branches, since I want to fix <em>all</em>
buildbot failures on <em>all</em> maintained branches.</p>
</div>
<div class="section" id="environment-changed-example">
<h2>Environment changed example</h2>
<p>While running the previous example, I noticed the following warning:</p>
<pre class="literal-block">
Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
</pre>
<p>Using the new <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt> option, it is now posible to check which
test of test_os emits such warning:</p>
<pre class="literal-block">
haypo@selma$ ./python -m test.bisect --fail-env-changed -R 3:3 test_os
(...)
Tests (1):
* test.test_os.TestSendfile.test_keywords
Bisection completed in 14 iterations and 0:03:27
</pre>
<p>I never trust anything, so let's confirm the bug:</p>
<pre class="literal-block">
haypo@selma$ ./python -m test --fail-env-changed -R 3:3 test_os -m test.test_os.TestSendfile.test_keywords
Run tests sequentially
0:00:00 load avg: 0.33 [1/1] test_os
Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
beginning 6 repetitions
123456
Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
.
Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
.Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
.Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
.Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
.Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2)
.
test_os failed (env changed)
1 test altered the execution environment:
test_os
Total duration: 21 sec
Tests result: ENV CHANGED
</pre>
<p>Ok right, there is something wrong with test_keywords(). I just opened
the <a class="reference external" href="http://bugs.python.org/issue30908">bpo-30908</a>.</p>
</div>
My contributions to CPython during 2017 Q12017-07-05T12:00:00+02:002017-07-05T12:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-05:/contrib-cpython-2017q1.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q1
(january, februrary, march):</p>
<ul class="simple">
<li>Statistics</li>
<li>Optimization</li>
<li>Tricky bug</li>
<li>FASTCALL optimizations</li>
<li>Stack consumption</li>
<li>Contributions</li>
<li>os.urandom() and getrandom()</li>
<li>Migration to GitHub</li>
<li>Enhancements</li>
<li>Security</li>
<li>regrtest</li>
<li>Bugfixes</li>
</ul>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to CPython during 2016 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to
CPython during 2017 Q2 (part 1 …</a></p><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q1
(january, februrary, march):</p>
<ul class="simple">
<li>Statistics</li>
<li>Optimization</li>
<li>Tricky bug</li>
<li>FASTCALL optimizations</li>
<li>Stack consumption</li>
<li>Contributions</li>
<li>os.urandom() and getrandom()</li>
<li>Migration to GitHub</li>
<li>Enhancements</li>
<li>Security</li>
<li>regrtest</li>
<li>Bugfixes</li>
</ul>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to CPython during 2016 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to
CPython during 2017 Q2 (part 1)</a>.</p>
<div class="section" id="statistics">
<h2>Statistics</h2>
<pre class="literal-block">
# All commits
$ git log --after=2016-12-31 --before=2017-04-01 --reverse --branches='*' --author=Stinner > 2017Q1
$ grep '^commit ' 2017Q1|wc -l
121
# Exclude merges
$ git log --no-merges --after=2016-12-31 --before=2017-04-01 --reverse --branches='*' --author=Stinner|grep '^commit '|wc -l
105
# master branch (excluding merges)
$ git log --no-merges --after=2016-12-31 --before=2017-04-01 --reverse --author=Stinner origin/master|grep '^commit '|wc -l
98
# Only merges
$ git log --merges --after=2016-12-31 --before=2017-04-01 --reverse --branches='*' --author=Stinner|grep '^commit '|wc -l
16
</pre>
<p>Statistics: <strong>98</strong> commits in the master branch, 16 merge commits (done using
Mercurial before the migration to GitHub, and then converted to Git), and 7
other commits (likely backports), total: <strong>121</strong> commits.</p>
</div>
<div class="section" id="optimization">
<h2>Optimization</h2>
<p>With the work done in 2016 on FASTCALL, it became much easier to optimize code
by using the new FASTCALL API.</p>
<div class="section" id="python-slots">
<h3>Python slots</h3>
<p>Issue #29507: I worked with <strong>INADA Naoki</strong> to continue the work he did with
<strong>Yury Selivanov</strong> on optimizing method calls. We optimized "slots" implemented
in Python. Slots is an internal optimization to call "dunder" methods like
<tt class="docutils literal">__getitem__()</tt>.</p>
<p>For Python methods, get the unbound Python function and prepend arguments with
<em>self</em>, rather than calling the descriptor which creates a temporary
PyMethodObject.</p>
<p>Add a new _PyObject_FastCall_Prepend() function used to call the unbound Python
method with <em>self</em>. It avoids the creation of a temporary tuple to pass
positional arguments.</p>
<p>Avoiding a temporary PyMethodObject and a temporary tuple makes Python slots up
to <strong>1.46x faster</strong>. Microbenchmark on a <tt class="docutils literal">__getitem__()</tt> method implemented
in Python:</p>
<pre class="literal-block">
Median +- std dev: 121 ns +- 5 ns -> 82.8 ns +- 1.0 ns: 1.46x faster (-31%)
</pre>
</div>
<div class="section" id="struct-module">
<h3>struct module</h3>
<p>In the issue #29300, <strong>Serhiy Storchaka</strong> and me converted most methods in the
C <tt class="docutils literal">_struct</tt> module to Argument Clinic to make them use the FASTCALL calling
convention. Using METH_FASTCALL avoids the creation of temporary tuple to pass
positional arguments and so is faster. For example, <tt class="docutils literal"><span class="pre">struct.pack("i",</span> 1)</tt>
becomes <strong>1.56x faster</strong> (-36%):</p>
<pre class="literal-block">
$ ./python -m perf timeit \
-s 'import struct; pack=struct.pack' 'pack("i", 1)' \
--compare-to=../default-ref/python
Median +- std dev: 119 ns +- 1 ns -> 76.8 ns +- 0.4 ns: 1.56x faster (-36%)
Significant (t=295.91)
</pre>
<p>The difference is only <tt class="docutils literal">42.2 ns</tt>, but since the function only takes <tt class="docutils literal">76.8
ns</tt>, the difference is significant. The speedup can also be explained by more
efficient functions used to parse arguments. The new functions now use a cache
on the format string.</p>
</div>
<div class="section" id="deque-module">
<h3>deque module</h3>
<p>Similar change in the deque module, I modified the index(), insert() and
rotate() methods to use METH_FASTCALL. Speedup:</p>
<ul class="simple">
<li>d.index(): <strong>1.24x faster</strong></li>
<li>d.rotate(1): 1.24x faster</li>
<li>d.insert(): 1.18x faster</li>
<li>d.rotate(): 1.10x faster</li>
</ul>
</div>
</div>
<div class="section" id="tricky-bug">
<h2>Tricky bug</h2>
<div class="section" id="test-exceptions-test-unraisable">
<h3>test_exceptions.test_unraisable()</h3>
<p>The optimization on Python slots (issue #29507) caused a regression in the
test_unraisable() unit test of test_exceptions.</p>
<p>The <tt class="docutils literal">test_unraisable()</tt> method expects that <tt class="docutils literal">PyErr_WriteUnraisable(method)</tt>
fails on <tt class="docutils literal">repr(method)</tt>.</p>
<p>Before the change, <tt class="docutils literal">slot_tp_finalize()</tt> called
<tt class="docutils literal">PyErr_WriteUnraisable()</tt> with a PyMethodObject. In this case,
<tt class="docutils literal">repr(method)</tt> calls <tt class="docutils literal">repr(self)</tt> which is <tt class="docutils literal">BrokenRepr.__repr__()</tt> and
the calls raises a new exception.</p>
<p>After the change, <tt class="docutils literal">slot_tp_finalize()</tt> uses an unbound method:
<tt class="docutils literal">repr()</tt> is called on a regular <tt class="docutils literal">__del__()</tt> method which doesn't call
<tt class="docutils literal">repr(self)</tt> and so <tt class="docutils literal">repr()</tt> doesn't fail anymore.</p>
<p>The fix is to remove the BrokenRepr unit test, since
<tt class="docutils literal">PyErr_WriteUnraisable()</tt> doesn't call <tt class="docutils literal">__repr__()</tt> anymore.</p>
<p>The removed test was really implementation specific, and my optimization
"fixed" the bug or "broke" the test. It's hard to say :-)</p>
</div>
<div class="section" id="unittest-assertraises-reference-cycle">
<h3>unittest assertRaises() reference cycle</h3>
<p>At April 2015, <strong>Vjacheslav Fyodorov</strong> reported a reference cycle in the
assertRaises() method of the unittest module: bpo-23890.</p>
<p>When the context manager API of the <tt class="docutils literal">assertRaises()</tt> method is used, the
context manager returns an object which contains the exception. So the
exception is kept alive longer than usual.</p>
<p>Python 3 exceptions now store traceback objects which contain local variables.
If a function stores the current exception in a local variable and the frame of
this function is part of the traceback, we get a reference cycle:</p>
<blockquote>
exception -> traceback > frame -> variable -> exception</blockquote>
<p>I fixed the reference cycle by manually clearing local variables. Example of
change of my commit:</p>
<pre class="literal-block">
try:
return context.handle('assertRaises', args, kwargs)
finally:
# bpo-23890: manually break a reference cycle
context = None
</pre>
<p>It's not the first time that I fixed such reference cycle in the unit test
module. My previous fix was the issue #19880. Fix a reference leak in
unittest.TestCase. Explicitly break reference cycles between frames and the
<tt class="docutils literal">_Outcome</tt> instance: commit <a class="reference external" href="https://github.com/python/cpython/commit/031bd532c48cf20a9cbf438bdae75dde49e36c51">031bd532</a>.</p>
</div>
</div>
<div class="section" id="fastcall-optimizations">
<h2>FASTCALL optimizations</h2>
<p>FASTCALL is my project to avoid temporary tuple to pass positional arguments
and avoid temporary dictionary to pass keyword arguments when calling a
function. It optimizes function calls in general.</p>
<p>I continued work on FASTCALL to optimize code further and use FASTCALL in more
cases.</p>
<div class="section" id="recursion-depth">
<h3>Recursion depth</h3>
<p>In the issue #29306, I fixed the usage of Py_EnterRecursiveCall() to account
correctly the recursion depth, to fix the code responsible to prevent C stack
overflow:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">*PyCFunction_*Call*()</span></tt> functions now call <tt class="docutils literal">Py_EnterRecursiveCall()</tt>.</li>
<li><tt class="docutils literal">PyObject_Call()</tt> now calls directly <tt class="docutils literal">_PyFunction_FastCallDict()</tt> and
<tt class="docutils literal">PyCFunction_Call()</tt> to avoid calling <tt class="docutils literal">Py_EnterRecursiveCall()</tt> twice per
function call</li>
</ul>
</div>
<div class="section" id="support-position-arguments">
<h3>Support position arguments</h3>
<p>The issue #29286 enhanced Argument Clinic to use FASTCALL for functions which
only accept positional arguments:</p>
<ul class="simple">
<li>Rename _PyArg_ParseStack to _PyArg_ParseStackAndKeywords</li>
<li>Add _PyArg_ParseStack() helper function</li>
<li>Add _PyArg_NoStackKeywords() helper function.</li>
<li>Add _PyArg_UnpackStack() function helper</li>
<li>Argument Clinic: Use METH_FASTCALL calling convention instead of METH_VARARGS
to parse position arguments and to parse "boring" position arguments.</li>
</ul>
</div>
<div class="section" id="functions-converted-to-fastcall">
<h3>Functions converted to FASTCALL</h3>
<ul class="simple">
<li>_hashopenssl module</li>
<li>collections.OrderedDict methods (some of them, not all)</li>
<li>__build_class__(), getattr(), next() and sorted() builtin functions</li>
<li>type_prepare() C function, used in type constructor</li>
<li>dict.get() and dict.setdefault() now use Argument Clinic. The signature of
docstrings is also enhanced. For example, <tt class="docutils literal"><span class="pre">get(...)</span></tt> becomes
<tt class="docutils literal">get(self, key, default=None, /)</tt>. Add also a note explaining why
dict_update() doesn't use METH_FASTCALL.</li>
</ul>
</div>
<div class="section" id="optimizations">
<h3>Optimizations</h3>
<ul class="simple">
<li>Issue #28839: Optimize function_call(), now simply calls
_PyFunction_FastCallDict() which is more efficient (fast paths for the common
case, optimized code object and no keyword argument).</li>
<li>Issue #28839: Optimize _PyFunction_FastCallDict() when kwargs is an empty
dictionary, avoid the creation of an useless empty tuple.</li>
<li>Issue #29259: Write fast path in _PyCFunction_FastCallKeywords() for
METH_FASTCALL, avoid the creation of a temporary dictionary for keyword
arguments.</li>
<li>Issue #29259, #29263. methoddescr_call() creates a PyCFunction object, call
it and the destroy it. Add a new _PyMethodDef_RawFastCallDict() method to
avoid the temporary PyCFunction object.</li>
<li>PyCFunction_Call() now calls _PyCFunction_FastCallDict()</li>
<li>bpo-29735: Optimize partial_call(): avoid tuple. Add _PyObject_HasFastCall().
Fix also a performance regression in partial_call() if the callable doesn't
support FASTCALL.</li>
</ul>
</div>
<div class="section" id="bugfixes">
<h3>Bugfixes</h3>
<ul class="simple">
<li>Issue #29286: _PyStack_UnpackDict() now returns -1 on error. Change
_PyStack_UnpackDict() prototype to be able to notify of failure when args is
NULL.</li>
<li>Fix PyCFunction_Call() performance issue. Issue #29259, #29465:
PyCFunction_Call() doesn't create anymore a redundant tuple to pass
positional arguments for METH_VARARGS. Add a new cfunction_call()
subfunction.</li>
</ul>
</div>
<div class="section" id="objects-call-c-file">
<h3>Objects/call.c file</h3>
<p>The issue #29465 moved all C functions "calling functions" to a new
Objects/call.c file. Moving all functions at the same place should help to keep
the code consistent. It might also help the compiler to inline code more
easily, or maybe help to cache more machine code in CPU instruction cache.</p>
<p>This change was made during the GitHub migration. Since the change is big
(modify many <tt class="docutils literal">.c</tt> files), I got many conflicts and it was annoying to rebase
it. I am now happy to get this <tt class="docutils literal">call.c</tt> file, it already helped me :-)</p>
<p>Having <tt class="docutils literal">call.c</tt> also helps to keep helper functions need their callers, and
prevent to expose them in the C API, even if they are exposed as private
functions.</p>
</div>
<div class="section" id="don-t-optimize-keywords">
<h3>Don't optimize keywords</h3>
<ul class="simple">
<li>Document that _PyFunction_FastCallDict() must copy kwargs. Issue #29318:
Caller and callee functions must not share the dictionary: kwargs must be
copied.</li>
<li>Document why functools.partial() must copy kwargs. Add a comment to prevent
further attempts to avoid a copy for optimization.</li>
</ul>
</div>
</div>
<div class="section" id="stack-consumption">
<h2>Stack consumption</h2>
<p>A FASTCALL micro-optimization was blocked by Serhiy Storchaka because it
increased the C stack consumption. In the past, I never analyzed the C stack
consumption. Since I wanted to get this micro-optimization merged, I tried to
reduce the consumption.</p>
<p>At the beginning, I wrote a function to <strong>measure</strong> the C stack consumption in
a reliable way. It took me a few iterations.</p>
<p>Table showing the C stack consumption in bytes, and the difference compared to
Python 3.5 (last release before I started working on FASTCALL):</p>
<table border="1" class="docutils">
<colgroup>
<col width="27%" />
<col width="22%" />
<col width="7%" />
<col width="22%" />
<col width="22%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Function</th>
<th class="head">2.7</th>
<th class="head">3.5</th>
<th class="head">3.6</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>test_python_call</td>
<td>1,360 (<strong>+352</strong>)</td>
<td>1,008</td>
<td>1,120 (<strong>+112</strong>)</td>
<td>960 (<strong>-48</strong>)</td>
</tr>
<tr><td>test_python_getitem</td>
<td>1,408 (<strong>+288</strong>)</td>
<td>1,120</td>
<td>1,168 (<strong>+48</strong>)</td>
<td>880 (<strong>-240</strong>)</td>
</tr>
<tr><td>test_python_iterator</td>
<td>1,424 (<strong>+192</strong>)</td>
<td>1,232</td>
<td>1,200 (<strong>-32</strong>)</td>
<td>1,024 (<strong>-208</strong>)</td>
</tr>
<tr><td>Total</td>
<td>4,192 (<strong>+832</strong>)</td>
<td>3,360</td>
<td>3,488 (<strong>+128</strong>)</td>
<td>2,864 (<strong>-496</strong>)</td>
</tr>
</tbody>
</table>
<p>Table showing the number of function calls before a stack overflow,
and the difference compared to Python 3.5:</p>
<table border="1" class="docutils">
<colgroup>
<col width="24%" />
<col width="23%" />
<col width="7%" />
<col width="23%" />
<col width="23%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Function</th>
<th class="head">2.7</th>
<th class="head">3.5</th>
<th class="head">3.6</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>test_python_call</td>
<td>6,161 (<strong>-2,153</strong>)</td>
<td>8,314</td>
<td>7,482 (<strong>-832</strong>)</td>
<td>8,729 (<strong>+415</strong>)</td>
</tr>
<tr><td>test_python_getitem</td>
<td>5,951 (<strong>-1,531</strong>)</td>
<td>7,482</td>
<td>7,174 (<strong>-308</strong>)</td>
<td>9,522 (<strong>+2,040</strong>)</td>
</tr>
<tr><td>test_python_iterator</td>
<td>5,885 (<strong>-916</strong>)</td>
<td>6,801</td>
<td>6,983 (<strong>+182</strong>)</td>
<td>8,184 (<strong>+1,383</strong>)</td>
</tr>
<tr><td>Total</td>
<td>17,997 (<strong>-4600</strong>)</td>
<td>22,597</td>
<td>21,639 (<strong>-958</strong>)</td>
<td>26,435 (<strong>+3,838</strong>)</td>
</tr>
</tbody>
</table>
<p>Python 3.7 is the best of 2.7, 3.5, 3.6 and 3.7: lowest stack consumption and
maximum number of calls (before a stack overflow) ;-)</p>
<p>Changes:</p>
<ul class="simple">
<li>call_method() now uses _PyObject_FastCall(). Issue #29233: Replace the
inefficient _PyObject_VaCallFunctionObjArgs() with _PyObject_FastCall() in
call_method() and call_maybe().</li>
<li>Issue #29227: Inline call_function() into _PyEval_EvalFrameDefault() using
Py_LOCAL_INLINE to reduce the stack consumption.</li>
<li>Issue #29234: Inlining _PyStack_AsTuple() into callers increases their stack
consumption, Disable inlining to optimize the stack consumption. Add
_Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang.</li>
</ul>
</div>
<div class="section" id="contributions">
<h2>Contributions</h2>
<ul class="simple">
<li>Issue #28961: Fix unittest.mock._Call helper: don't ignore the name parameter
anymore. Patch written by <strong>Jiajun Huang</strong>.</li>
<li>Prohibit implicit C function declarations. Issue #27659: use
-Werror=implicit-function-declaration when possible (GCC and Clang, but it
depends on the compiler version). Patch written by <strong>Chi Hsuan Yen</strong>.</li>
</ul>
</div>
<div class="section" id="os-urandom-and-getrandom">
<h2>os.urandom() and getrandom()</h2>
<p>As usual, I had fun with os.urandom() in this quarter (see my previous article
on urandom: <a class="reference external" href="https://vstinner.github.io/pep-524-os-urandom-blocking.html">PEP 524: os.urandom() now blocks on Linux in Python 3.6</a>).</p>
<p>The glibc developers succeeded to implement a function getrandom() in glibc
2.25 (February 2017) to expose the "new" Linux getrandom() syscall which was
introduced in Linux 3.17 (August 2014). Read the LWN article: <a class="reference external" href="https://lwn.net/Articles/711013/">The long road to
getrandom() in glibc</a>.</p>
<p>I created the issue #29157 because my os.urandom() implementation wasn't ready
for the addition of a getrandom() function on Linux. My implementation using
the getrandom() function didn't handle the ENOSYS error (syscall not
supported), when Python is compiled on a recent kernel and glibc, but run on an
older kernel and glibc.</p>
<p>I rewrote the code to prefer getrandom() over getentropy():</p>
<ul class="simple">
<li>dev_urandom() now calls py_getentropy(). Prepare the fallback to support
getentropy() failure and falls back on reading from /dev/urandom.</li>
<li>Simplify dev_urandom(). pyurandom() is now responsible to call getentropy()
or getrandom(). Enhance also dev_urandom() and pyurandom() documentation.</li>
<li>getrandom() is now preferred over getentropy(). The glibc 2.24 now implements
getentropy() on Linux using the getrandom() syscall. But getentropy()
doesn't support non-blocking mode. Since getrandom() is tried first, it's not
more needed to explicitly exclude getentropy() on Solaris. Replace:
"if defined(HAVE_GETENTROPY) && !defined(sun)"
with "if defined(HAVE_GETENTROPY)"</li>
<li>Enhance py_getrandom() documentation. py_getentropy() now supports ENOSYS,
EPERM & EINTR</li>
</ul>
<p>IMHO the main enhancement was the documentation (comments) of the code. The
main function pyrandom() now has this long comment:</p>
<blockquote>
<p>Read random bytes:</p>
<ul class="simple">
<li>Return 0 on success</li>
<li>Raise an exception (if raise is non-zero) and return -1 on error</li>
</ul>
<p>Used sources of entropy ordered by preference, preferred source first:</p>
<ul class="simple">
<li>CryptGenRandom() on Windows</li>
<li>getrandom() function (ex: Linux and Solaris): call py_getrandom()</li>
<li>getentropy() function (ex: OpenBSD): call py_getentropy()</li>
<li>/dev/urandom device</li>
</ul>
<p>Read from the /dev/urandom device if getrandom() or getentropy() function
is not available or does not work.</p>
<p>Prefer getrandom() over getentropy() because getrandom() supports blocking
and non-blocking mode: see the PEP 524. Python requires non-blocking RNG at
startup to initialize its hash secret, but os.urandom() must block until the
system urandom is initialized (at least on Linux 3.17 and newer).</p>
<p>Prefer getrandom() and getentropy() over reading directly /dev/urandom
because these functions don't need file descriptors and so avoid ENFILE or
EMFILE errors (too many open files): see the issue #18756.</p>
<p>Only the getrandom() function supports non-blocking mode.</p>
<p>Only use RNG running in the kernel. They are more secure because it is
harder to get the internal state of a RNG running in the kernel land than a
RNG running in the user land. The kernel has a direct access to the hardware
and has access to hardware RNG, they are used as entropy sources.</p>
<p>Note: the OpenSSL RAND_pseudo_bytes() function does not automatically reseed
its RNG on fork(), two child processes (with the same pid) generate the same
random numbers: see issue #18747. Kernel RNGs don't have this issue,
they have access to good quality entropy sources.</p>
<p>If raise is zero:</p>
<ul class="simple">
<li>Don't raise an exception on error</li>
<li>Don't call the Python signal handler (don't call PyErr_CheckSignals()) if
a function fails with EINTR: retry directly the interrupted function</li>
<li>Don't release the GIL to call functions.</li>
</ul>
</blockquote>
</div>
<div class="section" id="migration-to-github">
<h2>Migration to GitHub</h2>
<p>In February 2017, the Mercurial repository was converted to Git and the
development of CPython moved to GitHub at <a class="reference external" href="https://github.com/python/cpython/">https://github.com/python/cpython/</a>. I
helped to polish the migration in early days:</p>
<ul class="simple">
<li>Rename README to README.rst and enhance formatting</li>
<li>bpo-29527: Don't treat warnings as error in Travis docs job</li>
<li>Travis CI: run rstlint.py in the docs job. Currently,
<a class="reference external" href="http://buildbot.python.org/all/buildslaves/ware-docs">http://buildbot.python.org/all/buildslaves/ware-docs</a> buildbot is only run as
post-commit. For example, bpo-29521 (PR#41) introduced two warnings,
unnotified by the Travis CI docs job. Modify the docs job to run
toosl/rstlint.py. Fix also the two minor warnings which causes the buildbot
slave to fail. Doc/Makefile: set PYTHON to python3.</li>
<li>Add Travis CI and Codecov badges to README.</li>
<li>Exclude myself from mention-bot. I made changes in almost all CPython files
last 5 years, so mention-bot asks me to review basically all pull requests. I
simply don't have the bandwidth to review everything, sorry! I prefer to
select myself which PR I want to follow.</li>
<li>bpo-27425: Add .gitattributes, fix Windows tests. Mark binary files as binay
in .gitattributes to not translate newline characters in Git repositories on
Windows.</li>
</ul>
</div>
<div class="section" id="enhancements">
<h2>Enhancements</h2>
<ul class="simple">
<li>Issue #29259: python-gdb.py now also looks for PyCFunction in the current
frame, not only in the older frame. python-gdb.py now also supports
method-wrapper (wrapperobject) objects (Issue #29367).</li>
<li>Issue #26273: Document the new TCP_USER_TIMEOUT and TCP_CONGESTION constants</li>
<li>bpo-29919: Remove unused imports found by pyflakes. Make also minor PEP8
coding style fixes on modified imports.</li>
<li>bpo-29887: Test normalization now fails if download fails; fix also a
ResourceWarning.</li>
</ul>
</div>
<div class="section" id="security">
<h2>Security</h2>
<ul class="simple">
<li>Backport for Python 3.4. Issues #27850 and #27766: Remove 3DES from ssl
default cipher list and add ChaCha20 Poly1305. See the <a class="reference external" href="http://python-security.readthedocs.io/vuln/cve-2016-2183_sweet32_attack_des_3des.html">CVE-2016-2183:
Sweet32 attack (DES, 3DES)</a>
vulnerability.</li>
</ul>
</div>
<div class="section" id="regrtest">
<h2>regrtest</h2>
<p>regrtest is the runner of the Python test suite. Changes:</p>
<ul class="simple">
<li>regrtest: don't fail immediately if a child does crash. Issue #29362: Catch a
crash of a worker process as a normal failure and continue to run next tests.
It allows to get the usual test summary: single line result (OK/FAIL), total
duration, etc.</li>
<li>Fix regrtest -j0 -R output: write also dots into stderr, instead of stdout.</li>
</ul>
</div>
<div class="section" id="bugfixes-1">
<h2>Bugfixes</h2>
<ul class="simple">
<li>Issue #29140: Fix hash(datetime.time). Fix time_hash() function: replace
DATE_xxx() macros with TIME_xxx() macros. Before, the hash function used a
wrong value for microseconds if fold is set (equal to 1).</li>
<li>Issue #29174, #26741: Fix subprocess.Popen.__del__() on Python shutdown.
subprocess.Popen.__del__() now keeps a strong reference to warnings.warn()
function. The change allows to log the warning late at Python finalization.
Before the warning was ignored or logged an error instead of the warning.</li>
<li>Issue #25591: Fix test_imaplib if the module ssl is missing.</li>
<li>Fix script_helper.run_python_until_end(): copy the <tt class="docutils literal">SYSTEMROOT</tt> environment
variable. Windows requires at least the SYSTEMROOT environment variable to
start Python. If run_python_until_end() doesn't copy SYSTEMROOT, the
function always fail on Windows.</li>
<li>Fix datetime.fromtimestamp(): check bounds. Issue #29100: Fix
datetime.fromtimestamp() regression introduced in Python 3.6.0: check minimum
and maximum years.</li>
<li>Fix test_datetime on system with 32-bit time_t. Issue #29100: Catch
OverflowError in the new test_timestamp_limits() test.</li>
<li>Fix test_datetime on Windows. Issue #29100: On Windows,
datetime.datetime.fromtimestamp(min_ts) fails with an OSError in
test_timestamp_limits().</li>
<li>bpo-29176: Fix the name of the _curses.window class. Set name to
<tt class="docutils literal">_curses.window</tt> instead of <tt class="docutils literal">_curses.curses window</tt> with a space!?</li>
<li>bpo-29619: os.stat() and os.DirEntry.inodeo() now convert inode (st_ino)
using unsigned integers to support very large inodes (larger than 2^31).</li>
</ul>
</div>
speed.python.org results: March 20172017-03-29T00:40:00+02:002017-03-29T00:40:00+02:00Victor Stinnertag:vstinner.github.io,2017-03-29:/speed-python-org-march-2017.html<p>In feburary 2017, CPython from Bitbucket with Mercurial moved to GitHub with
Git: read <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-February/147381.html">[Python-Dev] CPython is now on GitHub</a> by
Brett Cannon.</p>
<p>In 2016, I worked on speed.python.org to automate running benchmarks and make
benchmarks more stable. At the end, I had a single command to:</p>
<ul class="simple">
<li>tune …</li></ul><p>In feburary 2017, CPython from Bitbucket with Mercurial moved to GitHub with
Git: read <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-February/147381.html">[Python-Dev] CPython is now on GitHub</a> by
Brett Cannon.</p>
<p>In 2016, I worked on speed.python.org to automate running benchmarks and make
benchmarks more stable. At the end, I had a single command to:</p>
<ul class="simple">
<li>tune the system for benchmarks</li>
<li>compile CPython using LTO+PGO</li>
<li>install CPython</li>
<li>install performance</li>
<li>run performance</li>
<li>upload results</li>
</ul>
<p>But my tools were written for Mercurial and speed.python.org uses Mercurial
revisions as keys for changes. Since the CPython repository was converted to
Git, I have to remove all old results and run again old benchmarks. But before
removing everyhing, I took screenshots of the most interesting pages. It would
prefer to keep a copy of all data, but it would require to write new tools
and I am not motivated to do that.</p>
<div class="section" id="python-3-7-compared-to-python-2-7">
<h2>Python 3.7 compared to Python 2.7</h2>
<p>Benchmarks where Python 3.7 is <strong>faster</strong> than Python 2.7:</p>
<img alt="python37_faster_py27" src="https://vstinner.github.io/images/speed2017/python37_faster_py27.png" />
<p>Benchmarks where Python 3.7 is <strong>slower</strong> than Python 2.7:</p>
<img alt="python37_slower_py27" src="https://vstinner.github.io/images/speed2017/python37_slower_py27.png" />
</div>
<div class="section" id="significant-optimizations">
<h2>Significant optimizations</h2>
<p>CPython became regulary faster in 2016 on the following benchmarks.</p>
<p>call_method, the main optimized was <a class="reference external" href="https://bugs.python.org/issue26110">Speedup method calls 1.2x</a>:</p>
<img alt="call_method" src="https://vstinner.github.io/images/speed2017/call_method.png" />
<p>float:</p>
<img alt="float" src="https://vstinner.github.io/images/speed2017/float.png" />
<p>hexiom:</p>
<img alt="hexiom" src="https://vstinner.github.io/images/speed2017/hexiom.png" />
<p>nqueens:</p>
<img alt="nqueens" src="https://vstinner.github.io/images/speed2017/nqueens.png" />
<p>pickle_list, something happened near September 2016:</p>
<img alt="pickle_list" src="https://vstinner.github.io/images/speed2017/pickle_list.png" />
<p>richards:</p>
<img alt="richards" src="https://vstinner.github.io/images/speed2017/richards.png" />
<p>scimark_lu, I like the latest dot!</p>
<img alt="scimark_lu" src="https://vstinner.github.io/images/speed2017/scimark_lu.png" />
<p>scimark_sor:</p>
<img alt="scimark_sor" src="https://vstinner.github.io/images/speed2017/scimark_sor.png" />
<p>sympy_sum:</p>
<img alt="sympy_sum" src="https://vstinner.github.io/images/speed2017/sympy_sum.png" />
<p>telco is one of the most impressive, it became regulary faster:</p>
<img alt="telco" src="https://vstinner.github.io/images/speed2017/telco.png" />
<p>unpickle_list, something happened between March and May 2016:</p>
<img alt="unpickle_list" src="https://vstinner.github.io/images/speed2017/unpickle_list.png" />
</div>
<div class="section" id="the-enum-change">
<h2>The enum change</h2>
<p>One change related to the <tt class="docutils literal">enum</tt> module had significant impact on the two
following benchmarks.</p>
<p>python_startup:</p>
<img alt="python_startup" src="https://vstinner.github.io/images/speed2017/python_startup.png" />
<p>See "Python startup performance regression" section of <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to
CPython during 2016 Q4</a> for the
explanation on changes around September 2016.</p>
<p>regex_compile became 1.2x slower (312 ms => 376 ms: +20%) because constants
of the <tt class="docutils literal">re</tt> module became <tt class="docutils literal">enum</tt> objects: see <a class="reference external" href="http://bugs.python.org/issue28082">convert re flags to (much
friendlier) IntFlag constants (issue #28082)</a>.</p>
<img alt="regex_compile" src="https://vstinner.github.io/images/speed2017/regex_compile.png" />
</div>
<div class="section" id="benchmarks-became-stable">
<h2>Benchmarks became stable</h2>
<p>The following benchmarks are microbenchmarks which are impacted by many
external factors. It's hard to get stable results. I'm happy to see that
results are stable. I would say very stable compared to results when I started
to work on the project!</p>
<p>call_simple:</p>
<img alt="call_simple" src="https://vstinner.github.io/images/speed2017/call_simple.png" />
<p>spectral_norm:</p>
<img alt="spectral_norm" src="https://vstinner.github.io/images/speed2017/spectral_norm.png" />
</div>
<div class="section" id="straight-line">
<h2>Straight line</h2>
<p>It seems like no optimization had a significant impact on the following
benchmarks. You can also see that benchmarks became stable, so it's easier to
detect performance regression or significant optimization.</p>
<p>dulwich_log:</p>
<img alt="dulwich_log" src="https://vstinner.github.io/images/speed2017/dulwich_log.png" />
<p>pidigits:</p>
<img alt="pidigits" src="https://vstinner.github.io/images/speed2017/pidigits.png" />
<p>sqlite_synth:</p>
<img alt="sqlite_synth" src="https://vstinner.github.io/images/speed2017/sqlite_synth.png" />
<p>Apart something around April 2016, tornado_http result is stable:</p>
<img alt="tornado_http" src="https://vstinner.github.io/images/speed2017/tornado_http.png" />
</div>
<div class="section" id="unstable-benchmarks">
<h2>Unstable benchmarks</h2>
<p>After months of efforts to make everything stable, some benchmarks are still
unstable, even if temporary spikes are lower than before. See <a class="reference external" href="https://vstinner.github.io/analysis-python-performance-issue.html">Analysis of a
Python performance issue</a>
to see the size of previous tempoary performance spikes.</p>
<p>regex_v8:</p>
<img alt="regex_v8" src="https://vstinner.github.io/images/speed2017/regex_v8.png" />
<p>scimark_sparse_mat_mult:</p>
<img alt="scimark_sparse_mat_mult" src="https://vstinner.github.io/images/speed2017/scimark_sparse_mat_mult.png" />
<p>unpickle_pure_python:</p>
<img alt="unpickle_pure_python" src="https://vstinner.github.io/images/speed2017/unpickle_pure_python.png" />
</div>
<div class="section" id="boring-results">
<h2>Boring results</h2>
<p>There is nothing interesting to say on the following benchmark results.</p>
<p>2to3:</p>
<img alt="2to3" src="https://vstinner.github.io/images/speed2017/2to3.png" />
<p>crypto_pyaes:</p>
<img alt="crypto_pyaes" src="https://vstinner.github.io/images/speed2017/crypto_pyaes.png" />
<p>deltablue:</p>
<img alt="deltablue" src="https://vstinner.github.io/images/speed2017/deltablue.png" />
<p>logging_silent:</p>
<img alt="logging_silent" src="https://vstinner.github.io/images/speed2017/logging_silent.png" />
<p>mako:</p>
<img alt="mako" src="https://vstinner.github.io/images/speed2017/mako.png" />
<p>xml_etree_process:</p>
<img alt="xml_etree_process" src="https://vstinner.github.io/images/speed2017/xml_etree_process.png" />
<p>xml_etre_iterparse:</p>
<img alt="xml_etre_iterparse" src="https://vstinner.github.io/images/speed2017/xml_etre_iterparse.png" />
</div>
FASTCALL issues2017-02-25T00:00:00+01:002017-02-25T00:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-25:/fastcall-issues.html<p>Here is the raw list of the 46 CPython issues I opended between 2016-04-21 and
2017-02-10 to implement my FASTCALL optimization. Most issues created in 2016
are already part of Python 3.6.0, some are already merged into the future
Python 3.7, the few remaining issues are still …</p><p>Here is the raw list of the 46 CPython issues I opended between 2016-04-21 and
2017-02-10 to implement my FASTCALL optimization. Most issues created in 2016
are already part of Python 3.6.0, some are already merged into the future
Python 3.7, the few remaining issues are still open.</p>
<div class="section" id="fastcall-issues-1">
<h2>27 FASTCALL issues</h2>
<ul class="simple">
<li>2016-04-21: <a class="reference external" href="http://bugs.python.org/issue26814">[WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments</a></li>
<li>2016-05-26: <a class="reference external" href="http://bugs.python.org/issue27128">Add _PyObject_FastCall()</a></li>
<li>2016-08-20: <a class="reference external" href="http://bugs.python.org/issue27809">Add _PyFunction_FastCallDict(): fast call with keyword arguments as a dict</a></li>
<li>2016-08-20: <a class="reference external" href="http://bugs.python.org/issue27810">Add METH_FASTCALL: new calling convention for C functions</a></li>
<li>2016-08-22: <a class="reference external" href="http://bugs.python.org/issue27830">Add _PyObject_FastCallKeywords(): avoid the creation of a temporary dictionary for keyword arguments</a></li>
<li>2016-08-23: <a class="reference external" href="http://bugs.python.org/issue27840">functools.partial: don't copy keywoard arguments in partial_call()?</a> [<strong>REJECTED</strong>]</li>
<li>2016-08-23: <a class="reference external" href="http://bugs.python.org/issue27841">Use fast call in method_call() and slot_tp_new()</a></li>
<li>2016-08-23: <a class="reference external" href="http://bugs.python.org/issue27845">Optimize update_keyword_args() function</a></li>
<li>2016-11-22: <a class="reference external" href="http://bugs.python.org/issue28770">Update python-gdb.py for fastcalls</a></li>
<li>2016-11-30: <a class="reference external" href="http://bugs.python.org/issue28839">_PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()</a> [<strong>REJECTED</strong>]</li>
<li>2016-12-02: <a class="reference external" href="http://bugs.python.org/issue28855">Compiler warnings in _PyObject_CallArg1()</a></li>
<li>2016-12-02: <a class="reference external" href="http://bugs.python.org/issue28858">Fastcall uses more C stack</a></li>
<li>2016-12-09: <a class="reference external" href="http://bugs.python.org/issue28915">Modify PyObject_CallFunction() to use fast call internally</a></li>
<li>2017-01-10: <a class="reference external" href="http://bugs.python.org/issue29227">Reduce C stack consumption in function calls</a></li>
<li>2017-01-10: <a class="reference external" href="http://bugs.python.org/issue29233">call_method(): call _PyObject_FastCall() rather than _PyObject_VaCallFunctionObjArgs()</a></li>
<li>2017-01-11: <a class="reference external" href="http://bugs.python.org/issue29234">Disable inlining of _PyStack_AsTuple() to reduce the stack consumption</a></li>
<li>2017-01-13: <a class="reference external" href="http://bugs.python.org/issue29259">Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects</a> [<strong>REJECTED</strong>]</li>
<li>2017-01-13: <a class="reference external" href="http://bugs.python.org/issue29263">Implement LOAD_METHOD/CALL_METHOD for C functions</a></li>
<li>2017-01-18: <a class="reference external" href="http://bugs.python.org/issue29306">Check usage of Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() in new FASTCALL functions</a></li>
<li>2017-01-19: <a class="reference external" href="http://bugs.python.org/issue29318">Optimize _PyFunction_FastCallDict() for **kwargs</a> [<strong>REJECTED</strong>]</li>
<li>2017-01-24: <a class="reference external" href="http://bugs.python.org/issue29358">Add tp_fastnew and tp_fastinit to PyTypeObject, 15-20% faster object instanciation</a> [<strong>REJECTED</strong>]</li>
<li>2017-01-24: <a class="reference external" href="http://bugs.python.org/issue29360">_PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique</a></li>
<li>2017-01-25: <a class="reference external" href="http://bugs.python.org/issue29367">python-gdb: display wrapper_call()</a></li>
<li>2017-02-05: <a class="reference external" href="http://bugs.python.org/issue29451">Use _PyArg_Parser for _PyArg_ParseStack(): support positional only arguments</a></li>
<li>2017-02-06: <a class="reference external" href="http://bugs.python.org/issue29465">Modify _PyObject_FastCall() to reduce stack consumption</a></li>
<li>2017-02-09: <a class="reference external" href="http://bugs.python.org/issue29507">Use FASTCALL in call_method() to avoid temporary tuple</a></li>
<li>2017-02-10: <a class="reference external" href="http://bugs.python.org/issue29524">Move functions to call objects into a new Objects/call.c file</a></li>
</ul>
</div>
<div class="section" id="issues-converting-functions-to-fastcall">
<h2>3 issues converting functions to FASTCALL</h2>
<ul class="simple">
<li>2017-01-16: <a class="reference external" href="http://bugs.python.org/issue29286">Use METH_FASTCALL in str methods</a></li>
<li>2017-01-18: <a class="reference external" href="http://bugs.python.org/issue29312">Use FASTCALL in dict.update()</a> [<strong>REJECTED</strong>]</li>
<li>2017-02-05: <a class="reference external" href="http://bugs.python.org/issue29452">Use FASTCALL for collections.deque methods: index, insert, rotate</a></li>
</ul>
</div>
<div class="section" id="argument-clinic-issues">
<h2>6 Argument Clinic issues</h2>
<p>Converting code to Argument Clinic converts METH_VARARGS methods to
METH_FASTCALL.</p>
<ul class="simple">
<li>2017-01-16: <a class="reference external" href="http://bugs.python.org/issue29289">Convert OrderedDict methods to Argument Clinic</a></li>
<li>2017-01-17: <a class="reference external" href="http://bugs.python.org/issue29299">Argument Clinic: Fix signature of optional positional-only arguments</a></li>
<li>2017-01-17: <a class="reference external" href="http://bugs.python.org/issue29300">Modify the _struct module to use FASTCALL and Argument Clinic</a></li>
<li>2017-01-17: <a class="reference external" href="http://bugs.python.org/issue29301">decimal: Use FASTCALL and/or Argument Clinic</a></li>
<li>2017-01-18: <a class="reference external" href="http://bugs.python.org/issue29311">Argument Clinic: convert dict methods</a></li>
<li>2017-02-02: <a class="reference external" href="http://bugs.python.org/issue29419">Argument Clinic: inline PyArg_UnpackTuple and PyArg_ParseStack(AndKeyword)?</a></li>
</ul>
</div>
<div class="section" id="other-optimization-issues">
<h2>10 other optimization issues</h2>
<ul class="simple">
<li>2016-08-24: <a class="reference external" href="http://bugs.python.org/issue27848">C function calls: use Py_ssize_t rather than C int for number of arguments</a></li>
<li>2016-09-07: <a class="reference external" href="http://bugs.python.org/issue28004">Optimize bytes.join(sequence)</a> [<strong>REJECTED</strong>]</li>
<li>2016-11-05: <a class="reference external" href="http://bugs.python.org/issue28618">Decorate hot functions using __attribute__((hot)) to optimize Python</a></li>
<li>2016-11-07: <a class="reference external" href="http://bugs.python.org/issue28637">Python startup performance regression</a></li>
<li>2016-11-25: <a class="reference external" href="http://bugs.python.org/issue28800">Add RETURN_NONE bytecode instruction</a> [<strong>REJECTED</strong>]</li>
<li>2016-11-25: <a class="reference external" href="http://bugs.python.org/issue28799">Drop CALL_PROFILE special build?</a></li>
<li>2016-12-09: <a class="reference external" href="http://bugs.python.org/issue28924">Inline PyEval_EvalFrameEx() in callers</a> [<strong>REJECTED</strong>]</li>
<li>2016-12-15: <a class="reference external" href="http://bugs.python.org/issue28977">Document PyObject_CallFunction() special case more explicitly</a></li>
<li>2017-02-06: <a class="reference external" href="http://bugs.python.org/issue29461">Experiment usage of likely/unlikely in CPython core</a></li>
<li>2017-02-08: <a class="reference external" href="http://bugs.python.org/issue29502">Should PyObject_Call() call the profiler on C functions, use C_TRACE() macro?</a></li>
</ul>
</div>
FASTCALL microbenchmarks2017-02-24T22:00:00+01:002017-02-24T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-24:/fastcall-microbenchmarks.html<p>For my FASTCALL project (CPython optimization avoiding temporary tuples and
dictionaries to pass arguments), I wrote many short microbenchmarks. I grouped
them into a new Git repository: <a class="reference external" href="https://github.com/vstinner/pymicrobench">pymicrobench</a>. Benchmark results are required by
CPython developers to prove that an optimization is worth it. It's not uncommon
that I abandon a …</p><p>For my FASTCALL project (CPython optimization avoiding temporary tuples and
dictionaries to pass arguments), I wrote many short microbenchmarks. I grouped
them into a new Git repository: <a class="reference external" href="https://github.com/vstinner/pymicrobench">pymicrobench</a>. Benchmark results are required by
CPython developers to prove that an optimization is worth it. It's not uncommon
that I abandon a change because the speedup is not significant, makes CPython
slower, or because the change is too complex. Last 12 months, I counted that I
abandonned 9 optimization issues, rejected for different reasons, on a total of
46 optimization issues.</p>
<p>This article gives Python 3.7 results of these microbenchmarks compared to
Python 3.5 (before FASTCALL). I ignored 3 microbenchmarks which are between 2%
and 5% slower: the code was not optimized and the result is not signifiant
(less than 10% on a <em>microbenchmark</em> is not significant).</p>
<p>On results below, the speedup is between 1.11x faster (-10%) and 1.92x faster
(-48%). It's not easy to isolate the speedup of only FASTCALL. Since Python
3.5, Python 3.7 got many other optimizations.</p>
<p>Using FASTCALL gives a speedup around 20 ns: measured on a patch to use
FASTCALL. It's not a lot, but many builtin functions take less than 100 ns, so
20 ns is significant in practice! Avoiding a tuple to pass positional arguments
is interesting, but FASTCALL also allows further internal optimizations.</p>
<p>Microbenchmark on calling builtin functions:</p>
<table border="1" class="docutils">
<colgroup>
<col width="53%" />
<col width="11%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Benchmark</th>
<th class="head">3.5</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>struct.pack("i", 1)</td>
<td>105 ns</td>
<td>77.6 ns: 1.36x faster (-26%)</td>
</tr>
<tr><td>getattr(1, "real")</td>
<td>79.4 ns</td>
<td>64.4 ns: 1.23x faster (-19%)</td>
</tr>
</tbody>
</table>
<p>Microbenchmark on calling methods of builtin types:</p>
<table border="1" class="docutils">
<colgroup>
<col width="53%" />
<col width="11%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Benchmark</th>
<th class="head">3.5</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>{1: 2}.get(7, None)</td>
<td>84.9 ns</td>
<td>61.6 ns: 1.38x faster (-27%)</td>
</tr>
<tr><td>collections.deque([None]).index(None)</td>
<td>116 ns</td>
<td>87.0 ns: 1.33x faster (-25%)</td>
</tr>
<tr><td>{1: 2}.get(1)</td>
<td>79.4 ns</td>
<td>59.6 ns: 1.33x faster (-25%)</td>
</tr>
<tr><td>"a".replace("x", "y")</td>
<td>134 ns</td>
<td>101 ns: 1.33x faster (-25%)</td>
</tr>
<tr><td>b"".decode()</td>
<td>71.5 ns</td>
<td>54.5 ns: 1.31x faster (-24%)</td>
</tr>
<tr><td>b"".decode("ascii")</td>
<td>99.1 ns</td>
<td>75.7 ns: 1.31x faster (-24%)</td>
</tr>
<tr><td>collections.deque.rotate(1)</td>
<td>106 ns</td>
<td>82.8 ns: 1.28x faster (-22%)</td>
</tr>
<tr><td>collections.deque.insert()</td>
<td>778 ns</td>
<td>608 ns: 1.28x faster (-22%)</td>
</tr>
<tr><td>b"".join((b"hello", b"world") * 100)</td>
<td>4.02 us</td>
<td>3.32 us: 1.21x faster (-17%)</td>
</tr>
<tr><td>[0].count(0)</td>
<td>53.9 ns</td>
<td>46.3 ns: 1.16x faster (-14%)</td>
</tr>
<tr><td>collections.deque.rotate()</td>
<td>72.6 ns</td>
<td>63.1 ns: 1.15x faster (-13%)</td>
</tr>
<tr><td>b"".join((b"hello", b"world"))</td>
<td>102 ns</td>
<td>89.8 ns: 1.13x faster (-12%)</td>
</tr>
</tbody>
</table>
<p>Microbenchmark on builtin functions calling Python functions (callbacks):</p>
<table border="1" class="docutils">
<colgroup>
<col width="53%" />
<col width="11%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Benchmark</th>
<th class="head">3.5</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>map(lambda x: x, list(range(1000)))</td>
<td>76.1 us</td>
<td>61.1 us: 1.25x faster (-20%)</td>
</tr>
<tr><td>sorted(list(range(1000)), key=lambda x: x)</td>
<td>90.2 us</td>
<td>78.2 us: 1.15x faster (-13%)</td>
</tr>
<tr><td>filter(lambda x: x, list(range(1000)))</td>
<td>81.8 us</td>
<td>73.4 us: 1.11x faster (-10%)</td>
</tr>
</tbody>
</table>
<p>Microbenchmark on calling slots (<tt class="docutils literal">__getitem__</tt>, <tt class="docutils literal">__init__</tt>, <tt class="docutils literal">__int__</tt>)
implemented in Python:</p>
<table border="1" class="docutils">
<colgroup>
<col width="53%" />
<col width="11%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Benchmark</th>
<th class="head">3.5</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>Python __getitem__: obj[0]</td>
<td>167 ns</td>
<td>87.0 ns: 1.92x faster (-48%)</td>
</tr>
<tr><td>call_pyinit_kw1</td>
<td>348 ns</td>
<td>240 ns: 1.45x faster (-31%)</td>
</tr>
<tr><td>call_pyinit_kw5</td>
<td>564 ns</td>
<td>401 ns: 1.41x faster (-29%)</td>
</tr>
<tr><td>call_pyinit_kw10</td>
<td>960 ns</td>
<td>734 ns: 1.31x faster (-24%)</td>
</tr>
<tr><td>Python __int__: int(obj)</td>
<td>241 ns</td>
<td>207 ns: 1.16x faster (-14%)</td>
</tr>
</tbody>
</table>
<p>Microbenchmark on calling a method descriptor (static method):</p>
<table border="1" class="docutils">
<colgroup>
<col width="53%" />
<col width="11%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Benchmark</th>
<th class="head">3.5</th>
<th class="head">3.7</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>int.to_bytes(1, 4, "little")</td>
<td>177 ns</td>
<td>103 ns: 1.72x faster (-42%)</td>
</tr>
</tbody>
</table>
<p>Benchmarks were run on <tt class="docutils literal"><span class="pre">speed-python</span></tt>, server used to run CPython benchmarks.</p>
The start of the FASTCALL project2017-02-16T17:00:00+01:002017-02-16T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-16:/start-fastcall-project.html<div class="section" id="false-start">
<h2>False start</h2>
<p>In April 2016, I experimented a Python change to avoid temporary tuple to call
functions. Builtin functions were between 20 and 50% faster!</p>
<p>Sadly, some benchmarks were randomy slower. It will take me four months to
understand why!</p>
</div>
<div class="section" id="work-on-benchmarks">
<h2>Work on benchmarks</h2>
<p>During four months, I worked on making …</p></div><div class="section" id="false-start">
<h2>False start</h2>
<p>In April 2016, I experimented a Python change to avoid temporary tuple to call
functions. Builtin functions were between 20 and 50% faster!</p>
<p>Sadly, some benchmarks were randomy slower. It will take me four months to
understand why!</p>
</div>
<div class="section" id="work-on-benchmarks">
<h2>Work on benchmarks</h2>
<p>During four months, I worked on making benchmarks more stable. See my previous
blog posts:</p>
<ul class="simple">
<li><a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-system.html">My journey to stable benchmark, part 1 (system)</a> (May 21, 2016)</li>
<li><a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html">My journey to stable benchmark, part 2 (deadcode)</a> (May 22, 2016)</li>
<li><a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-average.html">My journey to stable benchmark, part 3 (average)</a> (May 23, 2016)</li>
<li><a class="reference external" href="https://vstinner.github.io/perf-visualize-system-noise-with-cpu-isolation.html">Visualize the system noise using perf and CPU isolation</a> (June 16, 2016)</li>
<li><a class="reference external" href="https://vstinner.github.io/intel-cpus.html">Intel CPUs: P-state, C-state, Turbo Boost, CPU frequency, etc.</a> (July 15, 2015)</li>
<li><a class="reference external" href="https://vstinner.github.io/intel-cpus-part2.html">Intel CPUs (part 2): Turbo Boost, temperature, frequency and Pstate C0 bug</a>
(September 23, 2016)</li>
<li><a class="reference external" href="https://vstinner.github.io/analysis-python-performance-issue.html">Analysis of a Python performance issue</a>
(November 19, 2016)</li>
<li>...</li>
</ul>
<p>See my talk <a class="reference external" href="https://fosdem.org/2017/schedule/event/python_stable_benchmark/">How to run a stable benchmark</a> that I gave
at FOSDEM 2017 (Brussels, Belgium): slides + video. I listed all the issues
that I had to get reliable benchmarks.</p>
</div>
<div class="section" id="ask-for-permission">
<h2>Ask for permission</h2>
<p>August 2016, I
confirmed that my change didn't introduce any slowndown. So I asked for the
permission on the python-dev mailing list to start pushing changes: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-August/145793.html">New
calling convention to avoid temporarily tuples when calling functions</a>.</p>
<p>Guido van Rossum asked me for benchmark results:</p>
<blockquote>
But is there a performance improvement?</blockquote>
</div>
<div class="section" id="benchmark-results">
<h2>Benchmark results</h2>
<p>On micro-benchmarks, FASTCALL is much faster:</p>
<ul class="simple">
<li><tt class="docutils literal">getattr(1, "real")</tt> becomes <strong>44%</strong> faster</li>
<li><tt class="docutils literal">list(filter(lambda x: x, <span class="pre">list(range(1000))))</span></tt> becomes <strong>31%</strong> faster</li>
<li><tt class="docutils literal">namedtuple.attr</tt> (read the attribute) becomes <strong>23%</strong> faster</li>
<li>...</li>
</ul>
<p>Full results:</p>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue26814#msg263999">FASTCALL compared to Python 3.6 (default branch)</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue26814#msg264003">2.7 / 3.4 / 3.5 / 3.6 / 3.6 FASTCALL comparison</a></li>
</ul>
<p>On the <a class="reference external" href="https://bugs.python.org/issue26814#msg266359">CPython benchmark suite</a>, I also saw many faster
benchmarks:</p>
<ul class="simple">
<li>pickle_list: <strong>1.29x faster</strong></li>
<li>etree_generate: <strong>1.22x faster</strong></li>
<li>pickle_dict: <strong>1.19x faster</strong></li>
<li>etree_process: <strong>1.16x faster</strong></li>
<li>mako_v2: <strong>1.13x faster</strong></li>
<li>telco: <strong>1.09x faster</strong></li>
<li>...</li>
</ul>
</div>
<div class="section" id="replies-to-my-email">
<h2>Replies to my email</h2>
<p>I got two very positive replies, so I understood that it was ok.</p>
<p>Brett Canon:</p>
<blockquote>
I just wanted to say I'm excited about this and I'm glad someone is taking
advantage of what Argument Clinic allows for and what I know Larry had
initially hoped AC would make happen!</blockquote>
<p>Yury Selivanov:</p>
<blockquote>
Exceptional results, congrats Victor. Will be happy to help with code
review.</blockquote>
</div>
<div class="section" id="real-start">
<h2>Real start</h2>
<p>That's how the FASTCALL began for real! I started to push a long serie of
patches adding new private functions and then modify code to call these new
functions.</p>
</div>
My contributions to CPython during 2016 Q42017-02-16T11:00:00+01:002017-02-16T11:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-16:/contrib-cpython-2016q4.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q4
(october, november, december):</p>
<pre class="literal-block">
hg log -r 'date("2016-10-01"):date("2016-12-31")' --no-merges -u Stinner
</pre>
<p>Statistics: 105 non-merge commits + 31 merge commits (total: 136 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to CPython during 2016 Q3</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to
CPython during 2017 Q1</a>.</p>
<p>Table of …</p><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q4
(october, november, december):</p>
<pre class="literal-block">
hg log -r 'date("2016-10-01"):date("2016-12-31")' --no-merges -u Stinner
</pre>
<p>Statistics: 105 non-merge commits + 31 merge commits (total: 136 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to CPython during 2016 Q3</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to
CPython during 2017 Q1</a>.</p>
<p>Table of Contents:</p>
<ul class="simple">
<li>Python startup performance regression</li>
<li>Optimizations</li>
<li>Code placement and __attribute__((hot))</li>
<li>Interesting bug: duplicated filters when tests reload the warnings module</li>
<li>Contributions</li>
<li>regrtest</li>
<li>Other changes</li>
</ul>
<div class="section" id="python-startup-performance-regression">
<h2>Python startup performance regression</h2>
<div class="section" id="regresion">
<h3>Regresion</h3>
<p>My work on tracking Python performances started to become useful :-) I
identified a performance slowdown on the <tt class="docutils literal">bm_python_startup</tt> benchmark
(average time to start Python).</p>
<p>Before September 2016, the start took around <strong>17.9 ms</strong>. At September 15,
after the <a class="reference external" href="https://vstinner.github.io/cpython-sprint-2016.html">CPython sprint</a>, it was
better: <strong>13.4 ms</strong>. But suddenly, at september 19, it became much worse:
<strong>22.8 ms</strong>. What happened?</p>
<p>Timeline of Python startup performance on speed.python.org:</p>
<a class="reference external image-reference" href="https://speed.python.org/timeline/#/?exe=5&ben=python_startup&env=1&revs=50&equid=off&quarts=on&extr=on"><img alt="Timeline of Python startup performance" src="https://vstinner.github.io/images/python_startup_regression.png" /></a>
<p>I looked at commits between September 15 and September 19, and I quickly
identified the commit of the <a class="reference external" href="http://bugs.python.org/issue28082">convert re flags to (much
friendlier) IntFlag constants (issue #28082)</a>. The <tt class="docutils literal">re</tt> module now imports the
<tt class="docutils literal">enum</tt> module to get a better representation for their flags. Example:</p>
<pre class="literal-block">
$ ./python
Python 3.7.0a0
>>> import re; re.M
<RegexFlag.MULTILINE: 8>
</pre>
</div>
<div class="section" id="revert">
<h3>Revert</h3>
<p>At November 7, I opened the issue #28637 to propose to revert the commit to get
back better Python startup performance. The revert was approved by Guido van
Rossum, so I pushed it.</p>
</div>
<div class="section" id="better-fix">
<h3>Better fix</h3>
<p>I also noticed that the <tt class="docutils literal">re</tt> module is not imported by default if Python is
installed or if Python is run from its source code directory. The <tt class="docutils literal">re</tt> module
is only imported by default if Python is installed in a virtual environment.</p>
<p><strong>Serhiy Storchaka</strong> proposed a change to not import <tt class="docutils literal">re</tt> anymore in the
<tt class="docutils literal">site</tt> module when Python runs into a virutal environment. Since the benefit
was obvious (avoid an import at startup) and simple, it was quickly merged.</p>
</div>
<div class="section" id="restore-reverted-enum-change">
<h3>Restore reverted enum change</h3>
<p>Since using <tt class="docutils literal">enum</tt> in <tt class="docutils literal">re</tt> has no more impact on Python startup
performance by default, the <tt class="docutils literal">enum</tt> change was restored at November 14.</p>
<p>Sadly, the <tt class="docutils literal">enum</tt> change still have an impact on performance:
<tt class="docutils literal">re.compile()</tt> became 1.2x slower (312 ms => 376 ms: +20%).</p>
<a class="reference external image-reference" href="https://speed.python.org/timeline/#/?exe=5&ben=regex_compile&env=1&revs=50&equid=off&quarts=on&extr=on"><img alt="Timeline of re.compile() performance" src="https://vstinner.github.io/images/regex_compile_perf.png" /></a>
<p>I think that it's ok since it is very easy to use precompiled regular
expressions in an application: store and reuse the result of <tt class="docutils literal">re.compile()</tt>,
instead of calling directly <tt class="docutils literal">re.match()</tt> for example.</p>
</div>
</div>
<div class="section" id="optimizations">
<h2>Optimizations</h2>
<div class="section" id="fastcall">
<h3>FASTCALL</h3>
<p>Same than 2016 Q3: I pushed a <em>lot</em> of changes for FASTCALL optimizations, but
I will write a dedicated article later.</p>
</div>
<div class="section" id="no-int-int-micro-optimization-thank-you">
<h3>No int+int micro-optimization, thank you</h3>
<p>After 2 years of benchmarking and a huge effort of making Python benchmarks more
reliable and stable, I decided to close the issue #21955 "ceval.c: implement
fast path for integers with a single digit" as REJECTED. It became clear to me
that such micro-optimization has no effect on non-trivial code, but only on
specially crafted micro-benchmarks. I added a comment in the C code to prevent
further optimizations attempts:</p>
<pre class="literal-block">
/* NOTE(haypo): Please don't try to micro-optimize int+int on
CPython using bytecode, it is simply worthless.
See http://bugs.python.org/issue21955 and
http://bugs.python.org/issue10044 for the discussion. In short,
no patch shown any impact on a realistic benchmark, only a minor
speedup on microbenchmarks. */
</pre>
</div>
<div class="section" id="timeit">
<h3>timeit</h3>
<p>I enhanced the <tt class="docutils literal">timeit</tt> benchmark module to make it more reliable (issue
#28240):</p>
<ul class="simple">
<li>Autorange now starts with a single loop iteration instead of 10. For example,
<tt class="docutils literal">python3 <span class="pre">-m</span> timeit <span class="pre">-s</span> 'import time' 'time.sleep(1)'</tt> now only takes 4
seconds instead of 40 seconds.</li>
<li>Repeat the benchmarks 5 times by default, instead of only 3, to make
benchmarks more reliable.</li>
<li>Remove <tt class="docutils literal"><span class="pre">-c/--clock</span></tt> and <tt class="docutils literal"><span class="pre">-t/--time</span></tt> command line options which were
deprecated since Python 3.3.</li>
<li>Add <tt class="docutils literal">nsec</tt> (nanosecond) unit to format timings</li>
<li>Enhance formatting of raw timings in verbose mode. Add newlines to the output
for readability.</li>
</ul>
</div>
<div class="section" id="micro-optimizations">
<h3>Micro-optimizations</h3>
<p>I also pushed two minor micro-optimizations:</p>
<ul class="simple">
<li>Use <tt class="docutils literal">PyThreadState_GET()</tt> macro in performance critical code.
<tt class="docutils literal">_PyThreadState_UncheckedGet()</tt> calls are not inlined as expected, even
when using <tt class="docutils literal">gcc <span class="pre">-O3</span></tt>.</li>
<li>Modify <tt class="docutils literal">type_setattro()</tt> to call directly
<tt class="docutils literal">_PyObject_GenericSetAttrWithDict()</tt> instead of
<tt class="docutils literal">PyObject_GenericSetAttr()</tt>. <tt class="docutils literal">PyObject_GenericSetAttr()</tt> is a thin
wrapper to <tt class="docutils literal">_PyObject_GenericSetAttrWithDict()</tt>.</li>
</ul>
</div>
</div>
<div class="section" id="code-placement-and-attribute-hot">
<h2>Code placement and __attribute__((hot))</h2>
<p>On <a class="reference external" href="https://speed.python.org/">speed.python.org</a>, I still noticed random
performance slowdowns on the evil <tt class="docutils literal">call_simple</tt> benchmark. This benchmark is
a <em>micro</em>-benchmark measuring the performance of a single Python function call,
it is CPU-bound and very small and so impact by CPU caches. I was bitten again
by significant performance slowdown only caused by code placement.</p>
<p>It wasn't possible to use <em>Profiled Guided Optimization</em> (PGO) on the benchmark
runner, since it used Ubuntu 14.04 and GCC crashed with an "internal error".</p>
<p>So I tried something different: mark "hot functions" with
<tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt>. It's a GCC and Clang attribute helping code
placements: "hot functions" are moved to a dedicated ELF section and so are
closer in memory, and the compiler tries to optimize these functions even more.</p>
<p>The following functions are considered as hot according to statistics collected
by Linux <tt class="docutils literal">perf record</tt> and <tt class="docutils literal">perf report</tt> commands:</p>
<ul class="simple">
<li>_PyEval_EvalFrameDefault()</li>
<li>call_function()</li>
<li>_PyFunction_FastCall()</li>
<li>PyFrame_New()</li>
<li>frame_dealloc()</li>
<li>PyErr_Occurred()</li>
</ul>
<p>I added a <tt class="docutils literal">_Py_HOT_FUNCTION</tt> macro which uses <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> and
used <tt class="docutils literal">_Py_HOT_FUNCTION</tt> on these functions (issue #28618).</p>
<p>Read also my previous blog article <a class="reference external" href="https://vstinner.github.io/analysis-python-performance-issue.html">Analysis of a Python performance issue</a> for a deeper analysis.</p>
<p>Sadly, after I wrote this blog post and after more analysis of <tt class="docutils literal">call_simple</tt>
benchmark results, I saw that <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> wasn't enough. I still
had random major performance slowdown.</p>
<p>I dediced to upgrade the performance runner to Ubuntu 16.04. It was dangerous
because nobody has access to the physical server, so it may takes weeks to
repair it if I did a mistake. Hopefully, the upgrade gone smoothly and I was
able to run again all benchmarks using PGO. As expected, using PGO+LTO,
benchmark results are more stable!</p>
</div>
<div class="section" id="interesting-bug-duplicated-filters-when-tests-reload-the-warnings-module">
<h2>Interesting bug: duplicated filters when tests reload the warnings module</h2>
<p>Python test suite has an old bug: the issue #18383 opened in July 2013.
Sometimes, the test suite emits the following warning:</p>
<pre class="literal-block">
[247/375] test_warnings
Warning -- warnings.filters was modified by test_warnings
</pre>
<p>Since it's only a warning and it only occurs in the Python test suite, it was a
low priority and took 3 years to be fixed! It also took time to find the right
design to fix the root cause.</p>
<div class="section" id="duplicated-filters">
<h3>Duplicated filters</h3>
<p>test_warnings imports the <tt class="docutils literal">warnings</tt> module 3 times:</p>
<pre class="literal-block">
import warnings as original_warnings # Python
py_warnings = support.import_fresh_module('warnings', blocked=['_warnings']) # Python
c_warnings = support.import_fresh_module('warnings', fresh=['_warnings']) # C
</pre>
<p>The Python <tt class="docutils literal">warnings</tt> module (<tt class="docutils literal">Lib/warnings.py</tt>) installs warning filters
when the module is loaded:</p>
<pre class="literal-block">
_processoptions(sys.warnoptions)
</pre>
<p>where <tt class="docutils literal">sys.warnoptions</tt> contains the value of the <tt class="docutils literal"><span class="pre">-W</span></tt> command line option.</p>
<p>If the Python module is loaded more than once, filters are duplicated.</p>
</div>
<div class="section" id="first-fix-use-the-right-module">
<h3>First fix: use the right module</h3>
<p>I pushed a first fix in september 2015.</p>
<p>Fix test_warnings: don't modify warnings.filters. BaseTest now ensures that
unittest.TestCase.assertWarns() uses the same warnings module than
warnings.catch_warnings(). Otherwise, warnings.catch_warnings() will be unable
to remove the added filter.</p>
</div>
<div class="section" id="second-fix-don-t-add-duplicated-filters">
<h3>Second fix: don't add duplicated filters</h3>
<p>Issue #18383: the first patch was proposed by <strong>Florent Xicluna</strong> in 2013: save
the length of filters, and remove newly added filters after <tt class="docutils literal">warnings</tt>
modules are reloaded by <tt class="docutils literal">test_warnings</tt>. December 2014, <strong>Serhiy Storchaka</strong>
reviewed the patch: he didn't like this <em>workaround</em>, he would like to fix the
<em>root cause</em>.</p>
<p>March 2015, <strong>Alex Shkop</strong> proposed a patch which avoids to add duplicated
filters.</p>
<p>September 2015, <strong>Martin Panter</strong> proposed to try to save/restore filters on
the C warnings module. I proposed something similar in the issue #26742. But
this solution has the same flaw that Florent's idea: it's only a workaround.</p>
<p>Martin also proposed add a private flag to say that filters were already set to
not try to add again same filters.</p>
<p>Finally, in may 2016, Martin updated Alex's patch avoiding duplicated filters
and pushed it.</p>
</div>
<div class="section" id="third-fix">
<h3>Third fix</h3>
<p>The filter comparisons wasn't perfect. A filter can be made of a precompiled
regular expression, whereas these objects don't implement comparison.</p>
<p>November 2016, I opened the issue #28727 to propose to implement rich
comparison for <tt class="docutils literal">_sre.SRE_Pattern</tt>.</p>
<p>My first patch didn't implement <tt class="docutils literal">hash()</tt> and had different bugs. It took me
almost one week and 6 versions to write complete unit tests and handle all
cases: support bytes and Unicode and handle regular expression flags.</p>
<p><strong>Serhiy Storchaka</strong> found bugs and helps me to write the implementation.</p>
</div>
</div>
<div class="section" id="contributions">
<h2>Contributions</h2>
<p>As usual, I reviewed and pushed changes written by other contributors:</p>
<ul>
<li><p class="first">Issue #27896: Allow passing sphinx options to Doc/Makefile. Patch written by
<strong>Julien Palard</strong>.</p>
</li>
<li><p class="first">Issue #28476: Reuse math.factorial() in test_random.
Patch written by <strong>Francisco Couzo</strong>.</p>
</li>
<li><p class="first">Issue #28479: Fix reST syntax in windows.rst. Patch written by <strong>Julien Palard</strong>.</p>
</li>
<li><p class="first">Issue #26273: Add new constants: <tt class="docutils literal">socket.TCP_CONGESTION</tt> (Linux 2.6.13) and
<tt class="docutils literal">socket.TCP_USER_TIMEOUT</tt> (Linux 2.6.37).
Patch written by <strong>Omar Sandoval</strong>.</p>
</li>
<li><p class="first">Issue #28979: Fix What's New in Python 3.6: compact dict is not faster, but
only more compact. Patch written by <strong>Brendan Donegan</strong>.</p>
</li>
<li><p class="first">Issue #28147: Fix a memory leak in split-table dictionaries: <tt class="docutils literal">setattr()</tt>
must not convert combined table into split table.
Patch written by <strong>INADA Naoki</strong>.</p>
</li>
<li><p class="first">Issue #29109: Enhance tracemalloc documentation:</p>
<ul class="simple">
<li>Wrong parameter name, 'group_by' instead of 'key_type'</li>
<li>Don't round up numbers when explaining the examples. If they exactly match
what can be read in the script output, it is to easier to understand
(4.8 MiB vs 4855 KiB)</li>
<li>Fix incorrect method link that was pointing to another module</li>
</ul>
<p>Patch written by <strong>Loic Pefferkorn</strong>.</p>
</li>
</ul>
</div>
<div class="section" id="regrtest">
<h2>regrtest</h2>
<ul class="simple">
<li>regrtest <tt class="docutils literal"><span class="pre">--fromfile</span></tt> now accepts a list of filenames, not only a list of
<em>test</em> names.</li>
<li>Issue #28409: regrtest: fix the parser of command line arguments.</li>
</ul>
</div>
<div class="section" id="other-changes">
<h2>Other changes</h2>
<ul class="simple">
<li>Fix <tt class="docutils literal">_Py_normalize_encoding()</tt> function: It was not exactly the same than
Python <tt class="docutils literal">encodings.normalize_encoding()</tt>: the C function now also converts
to lowercase.</li>
<li>Issue #28256: Cleanup <tt class="docutils literal">_math.c</tt>: only define fallback implementations when
needed. It avoids producing deadcode when the system provides required math
functions, and so enhance the code coverage.</li>
<li>_csv: use <tt class="docutils literal">_PyLong_AsInt()</tt> to simplify the code, the function checks for
the limits of the C <tt class="docutils literal">int</tt> type.</li>
<li>Issue #28544: Fix <tt class="docutils literal">_asynciomodule.c</tt> on Windows. <tt class="docutils literal">PyType_Ready()</tt> sets
the reference to <tt class="docutils literal">&PyType_Type</tt>. <tt class="docutils literal">&PyType_Type</tt> address cannot be
resolved at compilation time (not on Windows?).</li>
<li>Issue #28082: Add basic unit tests on the new <tt class="docutils literal">re</tt> enums.</li>
<li>Issue #28691: Fix <tt class="docutils literal">warn_invalid_escape_sequence()</tt>: handle correctly
<tt class="docutils literal">DeprecationWarning</tt> raised as an exception. First clear the current
exception to replace the <tt class="docutils literal">DeprecationWarning</tt> exception with a
<tt class="docutils literal">SyntaxError</tt> exception. Unit test written by <strong>Serhiy Storchaka</strong>.</li>
<li>Issue #28023: Fix python-gdb.py on old GDB versions. Replace
<tt class="docutils literal"><span class="pre">int(value.address)+offset</span></tt> with <tt class="docutils literal">value.cast(unsigned <span class="pre">char*)+offset</span></tt>.
It seems like <tt class="docutils literal">int(value.address)</tt> fails on old GDB versions.</li>
<li>Issue #28765: <tt class="docutils literal">_sre.compile()</tt> now checks the type of <tt class="docutils literal">groupindex</tt> and
<tt class="docutils literal">indexgroup</tt> arguments. <tt class="docutils literal">groupindex</tt> must a dictionary and <tt class="docutils literal">indexgroup</tt>
must be a tuple. Previously, <tt class="docutils literal">indexgroup</tt> was a list. Use a tuple to
reduce the memory usage.</li>
<li>Issue #28782: Fix a bug in the implementation <tt class="docutils literal">yield from</tt>
(fix <tt class="docutils literal">_PyGen_yf()</tt> function). Fix the test checking if the next instruction
is <tt class="docutils literal">YIELD_FROM</tt>. Regression introduced by the new "WordCode" bytecode
(issue #26647). Fix reviewed by <strong>Serhiy Storchaka</strong> and <strong>Yury Selivanov</strong>.</li>
<li>Issue #28792: Remove aliases from <tt class="docutils literal">_bisect</tt>. Remove aliases from the C
module. Always implement <tt class="docutils literal">bisect()</tt> and <tt class="docutils literal">insort()</tt> aliases in
<tt class="docutils literal">bisect.py</tt>. Remove also the <tt class="docutils literal"># backward compatibility</tt> comment: there
is no plan to deprecate nor remove these aliases. When keys are equal, it
makes sense to use <tt class="docutils literal">bisect.bisect()</tt> and <tt class="docutils literal">bisect.insort()</tt>.</li>
<li>Fix a <tt class="docutils literal">ResourceWarning</tt> in <tt class="docutils literal">generate_opcode_h.py</tt>. Use a context manager
to close the Python file. Replace also <tt class="docutils literal">open()</tt> with <tt class="docutils literal">tokenize.open()</tt> to
handle coding cookie of <tt class="docutils literal">Lib/opcode.py</tt>.</li>
<li>Issue #28740: Add <tt class="docutils literal">sys.getandroidapilevel()</tt> function: return the build
time API version of Android as an integer. Function only available on
Android. The availability of this function can be tested to check if Python
is running on Android.</li>
<li>Issue #28152: Fix <tt class="docutils literal"><span class="pre">-Wunreachable-code</span></tt> warnings on Clang.<ul>
<li>Don't declare dead code when the code is compiled with Clang.</li>
<li>Replace C <tt class="docutils literal">if()</tt> with precompiler <tt class="docutils literal">#if</tt> to fix a warning on dead code
when using Clang.</li>
<li>Replace <tt class="docutils literal">0</tt> with <tt class="docutils literal">(0)</tt> to ignore a compiler warning about dead code on
<tt class="docutils literal"><span class="pre">((int)(SEM_VALUE_MAX)</span> < 0)</tt>: <tt class="docutils literal">SEM_VALUE_MAX</tt> is not negative on Linux.</li>
</ul>
</li>
<li>Issue #28835: Fix a regression introduced in <tt class="docutils literal">warnings.catch_warnings()</tt>:
call <tt class="docutils literal">warnings.showwarning()</tt> if it was overriden inside the context
manager.</li>
<li>Issue #28915: Replace <tt class="docutils literal">int</tt> with <tt class="docutils literal">Py_ssize_t</tt> in <tt class="docutils literal">modsupport</tt>.
<tt class="docutils literal">Py_ssize_t</tt> type is better for indexes. The compiler might emit more
efficient code for <tt class="docutils literal">i++</tt>. <tt class="docutils literal">Py_ssize_t</tt> is the type of a PyTuple index for
example. Replace also <tt class="docutils literal">int endchar</tt> with <tt class="docutils literal">char endchar</tt>.</li>
<li>Initialize variables to fix compiler warnings. Warnings seen on the "AMD64
Debian PGO 3.x" buildbot. Warnings are false positive, but variable
initialization should not harm performances.</li>
<li>Remove useless variable initialization. Don't initialize variables which are
not used before they are assigned.</li>
<li>Issue #28838: Cleanup <tt class="docutils literal">abstract.h</tt>. Rewrite all comments to use the same style
than other Python header files: comment functions <em>before</em> their declaration,
no newline between the comment and the declaration. Reformat some comments,
add newlines, to make them easier to read. Quote argument like 'arg' to
mention an argument in a comment.</li>
<li>Issue #28838: <tt class="docutils literal">abstract.h</tt>: remove long outdated comment. The documentation
of the Python C API is more complete and more up to date than this old
comment. Removal suggested by <strong>Antoine Pitrou</strong>.</li>
<li>python-gdb.py: catch <tt class="docutils literal">gdb.error</tt> on <tt class="docutils literal">gdb.selected_frame()</tt>.</li>
<li>Issue #28383: <tt class="docutils literal">__hash__</tt> documentation recommends naive XOR to combine, but
this is suboptimal. Update the documentation to suggest to reuse the
<tt class="docutils literal">hash()</tt> function on a tuple, with an example.</li>
</ul>
</div>
My contributions to CPython during 2016 Q32017-02-14T19:00:00+01:002017-02-14T19:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-14:/contrib-cpython-2016q3.html<p class="first last">My contributions to CPython during 2016 Q3</p>
<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q3
(july, august, september):</p>
<pre class="literal-block">
hg log -r 'date("2016-07-01"):date("2016-09-30")' --no-merges -u Stinner
</pre>
<p>Statistics: 161 non-merge commits + 29 merge commits (total: 190 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q2.html">My contributions to CPython during 2016 Q2</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to
CPython during 2016 Q4</a>.</p>
<p>Table of Contents:</p>
<ul class="simple">
<li>Two new core developers</li>
<li>CPython sprint, September, in California</li>
<li>PEP 524: Make os.urandom() blocking on Linux</li>
<li>PEP 509: private dictionary version</li>
<li>FASTCALL: optimization avoiding temporary tuple to call functions</li>
<li>More efficient CALL_FUNCTION bytecode</li>
<li>Work on optimization</li>
<li>Interesting bug: hidden resource warnings</li>
<li>Contributions</li>
<li>Bugfixes</li>
<li>regrtest changes</li>
<li>Tests changes</li>
<li>Other changes</li>
</ul>
<div class="section" id="two-new-core-developers">
<h2>Two new core developers</h2>
<p>New core developers is the result of the productive third 2016 quarter.</p>
<p>At september 25, 2016, Yury Selivanov proposed to give <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2016-September/004013.html">commit privileges for
INADA Naoki</a>.
Naoki became a core developer the day after!</p>
<p>At november 14, 2016, I proposed to <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2016-November/004045.html">promote Xiang Zhang as a core developer</a>.
One week later, he also became a core developer! I mentored him during one
month, and later let him push directly changes.</p>
<p>Most Python core developers are men coming from North America and Europe.
INADA Naoki comes from Japan and Xiang Zhang comes from China: more core
developers from Asia, we increased the diversity of Python core developers!</p>
</div>
<div class="section" id="cpython-sprint-september-in-california">
<h2>CPython sprint, September, in California</h2>
<p>I was invited at my first CPython sprint in September! Five days, September
5-9, at Instagram office in California, USA. I reviewed a lot of changes and
pushed many new features! Read my previous blog post: <a class="reference external" href="https://vstinner.github.io/cpython-sprint-2016.html">CPython sprint,
september 2016</a>.</p>
</div>
<div class="section" id="pep-524-make-os-urandom-blocking-on-linux">
<h2>PEP 524: Make os.urandom() blocking on Linux</h2>
<p>I pushed the implementation my PEP 524: read my previous blog post: <a class="reference external" href="https://vstinner.github.io/pep-524-os-urandom-blocking.html">PEP 524:
os.urandom() now blocks on Linux in Python 3.6</a>.</p>
</div>
<div class="section" id="pep-509-private-dictionary-version">
<h2>PEP 509: private dictionary version</h2>
<p>Another enhancement from my <a class="reference external" href="http://faster-cpython.readthedocs.io/fat_python.html">FAT Python</a> project: my <a class="reference external" href="https://www.python.org/dev/peps/pep-0509/">PEP 509:
Add a private version to dict</a> was
approved at the CPython sprint by Guido van Rossum.</p>
<p>The dictionary version is used by FAT Python to check quickly if a variable was
modified in a Python namespace. Technically, a Python namespace is a regular
dictionary.</p>
<p>Using the feedback from the python-ideas mailing list on the first version of
my PEP, I made further changes:</p>
<ul class="simple">
<li>Use 64-bit unsigned integers on 32-bit system: "A risk of an integer overflow
every 584 years is acceptable." Using 32-bit, an overflow occurs every 4
seconds!</li>
<li>Don't expose the version at Python level to prevent users writing
optimizations based on it in Python. Reading the dictionary version in Python
is as slow as a dictionary lookup, wheras the version is usually used to
avoid a "slow" dictionary lookup. The version is only accessible at the C
level.</li>
</ul>
<p>While my experimental FAT Python static optimizer didn't convince Guido, Yury
Selivanov wrote yet another cache for global variables using the dictionary
version: <a class="reference external" href="http://bugs.python.org/issue28158">Implement LOAD_GLOBAL opcode cache</a> (sadly, not merged yet).</p>
<p>I added the private version to the builtin dict type with the issue #26058. The
global dictionary version is incremented at each dictionary creation and at
each dictionary change, and each dictionary has its own version as well.</p>
</div>
<div class="section" id="fastcall-optimization-avoiding-temporary-tuple-to-call-functions">
<h2>FASTCALL: optimization avoiding temporary tuple to call functions</h2>
<p>Thanks to my work on making Python benchmarks more stable, I confirmed that my
FASTCALL patches don't introduce performance regressions, and make Python
faster in some specific cases.</p>
<p>I started to push FASTCALL changes. It will take me 6 months to push most
changes to enable fully FASTCALL "everywhere" in the code base and to finish
the implementation.</p>
<p>Following blog posts will describe FASTCALL changes, its history and
performance enhancements. Spoiler: Python 3.6 is fast!</p>
</div>
<div class="section" id="more-efficient-call-function-bytecode">
<h2>More efficient CALL_FUNCTION bytecode</h2>
<p>I reviewed and merged Demur Rumed's patch to make the CALL_FUNCTION opcodes
more efficient. Demur implemented the design proposed by Serhiy Storchaka.
Serhiy Storchaka also reviewied the implementation with me.</p>
<p>Issue #27213: Rework CALL_FUNCTION* opcodes to produce shorter and more
efficient bytecode:</p>
<ul class="simple">
<li><tt class="docutils literal">CALL_FUNCTION</tt> now only accepts positional arguments</li>
<li><tt class="docutils literal">CALL_FUNCTION_KW</tt> accepts positional arguments and keyword arguments,
keys of keyword arguments are packed into a constant tuple.</li>
<li><tt class="docutils literal">CALL_FUNCTION_EX</tt> is the most generic opcode: it expects a tuple and a
dict for positional and keyword arguments.</li>
</ul>
<p><tt class="docutils literal">CALL_FUNCTION_VAR</tt> and <tt class="docutils literal">CALL_FUNCTION_VAR_KW</tt> opcodes have been removed.</p>
<p>Demur Rumed also implemented "Wordcode", a new bytecode format using fixed
units of 16-bit: 8-bit opcode with 8-bit argument. Wordcode was merged in May
2016, see <a class="reference external" href="http://bugs.python.org/issue26647">issue #26647: ceval: use Wordcode, 16-bit bytecode</a>.</p>
<p>All instructions have an argument: opcodes without argument use the argument
<tt class="docutils literal">0</tt>. It allowed to remove the following conditional code in the very hot code
of <tt class="docutils literal">Python/ceval.c</tt>:</p>
<pre class="literal-block">
if (HAS_ARG(opcode))
oparg = NEXTARG();
</pre>
<p>The bytecode is now fetched using 16-bit words, instead of loading one or two
8-bit words per instruction.</p>
</div>
<div class="section" id="work-on-optimization">
<h2>Work on optimization</h2>
<p>I continued with work on the <a class="reference external" href="https://github.com/python/performance">performance</a> Python benchmark suite. The suite
works on CPython and PyPy, but it's maybe not fine tuned for PyPy yet.</p>
<ul class="simple">
<li>Issue #27938: Add a fast-path for us-ascii encoding</li>
<li>Issue #15369: Remove the (old version of) pybench microbenchmark. Please use
the new "performance" benchmark suite which includes a more recent version of
pybench.</li>
<li>Issue #15369. Remove old and unreliable pystone microbenchmark. Please use
the new "performance" benchmark suite which is much more reliable.</li>
</ul>
</div>
<div class="section" id="interesting-bug-hidden-resource-warnings">
<h2>Interesting bug: hidden resource warnings</h2>
<p>At 2016-08-22, I started to investigate why "Warning -- xxx was modfied by
test_xxx" warnings were not logged on some buildbots (issue #27829).</p>
<p>I modified the code logging the warning to flush immediatly stderr:
<tt class="docutils literal"><span class="pre">print(...,</span> flush=True)</tt>.</p>
<p>19 days later, I tried to remove a quiet flag <tt class="docutils literal"><span class="pre">-q</span></tt> on the Windows build...
but it was a mistake, this flag doesn't mean quiet in the modified batch script
:-)</p>
<p>13 days later, I finally understood that the <tt class="docutils literal"><span class="pre">-W</span></tt> option of regrtest was
eating stderr if the test pass but the environment was modified.</p>
<p>I fixed regrtest to log stderr in all cases, except if the test pass! It should
now be easier to fix "environment changed" warnings emitted by regrtest.</p>
</div>
<div class="section" id="contributions">
<h2>Contributions</h2>
<p>As usual, I reviewed and pushed changes written by other contributors:</p>
<ul class="simple">
<li>Issue #27350: I reviewed and pushed the implementation of compact
dictionaries preserving insertion order. This resulted in dictionaries using
20% to 25% less memory when compared to Python 3.5. The implementation was
written by <strong>INADA Naoki</strong>, based on the PyPy implementation, with a design
by Raymond Hettinger.</li>
<li>"make tags": remove <tt class="docutils literal"><span class="pre">-t</span></tt> option of <tt class="docutils literal">ctags</tt>. The option was kept for
backward compatibility, but it was completly removed recently. Patch written
by <strong>Stéphane Wirtel</strong>.</li>
<li>Issue #27558: Fix a <tt class="docutils literal">SystemError</tt> in the implementation of "raise" statement.
In a brand new thread, raise a RuntimeError since there is no active
exception to reraise. Patch written by <strong>Xiang Zhang</strong>.</li>
<li>Issue #28120: Fix <tt class="docutils literal">dict.pop()</tt> for splitted dictionary when trying to remove a
"pending key": a key not yet inserted in split-table. Patch by <strong>Xiang
Zhang</strong>.</li>
</ul>
</div>
<div class="section" id="bugfixes">
<h2>Bugfixes</h2>
<ul>
<li><p class="first">socket: Fix <tt class="docutils literal">internal_select()</tt> function. Bug found by <strong>Pavel Belikov</strong>
("Fragment N1"): <a class="reference external" href="http://www.viva64.com/en/b/0414/#ID0ECDAE">http://www.viva64.com/en/b/0414/#ID0ECDAE</a></p>
</li>
<li><p class="first">socket: use INVALID_SOCKET.</p>
<ul class="simple">
<li>Replace <tt class="docutils literal">fd = <span class="pre">-1</span></tt> with <tt class="docutils literal">fd = INVALID_SOCKET</tt></li>
<li>Replace <tt class="docutils literal">fd < 0</tt> with <tt class="docutils literal">fd == INVALID_SOCKET</tt>:
SOCKET_T is unsigned on Windows</li>
</ul>
<p>Bug found by Pavel Belikov ("Fragment N1"):
<a class="reference external" href="http://www.viva64.com/en/b/0414/#ID0ECDAE">http://www.viva64.com/en/b/0414/#ID0ECDAE</a></p>
</li>
<li><p class="first">Issue #11048: ctypes, fix <tt class="docutils literal">CThunkObject_new()</tt></p>
<ul class="simple">
<li>Initialize restype and flags fields to fix a crash when Python runs on a
read-only file system</li>
<li>Use <tt class="docutils literal">Py_ssize_t</tt> type rather than <tt class="docutils literal">int</tt> for the <tt class="docutils literal">i</tt> iterator variable</li>
<li>Reorder assignements to be able to more easily check if all fields are
initialized</li>
</ul>
<p>Initial patch written by <strong>Marcin Bachry</strong>.</p>
</li>
<li><p class="first">Issue #27744: socket: Fix memory leak in <tt class="docutils literal">sendmsg()</tt> and
<tt class="docutils literal">sendmsg_afalg()</tt>. Release <tt class="docutils literal">msg.msg_iov</tt> memory block. Release memory
on <tt class="docutils literal">PyMem_Malloc(controllen)</tt> failure</p>
</li>
<li><p class="first">Issue #27866: ssl: Fix refleak in <tt class="docutils literal">cipher_to_dict()</tt>.</p>
</li>
<li><p class="first">Issue #28077: Fix dict type, <tt class="docutils literal">find_empty_slot()</tt> only supports combined
dictionaries.</p>
</li>
<li><p class="first">Issue #28200: Fix memory leak in <tt class="docutils literal">path_converter()</tt>. Replace
<tt class="docutils literal">PyUnicode_AsWideCharString()</tt> <tt class="docutils literal">with PyUnicode_AsUnicodeAndSize()</tt>.</p>
</li>
<li><p class="first">Issue #27955: Catch permission error (<tt class="docutils literal">EPERM</tt>) in <tt class="docutils literal">py_getrandom()</tt>.
Fallback on reading from the <tt class="docutils literal">/dev/urandom</tt> device when the <tt class="docutils literal">getrandom()</tt>
syscall fails with <tt class="docutils literal">EPERM</tt>, for example if blocked by SECCOMP.</p>
</li>
<li><p class="first">Issue #27778: Fix a memory leak in <tt class="docutils literal">os.getrandom()</tt> when the
<tt class="docutils literal">getrandom()</tt> is interrupted by a signal and a signal handler raises a
Python exception.</p>
</li>
<li><p class="first">Issue #28233: Fix <tt class="docutils literal">PyUnicode_FromFormatV()</tt> error handling. Fix a memory
leak if the format string contains a non-ASCII character: destroy the unicode
writer.</p>
</li>
</ul>
</div>
<div class="section" id="regrtest-changes">
<h2>regrtest changes</h2>
<ul class="simple">
<li>regrtest: rename <tt class="docutils literal"><span class="pre">--slow</span></tt> option to <tt class="docutils literal"><span class="pre">--slowest</span></tt> (to get same option name
than the <tt class="docutils literal">testr</tt> tool). Thanks to optparse, --slow syntax still works ;-)
Add --slowest option to buildbots. Display the top 10 slowest tests.</li>
<li>regrtest: nicer output for durations. Use milliseconds and minutes units, not
only seconds.</li>
<li>regrtest: Add a summary of the tests at the end of tests output:
"Tests result: xxx". It was sometimes hard to check quickly if tests
succeeded, failed or something bad happened.</li>
<li>regrtest: accept options after test names. For example, <tt class="docutils literal">./python <span class="pre">-m</span> test
test_os <span class="pre">-v</span></tt> runs <tt class="docutils literal">test_os</tt> in verbose mode. Before, regrtest tried to run
a test called "-v"!</li>
<li>Issue #28195: Fix <tt class="docutils literal">test_huntrleaks_fd_leak()</tt> of test_regrtest. Don't expect
the fd leak message to be on a specific line number, just make sure that the
line is present in the output.</li>
</ul>
<p>Example of a recent (2017-02-15) successful test run, truncated output:</p>
<pre class="literal-block">
...
0:08:20 [403/404] test_codecs passed
0:08:21 [404/404] test_threading passed
391 tests OK.
10 slowest tests:
- test_multiprocessing_spawn: 1 min 24 sec
- test_concurrent_futures: 1 min 3 sec
- test_multiprocessing_forkserver: 60 sec
...
13 tests skipped:
test_devpoll test_ioctl test_kqueue ...
Total duration: 8 min 22 sec
Tests result: SUCCESS
</pre>
</div>
<div class="section" id="tests-changes">
<h2>Tests changes</h2>
<ul>
<li><p class="first">script_helper: kill the subprocess on error. If Popen.communicate() raises an
exception, kill the child process to not leave a running child process in
background and maybe create a zombi process. This change fixes a
ResourceWarning in Python 3.6 when unit tests are interrupted by CTRL+c.</p>
</li>
<li><p class="first">Issue #27181: Skip test_statistics tests known to fail until a fix is found.</p>
</li>
<li><p class="first">Issue #18401: Fix test_pdb if $HOME is not set. HOME is not set on Windows
for example.</p>
</li>
<li><p class="first">test_eintr: Fix <tt class="docutils literal">ResourceWarning</tt> warnings</p>
</li>
<li><p class="first">Buildbot: give 20 minute per test file. It seems like at least 2 buildbots
need more than 15 minutes per test file. Example with "AMD64 Snow Leop 3.x":</p>
<pre class="literal-block">
10 slowest tests:
- test_tools: 14 min 40 sec
- test_tokenize: 11 min 57 sec
- test_datetime: 11 min 25 sec
- ...
</pre>
</li>
<li><p class="first">Issue #28176: test_asynico: fix test_sock_connect_sock_write_race(), increase
the timeout from 10 seconds to 60 seconds.</p>
</li>
</ul>
</div>
<div class="section" id="other-changes">
<h2>Other changes</h2>
<ul class="simple">
<li>Issue #22624: Python 3 now requires the <tt class="docutils literal">clock()</tt> function to build to
simplify the C code.</li>
<li>Issue #27404: tag security related changes with the "[Security]" prefix in
the changelog Misc/NEWS.</li>
<li>Issue #27776: <tt class="docutils literal">dev_urandom(raise=0)</tt> now closes the file descriptor on error</li>
<li>Issue #27128, #18295: Use <tt class="docutils literal">Py_ssize_t</tt> in <tt class="docutils literal">_PyEval_EvalCodeWithName()</tt>.
Replace <tt class="docutils literal">int</tt> type with <tt class="docutils literal">Py_ssize_t</tt> for index variables used for
positional arguments. It should help to avoid integer overflow and help to
emit better machine code for <tt class="docutils literal">i++</tt> (no trap needed for overflow). Make also
the <tt class="docutils literal">total_args</tt> variable constant.</li>
<li>Fix "make tags": set locale to C to call sort. vim expects that the tags file
is sorted using english collation, so it fails if the locale is french for
example. Use LC_ALL=C to force english sorting order. Issue #27726.</li>
<li>Issue #27698: Add <tt class="docutils literal">socketpair</tt> function to <tt class="docutils literal">socket.__all__</tt> on Windows</li>
<li>Issue #27786: Simplify (optimize?) PyLongObject private function <tt class="docutils literal">x_sub()</tt>:
the <tt class="docutils literal">z</tt> variable is known to be a new object which cannot be shared,
<tt class="docutils literal">Py_SIZE()</tt> can be used directly to negate the number.</li>
<li>Fix a clang warning in grammar.c. Clang is smarter than GCC and emits a
warning for dead code on a function declared with
<tt class="docutils literal"><span class="pre">__attribute__((__noreturn__))</span></tt> (the <tt class="docutils literal">Py_FatalError()</tt> function in this
case).</li>
<li>Issue #28114: Add unit tests on <tt class="docutils literal"><span class="pre">os.spawn*()</span></tt> to prepare to fix a crash
with bytes environment.</li>
<li>Issue #28127: Add <tt class="docutils literal">_PyDict_CheckConsistency()</tt>: function checking that a
dictionary remains consistent after any change. By default, only basic
attributes are tested, table content is not checked because the impact on
Python performance is too important. <tt class="docutils literal">DEBUG_PYDICT</tt> must be defined (ex:
<tt class="docutils literal">gcc <span class="pre">-D</span> DEBUG_PYDICT</tt>) to check also dictionaries content.</li>
</ul>
</div>
CPython sprint, september 20162017-02-14T18:00:00+01:002017-02-14T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-14:/cpython-sprint-2016.html<p>I was invited at my first CPython sprint in September! Five days, September
5-9, at Instagram office in California, USA. The sprint was sponsored by
Instagram, Microsoft, and the PSF.</p>
<p><strong>First little game:</strong> Many happy faces, but <em>Where is Victor?</em></p>
<a class="reference external image-reference" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html"><img alt="CPython developers at the Facebook sprint" src="https://vstinner.github.io/images/cpython_sprint_2016_photo.jpg" /></a>
<p>IMHO it was the most productive CPython week ever :-) Having …</p><p>I was invited at my first CPython sprint in September! Five days, September
5-9, at Instagram office in California, USA. The sprint was sponsored by
Instagram, Microsoft, and the PSF.</p>
<p><strong>First little game:</strong> Many happy faces, but <em>Where is Victor?</em></p>
<a class="reference external image-reference" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html"><img alt="CPython developers at the Facebook sprint" src="https://vstinner.github.io/images/cpython_sprint_2016_photo.jpg" /></a>
<p>IMHO it was the most productive CPython week ever :-) Having Guido van Rossum
in a room helped to get many PEPs accepted. Having a lot of highly skilled
reviewers in the same room helped to get many new features and many PEP
implementations merged much faster than usual.</p>
<p><strong>Second little game:</strong> try to spot the sprint on the CPython commit statistics of
the last 12 months (Feb, 2016-Feb, 2017) ;-)</p>
<a class="reference external image-reference" href="https://github.com/python/cpython/graphs/commit-activity"><img alt="CPython commits statistics" src="https://vstinner.github.io/images/cpython_sprint_2016_commits.png" /></a>
<div class="section" id="compact-dict">
<h2>Compact dict</h2>
<p>Issue #27350: I reviewed and pushed the "compact dict" implementation which
makes Python dictionaries ordered (by insertion order) by default. It reduces
the memory usage of dictionaries betwen 20% and 25%.</p>
<p>The implementation was written by INADA Naoki, based on the PyPy
implementation, with a design by Raymond Hettinger.</p>
</div>
<div class="section" id="fastcall">
<h2>FASTCALL</h2>
<p>"Fast calls": Python 3.6 has a new private C API and a new METH_FASTCALL
calling convention which avoids temporary tuple for positional arguments and
avoids temporary dictionary for keyword arguments. Changes:</p>
<ul class="simple">
<li>Add a new C calling convention: METH_FASTCALL</li>
<li>Add _PyArg_ParseStack() function</li>
<li>Add _PyCFunction_FastCallKeywords() function: issue #27810</li>
<li>Add _PyObject_FastCallKeywords() function: issue #27830</li>
</ul>
</div>
<div class="section" id="more-efficient-call-function-bytecode">
<h2>More efficient CALL_FUNCTION bytecode</h2>
<p>I reviewed and pushed: "Rework CALL_FUNCTION* opcodes to produce shorter and
more efficient bytecode" (issue #27213).</p>
<p>Patch writen by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy
Storchaka and me.</p>
</div>
<div class="section" id="pep-509-add-a-private-version-to-dict">
<h2>PEP 509: Add a private version to dict</h2>
<p>Guido approved my PEP 509 "Add a new private version to the builtin dict type".</p>
<p>I pushed the implementation.</p>
</div>
<div class="section" id="pep-524-make-os-urandom-blocking-on-linux">
<h2>PEP 524: Make os.urandom() blocking on Linux</h2>
<p>I pushed the implementation of my PEP 524: "Make os.urandom() blocking on
Linux".</p>
<p>Issue #27776: The os.urandom() function does now block on Linux 3.17 and newer
until the system urandom entropy pool is initialized to increase the security.</p>
<p>Read my previous blog post for the painful story behind the PEP:
<a class="reference external" href="https://vstinner.github.io/pep-524-os-urandom-blocking.html">PEP 524: os.urandom() now blocks on Linux</a>.</p>
</div>
<div class="section" id="asynchronous-pep-525-and-530">
<h2>Asynchronous PEP 525 and 530</h2>
<p>Guido van Rossum approved two PEPs of Yury Selivanov:</p>
<ul class="simple">
<li>PEP 525: Asynchronous Generators</li>
<li>PEP 530: Asynchronous Comprehensions</li>
</ul>
<p>I reviewed the huge C implementation with Yury on my side :-)</p>
</div>
<div class="section" id="unicode-escape-codec-optimization">
<h2>unicode_escape codec optimization</h2>
<p>I reviewed and pushed "Optimize unicode_escape and raw_unicode_escape" (the
isue #16334), patch written by Serhiy Storchaka.</p>
</div>
<div class="section" id="python-3-6-bugfixes">
<h2>Python 3.6 bugfixes</h2>
<p>I happily found many issues including a major one: regular list-comprehension
were completely broken :-)</p>
<p>Another minor issue: SyntaxError didn't reported the correct line number in a
specific case.</p>
<p>Don't worry, Yury fixed both ;-)</p>
</div>
<div class="section" id="official-sprint-report">
<h2>Official sprint report</h2>
<p>Read also the official report: <a class="reference external" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html">Python Core Development Sprint 2016: 3.6 and
beyond!</a>.</p>
</div>
PEP 524: os.urandom() now blocks on Linux in Python 3.62017-02-14T12:00:00+01:002017-02-14T12:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-14:/pep-524-os-urandom-blocking.html<div class="section" id="getrandom-avoids-file-descriptors">
<h2>getrandom() avoids file descriptors</h2>
<p>Last years, I'm making sometimes enhancements in the Python code used to
generate random numbers, the C implementation of <tt class="docutils literal">os.urandom()</tt>. My main two
changes were to use the new <tt class="docutils literal">getentropy()</tt> and <tt class="docutils literal">getrandom()</tt> functions when
available on Linux, Solaris, OpenBSD, etc.</p>
<p>In 2013, <tt class="docutils literal">os.urandom()</tt> opened …</p></div><div class="section" id="getrandom-avoids-file-descriptors">
<h2>getrandom() avoids file descriptors</h2>
<p>Last years, I'm making sometimes enhancements in the Python code used to
generate random numbers, the C implementation of <tt class="docutils literal">os.urandom()</tt>. My main two
changes were to use the new <tt class="docutils literal">getentropy()</tt> and <tt class="docutils literal">getrandom()</tt> functions when
available on Linux, Solaris, OpenBSD, etc.</p>
<p>In 2013, <tt class="docutils literal">os.urandom()</tt> opened a file descriptor to read from
<tt class="docutils literal">/dev/urandom</tt> and then closed it. It was decided to use a single private
file descriptor and keep it open to prevent <tt class="docutils literal">EMFILE</tt> or <tt class="docutils literal">ENFILE</tt> errors
(too many open files) under high system loads with many threads: see the issue
#18756.</p>
<p>The private file descriptor introduced a backward incompatible change in badly
written programs. The code was modified to call <tt class="docutils literal">fstat()</tt> to check if the
file descriptor was closed and then replaced with a different file descriptor
(but same number): check if <tt class="docutils literal">st_dev</tt> or <tt class="docutils literal">st_ino</tt> attributes changed.</p>
<p>In 2014, the new Linux kernel 3.17 added a new <tt class="docutils literal">getrandom()</tt> syscall which
gives access to random bytes without having to handle a file descriptor. I
modified <tt class="docutils literal">os.urandom()</tt> to call <tt class="docutils literal">getrandom()</tt> to avoid file descriptors,
but a different issue appeared.</p>
</div>
<div class="section" id="getrandom-hangs-at-system-startup">
<h2>getrandom() hangs at system startup</h2>
<p>On embedded devices and virtual machines, Python 3.5 started to hang at
startup.</p>
<p>On Debian, a systemd script used Python to compute a MD5 checksum, but Python
was blocked during its initialization. Other users reported that Python blocked
on importing the <tt class="docutils literal">random</tt> module, sometimes imported indirectly by a
different module.</p>
<p>Python was blocked on the <tt class="docutils literal">getrandom(0)</tt> syscall, waiting until the system
collected enough entropy to initialize the urandom pool. It took longer than 90
seconds, so systemd killed the service with a timeout. As a consequence, the
system boot takes longer than 90 seconds or can even fail!</p>
</div>
<div class="section" id="fix-python-startup">
<h2>Fix Python startup</h2>
<p>The fix was obvious: call <tt class="docutils literal">getrandom(GRND_NONBLOCK)</tt> which fails immediately
if the call would block, and fall back on reading from <tt class="docutils literal">/dev/urandom</tt> which
doesn't block even if the entropy pool is not initialized yet.</p>
<p>Quickly, our security experts complained that falling back on <tt class="docutils literal">/dev/urandom</tt>
makes Python less secure. When the fall back path is taken, <tt class="docutils literal">/dev/urandom</tt>
returns random number not suitable for security purpose (initialized with low
entropy), wheras <a class="reference external" href="https://docs.python.org/dev/library/os.html#os.urandom">os.urandom() documenation</a> says: "The returned
data should be unpredictable enough for cryptographic applications" (and
"though its exact quality depends on the OS implementation.").</p>
<p>Calling <tt class="docutils literal">getrandom()</tt> in blocking mode for <tt class="docutils literal">os.urandom()</tt> makes Python more
secure, but it doesn't fix the startup bug.</p>
</div>
<div class="section" id="discussion-storm">
<h2>Discussion storm</h2>
<p>The proposed change started a huge rain of messages. More than 200 messages,
maybe even more than 500 messages, on the bug tracker and python-dev mailing
list. Everyone became a security expert and wanted to give his/her very
important opinion, without listening to other arguments.</p>
<p>Two Python security experts left the discussion.</p>
<p>I also ignored new messages. I simply had not enough time to read all of them,
and the discussion tone made me angry.</p>
</div>
<div class="section" id="new-mailing-list-and-two-new-peps">
<h2>New mailing list and two new PEPs</h2>
<p>A new <tt class="docutils literal"><span class="pre">security-sig</span></tt> mailing list, subtitled "os.urandom rehab clinic", was
created just to take a decision on <tt class="docutils literal">os.urandom()</tt>!</p>
<p>Nick Coghlan wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0522/">PEP 522: Allow BlockingIOError in security sensitive
APIs</a>. Basically: he considers
that there is no good default behaviour when <tt class="docutils literal">os.urandom()</tt> would block, so
raise an exception to let users decide.</p>
<p>I wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0524/">PEP 524: Make os.urandom() blocking on Linux</a>. My PEP proposes to make
<tt class="docutils literal">os.urandom()</tt> blocking, <em>but</em> also modify Python startup to fall back on
non-blocking RNG to initialize the secret hash seed and the <tt class="docutils literal">random</tt> module
(which is <em>not</em> sensitive for security, except of <tt class="docutils literal">random.SystemRandom</tt>).</p>
<p>Nick's PEP describes an important use case: be able to check if
<tt class="docutils literal">os.urandom()</tt> would block. Instead of adding a flag to <tt class="docutils literal">os.urandom()</tt>,
I chose to expose the low-level C
<tt class="docutils literal">getrandom()</tt> function as a new Python <tt class="docutils literal">os.getrandom()</tt> function. Calling
<tt class="docutils literal">os.getrandom(1, os.GRND_NONBLOCK)</tt> raises a <tt class="docutils literal">BlockingIOError</tt> exception,
as Nick proposed for <tt class="docutils literal">os.urandom()</tt>, so it's possible to decide what to do in
this case.</p>
<p>While both PEPs are valid, IMHO my PEP was <em>less</em> backward incompatible,
simpler and maybe closer to what users <em>expect</em>. The "os.urandom() would block"
case is a special case with my PEP, but my PEP allows to decide what to do in
that case (thanks to <tt class="docutils literal">os.getrandom()</tt>).</p>
<p>Guido van Rossum approved my PEP and rejected Nick's PEP. I worked with Nick to
implement my PEP.</p>
</div>
<div class="section" id="python-3-6-changes">
<h2>Python 3.6 changes</h2>
<p>I added a new <tt class="docutils literal">os.getrandom()</tt> function: expose the Linux
<tt class="docutils literal">getrandom()</tt> syscall (issue #27778). I also added the two getrandom() flags:
<tt class="docutils literal">os.GRND_NONBLOCK</tt> and <tt class="docutils literal">os.GRND_RANDOM</tt>.</p>
<p>I modified <tt class="docutils literal">os.urandom()</tt> to block on Linux: call <tt class="docutils literal">getrandom(0)</tt>
instead of <tt class="docutils literal">getrandom(GRND_NONBLOCK)</tt> (issue #27776).</p>
<p>I also added a private <tt class="docutils literal">_PyOS_URandomNonblock()</tt> function used to initialize
the hash secret and used by <tt class="docutils literal">random.Random.seed()</tt> (used to initialize the
<tt class="docutils literal">random</tt> module).</p>
<p>The <tt class="docutils literal">os.urandom()</tt> function now blocks in Python 3.6 on Linux 3.17 and newer
until the system urandom entropy pool is initialized to increase the security.</p>
</div>
<div class="section" id="read-also-lwn-articles">
<h2>Read also LWN articles</h2>
<ul class="simple">
<li><a class="reference external" href="https://lwn.net/Articles/606141/">A system call for random numbers: getrandom()</a> (July 2014)</li>
<li><a class="reference external" href="https://lwn.net/Articles/693189/">Python's os.urandom() in the absence of entropy</a> (July 2016) -- this story</li>
<li><a class="reference external" href="https://lwn.net/Articles/711013/">The long road to getrandom() in glibc</a> (January 2017)</li>
</ul>
</div>
My contributions to CPython during 2016 Q22017-02-12T18:00:00+01:002017-02-12T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-12:/contrib-cpython-2016q2.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q2
(april, may, june):</p>
<pre class="literal-block">
hg log -r 'date("2016-04-01"):date("2016-06-30")' --no-merges -u Stinner
</pre>
<p>Statistics: 52 non-merge commits + 22 merge commits (total: 74 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q1.html">My contributions to CPython during 2016 Q1</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to
CPython during 2016 Q3</a>.</p>
<div class="section" id="start-of-my-work-on-optimization">
<h2>Start of …</h2></div><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q2
(april, may, june):</p>
<pre class="literal-block">
hg log -r 'date("2016-04-01"):date("2016-06-30")' --no-merges -u Stinner
</pre>
<p>Statistics: 52 non-merge commits + 22 merge commits (total: 74 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q1.html">My contributions to CPython during 2016 Q1</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to
CPython during 2016 Q3</a>.</p>
<div class="section" id="start-of-my-work-on-optimization">
<h2>Start of my work on optimization</h2>
<p>During 2016 Q2, I started to spend more time on optimizing CPython.</p>
<p>I experimented a change on CPython: a new FASTCALL calling convention to avoid
the creation of a temporary tuple to pass positional argulments: <a class="reference external" href="http://bugs.python.org/issue26814">issue26814</a>. Early results were really good: calling
builtin functions became between 20% and 50% faster!</p>
<p>Quickly, my optimization work was blocked by unreliable benchmarks. I spent the
rest of the year 2016 analyzing benchmarks and making benchmarks more stable.</p>
</div>
<div class="section" id="subprocess-now-emits-resourcewarning">
<h2>subprocess now emits ResourceWarning</h2>
<p>subprocess.Popen destructor now emits a ResourceWarning warning if the child
process is still running (issue #26741). The warning helps to track and fix
zombi processes. I updated asyncio to prevent a false ResourceWarning (warning
whereas the child process completed): asyncio now copies the child process exit
status to the internal Popen object.</p>
<p>I also fixed the POSIX implementation of subprocess.Popen._execute_child(): it
now sets the returncode attribute from the child process exit status when exec
failed.</p>
</div>
<div class="section" id="security-fix-potential-shell-injections-in-ctypes-util">
<h2>Security: fix potential shell injections in ctypes.util</h2>
<p>I rewrote methods of the ctypes.util module using <tt class="docutils literal">os.popen()</tt>. I replaced
<tt class="docutils literal">os.popen()</tt> with <tt class="docutils literal">subprocess.Popen</tt> without shell (issue #22636) to fix a
class of security vulneratiblity, "shell injection" (inject arbitrary shell
commands to take the control of a computer).</p>
<p>The <tt class="docutils literal">os.popen()</tt> function uses a shell, so there is a risk if the command
line arguments are not properly escaped for shell. Using <tt class="docutils literal">subproces.Popen</tt>
without shell fixes completely the risk.</p>
<p>Note: the <tt class="docutils literal">ctypes</tt> is generally not considered as "safe", but it doesn't harm
to make it more secure ;-)</p>
</div>
<div class="section" id="optimization-pymem-malloc-now-uses-pymalloc">
<h2>Optimization: PyMem_Malloc() now uses pymalloc</h2>
<p>PyMem_Malloc() now uses the fast Python "pymalloc" memory allocator which is
optimized for small objects with a short lifetime (issue #26249). The change
makes some benchmarks up to 4% faster.</p>
<p>This change was possible thanks to the whole preparation work I did in the 2016
Q1, especially the new GIL check in memory allocator debug hooks and the new
<tt class="docutils literal">PYTHONMALLOC=debug</tt> environment variable enabling these hooks on a Python
compiled in released mode.</p>
<p>I tested lxml, Pillow, cryptography and numpy before pushing the change,
as asked by Marc-Andre Lemburg. All these projects work with the change, except
of numpy. I wrote a fix for numpy: <a class="reference external" href="https://github.com/numpy/numpy/pull/7404">Use PyMem_RawMalloc on Python 3.4 and newer</a>, merged one month later (my first
contribution to numy!).</p>
<p>The change indirectly helped to identify and fix a memory leak in the
<tt class="docutils literal">formatfloat()</tt> function used to format bytes strings: <tt class="docutils literal"><span class="pre">b"%f"</span> % 1.2</tt> (issue
#25349, #26249).</p>
</div>
<div class="section" id="optimization">
<h2>Optimization</h2>
<p>Issue #27056: Optimize pickle.load() and pickle.loads(), up to 10% faster to
deserialize a lot of small objects. I found this optimization using Linux perf
on Python compiled with PGO. My change implements manually the optimization if
Python is not compiled with PGO.</p>
<p>Issue #26770: When <tt class="docutils literal">set_inheritable()</tt> is implemented with <tt class="docutils literal">fcntl()</tt>, don't
call <tt class="docutils literal">fcntl()</tt> twice if the <tt class="docutils literal">FD_CLOEXEC</tt> flag is already set to the
requested value. Linux uses <tt class="docutils literal">ioctl()</tt> and so always only need a single
syscall.</p>
</div>
<div class="section" id="changes">
<h2>Changes</h2>
<ul>
<li><p class="first">Issue #26716: Replace IOError with OSError in fcntl documentation, IOError is
a deprecated alias to OSError since Python 3.3.</p>
</li>
<li><p class="first">Issue #26639: Replace the deprecated <tt class="docutils literal">imp</tt> module with the <tt class="docutils literal">importlib</tt>
module in <tt class="docutils literal">Tools/i18n/pygettext.py</tt>. Remove <tt class="docutils literal">_get_modpkg_path()</tt>,
replaced with <tt class="docutils literal">importlib.util.find_spec()</tt>.</p>
</li>
<li><p class="first">Issue #26735: Fix os.urandom() on Solaris 11.3 and newer when reading more
than 1024 bytes: call getrandom() multiple times with a limit of 1024 bytes
per call.</p>
</li>
<li><p class="first">configure: fix <tt class="docutils literal">HAVE_GETRANDOM_SYSCALL</tt> check, syscall() function requires
<tt class="docutils literal">#include <unistd.h></tt>.</p>
</li>
<li><p class="first">Issue #26766: Fix _PyBytesWriter_Finish(). Return a bytearray object when
bytearray is requested and when the small buffer is used. Fix also
test_bytes: bytearray%args must return a bytearray type.</p>
</li>
<li><p class="first">Issue #26777: Fix random failure of test_asyncio.test_timeout_disable() on
the "AMD64 FreeBSD 9.x 3.5" buildbot:</p>
<pre class="literal-block">
File ".../Lib/test/test_asyncio/test_tasks.py", line 2398, in go
self.assertTrue(0.09 < dt < 0.11, dt)
AssertionError: False is not true : 0.11902812402695417
</pre>
<p>Replace <tt class="docutils literal">< 0.11</tt> with <tt class="docutils literal">< 0.15</tt>.</p>
</li>
<li><p class="first">Backport test_gdb fix for s390x buildbots to Python 3.5.</p>
</li>
<li><p class="first">Cleanup import.c: replace <tt class="docutils literal">PyUnicode_RPartition()</tt> with
<tt class="docutils literal">PyUnicode_FindChar()</tt> and <tt class="docutils literal">PyUnicode_Substring()</tt> to avoid the creation
of a temporary tuple. Use <tt class="docutils literal">PyUnicode_FromFormat()</tt> to build a string and
avoid the single_dot ('.') singleton.</p>
</li>
<li><p class="first">regrtest now uses subprocesses when the <tt class="docutils literal"><span class="pre">-j1</span></tt> command line option is used:
each test file runs in a fresh child process. Before, the -j1 option was
ignored. <tt class="docutils literal">Tools/buildbot/test.bat</tt> script now uses -j1 by default to run
each test file in fresh child process.</p>
</li>
<li><p class="first">regrtest: display test result (passed, failed, ...) after each test
completion. In multiprocessing mode: always display the result. In sequential
mode: only display the result if the test did not pass</p>
</li>
<li><p class="first">Issue #27278: Fix <tt class="docutils literal">os.urandom()</tt> implementation using <tt class="docutils literal">getrandom()</tt> on
Linux. Truncate size to <tt class="docutils literal">INT_MAX</tt> and loop until we collected enough random
bytes, instead of casting a directly <tt class="docutils literal">Py_ssize_t</tt> to <tt class="docutils literal">int</tt>.</p>
</li>
</ul>
</div>
<div class="section" id="contributions">
<h2>Contributions</h2>
<p>I also pushed a few changes written by other contributors.</p>
<p>Issue #26839: <tt class="docutils literal">os.urandom()</tt> doesn't block on Linux anymore. On Linux,
<tt class="docutils literal">os.urandom()</tt> now calls getrandom() with <tt class="docutils literal">GRND_NONBLOCK</tt> to fall back on
reading <tt class="docutils literal">/dev/urandom</tt> if the urandom entropy pool is not initialized yet.
Patch written by <strong>Colm Buckley</strong>. This issue started a huge annoying discussion
around random number generation on the bug tracker and the python-dev mailing
list. I later wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0524/">PEP 524: Make os.urandom() blocking on Linux</a> to fix the issue!</p>
<p>Other changes:</p>
<ul class="simple">
<li>Issue #26647: Cleanup opcode: simplify code to build <tt class="docutils literal">opcode.opname</tt>. Patch
written by <strong>Demur Rumed</strong>.</li>
<li>Issue #26647: Cleanup modulefinder: use <tt class="docutils literal">dis.opmap[name]</tt> rather than
<tt class="docutils literal">dis.opname.index(name)</tt>. Patch written by <strong>Demur Rumed</strong>.</li>
<li>Issue #26801: Fix error handling in <tt class="docutils literal">shutil.get_terminal_size()</tt>: catch
AttributeError instead of NameError. Skip the functional test of test_shutil
using the <tt class="docutils literal">stty size</tt> command if the <tt class="docutils literal">os.get_terminal_size()</tt> function is
missing. Patch written by <strong>Emanuel Barry</strong>.</li>
<li>Issue #26802: Optimize function calls only using unpacking like
<tt class="docutils literal"><span class="pre">func(*tuple)</span></tt> (no other positional argument, no keyword argument): avoid
copying the tuple. Patch written by <strong>Joe Jevnik</strong>.</li>
<li>Issue #21668: Add missing libm dependency in setup.py: link audioop,
_datetime, _ctypes_test modules to libm, except on Mac OS X. Patch written by
<strong>Chi Hsuan Yen</strong>.</li>
<li>Issue #26799: Fix python-gdb.py: don't get C types at startup, only on
demand. The C types can change if python-gdb.py is loaded before loading the
Python executable in gdb. Patch written by <strong>Thomas Ilsche</strong>.</li>
<li>Issue #27057: Fix os.set_inheritable() on Android, ioctl() is blocked by
SELinux and fails with EACCESS. The function now falls back to fcntl(). Patch
written by <strong>Michał Bednarski</strong>.</li>
<li>Issue #26647: Fix typo in test_grammar. Patch written by <strong>Demur Rumed</strong>.</li>
</ul>
</div>
My contributions to CPython during 2016 Q12017-02-09T17:00:00+01:002017-02-09T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-09:/contrib-cpython-2016q1.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q1
(january, februrary, march):</p>
<pre class="literal-block">
hg log -r 'date("2016-01-01"):date("2016-03-31")' --no-merges -u Stinner
</pre>
<p>Statistics: 196 non-merge commits + 33 merge commits (total: 229 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q4.html">My contributions to CPython during 2015 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q2.html">My contributions to
CPython during 2016 Q2</a>.</p>
<div class="section" id="summary">
<h2>Summary</h2>
<p>Since …</p></div><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q1
(january, februrary, march):</p>
<pre class="literal-block">
hg log -r 'date("2016-01-01"):date("2016-03-31")' --no-merges -u Stinner
</pre>
<p>Statistics: 196 non-merge commits + 33 merge commits (total: 229 commits).</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q4.html">My contributions to CPython during 2015 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q2.html">My contributions to
CPython during 2016 Q2</a>.</p>
<div class="section" id="summary">
<h2>Summary</h2>
<p>Since this report is much longer than I expected, here are the highlights:</p>
<ul class="simple">
<li>Python 8: no pep8, no chocolate!</li>
<li>AST enhancements coming from FAT Python</li>
<li>faulthandler now catchs Windows fatal exceptions</li>
<li>New PYTHONMALLOC environment variable</li>
<li>tracemalloc: new C API and support multiple address spaces</li>
<li>ResourceWarning warnings now come with a traceback</li>
<li>PyMem_Malloc() now fails if the GIL is not held</li>
<li>Interesting bug: reentrant flag in tracemalloc</li>
</ul>
</div>
<div class="section" id="python-8-no-pep8-no-chocolate">
<h2>Python 8: no pep8, no chocolate!</h2>
<p>I prepared an April Fool: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-March/143603.html">[Python-Dev] The next major Python version will be
Python 8</a> :-)</p>
<p>I increased Python version to 8, added the <tt class="docutils literal">pep8</tt> module and modified
<tt class="docutils literal">importlib</tt> to raise an <tt class="docutils literal">ImportError</tt> if a module is not PEP8-compliant!</p>
</div>
<div class="section" id="ast-enhancements-coming-from-fat-python">
<h2>AST enhancements coming from FAT Python</h2>
<p>Changes coming from my <a class="reference external" href="http://faster-cpython.readthedocs.io/fat_python.html">FAT Python</a> (AST optimizer, run
ahead of time):</p>
<p>The compiler now ignores constant statements like <tt class="docutils literal">b'bytes'</tt> (issue #26204).
I had to replace constant statement with expressions to prepare the change (ex:
replace <tt class="docutils literal">b'bytes'</tt> with <tt class="docutils literal">x = b'bytes'</tt>). First, the compiler emited a
<tt class="docutils literal">SyntaxWarning</tt>, but it was quickly decided to let linters to emit such
warnings to not annoy users: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-February/143163.html">read the thread on python-dev</a>.</p>
<p>Example, Python 3.5:</p>
<pre class="literal-block">
>>> def f():
... b'bytes'
...
>>> import dis; dis.dis(f)
2 0 LOAD_CONST 1 (b'bytes')
3 POP_TOP
4 LOAD_CONST 0 (None)
7 RETURN_VALUE
</pre>
<p>Python 3.6:</p>
<pre class="literal-block">
>>> def f():
... b'bytes'
...
>>> import dis; dis.dis(f)
1 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
</pre>
<p>Other changes:</p>
<ul class="simple">
<li>Issue #26107: The format of the co_lnotab attribute of code objects changes
to support negative line number delta. It allows AST optimizers to move
instructions without breaking Python tracebacks. Change needed by the loop
unrolling optimization of FAT Python.</li>
<li>Issue #26146: Add a new kind of AST node: <tt class="docutils literal">ast.Constant</tt>. It can be used by
external AST optimizers like FAT Python, but the compiler does not emit
directly such node. Update code to accept ast.Constant instead of ast.Num
and/or ast.Str.</li>
<li>Issue #26146: <tt class="docutils literal">marshal.loads()</tt> now uses the empty frozenset singleton. It
fixes a test failure in FAT Python and reduces the memory footprint.</li>
</ul>
</div>
<div class="section" id="faulthandler-now-catchs-windows-fatal-exceptions">
<h2>faulthandler now catchs Windows fatal exceptions</h2>
<p>I enhanced the faulthandler.enable() function on Windows to set a
handler for Windows fatal exceptions using <tt class="docutils literal">AddVectoredExceptionHandler()</tt>
(issue #23848).</p>
<p>Windows exceptions are the native way to handle fatal errors on Windows,
whereas UNIX signals SIGSEGV, SIGFPE and SIGABRT are "emulated" on top of that.</p>
</div>
<div class="section" id="new-pythonmalloc-environment-variable">
<h2>New PYTHONMALLOC environment variable</h2>
<p>I added a new <tt class="docutils literal">PYTHONMALLOC</tt> environment variable (issue #26516) to set the
Python memory allocators.</p>
<p><tt class="docutils literal">PYTHONMALLOC=debug</tt> enables debug hooks on a Python compiled in release
mode, whereas Python 3.5 requires to recompile Python in debug mode. These
hooks implements various checks:</p>
<ul class="simple">
<li>Detect <strong>buffer underflow</strong>: write before the start of the buffer</li>
<li>Detect <strong>buffer overflow</strong>: write after the end of the buffer</li>
<li>Detect API violations, ex: <tt class="docutils literal">PyObject_Free()</tt> called on a buffer
allocated by <tt class="docutils literal">PyMem_Malloc()</tt></li>
<li>Check if the GIL is held when allocator functions of PYMEM_DOMAIN_OBJ (ex:
<tt class="docutils literal">PyObject_Malloc()</tt>) and PYMEM_DOMAIN_MEM (ex: <tt class="docutils literal">PyMem_Malloc()</tt>) domains
are called</li>
</ul>
<p>Moreover, logging a fatal memory error now uses the tracemalloc module to get
the traceback where a memory block was allocated. Example of a buffer overflow
using <tt class="docutils literal">python3.6 <span class="pre">-X</span> tracemalloc=5</tt> (store 5 frames in traces):</p>
<pre class="literal-block">
Debug memory block at address p=0x7fbcd41666f8: API 'o'
4 bytes originally requested
The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
The 8 pad bytes at tail=0x7fbcd41666fc are not all FORBIDDENBYTE (0xfb):
at tail+0: 0x02 *** OUCH
at tail+1: 0xfb
at tail+2: 0xfb
...
The block was made by call #1233329 to debug malloc/realloc.
Data at p: 1a 2b 30 00
Memory block allocated at (most recent call first):
File "test/test_bytes.py", line 323
File "unittest/case.py", line 600
...
Fatal Python error: bad trailing pad byte
Current thread 0x00007fbcdbd32700 (most recent call first):
File "test/test_bytes.py", line 323 in test_hex
File "unittest/case.py", line 600 in run
...
</pre>
<p><tt class="docutils literal">PYTHONMALLOC=malloc</tt> forces the usage of the system <tt class="docutils literal">malloc()</tt> allocator.
This option can be used with Valgrind. Without this option, Valgrind emits tons
of false alarms in the Python <tt class="docutils literal">pymalloc</tt> memory allocator.</p>
</div>
<div class="section" id="tracemalloc-new-c-api-and-support-multiple-address-spaces">
<h2>tracemalloc: new C API and support multiple address spaces</h2>
<p>Antoine Pitrou and Nathaniel Smith asked me to enhance the tracemalloc module:</p>
<ul class="simple">
<li>Add a C API to be able to manually track/untrack memory blocks, to track
the memory allocated by custom memory allocators. For example, numpy uses
allocators with a specific memory alignment for SIMD instructions.</li>
<li>Support tracking memory of different address spaces. For example, central
(CPU) memory and GPU memory for numpy.</li>
</ul>
<div class="section" id="support-multiple-address-spaces">
<h3>Support multiple address spaces</h3>
<p>I made deep changes in the <tt class="docutils literal">hashtable.c</tt> code (simple C implementation of an
hash table used by <tt class="docutils literal">_tracemalloc</tt>) to support keys of a variable size (issue
#26588), instead of using an hardcoded <tt class="docutils literal">void *</tt> size. It allows to support
keys larger than <tt class="docutils literal">sizeof(void*)</tt>, but also to use <em>less</em> memory for keys
smaller than <tt class="docutils literal">sizeof(void*)</tt> (ex: <tt class="docutils literal">int</tt> keys).</p>
<p>Then I extended the C <tt class="docutils literal">_tracemalloc</tt> module and the Python <tt class="docutils literal">tracemalloc</tt> to
add a new <tt class="docutils literal">domain</tt> attribute to traces: add <tt class="docutils literal">Trace.domain</tt> attribute and
<tt class="docutils literal">tracemalloc.DomainFilter</tt> class.</p>
<p>The final step was to optimize the memory footprint of _tracemalloc. Start with
compact keys (<tt class="docutils literal">Py_uintptr_t</tt> type) and only switch to <tt class="docutils literal">pointer_t</tt> keys when
the first memory block with a non-zero domain is tracked (when one more one
address space is used). So the <tt class="docutils literal">_tracemalloc</tt> memory usage doesn't change by
default in Python 3.6!</p>
</div>
<div class="section" id="c-api">
<h3>C API</h3>
<p>I added a private C API (issue #26530):</p>
<pre class="literal-block">
int _PyTraceMalloc_Track(_PyTraceMalloc_domain_t domain, Py_uintptr_t ptr, size_t size);
int _PyTraceMalloc_Untrack(_PyTraceMalloc_domain_t domain, Py_uintptr_t ptr);
</pre>
<p>I waited for Antoine and Nathaniel feedback on this API, but the API remains
private in Python 3.6 since none reviewed it.</p>
</div>
</div>
<div class="section" id="resourcewarning-warnings-now-come-with-a-traceback">
<h2>ResourceWarning warnings now come with a traceback</h2>
<div class="section" id="final-result">
<h3>Final result</h3>
<p>Before going to explain the long development of the feature, let's see an
example of the final result! Example with the script <tt class="docutils literal">example.py</tt>:</p>
<pre class="literal-block">
import warnings
def func():
return open(__file__)
f = func()
f = None
</pre>
<p>Output of the command <tt class="docutils literal">python3.6 <span class="pre">-Wd</span> <span class="pre">-X</span> tracemalloc=5 example.py</tt>:</p>
<pre class="literal-block">
example.py:7: ResourceWarning: unclosed file <_io.TextIOWrapper name='example.py' mode='r' encoding='UTF-8'>
f = None
Object allocated at (most recent call first):
File "example.py", lineno 4
return open(__file__)
File "example.py", lineno 6
f = func()
</pre>
<p>The <tt class="docutils literal">Object allocated at <span class="pre">(...)</span></tt> part is the new feature ;-)</p>
</div>
<div class="section" id="add-source-parameter-to-warnings">
<h3>Add source parameter to warnings</h3>
<p>Python 3 logs <tt class="docutils literal">ResourceWarning</tt> warnings when a resource is not closed
properly to help developers to handle resources correctly. The problem is that
the warning is only logged when the object is destroy, which can occur far from
the object creation and can occur on a line unrelated to the object because of
the garbage collector.</p>
<p>I added a new <tt class="docutils literal">tracemalloc</tt> module to Python 3.4 which has an interesting
<tt class="docutils literal">tracemalloc.get_object_traceback()</tt> function. If tracemalloc traced the
allocation of an object, it is able to provide later the traceback where the
object was allocated.</p>
<p>I wanted to modify the <tt class="docutils literal">warnings</tt> module to call
<tt class="docutils literal">get_object_traceback()</tt>, but I noticed that it wasn't possible
to easily extend the <tt class="docutils literal">warnings</tt> API because this module allows to override
<tt class="docutils literal">showwarning()</tt> and <tt class="docutils literal">formatwarning()</tt> functions and these
functions have a fixed number of parameters. Example:</p>
<pre class="literal-block">
def showwarning(message, category, filename, lineno, file=None, line=None):
...
</pre>
<p>With the issue #26568, I added new <tt class="docutils literal">_showwarnmsg()</tt> and <tt class="docutils literal">_formatwarnmsg()</tt>
functions to the warnings module which get a <tt class="docutils literal">warnings.WarningMessage</tt> object
instead of a list of parameters:</p>
<pre class="literal-block">
def _showwarnmsg(msg):
...
</pre>
<p>I added a <tt class="docutils literal">source</tt> attribute to <tt class="docutils literal">warnings.WarningMessage</tt> (issue #26567)
and a new optional <tt class="docutils literal">source</tt> parameter to <tt class="docutils literal">warnings.warn()</tt> (issue #26604):
the leaked resource object. I modified <tt class="docutils literal">_formatwarnmsg()</tt> to log the
traceback where resource was allocated, if available.</p>
<p>The tricky part was to fix corner cases when the following functions of the
<tt class="docutils literal">warnings</tt> module are overriden:</p>
<ul class="simple">
<li><tt class="docutils literal">formatwarning()</tt>, <tt class="docutils literal">showwarning()</tt></li>
<li><tt class="docutils literal">_formatwarnmsg()</tt>, <tt class="docutils literal">_showwarnmsg()</tt></li>
</ul>
</div>
<div class="section" id="set-the-source-parameter">
<h3>Set the source parameter</h3>
<p>I started to modify modules to set the source parameter when logging
<tt class="docutils literal">ResourceWarning</tt> warnings.</p>
<p>The easy part was to modify <tt class="docutils literal">asyncore</tt>, <tt class="docutils literal">asyncio</tt> and <tt class="docutils literal">_pyio</tt> modules to
set the <tt class="docutils literal">source</tt> parameter. These modules are implemented in Python, the
change was just to add <tt class="docutils literal">source=self</tt>. Example of <tt class="docutils literal">asyncio</tt> destructor:</p>
<pre class="literal-block">
def __del__(self):
if not self.is_closed():
warnings.warn("unclosed event loop %r" % self, ResourceWarning,
source=self)
if not self.is_running():
self.close()
</pre>
<p>Note: The warning is logged before the resource is closed to provide more
information in <tt class="docutils literal">repr()</tt>. Many objects clear most information in their
<tt class="docutils literal">close()</tt> method.</p>
<p>Modifying C modules was more tricky than expected. I had to implement
"finalizers" (<a class="reference external" href="https://www.python.org/dev/peps/pep-0442/">PEP 432: Safe object finalization</a>) for the <tt class="docutils literal">_socket.socket</tt> type
(issue #26590) and for the <tt class="docutils literal">os.scandir()</tt> iterator (issue #26603).</p>
</div>
<div class="section" id="more-reliable-warnings">
<h3>More reliable warnings</h3>
<p>The Python shutdown process is complex, and some Python functions are broken
during the shutdown. I enhanced the warnings module to handle nicely these
failures and try to log warnings anyway.</p>
<p>I modified <tt class="docutils literal">warnings.formatwarning()</tt> to catch <tt class="docutils literal">linecache.getline()</tt>
failures on formatting the traceback.</p>
<p>Logging the resource traceback is complex, so I only implemented it in Python.
Python tries to use the Python <tt class="docutils literal">warnings</tt> module if it was imported, or falls
back on the C <tt class="docutils literal">_warnings</tt> module. To get the resource traceback at Python
shutdown, I modified the C module to try to import the Python warning:
<tt class="docutils literal">_warnings.warn_explicit()</tt> now tries to import the Python warnings module if
the source parameter is set to be able to log the traceback where the source
was allocated (issue #26592).</p>
</div>
<div class="section" id="fix-resourcewarning-warnings">
<h3>Fix ResourceWarning warnings</h3>
<p>Since it became easy to debug these warnings, I fixed some of them in the
Python test suite:</p>
<ul class="simple">
<li>Issue #26620: Fix ResourceWarning in test_urllib2_localnet. Use context
manager on urllib objects and use self.addCleanup() to cleanup resources even
if a test is interrupted with CTRL+c</li>
<li>Issue #25654: multiprocessing: open file with <tt class="docutils literal">closefd=False</tt> to avoid
ResourceWarning. _test_multiprocessing: open file with <tt class="docutils literal">O_EXCL</tt> to detect
bugs in tests (if a previous test forgot to remove TESTFN).
<tt class="docutils literal">test_sys_exit()</tt>: remove TESTFN after each loop iteration</li>
<li>Fix <tt class="docutils literal">ResourceWarning</tt> in test_unittest when interrupted</li>
</ul>
</div>
</div>
<div class="section" id="pymem-malloc-now-fails-if-the-gil-is-not-held">
<h2>PyMem_Malloc() now fails if the GIL is not held</h2>
<p>Since using the mall object allocator (<tt class="docutils literal">pymalloc)</tt>) for dictionary key
storage showed speedup for the dict type (issue #23601), I proposed to
generalize the change, use <tt class="docutils literal">pymalloc</tt> for <tt class="docutils literal">PyMem_Malloc()</tt>: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-February/143084.html">[Python-Dev]
Modify PyMem_Malloc to use pymalloc for performance</a>.</p>
<p>The main issue was that the change means that <tt class="docutils literal">PyMem_Malloc()</tt> now requires
to hold the GIL, whereas it didn't before since it called directly
<tt class="docutils literal">malloc()</tt>.</p>
<div class="section" id="check-if-the-gil-is-held">
<h3>Check if the GIL is held</h3>
<p>CPython has a <tt class="docutils literal">PyGILState_Check()</tt> function to check if the GIL is held.
Problem: the function doesn't work with subinterpreters: see issues #10915 and
#15751.</p>
<p>I added an internal flag to <tt class="docutils literal">PyGILState_Check()</tt> (issue #26558) to skip the
test. The flag value is false at startup, set to true once the GIL is fully
initialized (Python initialization), set to false again when the GIL is
destroyed (Python finalization). The flag is also set to false when the first
subinterpreter is created.</p>
<p>This hack works around <tt class="docutils literal">PyGILState_Check()</tt> limitations allowing to call
<cite>PyGILState_Check()`</cite> anytime to debug more bugs earlier.</p>
<p><tt class="docutils literal">_Py_dup()</tt>, <tt class="docutils literal">_Py_fstat()</tt>, <tt class="docutils literal">_Py_read()</tt> and <tt class="docutils literal">_Py_write()</tt> are
low-level helper functions for system functions, but these functions require
the GIL to be held. Thanks to the <tt class="docutils literal">PyGILState_Check()</tt> enhancement, it
became possible to check the GIL using an assertion.</p>
</div>
<div class="section" id="pymem-malloc-and-gil">
<h3>PyMem_Malloc() and GIL</h3>
<p>Issue #26563: Debug hooks on Python memory allocators now raise a fatal error
if memory allocator functions like PyMem_Malloc() and PyMem_Malloc() are called
without holding the GIL.</p>
<p>The change spotted two bugs which I fixed:</p>
<ul class="simple">
<li>Issue #26563: Replace PyMem_Malloc() with PyMem_RawMalloc() in the Windows
implementation of os.stat(), the code is called without holding the
GIL.</li>
<li>Issue #26563: Fix usage of PyMem_Malloc() in overlapped.c. Replace
PyMem_Malloc() with PyMem_RawFree() since PostToQueueCallback() calls
PyMem_Free() in a new C thread which doesn't hold the GIL.</li>
</ul>
<p>I wasn't able to switch <tt class="docutils literal">PyMem_Malloc()</tt> to <tt class="docutils literal">pymalloc</tt> in this quarter,
since it took more a lot of time to implement requested checks and test third
party modules.</p>
</div>
<div class="section" id="fatal-error-and-faulthandler">
<h3>Fatal error and faulthandler</h3>
<p>I enhanced the faulthandler module to work in non-Python threads (issue
#26563). I fixed <tt class="docutils literal">Py_FatalError()</tt> if called without holding the GIL: don't
try to print the current exception, nor try to flush stdout and stderr: only
dump the traceback of Python threads.</p>
</div>
</div>
<div class="section" id="interesting-bug-reentrant-flag-in-tracemalloc">
<h2>Interesting bug: reentrant flag in tracemalloc</h2>
<p>A bug annoyed me a lot: a random assertion error related to a reentrant flag in
the _tracemalloc module.</p>
<p>Story starting in the <a class="reference external" href="http://bugs.python.org/issue26588#msg262125">middle of the issue #26588 (2016-03-21)</a>. While working on issue #26588,
"_tracemalloc: add support for multiple address spaces (domains)", I noticed an
assertion failure in set_reentrant(), a helper function to set a <em>Thread Local
Storage</em> (TLS), on a buildbot:</p>
<pre class="literal-block">
python: ./Modules/_tracemalloc.c:195: set_reentrant:
Assertion `PyThread_get_key_value(tracemalloc_reentrant_key) == ((PyObject *) &_Py_TrueStruct)' failed.
</pre>
<p>I was unable to reproduce the bug on my Fedora 23 (AMD64). After changes on my
patch, I pushed it the day after, but the assertion failed again. I added
assertions and debug informations. More failures, an interesting one on Windows
which uses a single process.</p>
<p>I added an assertion in tracemalloc_init() to ensure that the reeentrant flag
is set at the end of the function. The reentrant flag was no more set at
tracemalloc_start() entry for an unknown reason. I changed the module
initialization to no call tracemalloc_init() anymore, it's only called on
tracemalloc.start().</p>
<p>"The bug was seen on 5 buildbots yet: PPC Fedora, AMD64 Debian, s390x RHEL,
AMD64 Windows, x86 Ubuntu."</p>
<p>I finally understood and fixed the bug with the <a class="reference external" href="https://hg.python.org/cpython/rev/af1c1149784a">change af1c1149784a</a>: tracemalloc_start() and
tracemalloc_stop() don't clear/set the reentrant flag anymore.</p>
<p>The problem was that I expected that tracemalloc_init() and tracemalloc_start()
functions would always be called in the same thread, whereas it occurred that
tracemalloc_init() was called in thread A when the tracemalloc module is
imported, whereas tracemalloc_start() was called in thread B.</p>
</div>
<div class="section" id="other-commits">
<h2>Other commits</h2>
<div class="section" id="enhancements">
<h3>Enhancements</h3>
<p>The developers of the <tt class="docutils literal">vmprof</tt> profiler asked me to expose the atomic
variable <tt class="docutils literal">_PyThreadState_Current</tt>. The private variable was removed from
Python 3.5.1 API because the implementation of atomic variables depends on the
compiler, compiler options, etc. and so caused compilation issues. I added a
new private <tt class="docutils literal">_PyThreadState_UncheckedGet()</tt> function (issue #26154) which
gets the value of the variable without exposing its implementation.</p>
<p>Other enhancements:</p>
<ul class="simple">
<li>Issue #26099: The site module now writes an error into stderr if
sitecustomize module can be imported but executing the module raise an
ImportError. Same change for usercustomize.</li>
<li>Issue #26516: Enhance Python memory allocators documentation. Add link to
PYTHONMALLOCSTATS environment variable. Add parameters to PyMem macros like
PyMem_MALLOC().</li>
<li>Issue #26569: Fix pyclbr.readmodule() and pyclbr.readmodule_ex() to support
importing packages.</li>
<li>Issue #26564, #26516, #26563: Enhance documentation on memory allocator debug
hooks.</li>
<li>doctest now supports packages. Issue #26641: doctest.DocFileTest and
doctest.testfile() now support packages (module splitted into multiple
directories) for the package parameter.</li>
</ul>
</div>
<div class="section" id="bugfixes">
<h3>Bugfixes</h3>
<p>Issue #25843: When compiling code, don't merge constants if they are equal but
have a different types. For example, <tt class="docutils literal">f1, f2 = lambda: 1, lambda: 1.0</tt> is now
correctly compiled to two different functions: <tt class="docutils literal">f1()</tt> returns <tt class="docutils literal">1</tt> (int) and
<tt class="docutils literal">f2()</tt> returns <tt class="docutils literal">1.0</tt> (int), even if 1 and 1.0 are equal.</p>
<p>Other fixes:</p>
<ul class="simple">
<li>Issue #26101: Fix test_compilepath() of test_compileall. Exclude Lib/test/
from sys.path in test_compilepath(). The directory contains invalid Python
files like Lib/test/badsyntax_pep3120.py, whereas the test ensures that all
files can be compiled.</li>
<li>Issue #24520: Replace fpgetmask() with fedisableexcept(). On FreeBSD,
fpgetmask() was deprecated long time ago. fedisableexcept() is now
preferred.</li>
<li>Issue #26161: Use Py_uintptr_t instead of void* for atomic pointers in
pyatomic.h. Use atomic_uintptr_t when <stdatomic.h> is used. Using void*
causes compilation warnings depending on which implementation of atomic types
is used.</li>
<li>Issue #26637: The importlib module now emits an ImportError rather than a
TypeError if __import__() is tried during the Python shutdown process but
sys.path is already cleared (set to None).</li>
<li>doctest: fix _module_relative_path() error message. Write the module name
rather than <module> in the error message, if module has no __file__
attribute (ex: package).</li>
</ul>
</div>
<div class="section" id="fix-type-downcasts-on-windows-64-bit">
<h3>Fix type downcasts on Windows 64-bit</h3>
<p>In my spare time, I'm trying to fix a few compiler warnings on Windows 64-bit
where the C <tt class="docutils literal">long</tt> type is only 32-bit, whereas pointers are <tt class="docutils literal"><span class="pre">64-bit</span></tt> long:</p>
<ul class="simple">
<li>posix_getcwd(): limit to INT_MAX on Windows. It's more to fix a compiler
warning during compilation, I don't think that Windows support current
working directories larger than 2 GB :-)</li>
<li>_pickle: Fix load_counted_tuple(), use Py_ssize_t for size. Fix a warning on
Windows 64-bit.</li>
<li>getpathp.c: fix compiler warning, wcsnlen_s() result type is size_t.</li>
<li>compiler.c: fix compiler warnings on Windows</li>
<li>_msi.c: try to fix compiler warnings</li>
<li>longobject.c: fix compilation warning on Windows 64-bit. We know that
Py_SIZE(b) is -1 or 1 an so fits into the sdigit type.</li>
<li>On Windows, socket.setsockopt() now raises an OverflowError if the socket
option is larger than INT_MAX bytes.</li>
</ul>
</div>
<div class="section" id="unicode-bugfixes">
<h3>Unicode bugfixes</h3>
<ul class="simple">
<li>Issue #26227: On Windows, getnameinfo(), gethostbyaddr() and
gethostbyname_ex() functions of the socket module now decode the hostname
from the ANSI code page rather than UTF-8.</li>
<li>Issue #26217: Unicode resize_compact() must set wstr_length to 0 after
freeing the wstr string. Otherwise, an assertion fails in
_PyUnicode_CheckConsistency().</li>
<li>Issue #26464: Fix str.translate() when string is ASCII and first replacements
removes characters, but next replacements use a non-ASCII character or a
string longer than 1 character. Regression introduced in Python 3.5.0.</li>
</ul>
</div>
<div class="section" id="buildbot-tests">
<h3>Buildbot, tests</h3>
<p>Just to give you an idea of the work required to keep a working CI, here is the
list of changes I maded in a single quarter to make tests and Python buildbots
more reliable.</p>
<ul class="simple">
<li>Issue #26610: Skip test_venv.test_with_pip() if ctypes miss</li>
<li>test_asyncio: fix test_timeout_time(). Accept time delta up to 0.12 second,
instead of 0.11, for the "AMD64 FreeBSD 9.x" buildbot slave.</li>
<li>Issue #13305: Always test datetime.datetime.strftime("%4Y") for years < 1900.
Change quickly reverted, strftime("%4Y") fails on most platforms.</li>
<li>Issue #17758: Skip test_site if site.USER_SITE directory doesn't exist and
cannot be created.</li>
<li>Fix test_venv on FreeBSD buildbot. Ignore pip warning in
test_venv.test_with_venv().</li>
<li>Issue #26566: Rewrite test_signal.InterProcessSignalTests. Don't use
os.fork() with a subprocess to not inherit existing signal handlers or
threads: start from a fresh process. Use a timeout of 10 seconds to wait for
the signal instead of 1 second</li>
<li>Issue #26538: regrtest: Fix module.__path__. libregrtest: Fix setup_tests()
to keep module.__path__ type (_NamespacePath), don't convert to a list.
Add _NamespacePath.__setitem__() method to importlib._bootstrap_external.</li>
<li>regrtest: add time to output. Timestamps should help to debug slow buildbots,
and timeout and hang on buildbots.</li>
<li>regrtest: add timeout to main process when using -jN. libregrtest: add a
watchdog to run_tests_multiprocess() using faulthandler.dump_traceback_later().</li>
<li>Makefile: change default value of TESTTIMEOUT from 1 hour to 15 min.
The whole test suite takes 6 minutes on my laptop. It takes less than 30
minutes on most buildbots. The TESTTIMEOUT is the timeout for a single test
file.</li>
<li>Buildbots: change also Windows timeout from 1 hour to 15 min</li>
<li>regrtest: display test duration in sequential mode. Only display duration if
a test takes more than 30 seconds.</li>
<li>Issue #18787: Try to fix test_spwd on OpenIndiana. Try to get the "root"
entry which should exist on all UNIX instead of "bin" which doesn't exist on
OpenIndiana.</li>
<li>regrtest: fix --fromfile feature. Update code for the name regrtest output
format. Enhance also test_regrtest test on --fromfile</li>
<li>regrtest: mention if tests run sequentially or in parallel</li>
<li>regrtest: when parallel tests are interrupted, display progress</li>
<li>support.temp_dir(): call support.rmtree() instead of shutil.rmtree(). Try
harder to remove directories on Windows.</li>
<li>rt.bat: use -m test instead of Libtestregrtest.py</li>
<li>Refactor regrtest.</li>
<li>Fix test_warnings.test_improper_option(). test_warnings: only run
test_improper_option() and test_warnings_bootstrap() once. The unit test
doesn't depend on self.module.</li>
<li>Fix test_os.test_symlink(): remove created symlink.</li>
<li>Issue #26643: Add missing shutil resources to regrtest.py</li>
<li>test_urllibnet: set timeout on test_fileno(). Use the default timeout of 30
seconds to avoid blocking forever.</li>
<li>Issue #26295: When using "python3 -m test --testdir=TESTDIR", regrtest
doesn't add "test." prefix to test module names. regrtest also prepends
testdir to sys.path.</li>
<li>Issue #26295: test_regrtest now uses a temporary directory</li>
</ul>
</div>
<div class="section" id="contributions">
<h3>Contributions</h3>
<p>I also pushed a few changes written by other contributors:</p>
<ul class="simple">
<li>Issue #25907: Use {% trans %} tags in HTML templates to ease the translation
of the documentation. The tag comes from Jinja templating system, used by
Sphinx. Patch written by <strong>Julien Palard</strong>.</li>
<li>Issue #26248: Enhance os.scandir() doc, patch written by Ben Hoyt:</li>
<li>Fix error message in asyncio.selector_events. Patch written by <strong>Carlo
Beccarini</strong>.</li>
<li>Issue #16851: Fix inspect.ismethod() doc, return also True if object is an
unbound method. Patch written by <strong>Anna Koroliuk</strong>.</li>
<li>Issue #26574: Optimize bytes.replace(b'', b'.') and bytearray.replace(b'', b'.'):
up to 80% faster. Patch written by <strong>Josh Snider</strong>.</li>
</ul>
</div>
</div>
Analysis of a Python performance issue2016-11-19T00:30:00+01:002016-11-19T00:30:00+01:00Victor Stinnertag:vstinner.github.io,2016-11-19:/analysis-python-performance-issue.html<p>I am working on the CPython benchmark suite (<a class="reference external" href="https://github.com/python/performance">performance</a>) and I run the benchmark suite to
upload results to <a class="reference external" href="http://speed.python.org/">speed.python.org</a>. While
analying results, I noticed a temporary peak on the <tt class="docutils literal">call_method</tt>
benchmark at October 19th:</p>
<img alt="call_method microbenchmark" src="https://vstinner.github.io/images/call_method.png" />
<p>The graphic shows the performance of the <tt class="docutils literal">call_method</tt> microbenchmark between
Feb 29, 2016 …</p><p>I am working on the CPython benchmark suite (<a class="reference external" href="https://github.com/python/performance">performance</a>) and I run the benchmark suite to
upload results to <a class="reference external" href="http://speed.python.org/">speed.python.org</a>. While
analying results, I noticed a temporary peak on the <tt class="docutils literal">call_method</tt>
benchmark at October 19th:</p>
<img alt="call_method microbenchmark" src="https://vstinner.github.io/images/call_method.png" />
<p>The graphic shows the performance of the <tt class="docutils literal">call_method</tt> microbenchmark between
Feb 29, 2016 and November 17, 2016 on the <tt class="docutils literal">default</tt> branch of CPython. The average
is around 17.2 ms, whereas the peak is at 29.0 ms: <strong>68% slower</strong>!</p>
<p>The server has two "Intel(R) Xeon(R) CPU X5680 @ 3.33GHz" CPUs, total: 24
logical cores (12 physical cores with HyperThreading). This CPU was launched in
2010 and based on the <a class="reference external" href="https://en.wikipedia.org/wiki/Gulftown">Westmere-EP microarchitecture</a>. Westmere-EP is based on Westmere,
which is the 32 nm shrink of the Nehalem microarchitecture.</p>
<div class="section" id="reproduce-results">
<h2>Reproduce results</h2>
<p>Before going too far, the first step is to validate that results are
reproductible: reboot the computer, recompile Python, run again the benchmark.</p>
<p>Instead of running the full benchmark suite, install Python, ..., we will run
directly the benchmark manually using the Python freshly built in its source
code directory.</p>
<p>Interesting dots on the graphic (can be seen at speed.python.org, not on the
screenshot):</p>
<ul class="simple">
<li>678fe178da0d, Oct 09, 17.0 ms: "Fast"</li>
<li>1ce50f7027c1, Oct 19, 28.9 ms: "Slow"</li>
<li>36af3566b67a, Nov 3, 16.9 ms: Fast again</li>
</ul>
<p>I use the following directories:</p>
<ul class="simple">
<li>~/perf: GitHub haypo/perf project</li>
<li>~/performance: GitHub python/performance project</li>
<li>~/cpython: Mercurial CPython repository</li>
</ul>
<p>Tune the system for benchmarks:</p>
<pre class="literal-block">
sudo python3 -m perf system tune
</pre>
<p>Note: all <tt class="docutils literal">system</tt> commands in this article are optional. They help to reduce
the operating system jitter (make benchmarks more reliablee).</p>
<p>Fast:</p>
<pre class="literal-block">
$ hg up -C -r 678fe178da0d
$ ./configure --with-lto -C && make clean && make
$ mv python python-fast
$ PYTHONPATH=~/perf ./python-fast ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast
call_method: Median +- std dev: 17.0 ms +- 0.1 ms
</pre>
<p>Slow:</p>
<pre class="literal-block">
$ hg up -C -r 1ce50f7027c1
$ ./configure --with-lto -C && make clean && make
$ mv python python-slow
$ PYTHONPATH=~/perf ./python-slow ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast
call_method: Median +- std dev: 29.3 ms +- 0.9 ms
</pre>
<p>We reproduced the significant benchmark result: 17 ms => 29 ms.</p>
<p>I use <tt class="docutils literal">./configure</tt> and <tt class="docutils literal">make clean</tt> instead of incremental compilation,
<tt class="docutils literal">make</tt> command, to avoid compilation errors, and to avoid potential side
effects only caused by the incremental compilation.</p>
</div>
<div class="section" id="analysis-with-the-linux-perf-tool">
<h2>Analysis with the Linux perf tool</h2>
<p>To collect perf events, we will run the benchmark with <tt class="docutils literal"><span class="pre">--worker</span></tt> to run a
single process and with <tt class="docutils literal"><span class="pre">-w0</span> <span class="pre">-n100</span></tt> to run the benchmark long enough: 100
samples means at least 10 seconds (a single sample takes at least 100 ms).</p>
<p>First, reset the system configuration to reset the Linux perf configuration:</p>
<pre class="literal-block">
sudo python3 -m perf system reset
</pre>
<p>Note: <tt class="docutils literal">python3 <span class="pre">-m</span> perf system tune</tt> reduces the sampling rate of Linux perf
to reduce operating system jitter.</p>
</div>
<div class="section" id="perf-stat">
<h2>perf stat</h2>
<p>Command to get general statistics on the benchmark:</p>
<pre class="literal-block">
$ perf stat ./python-slow ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -v -w0 -n100
</pre>
<p>"Fast" results:</p>
<pre class="literal-block">
Performance counter stats for ./python-fast:
3773.585194 task-clock (msec) # 0.998 CPUs utilized
369 context-switches # 0.098 K/sec
0 cpu-migrations # 0.000 K/sec
8,300 page-faults # 0.002 M/sec
12,981,234,867 cycles # 3.440 GHz [83.27%]
1,460,980,720 stalled-cycles-frontend # 11.25% frontend cycles idle [83.36%]
435,806,788 stalled-cycles-backend # 3.36% backend cycles idle [66.72%]
29,982,530,201 instructions # 2.31 insns per cycle
# 0.05 stalled cycles per insn [83.40%]
5,613,631,616 branches # 1487.612 M/sec [83.40%]
16,006,564 branch-misses # 0.29% of all branches [83.27%]
3.780064486 seconds time elapsed
</pre>
<p>"Slow" results:</p>
<pre class="literal-block">
Performance counter stats for ./python-slow:
5906.239860 task-clock (msec) # 0.998 CPUs utilized
556 context-switches # 0.094 K/sec
0 cpu-migrations # 0.000 K/sec
8,393 page-faults # 0.001 M/sec
20,651,474,102 cycles # 3.497 GHz [83.36%]
8,480,803,345 stalled-cycles-frontend # 41.07% frontend cycles idle [83.37%]
4,247,826,420 stalled-cycles-backend # 20.57% backend cycles idle [66.64%]
30,011,465,614 instructions # 1.45 insns per cycle
# 0.28 stalled cycles per insn [83.32%]
5,612,485,730 branches # 950.264 M/sec [83.36%]
13,584,136 branch-misses # 0.24% of all branches [83.29%]
5.915402403 seconds time elapsed
</pre>
<p>Significant differences, Fast => Slow:</p>
<ul class="simple">
<li>Instruction per cycle: 2.31 => 1.45</li>
<li>stalled-cycles-frontend: <strong>11.25% => 41.07%</strong></li>
<li>stalled-cycles-backend: <strong>3.36% => 20.57%</strong></li>
</ul>
<p>The increase of stalled cycles is interesting. Since the code is supposed to be
identical, it probably means that fetching instructions is slower. It sounds
like an issue with CPU caches.</p>
</div>
<div class="section" id="statistics-on-the-cpu-l1-instruction-cache">
<h2>Statistics on the CPU L1 instruction cache</h2>
<p>The <tt class="docutils literal">perf list</tt> command can be used to get the name of events collecting
statistics on the CPU L1 instruction cache:</p>
<pre class="literal-block">
$ perf list | grep L1
L1-icache-loads [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
(...)
</pre>
<p>Collect statistics on the CPU L1 instruction cache:</p>
<pre class="literal-block">
PYTHONPATH=~/perf perf stat -e L1-icache-loads,L1-icache-load-misses ./python-slow ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -w0 -n10
</pre>
<p>"Fast" statistics:</p>
<pre class="literal-block">
Performance counter stats for './python-fast (...)':
10,134,106,571 L1-icache-loads
10,917,606 L1-icache-load-misses # 0.11% of all L1-icache hits
3.775067668 seconds time elapsed
</pre>
<p>"Slow" statistics:</p>
<pre class="literal-block">
Performance counter stats for './python-slow (...)':
10,753,371,258 L1-icache-loads
848,511,308 L1-icache-load-misses # 7.89% of all L1-icache hits
6.020490449 seconds time elapsed
</pre>
<p>Cache misses on the L1 cache: <strong>0.1%</strong> (Fast) => <strong>8.0%</strong> (Slow).</p>
<p>The slow Python has <strong>71.7x more L1 cache misses</strong> than the fast Python! It can
explain the significant performance drop.</p>
<div class="section" id="perf-report">
<h3>perf report</h3>
<p>The <tt class="docutils literal">perf record</tt> command can be used to collect statistics on the functions
where the benchmark spends most of its time. Commands:</p>
<pre class="literal-block">
PYTHONPATH=~/perf perf record ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -v -w0 -n100
perf report
</pre>
<p>Output:</p>
<pre class="literal-block">
40.27% python python [.] _PyEval_EvalFrameDefault
10.30% python python [.] call_function
10.21% python python [.] PyFrame_New
8.56% python python [.] frame_dealloc
5.51% python python [.] PyObject_GenericGetAttr
(...)
</pre>
<p>More than 64% of the time is spent in these 5 functions.</p>
</div>
<div class="section" id="system-tune">
<h3>system tune</h3>
<p>To run benchmark, tune again the system for benchmarks:</p>
<pre class="literal-block">
sudo python3 -m perf system tune
</pre>
</div>
</div>
<div class="section" id="hg-bisect">
<h2>hg bisect</h2>
<p>To find the revision which introduces the performance slowdown, we use a
shell script to automate the bisection of the Mercurial history.</p>
<p><tt class="docutils literal">cmd.sh</tt> script checking if a revision is fast or slow:</p>
<pre class="literal-block">
set -e -x
./configure --with-lto -C && make clean && make
rm -f json
PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -o json -v
PYTHONPATH=~/perf python3 cmd.py json
</pre>
<p><tt class="docutils literal">cmd.sh</tt> uses the following <tt class="docutils literal">cmd.py</tt> script which checks if the benchmark
is slow: if it takes longer than 23 ms (average between 17 ans 29 ms):</p>
<pre class="literal-block">
import perf, sys
bench = perf.Benchmark.load('json')
bad = (29 + 17) / 2.0
ms = bench.median() * 1e3
if ms >= bad:
print("BAD! %.1f ms >= %.1f ms" % (ms, bad))
sys.exit(1)
else:
print("good: %.1f ms < %.1f ms" % (ms, bad))
</pre>
<p>In the bisection, "good" means "fast" (17 ms), whereas "bad" means "slow" (29
ms). The peak, revision 1ce50f7027c1, is used as the first "bad" revision. The
previous fast revision before the peak is 678fe178da0d, our first "good"
revision.</p>
<p>Commands to identify the first revision which introduced the slowdown:</p>
<pre class="literal-block">
hg bisect --reset
hg bisect -b 1ce50f7027c1
hg bisect -g 678fe178da0d
time hg bisect -c ./cmd.sh
</pre>
<p>3 min 52 sec later:</p>
<pre class="literal-block">
The first bad revision is:
changeset: 104531:83877018ef97
parent: 104528:ce85a1f129e3
parent: 104530:2d352bf2b228
user: Serhiy Storchaka <storchaka@gmail.com>
date: Tue Oct 18 13:27:54 2016 +0300
files: Misc/NEWS
description:
Issue #23782: Fixed possible memory leak in _PyTraceback_Add() and exception
loss in PyTraceBack_Here().
</pre>
<p>Thank you <tt class="docutils literal">hg bisect</tt>! I love this tool.</p>
<p>Even if I trust <tt class="docutils literal">hg bisect</tt>, I don't trust benchmarks, so I recheck manually:</p>
<p>Slow:</p>
<pre class="literal-block">
$ hg up -C -r 83877018ef97
$ ./configure --with-lto -C && make clean && make
$ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast
call_method: Median +- std dev: 29.4 ms +- 1.8 ms
</pre>
<p>Use <tt class="docutils literal">hg parents</tt> to get the latest fast revision:</p>
<pre class="literal-block">
$ hg parents -r 83877018ef97
changeset: 104528:ce85a1f129e3
(...)
changeset: 104530:2d352bf2b228
branch: 3.6
(...)
</pre>
<p>Check the parent:</p>
<pre class="literal-block">
$ hg up -C -r ce85a1f129e3
$ ./configure --with-lto -C && make clean && make
$ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast
call_method: Median +- std dev: 17.1 ms +- 0.1 ms
</pre>
<p>The revision ce85a1f129e3 is fast and the following revision 83877018ef97 is
slow. <strong>The revision 83877018ef97 introduced the slowdown</strong>. We found it!</p>
</div>
<div class="section" id="analysis-of-the-revision-introducing-the-slowdown">
<h2>Analysis of the revision introducing the slowdown</h2>
<p>The <a class="reference external" href="https://hg.python.org/cpython/rev/83877018ef97/">revision 83877018ef97</a>
changes two files: Misc/NEWS and Python/traceback.c. The NEWS file is only
documentation and so must not impact performances. Python/traceback.c is part
of the C code and so is more interesting.</p>
<p>The commit only changes two C functions: <tt class="docutils literal">PyTraceBack_Here()</tt> and
<tt class="docutils literal">_PyTraceback_Add()</tt>, but <tt class="docutils literal">perf report</tt> didn't show these functions as "hot".
In fact, these functions are never called by the benchmark.</p>
<p><strong>The commit doesn't touch the C code used in the benchmark.</strong></p>
<p>Unrelated C change impacting performances reminds me my previous <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html">deadcode
horror story</a>. The performance
difference is probably caused by <strong>"code placement"</strong>: <tt class="docutils literal">perf stat</tt> showed a
significant increase of the cache miss rate on the L1 instruction cache.</p>
</div>
<div class="section" id="use-gcc-attribute-hot">
<h2>Use GCC __attribute__((hot))</h2>
<p>Using PGO compilation was the solution for deadcode, but PGO doesn't work on
Ubuntu 14.04 (the OS used by the benchmark server, speed-python) and PGO seems
to make benchmarks less reliable.</p>
<p>I wanted to try something else: mark hot functions using the GCC
<tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> attribute. PGO compilation does this automatically.</p>
<p>This attribute only has an impact on the code placement: where functions are
loaded in memory. The flag declares functions in the <tt class="docutils literal">.text.hot</tt> ELF section
rather than the <tt class="docutils literal">.text</tt> ELF section. Grouping hot functions in the same
functions helps to reduce the distance between functions and so enhance the
usage of CPU caches.</p>
<p>I wrote and then pushed a patch in the <a class="reference external" href="http://bugs.python.org/issue28618">issue #28618</a>: "Decorate hot functions using
__attribute__((hot)) to optimize Python".</p>
<p>The patch marks 6 functions as hot:</p>
<ul class="simple">
<li><tt class="docutils literal">_PyEval_EvalFrameDefault()</tt></li>
<li><tt class="docutils literal">call_function()</tt></li>
<li><tt class="docutils literal">_PyFunction_FastCall()</tt></li>
<li><tt class="docutils literal">PyFrame_New()</tt></li>
<li><tt class="docutils literal">frame_dealloc()</tt></li>
<li><tt class="docutils literal">PyErr_Occurred()</tt></li>
</ul>
<p>Let's try the patch:</p>
<pre class="literal-block">
$ hg up -C -r 83877018ef97
$ wget https://hg.python.org/cpython/raw-rev/59b91b4e9506 -O patch
$ patch -p1 < patch
$ ./configure --with-lto -C && make clean && make
$ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast
call_method: Median +- std dev: 16.7 ms +- 0.3 ms
</pre>
<p>It's easy to make mistakes and benchmarks are always suprising, so let's retry
without the patch:</p>
<pre class="literal-block">
$ hg up -C -r 83877018ef97
$ ./configure --with-lto -C && make clean && make
$ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast
call_method: Median +- std dev: 29.3 ms +- 0.6 ms
</pre>
<p>The check confirms that the GCC attribute fixed the issue!</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>On modern Intel CPUs, the code placement can have a major impact on the
performance of microbenchmarks.</p>
<p>The GCC <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> attribute can be used manually to make "hot
functions" close in memory to enhance the usage of CPU caches.</p>
<p>To know more about the impact of code placement, see the very good talk of Zia
Ansari (Intel) at the LLVM Developers' Meeting 2016: <a class="reference external" href="https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86">Causes of Performance
Swings Due to Code Placement in IA</a>.
He describes well "performance swings" like the one described in this article
and explains how CPUs work internally and how code placement impacts CPU
performances.</p>
</div>
Intel CPUs (part 2): Turbo Boost, temperature, frequency and Pstate C0 bug2016-09-23T23:00:00+02:002016-09-23T23:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-09-23:/intel-cpus-part2.html<p class="first last">Intel CPUs (part 2): Turbo Boost, temperature, frequency and Pstate C0 bug</p>
<p>My first article <a class="reference external" href="https://vstinner.github.io/intel-cpus.html">Intel CPUs</a> is a general
introduction on modern CPU technologies having an impact on benchmarks.</p>
<p>This second article is much more concrete with numbers and a concrete bug
having a major impact on benchmarks: a benchmark suddenly becomes 2x faster!</p>
<p>I will tell you how I first noticed the bug, which tests I ran to analyze the
issue, how I found commands to reproduce the bug, and finally how I identified
the bug.</p>
<div class="section" id="glitch-in-benchmarks">
<h2>"Glitch" in benchmarks</h2>
<p>Last week I ran a benchmark to check if enabling Profile Guided Optimization
(PGO) when compiling Python makes benchmark results less stable. I recompiled
Python 5 times, and after each compilation I ran a benchmark. I tested
different commands and options to compile Python. Everything was fine until
the last benchmark of the last compilation. <strong>The benchmark suddenly became 2
times faster.</strong></p>
<p>Hopefully, my perf module collects a lot of metadata. I was able to analyze
in depth what happened.</p>
<p>The "glitch" occurred in a benchmark having 400 runs (benchmark run in 400
different processes), between the run 105 (20.3 ms) and the run 106
(11.0 ms).</p>
<p>I noticed that the CPU temperature was between 69°C and 72°C until the run 105,
and then decreased to from 69°C to 58°C.</p>
<p>The system load slowly increased from 1.25 up to 1.62 around the run 108 and
then slowly decreased to 1.00.</p>
<p>The system was not idle while the benchmark was running. I was working on the
PC too! But according to timestamps, it seems like the glitch was close to when
I stopped working. When I stopped working, I closed all applications (except of
the benchmark running in background) and turned of my two monitors.</p>
<p>Well, at this point, it's hard to correlate for sure an event with the major
performance change.</p>
<p>So I started to analyze different factors affecting CPUs and benchmarks: Turbo
Boost, CPU temperature and CPU frequency.</p>
</div>
<div class="section" id="impact-of-turbo-boost-on-benchmarks">
<h2>Impact of Turbo Boost on benchmarks</h2>
<p>Without Turbo Boost, the maximum frequency of the "Intel(R) Core(TM) i7-3520M
CPU @ 2.90GHz" of my laptop is 2.9 GHz. With Turbo Boost, the maximum
frequency is 3.6 GHz if only one core is active, or 3.4 GHz otherwise:</p>
<pre class="literal-block">
$ sudo cpupower frequency-info
...
boost state support:
Supported: yes
Active: yes
3400 MHz max turbo 4 active cores
3400 MHz max turbo 3 active cores
3400 MHz max turbo 2 active cores
3600 MHz max turbo 1 active cores
</pre>
<p>I ran the bm_call_simple.py microbenchmark (CPU-bound) of performance 0.2.2.</p>
<p>Turbo Boost disabled:</p>
<ul class="simple">
<li>1 physical CPU active: 2.9 GHz, Median +- std dev: 14.6 ms +- 0.3 ms</li>
<li>2 physical CPU active: 2.9 GHz, Median +- std dev: 14.7 ms +- 0.5 ms</li>
</ul>
<p>Turbo Boost enabled:</p>
<ul class="simple">
<li>1 physical CPU active: 3.6 GHz, Median +- std dev: 11.8 ms +- 0.3 ms</li>
<li>2 physical CPU active: 3.4 GHz, Median +- std dev: 12.4 ms +- 0.1 ms</li>
</ul>
<p><strong>The maximum performance boost is 19% faster</strong> (14.6 ms => 11.8 ms), the
minimum boost if 15% faster (14.6 ms => 12.4 ms).</p>
<p>Hum, I don't think that Turbo Boost can explain the bug.</p>
</div>
<div class="section" id="impact-of-the-cpu-temperature-on-benchmarks">
<h2>Impact of the CPU temperature on benchmarks</h2>
<p>The CPU temperature is mentionned in Intel Turbo Boost documentation as a
factor used to decide which P-state will be used. I always wanted to check how
the CPU temperature impacts its performance.</p>
<div class="section" id="burn-the-cpu-of-my-desktop-pc">
<h3>Burn the CPU of my desktop PC</h3>
<p>CPU of my desktop PC: "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz".</p>
<p>I used my <a class="reference external" href="https://github.com/vstinner/misc/blob/master/bin/system_load.py">system_load.py script</a> to generate a
system load higher than 10.</p>
<p>When the fan is cooling correctly the CPU, all CPU run at 3.4 GHz (Turbo Boost
was disabled) and the CPU temperature is 66°C.</p>
<p>I used a simple sheet of paper to block the fan of my CPU. Yeah, I really
wanted to <a class="reference external" href="https://www.youtube.com/watch?v=Xf0VuRG7MN4">burn my CPU</a>! More
seriously, I checked the CPU temperature every second using the <tt class="docutils literal">sensors</tt>
command and was prepared to unblock the fan if sometimes gone wrong.</p>
<img alt="Sheet of paper blocking the CPU fan" src="https://vstinner.github.io/images/paper_blocks_cpu_fan.jpg" />
<p>After one minute, the CPU reached 97°C. I expected a system crash, smoke or
something worse, but I was disappointed. <strong>At 97°C, I was still able to use my
computer as everything was fine. The CPU was slowly down automatically to the
minimum CPU frequency: 1533 MHz</strong> according to turbostat (the minimum frequency
of this CPU is 1.6 GHz).</p>
<p>When I unblocked the fan, the temperature decreased quickly to go back to its
previous state (62°C) and the CPU frequency quickly increased to 3.4 GHz as
well.</p>
<p>My Intel CPU is really impressive! I didn't expect such very efficient
protection against overheating!</p>
</div>
<div class="section" id="burn-my-laptop-cpu">
<h3>Burn my laptop CPU</h3>
<p>I used my system_load.py script to get a system load over 200. I also opened 4
tabs in Firefox playing Youtube videos to stress also the GPU which is
integrated into the CPU (IGP) on such laptop.</p>
<img alt="Stress test playing Youtube videos in Firefox, CPU at 102°" src="https://vstinner.github.io/images/burn_cpu_firefox.jpg" />
<p>With such crazy stress test, the CPU temperature was "only" 83°C.</p>
<p>Using a simple tissue, I closed the air hole used by the CPU fan. <strong>When the
CPU temperature increased from 100°C to 101°C, the CPU frequency started slowly
to decrease from 3391 MHz to 3077 MHz</strong> (with steps between 10 MHz and 50 MHz
every second, or something like that).</p>
<p>When pushing hard the tissue and waiting longer than 5 minutes, the CPU
temperature increased up to 102°C, but the CPU frequency was only decreased
from 3.4 GHz (Turbo Mode with 4 active logical CPUs) to 3.1 GHz.</p>
<p>The maximum frequency is 2.9 GHz. Frequencies higher than 2.9 GHz means that
the Turbo Mode was enabled! It means that <strong>even with overheating, the CPU is
still fine and able to "overclock" itself!</strong></p>
<p>Again, I was disapointed. With a CPU at 102°C, my laptop was still super fast
and reactive. It seems like mobile CPUs handle even better overheating than
desktop CPUs (which is not something suprising at all).</p>
</div>
</div>
<div class="section" id="impact-of-the-cpu-frequency-on-benchmarks">
<h2>Impact of the CPU frequency on benchmarks</h2>
<p>I ran the bm_call_simple.py microbenchmark (CPU-bound) of performance 0.2.2
on my desktop PC.</p>
<p>Command to set the frequency of CPU 0 to the minimum frequency (1.6 GHz):</p>
<pre class="literal-block">
$ cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq|sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1600000
</pre>
<p>Command to set the frequency of CPU 0 to the maximum frequency (3.4 GHz):</p>
<pre class="literal-block">
$ cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq|sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3400000
</pre>
<ul class="simple">
<li>CPU running at 1.6 GHz (min freq): Median +- std dev: 27.7 ms +- 0.7 ms</li>
<li>CPU running at 3.4 GHz (min freq): Median +- std dev: 12.9 ms +- 0.2 ms</li>
</ul>
<p>The impact of the CPU frequency is quite obvious: <strong>when the CPU frequency is
doubled, the performance is also doubled</strong>. The benchmark is 53% faster (27.7
ms => 12.9 ms).</p>
</div>
<div class="section" id="bug-reproduced-and-then-identified-in-the-linux-cpu-driver">
<h2>Bug reproduced and then identified in the Linux CPU driver</h2>
<p>Two days ago, I ran a very simple "timeit" microbenchmark to try to bisect a
performance regression in Python 3.6 on <tt class="docutils literal">functools.partial</tt>. Again, suddenly,
the microbenchmark became 2x faster!</p>
<p>But this time, I found something: I noticed that running or stopping <tt class="docutils literal">cpupower
monitor</tt> and/or <tt class="docutils literal">turbostat</tt> can "enable" or "disable" the bug.</p>
<p>After a lot of tests, I understood that running the benchmark with turbostat
"disables" the bug, whereas running "cpupower monitor" while running a
benchmark enables the bug.</p>
<p>I reported the bug in the Fedora bug tracker, on the component kernel:
<a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1378529">intel_pstate C0 bug on isolated CPUs with the performance governor and
NOHZ_FULL</a>.</p>
<p>It seems like the bug is related to CPU isolation and NOHZ_FULL. The NOHZ_FULL
option is able to fully disable the scheduler clock interruption on isolated
CPUs. I understood the the <tt class="docutils literal">intel_pstate</tt> driver uses a callback on the
scheduler to update the Pstate of the CPU. According to an Intel engineer, the
<tt class="docutils literal">intel_pstate</tt> driver was never tested with CPU isolation.</p>
<p>The issue is not fully analyzed yet, but at least I succeeded to write a list
of commands to reproduce it with a success rate of 100% :-) Moreover, the Intel
engineer suggested to add an extra parameter to the Linux kernel command
(<tt class="docutils literal">rcu_nocbs=3,7</tt>) line which works around the issue.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>This article describes how I found and then identified a bug in the Linux
driver of my CPU.</p>
<p>Summary:</p>
<ul class="simple">
<li>The maximum speedup of Turbo Boost is 20%</li>
<li>Overheating on a dekstop PC can decrease the CPU frequency to its minimum
(half of the maximum in my case) which imply a slowdown of 50%</li>
<li>A bug in the Linux CPU driver changes suddenly the CPU frequency from its
minimum to maximum (or the opposite) which means a speedup of 50%
(or slowdown of 50%)</li>
</ul>
<p><strong>To get stable benchmarks, the safest fix for all these issues is probably to
set the CPU frequency of the CPUs used by benchmarks to the minimum.</strong>
It seems like nothing can reduce the frequency of a CPU below its minimum.</p>
<p><strong>When running benchmarks, raw timings and CPU performance don't matter. Only
comparisons between benchmark results and stable performances matter.</strong></p>
</div>
Intel CPUs: P-state, C-state, Turbo Boost, CPU frequency, etc.2016-07-15T12:00:00+02:002016-07-15T12:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-07-15:/intel-cpus.html<p class="first last">Intel CPUs: Hyper-threading, Turbo Boost, CPU frequency, etc.</p>
<p>Ten years ago, most computers were desktop computers designed for best
performances and their CPU frequency was fixed. Nowadays, most devices are
embedded and use <a class="reference external" href="https://en.wikipedia.org/wiki/Low-power_electronics">low power consumption</a> processors like ARM
CPUs. The power consumption now matters more than performance peaks.</p>
<p>Intel CPUs evolved from a single core to multiple physical cores in the same
<a class="reference external" href="https://en.wikipedia.org/wiki/CPU_socket">package</a> and got new features:
<a class="reference external" href="https://en.wikipedia.org/wiki/Hyper-threading">Hyper-threading</a> to run two
threads on the same physical core and <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Turbo_Boost">Turbo Boost</a> to maximum performances.
CPU cores can be completely turned off (CPU HALT, frequency of 0) temporarily to
reduce the power consumption, and the frequency of cores changes regulary
depending on many factors like the workload and temperature. The power
consumption is now an important part in the design of modern CPUs.</p>
<p>Warning! This article is a summary of what I learnt last weeks from random
articles. It may be full of mistakes, don't hesitate to report them, so I can
enhance the article! It's hard to find simple articles explaining performances
of modern Intel CPUs, so I tried to write mine.</p>
<div class="section" id="tools-used-in-this-article">
<h2>Tools used in this article</h2>
<p>This article mentions various tools. Commands to install them on Fedora 24:</p>
<p><tt class="docutils literal">dnf install <span class="pre">-y</span> <span class="pre">util-linux</span></tt>:</p>
<ul class="simple">
<li>lscpu</li>
</ul>
<p><tt class="docutils literal">dnf install <span class="pre">-y</span> <span class="pre">kernel-tools</span></tt>:</p>
<ul class="simple">
<li><a class="reference external" href="http://linux.die.net/man/1/cpupower">cpupower</a></li>
<li>turbostat</li>
</ul>
<p><tt class="docutils literal">sudo dnf install <span class="pre">-y</span> <span class="pre">msr-tools</span></tt>:</p>
<ul class="simple">
<li>rdmsr</li>
<li>wrmsr</li>
</ul>
<p>Other interesting tools, not used in this article: i7z (sadly no more
maintained), lshw, dmidecode, sensors.</p>
<p>The sensors tool is supposed to report the current CPU voltage, but it doesn't
provide this information on my computers. At least, it gives the temperature of
different components, but also the speed of fans.</p>
</div>
<div class="section" id="example-of-intel-cpus">
<h2>Example of Intel CPUs</h2>
<div class="section" id="my-laptop-cpu-proc-cpuinfo">
<h3>My laptop CPU: /proc/cpuinfo</h3>
<p>On Linux, the most common way to retrieve information on the CPU is to read
<tt class="docutils literal">/proc/cpuinfo</tt>. Example on my laptop:</p>
<pre class="literal-block">
selma$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
model name : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
cpu MHz : 1200.214
...
processor : 1
vendor_id : GenuineIntel
model name : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
cpu MHz : 3299.882
...
</pre>
<p>"i7-3520M" CPU is a model designed for Mobile Platforms (see the "M" suffix).
It was built in 2012 and is the third generation of the Intel i7
microarchitecture: <a class="reference external" href="https://en.wikipedia.org/wiki/Ivy_Bridge_(microarchitecture)">Ivy Bridge</a>.</p>
<p>The CPU has two physical cores, I disabled HyperThreading in the BIOS.</p>
<p>The first strange thing is that the CPU announces "2.90 GHz" but Linux reports
1.2 GHz on the first core, and 3.3 GHz on the second core. 3.3 GHz is greater
than 2.9 GHz!</p>
</div>
<div class="section" id="my-desktop-cpu-cpu-topology-with-lscpu">
<h3>My desktop CPU: CPU topology with lscpu</h3>
<p>cpuinfo:</p>
<pre class="literal-block">
smithers$ cat /proc/cpuinfo
processor : 0
physical id : 0
core id : 0
...
model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
cpu cores : 4
...
processor : 1
physical id : 0
core id : 1
...
(...)
processor : 7
physical id : 0
core id : 3
...
</pre>
<p>The CPU i7-2600 is the 2nd generation: <a class="reference external" href="https://en.wikipedia.org/wiki/Sandy_Bridge">Sandy Bridge microarchitecture</a>. There are 8 logical cores and 4
physical cores (so with Hyper-threading).</p>
<p>The <tt class="docutils literal">lscpu</tt> renders a short table which helps to understand the CPU topology:</p>
<pre class="literal-block">
smithers$ lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 3800.0000 1600.0000
1 0 0 1 1:1:1:0 yes 3800.0000 1600.0000
2 0 0 2 2:2:2:0 yes 3800.0000 1600.0000
3 0 0 3 3:3:3:0 yes 3800.0000 1600.0000
4 0 0 0 0:0:0:0 yes 3800.0000 1600.0000
5 0 0 1 1:1:1:0 yes 3800.0000 1600.0000
6 0 0 2 2:2:2:0 yes 3800.0000 1600.0000
7 0 0 3 3:3:3:0 yes 3800.0000 1600.0000
</pre>
<p>There are 8 logical CPUs (<tt class="docutils literal">CPU <span class="pre">0..7</span></tt>), all on the same node (<tt class="docutils literal">NODE 0</tt>) and
the same socket (<tt class="docutils literal">SOCKET 0</tt>). There are only 4 physical cores (<tt class="docutils literal">CORE
<span class="pre">0..3</span></tt>). For example, the physical core <tt class="docutils literal">2</tt> is made of the two logical CPUs:
<tt class="docutils literal">2</tt> and <tt class="docutils literal">6</tt>.</p>
<p>Using the <tt class="docutils literal">L1d:L1i:L2:L3</tt> column, we can see that each pair of two logical
cores share the same physical core caches for levels 1 (L1 data, L1
instruction) and 2 (L2). All physical cores share the same cache level 3 (L3).</p>
</div>
</div>
<div class="section" id="p-states">
<h2>P-states</h2>
<p>A new CPU driver <tt class="docutils literal">intel_pstate</tt> was added to the Linux kernel 3.9 (April
2009). First, it only supported SandyBridge CPUs (2nd generation), Linux 3.10
extended it to Ivybridge generation CPUs (3rd gen), and so on and so forth.</p>
<p>This driver supports recent features and thermal control of modern Intel CPUs.
Its name comes from P-states.</p>
<p>The processor P-state is the capability of running the processor at different
voltage and/or frequency levels. Generally, P0 is the highest state resulting
in maximum performance, while P1, P2, and so on, will save power but at some
penalty to CPU performance.</p>
<p>It is possible to force the legacy CPU driver (<tt class="docutils literal">acpi_cpufreq</tt>) using
<tt class="docutils literal">intel_pstate=disable</tt> option in the kernel command line.</p>
<p>See also:</p>
<ul class="simple">
<li><a class="reference external" href="https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt">Documentation of the intel-pstate driver</a></li>
<li><a class="reference external" href="https://plus.google.com/+ArjanvandeVen/posts/dLn9T4ehywL">Some basics on CPU P states on Intel processors</a> (2013) by Arjan
van de Ven (Intel)</li>
<li><a class="reference external" href="https://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf">Balancing Power and Performance in the Linux Kernel</a>
talk at LinuxCon Europe 2015 by Kristen Accardi (Intel)</li>
<li><a class="reference external" href="https://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1">What exactly is a P-state? (Pt. 1)</a>
(2008) by Taylor K. (Intel)</li>
</ul>
</div>
<div class="section" id="idle-states-c-states">
<h2>Idle states: C-states</h2>
<p>C-states are idle power saving states, in contrast to P-states, which are
execution power saving states.</p>
<p>During a P-state, the processor is still executing instructions, whereas during
a C-state (other than C0), the processor is idle, meaning that nothing is
executing.</p>
<p>C-states:</p>
<ul class="simple">
<li>C0 is the operational state, meaning that the CPU is doing useful work</li>
<li>C1 is the first idle state</li>
<li>C2 is the second idle state: The external I/O Controller Hub blocks
interrupts to the processor.</li>
<li>etc.</li>
</ul>
<p>When a logical processor is idle (C-state except of C0), its frequency is
typically 0 (HALT).</p>
<p>The <tt class="docutils literal">cpupower <span class="pre">idle-info</span></tt> command lists supported C-states:</p>
<pre class="literal-block">
selma$ cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:
Number of idle states: 6
Available idle states: POLL C1-IVB C1E-IVB C3-IVB C6-IVB C7-IVB
...
</pre>
<p>The <tt class="docutils literal">cpupower monitor</tt> shows statistics on C-states:</p>
<pre class="literal-block">
smithers$ sudo cpupower monitor -m Idle_Stats
|Idle_Stats
CPU | POLL | C1-S | C1E- | C3-S | C6-S
0| 0,00| 0,19| 0,09| 0,58| 96,23
4| 0,00| 0,00| 0,00| 0,00| 99,90
1| 0,00| 2,34| 0,00| 0,00| 97,63
5| 0,00| 0,00| 0,17| 0,00| 98,02
2| 0,00| 0,00| 0,00| 0,00| 0,00
6| 0,00| 0,00| 0,00| 0,00| 0,00
3| 0,00| 0,00| 0,00| 0,00| 0,00
7| 0,00| 0,00| 0,00| 0,00| 49,97
</pre>
<p>See also: <a class="reference external" href="https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states">Power Management States: P-States, C-States, and Package C-States</a>.</p>
</div>
<div class="section" id="turbo-boost-1">
<h2>Turbo Boost</h2>
<p>In 2005, Intel introduced <a class="reference external" href="https://en.wikipedia.org/wiki/SpeedStep">SpeedStep</a>, a serie of dynamic frequency
scaling technologies to reduce the power consumption of laptop CPUs. Turbo
Boost is an enhancement of these technologies, now also used on desktop and
server CPUs.</p>
<p>Turbo Boost allows to run one or many CPU cores to higher P-states than usual.
The maximum P-state is constrained by the following factors:</p>
<ul class="simple">
<li>The number of active cores (in C0 or C1 state)</li>
<li>The estimated current consumption of the processor (Imax)</li>
<li>The estimated power consumption (TDP - Thermal Design Power) of processor</li>
<li>The temperature of the processor</li>
</ul>
<p>Example on my laptop:</p>
<pre class="literal-block">
selma$ cat /proc/cpuinfo
model name : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
...
selma$ sudo cpupower frequency-info
analyzing CPU 0:
driver: intel_pstate
...
boost state support:
Supported: yes
Active: yes
3400 MHz max turbo 4 active cores
3400 MHz max turbo 3 active cores
3400 MHz max turbo 2 active cores
3600 MHz max turbo 1 active cores
</pre>
<p>The CPU base frequency is 2.9 GHz. If more than one physical cores is "active"
(busy), their frequency can be increased up to 3.4 GHz. If only 1 physical core
is active, its frequency can be increased up to 3.6 GHz.</p>
<p>In this example, Turbo Boost is supported and active.</p>
<p>See also the <a class="reference external" href="https://www.kernel.org/doc/Documentation/cpu-freq/boost.txt">Linux cpu-freq documentation on CPU boost</a>.</p>
<div class="section" id="turbo-boost-msr">
<h3>Turbo Boost MSR</h3>
<p>The bit 38 of the <a class="reference external" href="https://en.wikipedia.org/wiki/Model-specific_register">Model-specific register
(MSR)</a> <tt class="docutils literal">0x1a0</tt> can
be used to check if the Turbo Boost is enabled:</p>
<pre class="literal-block">
selma$ sudo rdmsr -f 38:38 0x1a0
0
</pre>
<p><tt class="docutils literal">0</tt> means that Turbo Boost is enabled, whereas <tt class="docutils literal">1</tt> means disabled (no
turbo). (The <tt class="docutils literal"><span class="pre">-f</span> 38:38</tt> option asks to only display the bit 38.)</p>
<p>If the command doesn't work, you may have to load the <tt class="docutils literal">msr</tt> kernel module:</p>
<pre class="literal-block">
sudo modprobe msr
</pre>
<p>Note: I'm not sure that all Intel CPU uses the same MSR.</p>
</div>
<div class="section" id="intel-state-no-turbo">
<h3>intel_state/no_turbo</h3>
<p>Turbo Boost can also be disabled at runtime in the <tt class="docutils literal">intel_pstate</tt> driver.</p>
<p>Check if Turbo Boost is enabled:</p>
<pre class="literal-block">
selma$ cat /sys/devices/system/cpu/intel_pstate/no_turbo
0
</pre>
<p>where <tt class="docutils literal">0</tt> means that Turbo Boost is enabled. Disable Turbo Boost:</p>
<pre class="literal-block">
selma$ echo 1|sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
</pre>
</div>
<div class="section" id="cpu-flag-ida">
<h3>CPU flag "ida"</h3>
<p>It looks like the Turbo Boost status (supported or not) can also be read by the
CPUID(6): "Thermal/Power Management". It gives access to the flag <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Dynamic_Acceleration">Intel
Dynamic Acceleration (IDA)</a>.</p>
<p>The <tt class="docutils literal">ida</tt> flag can also be seen in CPU flags of <tt class="docutils literal">/proc/cpuinfo</tt>.</p>
</div>
</div>
<div class="section" id="read-the-cpu-frequency">
<h2>Read the CPU frequency</h2>
<p>General information using <tt class="docutils literal">cpupower <span class="pre">frequency-info</span></tt>:</p>
<pre class="literal-block">
selma$ cpupower -c 0 frequency-info
analyzing CPU 0:
driver: intel_pstate
...
hardware limits: 1.20 GHz - 3.60 GHz
...
</pre>
<p>The frequency of CPUs is between 1.2 GHz and 3.6 GHz (the base frequency is
2.9 GHz on this CPU).</p>
<div class="section" id="get-the-frequency-of-cpus-turbostat">
<h3>Get the frequency of CPUs: turbostat</h3>
<p>It looks like the most reliable way to get a relialistic estimation of the CPUs
frequency is to use the tool <tt class="docutils literal">turbostat</tt>:</p>
<pre class="literal-block">
selma$ sudo turbostat
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz
- 224 7.80 2878 2893
0 448 15.59 2878 2893
1 0 0.01 2762 2893
CPU Avg_MHz Busy% Bzy_MHz TSC_MHz
- 139 5.65 2469 2893
0 278 11.29 2469 2893
1 0 0.01 2686 2893
...
</pre>
<ul class="simple">
<li><tt class="docutils literal">Avg_MHz</tt>: average frequency, based on APERF</li>
<li><tt class="docutils literal">Busy%</tt>: CPU usage in percent</li>
<li><tt class="docutils literal">Bzy_MHz</tt>: busy frequency, based on MPERF</li>
<li><tt class="docutils literal">TSC_MHz</tt>: fixed frequency, TSC stands for <a class="reference external" href="https://en.wikipedia.org/wiki/Time_Stamp_Counter">Time Stamp Counter</a></li>
</ul>
<p>APERF (average) and MPERF (maximum) are MSR registers that can provide feedback
on current CPU frequency.</p>
</div>
<div class="section" id="other-tools-to-get-the-cpu-frequency">
<h3>Other tools to get the CPU frequency</h3>
<p>It looks like the following tools are less reliable to estimate the CPU
frequency.</p>
<p>cpuinfo:</p>
<pre class="literal-block">
selma$ grep MHz /proc/cpuinfo
cpu MHz : 1372.289
cpu MHz : 3401.042
</pre>
<p>In April 2016, Len Brown proposed a patch modifying cpuinfo to use APERF and
MPERF MSR to estimate the CPU frequency: <a class="reference external" href="https://lkml.org/lkml/2016/4/1/7">x86: Calculate MHz using APERF/MPERF
for cpuinfo and scaling_cur_freq</a>.</p>
<p>The <tt class="docutils literal">tsc</tt> clock source logs the CPU frequency in kernel logs:</p>
<pre class="literal-block">
selma$ dmesg|grep 'MHz processor'
[ 0.000000] tsc: Detected 2893.331 MHz processor
</pre>
<p>cpupower frequency-info:</p>
<pre class="literal-block">
selma$ for core in $(seq 0 1); do sudo cpupower -c $core frequency-info|grep 'current CPU'; done
current CPU frequency: 3.48 GHz (asserted by call to hardware)
current CPU frequency: 3.40 GHz (asserted by call to hardware)
</pre>
<p>cpupower monitor:</p>
<pre class="literal-block">
selma$ sudo cpupower monitor -m 'Mperf'
|Mperf
CPU | C0 | Cx | Freq
0| 4.77| 95.23| 1924
1| 0.01| 99.99| 1751
</pre>
</div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>Modern Intel CPUs use various technologies to provide best performances without
killing the power consumption. It became harder to monitor and understand CPU
performances, than with older CPUs, since the performance now depends on much
more factors.</p>
<p>It also becomes common to get an integrated graphics processor (IGP) in the
same package, which makes the exact performance even more complex to predict,
since the IGP produces heat and so has an impact on the CPU P-state.</p>
<p>I should also explain that P-state are "voted" between CPU cores, but I didn't
understand this part. I'm not sure that understanding the exact algorithm
matters much. I tried to not give too much information.</p>
</div>
<div class="section" id="annex-amt-and-the-me-power-management-coprocessor">
<h2>Annex: AMT and the ME (power management coprocessor)</h2>
<p>Computers with Intel vPro technology includes <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Active_Management_Technology">Intel Active Management
Technology (AMT)</a>: "hardware
and firmware technology for remote out-of-band management of personal
computers". AMT has many features which includes power management.</p>
<p><a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Active_Management_Technology#Hardware">Management Engine (ME)</a>
is the hardware part: an isolated and protected coprocessor, embedded as a
non-optional part in all current (as of 2015) Intel chipsets. The coprocessor
is a special 32-bit ARC microprocessor (RISC architecture) that's physically
located inside the PCH chipset (or MCH on older chipsets). The coprocessor can
for example be found on Intel MCH chipsets Q35 and Q45.</p>
<p>See <a class="reference external" href="https://boingboing.net/2016/06/15/intel-x86-processors-ship-with.html">Intel x86s hide another CPU that can take over your machine (you can't
audit it)</a> for
more information on the coprocessor.</p>
<p>More recently, the Intel Xeon Phi CPU (2016) also includes a coprocessor for
power management. I didn't understand if it is the same coprocessor or not.</p>
</div>
Visualize the system noise using perf and CPU isolation2016-06-16T13:30:00+02:002016-06-16T13:30:00+02:00Victor Stinnertag:vstinner.github.io,2016-06-16:/perf-visualize-system-noise-with-cpu-isolation.html<p>I developed a new <a class="reference external" href="http://perf.readthedocs.io/">perf module</a> designed to run
stable benchmarks, give fine control on benchmark parameters and compute
statistics on results. With such tool, it becomes simple to <em>visualize</em>
sources of noise. The CPU isolation will be used to visualize the system noise.
Running a benchmark on isolated CPUs …</p><p>I developed a new <a class="reference external" href="http://perf.readthedocs.io/">perf module</a> designed to run
stable benchmarks, give fine control on benchmark parameters and compute
statistics on results. With such tool, it becomes simple to <em>visualize</em>
sources of noise. The CPU isolation will be used to visualize the system noise.
Running a benchmark on isolated CPUs isolates it from the system noise.</p>
<div class="section" id="isolate-cpus">
<h2>Isolate CPUs</h2>
<p>My computer has 4 physical CPU cores. I isolated half of them using
<tt class="docutils literal">isolcpus=2,3</tt> parameter of the Linux kernel. I modified manually the command
line in GRUB to add this parameter.</p>
<p>Check that CPUs are isolated:</p>
<pre class="literal-block">
$ cat /sys/devices/system/cpu/isolated
2-3
</pre>
<p>The CPU supports HyperThreading, but I disabled it in the BIOS.</p>
</div>
<div class="section" id="run-a-benchmark">
<h2>Run a benchmark</h2>
<p>The <tt class="docutils literal">perf</tt> module automatically detects and uses isolated CPU cores. I will
use the <tt class="docutils literal"><span class="pre">--affinity=0,1</span></tt> option to force running the benchmark on the CPUs
which are not isolated.</p>
<p>Microbenchmark with and without CPU isolation:</p>
<pre class="literal-block">
$ python3 -m perf.timeit --json-file=timeit_isolcpus.json --verbose -s 'x=1; y=2' 'x+y'
Pin process to isolated CPUs: 2-3
.........................
Median +- std dev: 36.6 ns +- 0.1 ns (25 runs x 3 samples x 10^7 loops; 1 warmup)
$ python3 -m perf.timeit --affinity=0,1 --json-file=timeit_no_isolcpus.json --verbose -s 'x=1; y=2' 'x+y'
Pin process to CPUs: 0-1
.........................
Median +- std dev: 36.7 ns +- 1.3 ns (25 runs x 3 samples x 10^7 loops; 1 warmup)
</pre>
<p>My computer was not 100% idle, I was using it while the benchmarks were
running.</p>
<p>The median is almost the same (36.6 ns and 36.7 ns). The first major difference
is the standard deviation: it is much larger without CPU isolation: 0.1 ns =>
1.3 ns (13x larger).</p>
<p>Just in case, check manually CPU affinity in metadata:</p>
<pre class="literal-block">
$ python3 -m perf show timeit_isolcpus.json --metadata | grep cpu
- cpu_affinity: 2-3 (isolated)
- cpu_count: 4
- cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
$ python3 -m perf show timeit_no_isolcpus.json --metadata | grep cpu_affinity
- cpu_affinity: 0-1
</pre>
</div>
<div class="section" id="statistics">
<h2>Statistics</h2>
<p>The <tt class="docutils literal">perf stats</tt> command computes statistics on the distribution of samples:</p>
<pre class="literal-block">
$ python3 -m perf stats timeit_isolcpus.json
Number of samples: 75
Minimum: 36.5 ns (-0.1%)
Median +- std dev: 36.6 ns +- 0.1 ns (36.5 ns .. 36.7 ns)
Maximum: 36.7 ns (+0.4%)
$ python3 -m perf stats timeit_no_isolcpus.json
Number of samples: 75
Minimum: 36.5 ns (-0.5%)
Median +- std dev: 36.7 ns +- 1.3 ns (35.4 ns .. 38.0 ns)
Maximum: 43.0 ns (+17.0%)
</pre>
<p>The minimum is the same. The second major difference is the maximum: it is much
larger without CPU isolation: 36.7 ns (+0.4%) => 43.0 ns (+17.0%).</p>
<p>The difference between the maximum and the median is 63x larger without CPU
isolation: 0.1 ns (<tt class="docutils literal">36.7 - 36.6</tt>) => 6.3 ns (<tt class="docutils literal">43.0 - 36.7</tt>).</p>
<p>Depending on the system load, a single sample of the microbenchmark is up to
17% slower (maximum of 43.0 ns with a median of 36.7 ns) without CPU isolation.
The difference is smaller with CPU isolation: only 0.4% slower (for the
maximum, and 0.1% faster for the minimum).</p>
</div>
<div class="section" id="histogram">
<h2>Histogram</h2>
<p>Another way to analyze the distribution of samples is to render an histogram:</p>
<pre class="literal-block">
$ python3 -m perf hist --bins=8 timeit_isolcpus.json timeit_no_isolcpus.json
[ timeit_isolcpus ]
36.1 ns: 75 ################################################
36.9 ns: 0 |
37.7 ns: 0 |
38.5 ns: 0 |
39.3 ns: 0 |
40.1 ns: 0 |
40.9 ns: 0 |
41.7 ns: 0 |
42.5 ns: 0 |
[ timeit_no_isolcpus ]
36.1 ns: 52 ################################################
36.9 ns: 13 ############
37.7 ns: 1 #
38.5 ns: 4 ####
39.3 ns: 2 ##
40.1 ns: 0 |
40.9 ns: 1 #
41.7 ns: 0 |
42.5 ns: 2 ##
</pre>
<p>I choose the number of bars to get a small histogram and to get all samples of
the first benchmark on the same bar. With 8 bars, each bar is a range of 0.8
ns.</p>
<p>The last major difference is the shape of these histogram. Without CPU
isolation, there is a "long tail" at the right of the median: <a class="reference external" href="https://en.wikipedia.org/wiki/Outlier">outliers</a> in the range [37.7 ns; 42.5 ns].
The outliers come from the "noise" caused by the multitasking system.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>The <tt class="docutils literal">perf</tt> module provides multiple tools to analyze the distribution of
benchmark samples. Three tools show a major difference without CPU isolation
compared to results with CPU isolation:</p>
<ul class="simple">
<li>Standard deviation: 13x larger without isolation</li>
<li>Maximum: difference to median 63x larger without isolation</li>
<li>Shape of the histogram: long tail at the right of the median</li>
</ul>
<p>It explains why CPU isolation helps to make benchmarks more stable.</p>
</div>
My journey to stable benchmark, part 3 (average)2016-05-23T23:00:00+02:002016-05-23T23:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-05-23:/journey-to-stable-benchmark-average.html<p class="first last">My journey to stable benchmark, part 3 (average)</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/stanzim/11100202065/"><img alt="Fog" src="https://vstinner.github.io/images/fog.jpg" /></a>
<p><em>Stable benchmarks are so close, but ...</em></p>
<div class="section" id="address-space-layout-randomization">
<h2>Address Space Layout Randomization</h2>
<p>When I started to work on removing the noise of the system, I was told that
disabling <a class="reference external" href="https://en.wikipedia.org/wiki/Address_space_layout_randomization">Address Space Layout Randomization (ASLR)</a> makes
benchmarks more stable.</p>
<p>I followed this advice without trying to understand it. We will see in this
article that it was a bad idea, but I had to hit other issues to really
understand the root issue with disabling ASLR.</p>
<p>Example of command to see the effect of ASLR, the first number of the output is
the start address of the heap memory:</p>
<pre class="literal-block">
$ python -c 'import os; os.system("grep heap /proc/%s/maps" % os.getpid())'
55e6a716c000-55e6a7235000 rw-p 00000000 00:00 0 [heap]
</pre>
<p>Heap address of 3 runs with ASLR enabled (random):</p>
<ul class="simple">
<li>55e6a716c000</li>
<li>561c218eb000</li>
<li>55e6f628f000</li>
</ul>
<p>Disable ASLR:</p>
<pre class="literal-block">
sudo bash -c 'echo 0 >| /proc/sys/kernel/randomize_va_space'
</pre>
<p>Heap addresses of 3 runs with ASLR disabled (all the same):</p>
<ul class="simple">
<li>555555756000</li>
<li>555555756000</li>
<li>555555756000</li>
</ul>
<p>Note: To reenable ASLR, it's better to use the value 2, the value 1 only
partially enables the feature:</p>
<pre class="literal-block">
sudo bash -c 'echo 2 >| /proc/sys/kernel/randomize_va_space'
</pre>
</div>
<div class="section" id="python-randomized-hash-function">
<h2>Python randomized hash function</h2>
<p>With <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-system.html">system tuning (part 1)</a>, a
<a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html">Python compiled with PGO (part 2)</a>
and ASLR disabled, I still I failed to get the same result when running
manually <tt class="docutils literal">bm_call_simple.py</tt>.</p>
<p>On Python 3, the hash function is now randomized by default: <a class="reference external" href="http://bugs.python.org/issue13703">issue #13703</a>. The problem is that for a
microbenchmark, the number of hash collisions of an "hot" dictionary has a
non-negligible impact on performances.</p>
<p>The <tt class="docutils literal">PYTHONHASHSEED</tt> environment variable can be used to get a fixed hash
function. Example with the patch:</p>
<pre class="literal-block">
$ PYTHONHASHSEED=1 taskset -c 1 ./python bm_call_simple.py -n 1
0.198
$ PYTHONHASHSEED=2 taskset -c 1 ./python bm_call_simple.py -n 1
0.201
$ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1
0.207
$ PYTHONHASHSEED=4 taskset -c 1 ./python bm_call_simple.py -n 1
0.187
$ PYTHONHASHSEED=5 taskset -c 1 ./python bm_call_simple.py -n 1
0.180
</pre>
<p>Timings of the reference python:</p>
<pre class="literal-block">
$ PYTHONHASHSEED=1 taskset -c 1 ./ref_python bm_call_simple.py -n 1
0.204
$ PYTHONHASHSEED=2 taskset -c 1 ./ref_python bm_call_simple.py -n 1
0.206
$ PYTHONHASHSEED=3 taskset -c 1 ./ref_python bm_call_simple.py -n 1
0.195
$ PYTHONHASHSEED=4 taskset -c 1 ./ref_python bm_call_simple.py -n 1
0.192
$ PYTHONHASHSEED=5 taskset -c 1 ./ref_python bm_call_simple.py -n 1
0.187
</pre>
<p>The minimums is 180 ms for the reference and 186 ms for the patch. The patched
Python is 3% faster, yeah!</p>
<p>Wait. What if we only test PYTHONHASHSEED from 1 to 3? In this case, the
minimum is 195 ms for the reference and 198 ms for the patch. The patched
Python becomes 2% slower, oh no!</p>
<p>Faster? Slower? Who is right?</p>
<p>Maybe I should write a script to find a <tt class="docutils literal">PYTHONHASHSEED</tt> value for which my
patch is always faster :-)</p>
</div>
<div class="section" id="command-line-and-environment-variables">
<h2>Command line and environment variables</h2>
<p>Well, let's say that we will use a fixed PYTHONHASHSEED value. Anyway, my
patch doesn't touch the hash function. So it doesn't matter.</p>
<p>While running benchmarks, I noticed differences when running the benchmark from
a different directory:</p>
<pre class="literal-block">
$ cd /home/haypo/prog/python/fastcall
$ PYTHONHASHSEED=3 taskset -c 1 pgo/python ../benchmarks/performance/bm_call_simple.py -n 1
0.215
$ cd /home/haypo/prog/python/benchmarks
$ PYTHONHASHSEED=3 taskset -c 1 ../fastcall/pgo/python ../benchmarks/performance/bm_call_simple.py -n 1
0.203
$ cd /home/haypo/prog/python
$ PYTHONHASHSEED=3 taskset -c 1 fastcall/pgo/python benchmarks/performance/bm_call_simple.py -n 1
0.200
</pre>
<p>In fact, a different command line is enough so get different results (added
arguments are ignored):</p>
<pre class="literal-block">
$ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1
0.201
$ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1
0.198
$ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 arg2 arg3
0.203
$ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 arg2 arg3 arg4 arg5
0.206
$ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 arg2 arg3 arg4 arg5 arg6
0.210
</pre>
<p>I also noticed minor differences when the environment changes (added variables
are ignored):</p>
<pre class="literal-block">
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 1
0.201
$ taskset -c 1 env -i PYTHONHASHSEED=3 VAR1=1 VAR2=2 VAR3=3 VAR4=4 ./python bm_call_simple.py -n 1
0.202
$ taskset -c 1 env -i PYTHONHASHSEED=3 VAR1=1 VAR2=2 VAR3=3 VAR4=4 VAR5=5 ./python bm_call_simple.py -n 1
0.198
</pre>
<p>Using <tt class="docutils literal">strace</tt> and <tt class="docutils literal">ltrace</tt>, I saw the memory addresses are different when
something (command line, env var, etc.) changes.</p>
</div>
<div class="section" id="average-and-standard-deviation">
<h2>Average and standard deviation</h2>
<p>Basically, it looks like a lot of "external factors" have an impact on the
exact memory addresses, even if ASRL is disabled and PYTHONHASHSEED is set. I
started to think how to get <em>exactly</em> the same command line, the same
environment (easy), the same current directory (easy), etc. The problem is that
it's just not possible to control all external factors (having an effect on the
exact memory addresses).</p>
<p>Maybe I was plain wrong from the beginning and ASLR must be enabled,
as the default on Linux:</p>
<pre class="literal-block">
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py
0.198
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py
0.202
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py
0.199
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py
0.207
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py
0.200
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py
0.201
</pre>
<p>These results look "random". Yes, they are. It's exactly the purpose of ASLR.</p>
<p>But how can we compare performances if results are random? Take the minimum?</p>
<p>No! You must never (ever again) use the minimum for benchmarking! Compute the
average and some statistics like the standard deviation:</p>
<pre class="literal-block">
$ python3
Python 3.4.3
>>> timings=[0.198, 0.202, 0.199, 0.207, 0.200, 0.201]
>>> import statistics
>>> statistics.mean(timings)
0.2011666666666667
>>> statistics.stdev(timings)
0.0031885210782848245
</pre>
<p>On this example, the average is 201 ms +/- 3 ms. IMHO the standard deviation is
quite small (reliable) which means that my benchmark is stable. To get a good
distribution, it's better to have many samples. It looks like at least 25
processes are needed. Each process tests a different memory layout and a
different hash function.</p>
<p>Result of 5 runs, each run uses 25 processes (ASLR enabled, random hash
function):</p>
<ul class="simple">
<li>Average: 205.2 ms +/- 3.0 ms (min: 201.1 ms, max: 214.9 ms)</li>
<li>Average: 205.6 ms +/- 3.3 ms (min: 201.4 ms, max: 216.5 ms)</li>
<li>Average: 206.0 ms +/- 3.9 ms (min: 201.1 ms, max: 215.3 ms)</li>
<li>Average: 205.7 ms +/- 3.6 ms (min: 201.5 ms, max: 217.8 ms)</li>
<li>Average: 206.4 ms +/- 3.5 ms (min: 201.9 ms, max: 214.9 ms)</li>
</ul>
<p>While memory layout and hash functions are random again, the result looks
<em>less</em> random, and so more reliable, than before!</p>
<p>With ASLR enabled, the effect of the environment variables, command line and
current directory is negligible on the (average) result.</p>
</div>
<div class="section" id="the-average-solves-issues-with-uniform-random-noises">
<h2>The average solves issues with uniform random noises</h2>
<p>The user will run the application with default system settings which means
ASLR enabled and Python hash function randomized. Running a benchmark in one
specific environment is a mistake because it is not representative of the
performance in practice.</p>
<p>Computing the average and standard deviation "fixes" the issue with hash
randomization. It's much better to use random hash functions and compute the
average, than using a fixed hash function (setting <tt class="docutils literal">PYTHONHASHSEED</tt> variable
to a value).</p>
<p>Oh wow, already 3 big articles explaing how to get stable benchmarks. Please
tell me that it was the last one! Nope, more is coming...</p>
</div>
<div class="section" id="annex-why-only-n1">
<h2>Annex: why only -n1?</h2>
<p>In this article, I ran <tt class="docutils literal">bm_call_simple.py</tt> with <tt class="docutils literal"><span class="pre">-n</span> 1</tt> with only run one
iteration.</p>
<p>Usually, a single iteration is not reliable at all, at least 50 iterations are
needed. But thanks to system tuning, compilation with PGO, ASRL disabled and
<tt class="docutils literal">PYTHONHASHSEED</tt> set, a single iteration is enough.</p>
<p>Example of 3 runs, each with 3 iterations:</p>
<pre class="literal-block">
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 3
0.201
0.201
0.201
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 3
0.201
0.201
0.201
$ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 3
0.201
0.201
0.201
</pre>
<p>Always the same timing!</p>
</div>
My journey to stable benchmark, part 2 (deadcode)2016-05-22T22:00:00+02:002016-05-22T22:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-05-22:/journey-to-stable-benchmark-deadcode.html<p class="first last">My journey to stable benchmark, part 2 (deadcode)</p>
<a class="reference external image-reference" href="https://www.flickr.com/photos/uw67/16875152403/"><img alt="Snail" src="https://vstinner.github.io/images/snail.jpg" /></a>
<p>With <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-system.html">the system tuning (part 1)</a>, I
expected to get very stable benchmarks and so I started to benchmark seriously
my <a class="reference external" href="https://bugs.python.org/issue26814">FASTCALL branch</a> of CPython (a new
calling convention avoiding temporary tuples).</p>
<p>I was disappointed to get many slowdowns in the CPython benchmark suite. I
started to analyze why my change introduced performance regressions.</p>
<p>I took my overall patch and slowly reverted more and more code to check which
changes introduced most of the slowdowns.</p>
<p>I focused on the <tt class="docutils literal">call_simple</tt> benchmark which does only one thing: call
Python functions which do nothing. Making Python function calls slower would
be a big and inacceptable mistake of my work.</p>
<div class="section" id="linux-perf">
<h2>Linux perf</h2>
<p>I started to learn how to use the great <a class="reference external" href="https://perf.wiki.kernel.org/index.php/Main_Page">Linux perf</a> tool to analyze why
<tt class="docutils literal">call_simple</tt> was slower. I tried to find a major difference between my
reference python and the patched python.</p>
<p>I analyzed cache misses on L1 instruction and data caches. I analyzed stallen
CPU cycles. I analyzed all memory events, branch events, etc. Basically, I tried
all perf events and spent a lot of time to run benchmarks multiple times.</p>
<p>By the way, I strongly suggest to use <tt class="docutils literal">perf stat</tt> using the <tt class="docutils literal"><span class="pre">--repeat</span></tt>
command line option to get an average on multiple runs and see the standard
deviation. It helps to get more reliable numbers. I even wrote a Python script
implementing <tt class="docutils literal"><span class="pre">--repeat</span></tt> (run perf multiple times, parse the output), before
seeing that it was already a builtin feature!</p>
<p>Use <tt class="docutils literal">perf list</tt> to list all available (pre-defined) events.</p>
<p>After many days, I decided to give up with perf.</p>
</div>
<div class="section" id="cachegrind">
<h2>Cachegrind</h2>
<a class="reference external image-reference" href="http://valgrind.org/"><img alt="Logo of the Valgrind project" src="https://vstinner.github.io/images/valgrind.png" /></a>
<p><a class="reference external" href="http://valgrind.org/">Valgrind</a> is a great tool known to detect memory
leaks, but it also contains gems like the <a class="reference external" href="http://valgrind.org/docs/manual/cg-manual.html">Cachegrind tool</a> which <em>simulates</em> the
CPU caches.</p>
<p>I used Cachegrind with the nice <a class="reference external" href="http://kcachegrind.sourceforge.net/">Kcachegrind GUI</a>. Sadly, I also failed to see anything
obvious in cache misses between the reference python and the patched python.</p>
</div>
<div class="section" id="strace-and-ltrace">
<h2>strace and ltrace</h2>
<img alt="strace and ltrace" src="https://vstinner.github.io/images/strace_ltrace.png" />
<p>I also tried <tt class="docutils literal">strace</tt> and <tt class="docutils literal">ltrace</tt> tools to try to see a difference in the
execution of the reference and the patched pythons. I saw different memory
addresses, but no major difference which can explain a difference of the
timing.</p>
<p>Morever, the hotcode simply does not call any syscall nor library
function. It's pure CPU-bound code.</p>
</div>
<div class="section" id="compiler-options">
<h2>Compiler options</h2>
<a class="reference external image-reference" href="https://gcc.gnu.org/"><img alt="GCC logo" class="align-right" src="https://vstinner.github.io/images/gcc.png" /></a>
<p>I used <a class="reference external" href="https://gcc.gnu.org/">GCC</a> to build to code. Just in case, I tried
LLVM compiler, but it didn't "fix" the issue.</p>
<p>I also tried different optimization levels: <tt class="docutils literal"><span class="pre">-O0</span></tt>, <tt class="docutils literal"><span class="pre">-O1</span></tt>, <tt class="docutils literal"><span class="pre">-O2</span></tt> and
<tt class="docutils literal"><span class="pre">-O3</span></tt>.</p>
<p>I read that the exact address of functions can have an impact on the CPU L1
cache: <a class="reference external" href="https://stackoverflow.com/questions/19470873/why-does-gcc-generate-15-20-faster-code-if-i-optimize-for-size-instead-of-speed">Why does gcc generate 15-20% faster code if I optimize for size instead
of speed?</a>.
I tried various values of the <tt class="docutils literal"><span class="pre">-falign-functions=N</span></tt> option (1, 2, 6, 12).</p>
<p>I also tried <tt class="docutils literal"><span class="pre">-fomit-pointer</span></tt> (omit frame pointer) to record the callgraph with <tt class="docutils literal">perf record</tt>.</p>
<p>I also tried <tt class="docutils literal"><span class="pre">-flto</span></tt>: Link Time Optimization (LTO).</p>
<p>These compiler options didn't fix the issue.</p>
<p>The truth is out there.</p>
<p><strong>UPDATE:</strong> See also <a class="reference external" href="https://lwn.net/Articles/534735/">Rethinking optimization for size</a> article on Linux Weekly News (LWN):
<em>"Such an option has obvious value if one is compiling for a
space-constrained environment like a small device. But it turns out that, in
some situations, optimizing for space can also produce faster code."</em></p>
</div>
<div class="section" id="when-cpython-performance-depends-on-dead-code">
<h2>When CPython performance depends on dead code</h2>
<p>I continued to revert changes. At the end, my giant patch was reduced to very
few changes only adding code which was never called (at least, I was sure
that it was not called in the <tt class="docutils literal">call_simple</tt> benchmark).</p>
<p>Let me rephase: <em>adding dead code</em> makes Python slower. What?</p>
<p>A colleague suggested me to remove the body (replace it with <tt class="docutils literal">return;</tt>) of
added function: the code became faster. Ok, now I'm completely lost. To be
clear, I don't expect that adding dead code would have <em>any</em> impact on the
performance.</p>
<p>My email <a class="reference external" href="https://mail.python.org/pipermail/speed/2016-April/000341.html">When CPython performance depends on dead code...</a> explains how
to reproduce the issue and contains many information.</p>
</div>
<div class="section" id="solution-pgo">
<h2>Solution: PGO</h2>
<p>The solution is called Profiled Guided Optimization, "PGO". Python build system
supports it in a single command: <tt class="docutils literal">make <span class="pre">profile-opt</span></tt>. It profiles the
execution of the Python test suite.</p>
<p>Using PGO, adding dead code has no more impact on the performance.</p>
<p>With system tuning and PGO compilation, benchmarks must now be stable this
time, no? ... No, sorry, not yet. We will see more sources of noise in
following articles ;-)</p>
</div>
My journey to stable benchmark, part 1 (system)2016-05-21T16:50:00+02:002016-05-21T16:50:00+02:00Victor Stinnertag:vstinner.github.io,2016-05-21:/journey-to-stable-benchmark-system.html<p class="first last">My journey to stable benchmark, part 1</p>
<div class="section" id="background">
<h2>Background</h2>
<p>In the CPython development, it became common to require the result of the
<a class="reference external" href="https://hg.python.org/benchmarks">CPython benchmark suite</a> ("The Grand
Unified Python Benchmark Suite") to evaluate the effect of an optimization
patch. The minimum requirement is to not introduce performance regressions.</p>
<p>I used the CPython benchmark suite and I had many bad surprises when trying to
analyze (understand) results. A change expected to be faster makes some
benchmarks slower without any obvious reason. At least, the change is expected
to be faster on some specific benchmarks, but have no impact on the other
benchmarks. The slowdown is usually between 5% and 10% slower. I am not
confortable with any kind of slowdown.</p>
<p>Many benchmarks look unstable. The problem is to trust the overall report.
Some developers started to say that they learnt to ignore some benchmarks known
to be unstable.</p>
<p>It's not the first time that I am totally disappointed by microbenchmark
results, so I decided to analyze completely the issue and go as deep as
possible to really understand the problem.</p>
</div>
<div class="section" id="how-to-get-stable-benchmarks-on-a-busy-linux-system">
<h2>How to get stable benchmarks on a busy Linux system</h2>
<p>A common advice to get stable benchmark is to stay away the keyboard
("freeze!") and stop all other applications to only run one application, the
benchmark.</p>
<p>Well, I'm working on a single computer and the full CPython benchmark suite
take up to 2 hours in rigorous mode. I just cannot stop working during 2 hours
to wait for the result of the benchmark. I like running benchmarks locally. It
is convenient to run benchmarks on the same computer used to develop.</p>
<p>The goal here is to "remove the noise of the system". Get the same result on a
busy system than an idle system. My simple <a class="reference external" href="https://github.com/vstinner/misc/blob/master/bin/system_load.py">system_load.py</a> program can be
used to increase the system load. For example, run <tt class="docutils literal">system_load.py 10</tt> in a
terminal to get at least a system load of 10 (busy system) and run the
benchmark in a different terminal. Use CTRL+c to stop <tt class="docutils literal">system_load.py</tt>.</p>
</div>
<div class="section" id="cpu-isolation">
<h2>CPU isolation</h2>
<p>In 2016, it is common to get a CPU with multiple physical cores. For example,
my Intel CPU has 4 physical cores and 8 logical cores thanks to
<a class="reference external" href="https://en.wikipedia.org/wiki/Hyper-threading">Hyper-Threading</a>. It is
possible to configure the Linux kernel to not schedule processes on some CPUs
using the "CPU isolation" feature. It is the <tt class="docutils literal">isolcpus</tt> parameter of the
Linux command line, the value is a list of CPUs. Example:</p>
<pre class="literal-block">
isolcpus=2,3,6,7
</pre>
<p>Check with:</p>
<pre class="literal-block">
$ cat /sys/devices/system/cpu/isolated
2-3,6-7
</pre>
<p>If you have Hyper-Threading, you must isolate the two logicial cores of each
isolated physical core. You can use the <tt class="docutils literal">lscpu <span class="pre">--all</span> <span class="pre">--extended</span></tt> command to
identify physical cores. Example:</p>
<pre class="literal-block">
$ lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 5900,0000 1600,0000
1 0 0 1 1:1:1:0 yes 5900,0000 1600,0000
2 0 0 2 2:2:2:0 yes 5900,0000 1600,0000
3 0 0 3 3:3:3:0 yes 5900,0000 1600,0000
4 0 0 0 0:0:0:0 yes 5900,0000 1600,0000
5 0 0 1 1:1:1:0 yes 5900,0000 1600,0000
6 0 0 2 2:2:2:0 yes 5900,0000 1600,0000
7 0 0 3 3:3:3:0 yes 5900,0000 1600,0000
</pre>
<p>The physical core <tt class="docutils literal">0</tt> (CORE column) is made of two logical cores (CPU
column): <tt class="docutils literal">0</tt> and <tt class="docutils literal">4</tt>.</p>
</div>
<div class="section" id="nohz-mode">
<h2>NOHZ mode</h2>
<p>By default, the Linux kernel uses a scheduling-clock which interrupts the
running application <tt class="docutils literal">HZ</tt> times per second to run the scheduler. <tt class="docutils literal">HZ</tt> is
usually between 100 and 1000: time slice between 1 ms and 10 ms.</p>
<p>Linux supports a <a class="reference external" href="https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt">NOHZ mode</a> which is able to
disable the scheduling-clock when the system is idle to reduce the power
consumption. Linux 3.10 introduces a <a class="reference external" href="https://lwn.net/Articles/549580/">full ticketless mode</a>, NOHZ full, which is able to disable the
scheduling-clock when only one application is running on a CPU.</p>
<p>NOHZ full is disabled by default. It can be enabled with the <tt class="docutils literal">nohz_full</tt>
parameter of the Linux command line, the value is a list of CPUs. Example:</p>
<pre class="literal-block">
nohz_full=2,3,6,7
</pre>
<p>Check with:</p>
<pre class="literal-block">
$ cat /sys/devices/system/cpu/nohz_full
2-3,6-7
</pre>
</div>
<div class="section" id="interrupts-irq">
<h2>Interrupts (IRQ)</h2>
<p>The Linux kernel can also be configured to not run <a class="reference external" href="https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29">interruptions (IRQ)</a>
handlers on some CPUs using <tt class="docutils literal">/proc/irq/default_smp_affinity</tt> and
<tt class="docutils literal"><span class="pre">/proc/irq/<number>/smp_affinity</span></tt> files. The value is not a list of CPUs but
a bitmask.</p>
<p>The <tt class="docutils literal">/proc/interrupts</tt> file can be read to see the number of interruptions
per CPU.</p>
<p>Read the <a class="reference external" href="https://www.kernel.org/doc/Documentation/IRQ-affinity.txt">Linux SMP IRQ affinity</a> documentation.</p>
</div>
<div class="section" id="example-of-effect-of-cpu-isolation-on-a-microbenchmark">
<h2>Example of effect of CPU isolation on a microbenchmark</h2>
<p>Example with Linux parameters:</p>
<pre class="literal-block">
isolcpus=2,3,6,7 nohz_full=2,3,6,7
</pre>
<p>Microbenchmark on an idle system (without CPU isolation):</p>
<pre class="literal-block">
$ python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 229 msec per loop
</pre>
<p>Result on a busy system using <tt class="docutils literal">system_load.py 10</tt> and <tt class="docutils literal">find /</tt> commands
running in other terminals:</p>
<pre class="literal-block">
$ python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 372 msec per loop
</pre>
<p>The microbenchmark is 56% slower because of the high system load!</p>
<p>Result on the same busy system but using isolated CPUs. The <tt class="docutils literal">taskset</tt> command
allows to pin an application to specific CPUs:</p>
<pre class="literal-block">
$ taskset -c 1,3 python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 230 msec per loop
</pre>
<p>Just to check, new run without CPU isolation:</p>
<pre class="literal-block">
$ python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 357 msec per loop
</pre>
<p>The result with CPU isolation on a busy system is the same than the result an
idle system! CPU isolation removes most of the noise of the system.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>Great job Linux!</p>
<p>Ok! Now, the benchmark is super stable, no? ... Sorry, no, it's not stable yet.
I found a lot of other sources of "noise". We will see them in the following
articles ;-)</p>
</div>
Status of Python 3 in OpenStack Mitaka2016-03-02T14:00:00+01:002016-03-02T14:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-03-02:/openstack_mitaka_python3.html<p class="first last">Status of Python 3 in OpenStack Mitaka</p>
<p>Now that most OpenStack services have reached feature freeze for the Mitaka
cycle (November 2015-April 2016), it's time to look back on the progress made
for Python 3 support.</p>
<p>Previous status update: <a class="reference external" href="http://techs.enovance.com/7807/python-3-status-openstack-liberty">Python 3 Status in OpenStack Liberty</a>
(September 2015).</p>
<div class="section" id="services-ported-to-python-3">
<h2>Services ported to Python 3</h2>
<p>13 services were ported to Python 3 during the Mitaka cycle:</p>
<ul class="simple">
<li>Cinder</li>
<li>Congress</li>
<li>Designate</li>
<li>Glance</li>
<li>Heat</li>
<li>Horizon</li>
<li>Manila</li>
<li>Mistral</li>
<li>Octavia</li>
<li>Searchlight</li>
<li>Solum</li>
<li>Watcher</li>
<li>Zaqar</li>
</ul>
<p>Red Hat contributed to the Cinder, Designate, Glance and Horizon service
porting efforts.</p>
<p>"Ported to Python 3" means that all unit tests pass on Python 3.4 which is
verified by a voting gate job. It is not enough to run applications in
production with Python 3. Integration and functional tests are not run on
Python 3 yet. See the section dedicated to these tests below.</p>
<p>See the <a class="reference external" href="https://wiki.openstack.org/wiki/Python3">Python 3 wiki page</a> for the
current status of the OpenStack port to Python 3; especially the list of
services ported to Python 3.</p>
</div>
<div class="section" id="services-not-ported-yet">
<h2>Services not ported yet</h2>
<p>It's become easier to list services which are not compatible with Python 3 than
listing services already ported to Python 3!</p>
<p>9 services still need to be ported:</p>
<ul class="simple">
<li>Work-in-progress:<ul>
<li>Magnum: 83% (959 unit tests/1,161)</li>
<li>Cue: 81% (208 unit tests/257)</li>
<li>Nova: 74% (10,859 unit tests/14,726)</li>
<li>Barbican: 34% (392 unit tests/1168)</li>
<li>Murano: 29% (133 unit tests/455)</li>
<li>Keystone: 27% (1200 unit tests/4455)</li>
<li>Swift: 0% (3 unit tests/4,435)</li>
<li>Neutron-LBaaS: 0% (1 unit test/806)</li>
</ul>
</li>
<li>Port not started yet:<ul>
<li>Trove: no python34 gate</li>
</ul>
</li>
</ul>
<p>Red Hat contributed Python 3 patches to Cue, Neutron-LBaaS, Swift and Trove
during the Mitaka cycle.</p>
<p>Trove developers are ready to start the port at the beginning of the next cycle
(Newton). The py34 test environment was blocked by the MySQL-Python dependency (it
was not possible to build the test environment), but this dependency is now
skipped on Python 3. Later, it will be <a class="reference external" href="https://review.openstack.org/#/c/225915/">replaced with PyMySQL</a> on Python 2 and Python 3.</p>
</div>
<div class="section" id="python-3-issues-in-eventlet">
<h2>Python 3 issues in Eventlet</h2>
<p>Four Python 3 issues were fixed in Eventlet:</p>
<ul class="simple">
<li><a class="reference external" href="https://github.com/eventlet/eventlet/issues/295">Issue #295: Python 3: wsgi doesn't handle correctly partial write of
socket send() when using writelines()</a></li>
<li>PR #275: <a class="reference external" href="https://github.com/eventlet/eventlet/pull/275">Issue #274: Fix GreenSocket.recv_into()</a>.
Issue: <a class="reference external" href="https://github.com/eventlet/eventlet/issues/274">On Python 3, sock.makefile('rb').readline() doesn't handle blocking
errors correctly</a></li>
<li>PR #257: <a class="reference external" href="https://github.com/eventlet/eventlet/pull/257">Fix GreenFileIO.readall() for regular file</a></li>
<li><a class="reference external" href="https://github.com/eventlet/eventlet/issues/248">Issue #248: eventlet.monkey_patch() on Python 3.4 makes stdout
non-blocking</a>: pull
request <a class="reference external" href="https://github.com/eventlet/eventlet/pull/250">Fix GreenFileIO.write()</a></li>
</ul>
</div>
<div class="section" id="next-milestone-functional-and-integration-tests">
<h2>Next Milestone: Functional and integration tests</h2>
<p>The next major milestone will be to run functional and integration tests on
Python 3.</p>
<ul class="simple">
<li>functional tests are restricted to one component (ex: only Glance)</li>
<li>integration tests, like Tempest, test the integration of multiple components</li>
</ul>
<p>It is now possible to install some packages on Python 3 in DevStack using
<tt class="docutils literal">USE_PYTHON3</tt> and <tt class="docutils literal">PYTHON3_VERSION</tt> variables: <a class="reference external" href="https://review.openstack.org/#/c/181165/">Enable optional Python 3
support</a>. It means that it is
possible to run tests with some services running on Python 3, and the remaining
services on Python 2.</p>
<p>The port to Python 3 of Glance, Heat and Neutron functional and integration
tests have already started.</p>
<p>For Glance, 159 functional tests already pass on Python 3.4.</p>
<p>Heat:</p>
<ul class="simple">
<li>project-config: <a class="reference external" href="https://review.openstack.org/#/c/228194/">Add python34 integration test job for Heat</a> (WIP)</li>
<li>heat: <a class="reference external" href="https://review.openstack.org/#/c/188033/">py34: integration tests</a>
(WIP)</li>
</ul>
<p>Neutron: the <a class="reference external" href="https://review.openstack.org/#/c/231897/">Add the functional-py34 and dsvm-functional-py34 targets to
tox.ini</a> change was merged, but a
gate job hasn't been added for it yet.</p>
<p>Another pending project is to fix issues specific to Python 3.5, but the gate
doesn’t use Python 3.5 yet. There are some minor issues, probably easy to fix.</p>
</div>
<div class="section" id="how-to-port-remaining-code">
<h2>How to port remaining code?</h2>
<p>The <a class="reference external" href="https://wiki.openstack.org/wiki/Python3">Python 3 wiki page</a> contains
a lot of information about adding Python 3 support to Python 2 code.</p>
<p>Join us in the <tt class="docutils literal"><span class="pre">#openstack-python3</span></tt> IRC channel on Freenode to discuss
Python 3!</p>
</div>
Fast _PyAccu, _PyUnicodeWriter and_PyBytesWriter APIs to produce strings in CPython2016-03-01T16:00:00+01:002016-03-01T16:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-03-01:/pybyteswriter.html<p class="first last">_PyBytesWriter API</p>
<p>This article described the _PyBytesWriter and _PyUnicodeWriter private APIs of
CPython. These APIs are design to optimize code producing strings when the
ouput size is not known in advance.</p>
<p>I created the _PyUnicodeWriter API to reply to complains that Python 3 was much
slower than Python 2, especially with the new Unicode implementation (PEP 393).</p>
<div class="section" id="pyaccu-api">
<h2>_PyAccu API</h2>
<p>Issue #12778: In 2011, Antoine Pitrou found a performance issue in the JSON
serializer when serializing many small objects: it used way too much memory for
temporary objects compared to the final output string.</p>
<p>The JSON serializer used a list of strings and joined all strings at the end of
create a final output string. Pseudocode:</p>
<pre class="literal-block">
def serialize():
pieces = [serialize(item) for item in self]
return ''.join(pieces)
</pre>
<p>Antoine introduced an accumulator compacting the temporary list of "small"
strings and put the result in a second list of "large" strings. At the end, the
list of "large" strings was also compacted to build the final output string.
Pseudo-code:</p>
<pre class="literal-block">
def serialize():
small = []
large = []
for item in self:
small.append(serialize(item))
if len(small) > 10000:
large.append(''.join(small))
small.clear()
if small
large.append(''.join(small))
return ''.join(large)
</pre>
<p>The threshold of 10,000 strings is justified by this comment:</p>
<pre class="literal-block">
/* Each item in a list of unicode objects has an overhead (in 64-bit
* builds) of:
* - 8 bytes for the list slot
* - 56 bytes for the header of the unicode object
* that is, 64 bytes. 100000 such objects waste more than 6MB
* compared to a single concatenated string.
*/
</pre>
<p>Issue #12911: Antoine Pitrou found a similar performance issue in repr(list),
and so proposed to convert its accumular code into a new private _PyAccu API.
He added the _PyAccu API to Python 2.7.5 and 3.2.3. Title of te repr(list)
change: "Fix memory consumption when calculating the repr() of huge tuples or
lists".</p>
</div>
<div class="section" id="the-pyunicodewriter-api">
<h2>The _PyUnicodeWriter API</h2>
<div class="section" id="inefficient-implementation-of-the-pep-393">
<h3>Inefficient implementation of the PEP 393</h3>
<p>In 2010, Python 3.3 got a completly new Unicode implementation, the Python type
<tt class="docutils literal">str</tt>, with the PEP 393. The implementation of the PEP was the topic of a
Google Summer of Code 2011 with the student Torsten Becker menthored by Martin
v. Löwis (author of the PEP). The project was successful: the PEP 393 was
implemented, it worked!</p>
<p>The first implementation of the PEP 393 used a lot of 32-bit character buffers
(<tt class="docutils literal">Py_UCS4</tt>) which uses a lot of memory and requires expensive conversion to
8-bit (<tt class="docutils literal">Py_UCS1</tt>, ASCII and Latin1) or 16-bit (<tt class="docutils literal">Py_UCS2</tt>, BMP) characters.</p>
<p>The new internal structures for Unicode strings are now very complex and
require to be smart when building a new string to avoid memory copies. I
created the _PyUnicodeWriter API to try to reduce expensive memory copies, and
even completly avoid memory copies in best cases.</p>
</div>
<div class="section" id="design-of-the-pyunicodewriter-api">
<h3>Design of the _PyUnicodeWriter API</h3>
<p>According to benchmarks, creating a <tt class="docutils literal">Py_UCS1*</tt> buffer and then expand it
to <tt class="docutils literal">Py_UCS2*</tt> or <tt class="docutils literal">Py_UCS4*</tt> is more efficient, since <tt class="docutils literal">Py_UCS1*</tt> is the
most common format.</p>
<p>Python <tt class="docutils literal">str</tt> type is used for a wide range of usages. For example, it is used
for the name of variable names in the Python language itself. Variable names
are almost always ASCII.</p>
<p>The worst case for _PyUnicodeWriter is when a long <tt class="docutils literal">Py_UCS1*</tt> buffer must be
converted to <tt class="docutils literal">Py_UCS2*</tt>, and then converted to <tt class="docutils literal">Py_UCS4*</tt>. Each conversion
is expensive: need to allocate a second memory block and convert characters to
the new format.</p>
<p>_PyUnicodeWriter features:</p>
<ul class="simple">
<li>Optional overallocation: overallocate the buffer by 50% on Windows and 25%
on Linux. The ratio changes depending on the OS, it is a raw heuristic to get
the best performances depending on the <tt class="docutils literal">malloc()</tt> memory allocator.</li>
<li>The buffer can be a shared read-only string if the buffer was only created
from a single string. Micro-optimization for <tt class="docutils literal">"%s" % str</tt>.</li>
</ul>
<p>The API allows to disable overallocation before the last write. For example,
<tt class="docutils literal">"%s%s" % ('abc', 'def')</tt> disables the overallocation before writing
<tt class="docutils literal">'def'</tt>.</p>
<p>The _PyUnicodeWriter was introduced by the issue #14716 (change 7be716a47e9d):</p>
<blockquote>
Close #14716: str.format() now uses the new "unicode writer" API instead
of the PyAccu API. For example, it makes str.format() from 25% to 30%
faster on Linux.</blockquote>
</div>
<div class="section" id="fast-path-for-ascii">
<h3>Fast-path for ASCII</h3>
<p>The cool and <em>unexpected</em> side-effect of the _PyUnicodeWriter is that many
intermediate operations got a fast-path for <tt class="docutils literal">Py_UCS1*</tt>, especially for ASCII
strings. For example, padding a number with spaces on <tt class="docutils literal">'%10i' % 123</tt> is
implemented with <tt class="docutils literal">memset()</tt>.</p>
<p>Formating a floating point number uses the <tt class="docutils literal">PyOS_double_to_string()</tt> function
which creates an ASCII buffer. If the writer buffer uses Py_UCS1, a
<tt class="docutils literal">memcpy()</tt> is enough to copy the formatted number.</p>
</div>
<div class="section" id="avoid-temporary-buffers">
<h3>Avoid temporary buffers</h3>
<p>Since the beginning, I had the idea of avoiding temporary buffers thanks
to an unified API to handle a "Unicode buffer". Slowly, I spread my changes
to all functions producing Unicode strings.</p>
<p>The obvious target were <tt class="docutils literal">str % args</tt> and <tt class="docutils literal">str.format(args)</tt>. Both
instructions use very different code, but it was possible to share a few
functions especially the code to format integers in bases 2 (binary), 8
(octal), 10 (decimal) and 16 (hexadecimal).</p>
<p>The function formatting an integer computes the exact size of the output,
requests a number of characters and then write characters. The characters are
written directly in the writer buffer. No temporary memory block is needed
anymore, and moreover no Py_UCS conversion is need: <tt class="docutils literal">_PyLong_Format()</tt> writes
directly characters into the character format (PyUCS1, Py_UCS2 or Py_UCS4) of
the buffer.</p>
</div>
<div class="section" id="performance-compared-to-python-2">
<h3>Performance compared to Python 2</h3>
<p>The PEP 393 uses a complex storage for strings, so the exact performances
now depends on the character set used in the benchmark. For benchmarks using
a character set different than ASCII, the result are more tricky to understand.</p>
<p>To compare performances with Python 2, I focused my benchmarks on ASCII. I
compared Python 3 str with Python 2 unicode, but also sometimes to Python 2 str
(bytes). On ASCII, Python 3.3 was as fast as Python 2, or even faster on some
very specific cases, but these cases are probably artificial and never seen in
real applications.</p>
<p>In the best case, Python 3 str (Unicode) was faster than Python 2 bytes.</p>
</div>
</div>
<div class="section" id="pybyteswriter-api-first-try-big-fail">
<h2>_PyBytesWriter API: first try, big fail</h2>
<p>Since Python was <em>much</em> faster with _PyUnicodeWriter, I expected to get good
speedup with a similar API for bytes. The graal would be to share code for
bytes and Unicode (Spoiler alert! I reached this goal, but only for a single
function: format an integer to decimal).</p>
<p>My first attempt of a _PyBytesWriter API was in 2013: <a class="reference external" href="https://bugs.python.org/issue17742">Issue #17742: Add
_PyBytesWriter API</a>. But quickly, I
noticed with microbenchmarks that my change made Python slower! I spent hours
to understand why GCC produced less efficient machine code. When I started to
dig the "strict aliasing" optimization issue, I realized that I reached a
deadend.</p>
<p>Extract of the _PyBytesWriter structure:</p>
<pre class="literal-block">
typedef struct {
/* Current position in the buffer */
char *str;
/* Start of the buffer */
char *start;
/* End of the buffer */
char *end;
...
} _PyBytesWriter;
</pre>
<p>The problem is that GCC emited less efficient machine code for the C code (see
my <a class="reference external" href="https://bugs.python.org/issue17742#msg187595">msg187595</a>):</p>
<pre class="literal-block">
while (collstart++<collend)
*writer.str++ = '?';
</pre>
<p>For the <tt class="docutils literal">writer.str++</tt> instruction, the new pointer value is written
immediatly in the structure. The pointer value is read again at each iteration.
So we have 1 LOAD and 1 STORE per iteration.</p>
<p>GCC emits better code for the original C code:</p>
<pre class="literal-block">
while (collstart++<collend)
*str++ = '?';
</pre>
<p>The <tt class="docutils literal">str</tt> variable is stored in a register and the new value of <tt class="docutils literal">str</tt> is
only written <em>once</em>, at the end of loop (instead of writing it at each
iteration). The pointer value is <em>only read once</em> before the loop. So we have 0
LOAD and 0 STORE (related to the pointer value) in the loop body.</p>
<p>It looks like an aliasing issue, but I didn't find how to say to GCC that the
new value of <tt class="docutils literal">writer.str</tt> can be written only once at the end of the loop. I
tried the <tt class="docutils literal">__restrict__</tt> keyword: the LOAD (get the pointer value) was moved
out of the loop. But the STORE was still in the loop body.</p>
<p>I wrote to gcc-help: <a class="reference external" href="https://gcc.gnu.org/ml/gcc-help/2013-04/msg00192.html">Missed optimization when using a structure</a>, but I didn't get any
reply. I just gave up.</p>
</div>
<div class="section" id="pybyteswriter-api-new-try-the-good-one">
<h2>_PyBytesWriter API: new try, the good one</h2>
<p>In 2015, I created the <a class="reference external" href="https://bugs.python.org/issue25318">Issue #25318: Add _PyBytesWriter API to optimize
Unicode encoders</a>. I redesigned the API
to avoid the aliasing issue.</p>
<p>The new _PyBytesWriter doesn't contain the <tt class="docutils literal">char*</tt> pointers anymore: they are
now local variables in functions. Instead, functions of API requires two
parameters: the bytes writer and a <tt class="docutils literal">char*</tt> parameter. Example:</p>
<pre class="literal-block">
PyObject * _PyBytesWriter_Finish(_PyBytesWriter *writer, char *str)
</pre>
<p>The idea is to keep <tt class="docutils literal">char*</tt> pointers in functions to keep the most efficient
machine code in loops. The compiler doesn't have to compute complex aliasing
rules to decide if a CPU register can be used or not.</p>
<p>_PyBytesWriter features:</p>
<ul class="simple">
<li>Optional overallocation: overallocate the buffer by 25% on Windows and 50%
on Linux. Same idea than _PyUnicodeWriter.</li>
<li>Support <tt class="docutils literal">bytes</tt> and <tt class="docutils literal">bytearray</tt> type as output format to avoid an expensive
memory copy from <tt class="docutils literal">bytes</tt> to <tt class="docutils literal">bytearray</tt>.</li>
<li>Small buffer of 512 bytes allocated on the stack to avoid the need of a
buffer allocated on the heap, before creating the final
<tt class="docutils literal">bytes</tt>/<tt class="docutils literal">bytearray</tt> object.</li>
</ul>
<p>A _PyBytesWriter structure must always be allocated on the stack (to get fast
memory allocation of the smaller buffer).</p>
<p>While _PyUnicodeWriter has a 5 functions and 1 macro to write a single
character, write strings, write a substring, etc. _PyBytesWriter has a single
_PyBytesWriter_WriteBytes() function to write a string, since all other writes
are done directly with regular C code on <tt class="docutils literal">char*</tt> pointers.</p>
<p>The API itself doesn't make the code faster. Disabling overallocation on the
last write and the usage of the small buffer allocated on the stack may be
faster.</p>
<p>In Python 3.6, I optimized error handlers on various codecs: ASCII, Latin1
and UTF-8. For example, the UTF-8 encoder is now up to 75 times as fast for
error handlers: <tt class="docutils literal">ignore</tt>, <tt class="docutils literal">replace</tt>, <tt class="docutils literal">surrogateescape</tt>,
<tt class="docutils literal">surrogatepass</tt>. The <tt class="docutils literal">bytes % int</tt> instruction became between 30% and 50%
faster on a microbenchmark.</p>
<p>Later, I replaced <tt class="docutils literal">char*</tt> type with <tt class="docutils literal">void*</tt> to avoid compiler warnings
in functions using <tt class="docutils literal">Py_UCS1*</tt> or <tt class="docutils literal">unsigned char*</tt>, unsigned types.</p>
</div>
My contributions to CPython during 2015 Q42016-03-01T15:00:00+01:002016-03-01T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-03-01:/contrib-cpython-2015q4.html<p class="first last">My contributions to CPython during 2015 Q4</p>
<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2015 Q4
(october, november, december):</p>
<pre class="literal-block">
hg log -r 'date("2015-10-01"):date("2015-12-31")' --no-merges -u Stinner
</pre>
<p>Statistics: 100 non-merge commits + 25 merge commits (total: 125 commits).</p>
<p>As usual, I pushed changes of various contributors and helped them to polish
their change.</p>
<p>I fighted against a recursion error, a regression introduced by my recent work
on the Python test suite.</p>
<p>I focused on optimizing the bytes type during this quarter. It started with the
issue #24870 opened by <strong>INADA Naoki</strong> who works on PyMySQL: decoding bytes
using the surrogateescape error handler was the bottleneck of this benchmark.
For me, it was an opportunity for a new attempt to implement a fast "bytes
writer API".</p>
<p>I pushed my first change related to <a class="reference external" href="http://faster-cpython.readthedocs.org/fat_python.html">FAT Python</a>! Fix parser and AST:
fill lineno and col_offset of "arg" node when compiling AST from Python
objects.</p>
<p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q3.html">My contributions to CPython during 2015 Q3</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q1.html">My contributions to
CPython during 2016 Q1</a>.</p>
<div class="section" id="recursion-error">
<h2>Recursion error</h2>
<div class="section" id="the-bug-issue-25274">
<h3>The bug: issue #25274</h3>
<p>During the previous quarter, I refactored Lib/test/regrtest.py huge file (1,600
lines) into a new Lib/test/libregrtest/ library (8 files). The problem is that
test_sys started to crash with "Fatal Python error: Cannot recover from stack
overflow" on test_recursionlimit_recovery(). The regression was introduced by a
change on regrtest which indirectly added one more Python frame in the code
executing test_sys.</p>
<p>CPython has a limit on the depth of a call stack: <tt class="docutils literal">sys.getrecursionlimit()</tt>,
1000 by default. The limit is a weak protection against overflow of the C
stack. Weak because it only counts Python frames, intermediate C functions may
allocate a lot of memory on the stack.</p>
<p>When we reach the limit, an "overflow" flag is set, but we still allow up to
limit+50 frames, because handling a RecursionError may need a few more frames.
The overflow flag is cleared when the stack level goes below a "low-water
mark".</p>
<p>After the regrtest change, test_recursionlimit_recovery() was called at stack
level 36. Before, it was called at level 35. The test triggers a RecursionError.
The problem is that we never goes again below the low-water mark, so the
overflow flag is never cleared.</p>
</div>
<div class="section" id="the-fix">
<h3>The fix</h3>
<p>Another problem is that the function used to compute the "low-level mark" was
not monotonic:</p>
<pre class="literal-block">
if limit > 100:
low_water_mark = limit - 50
else:
low_water_mark = 3 * limit // 4
</pre>
<p>The gap occurs near a limit of 100 frames:</p>
<ul class="simple">
<li>limit = 99 => low_level_mark = 74</li>
<li>limit = 100 => low_level_mark = 75</li>
<li>limit = 101 => low_level_mark = 51</li>
</ul>
<p>The formula was replaced with:</p>
<pre class="literal-block">
if limit > 200:
low_water_mark = limit - 50
else:
low_water_mark = 3 * limit // 4
</pre>
<p>The fix (<a class="reference external" href="https://hg.python.org/cpython/rev/eb0c76442cee">change eb0c76442cee</a>) modified the
<tt class="docutils literal">sys.setrecursionlimit()</tt> function to raise a <tt class="docutils literal">RecursionError</tt> exception if
the new limit is too low depending on the <em>current</em> stack depth.</p>
</div>
</div>
<div class="section" id="optimizations">
<h2>Optimizations</h2>
<p>As usual for performance, Serhiy Storchaka was very helpful on reviews, to run
independant benchmarks, etc.</p>
<p>Optimizations on the <tt class="docutils literal">bytes</tt> type, ASCII, Latin1 and UTF-8 codecs:</p>
<ul class="simple">
<li>Issue #25318: Add _PyBytesWriter API. Add a new private API to optimize
Unicode encoders. It uses a small buffer of 512 bytes allocated on the stack
and supports configurable overallocation.</li>
<li>Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.</li>
<li>unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.</li>
<li>Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: <tt class="docutils literal">ignore</tt>, <tt class="docutils literal">replace</tt>, <tt class="docutils literal">surrogateescape</tt>, <tt class="docutils literal">surrogatepass</tt>.
Patch co-written with <strong>Serhiy Storchaka</strong>.</li>
<li>Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: <tt class="docutils literal">ignore</tt>, <tt class="docutils literal">replace</tt> and <tt class="docutils literal">surrogateescape</tt>.</li>
<li>Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers
in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.</li>
<li>Issue #25349: Optimize bytes % args using the new private _PyBytesWriter API</li>
<li>Optimize error handlers of ASCII and Latin1 encoders when the replacement
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.</li>
<li>Issue #25349: Optimize bytes % int. Formatting is between 30% and 50% faster
on a microbenchmark.</li>
<li>Issue #25357: Add an optional newline paramer to binascii.b2a_base64().
base64.b64encode() uses it to avoid a memory copy.</li>
<li>Issue #25353: Optimize unicode escape and raw unicode escape encoders: use
the new _PyBytesWriter API.</li>
<li>Rewrite PyBytes_FromFormatV() using _PyBytesWriter API</li>
<li>Issue #25399: Optimize bytearray % args. Most formatting operations are now
between 2.5 and 5 times faster.</li>
<li>Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now
between 2x and 3.5x faster.</li>
</ul>
</div>
<div class="section" id="changes">
<h2>Changes</h2>
<ul class="simple">
<li>Issue #25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom()
function instead of the getentropy() function. The getentropy() function is
blocking to generate very good quality entropy, os.urandom() doesn't need
such high-quality entropy.</li>
<li>Issue #22806: Add <tt class="docutils literal">python <span class="pre">-m</span> test <span class="pre">--list-tests</span></tt> command to list tests.</li>
<li>Issue #25670: Remove duplicate getattr() in ast.NodeTransformer</li>
<li>Issue #25557: Refactor _PyDict_LoadGlobal(). Don't fallback to
PyDict_GetItemWithError() if the hash is unknown: compute the hash instead.
Add also comments to explain the _PyDict_LoadGlobal() optimization.</li>
<li>Issue #25868: Try to make test_eintr.test_sigwaitinfo() more reliable
especially on slow buildbots</li>
</ul>
</div>
<div class="section" id="changes-specific-to-python-2-7">
<h2>Changes specific to Python 2.7</h2>
<ul class="simple">
<li>Closes #25742: locale.setlocale() now accepts a Unicode string for its second
parameter.</li>
</ul>
</div>
<div class="section" id="bugfixes">
<h2>Bugfixes</h2>
<ul class="simple">
<li>Fix regrtest --coverage on Windows</li>
<li>Fix pytime on OpenBSD</li>
<li>More fixes for test_eintr on FreeBSD</li>
<li>Close #25373: Fix regrtest --slow with interrupted test</li>
<li>Issue #25555: Fix parser and AST: fill lineno and col_offset of "arg" node
when compiling AST from Python objects. First contribution related
to FAT Python ;-)</li>
<li>Issue #25696: Fix installation of Python on UNIX with make -j9.</li>
</ul>
</div>
My contributions to CPython during 2015 Q32016-02-18T01:00:00+01:002016-02-18T01:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-02-18:/contrib-cpython-2015q3.html<p class="first last">My contributions to CPython during 2015 Q3</p>
<p>A few years ago, someone asked me: "Why do you contribute to CPython? Python is
perfect, there are no more bugs, right?". The article list most of my
contributions to CPython during 2015 Q3 (july, august, september). It gives an
idea of which areas of Python are not perfect yet :-)</p>
<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2015 Q3
(july, august, september):</p>
<pre class="literal-block">
hg log -r 'date("2015-07-01"):date("2015-09-30")' --no-merges -u Stinner
</pre>
<p>Statistics: 153 non-merge commits + 75 merge commits (total: 228 commits).</p>
<p>The major event in Python of this quarter was the release of Python 3.5.0.</p>
<p>As usual, I helped various contributors to refine their changes and I pushed
their final changes.</p>
<p>Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q4.html">My contributions to CPython during 2015 Q4</a>.</p>
<div class="section" id="freebsd-kernel-bug">
<h2>FreeBSD kernel bug</h2>
<p>It took me a while to polish the implementation of the <a class="reference external" href="https://www.python.org/dev/peps/pep-0475/">PEP 475 (retry syscall
on EINTR)</a> especially its unit
test <tt class="docutils literal">test_eintr</tt>. The unit test is supposed to test Python, but as usual,
it also tests indirectly the operating system.</p>
<p>I spent some days investigating a random hang on the FreeBSD buildbots: <a class="reference external" href="https://bugs.python.org/issue25122">issue
#25122</a>. I quickly found the guilty test
(test_eintr.test_open), but it took me a while to understand that it was a
kernel bug in the FIFO driver. Hopefully at the end, I was able to reproduce
the bug with a short C program in my FreeBSD VM. It is the best way to ask a
fix upstream.</p>
<p>My <a class="reference external" href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203162">FreeBSD bug report #203162</a> ("when close(fd)
on a fifo fails with EINTR, the file descriptor is not really closed") was
quickly fixed. The FreeBSD team is reactive!</p>
<p>I like free softwares because it's possible to investigate bugs deep in the
code, and it's usually quick to get a fix.</p>
</div>
<div class="section" id="timestamp-rounding-issue">
<h2>Timestamp rounding issue</h2>
<p>Even if the <a class="reference external" href="http://bugs.python.org/issue23517">issue #23517</a> is well defined
and simple to fix, it took me days (weeks?) to understand exactly how
timestamps are supposed to be rounded and agree on the "right" rounding method.
Alexander Belopolsky reminded me the important property:</p>
<pre class="literal-block">
(datetime(1970,1,1) + timedelta(seconds=t)) == datetime.utcfromtimestamp(t)
</pre>
<p>Tim Peters helped me to understand why Python rounds to nearest with ties going
away from zero (ROUND_HALF_UP) in <tt class="docutils literal">round(float)</tt> and other functions. At
the first look, the rounding method doesn't look natural nor logical:</p>
<pre class="literal-block">
>>> round(0.5)
0
>>> round(1.5)
2
</pre>
<p>See my previous article on the _PyTime API for the long story of rounding
methods between Python 3.2 and Python 3.6: <a class="reference external" href="https://vstinner.github.io/pytime.html">History of the Python private C API
_PyTime</a>.</p>
</div>
<div class="section" id="enhancements">
<h2>Enhancements</h2>
<ul class="simple">
<li>type_call() now detect C bugs in type __new__() and __init__() methods.</li>
<li>Issue #25220: Enhancements of the test runner: add more info when regrtest runs
tests in parallel, fix some features of regrtest, add functional tests to
test_regrtest.</li>
</ul>
</div>
<div class="section" id="optimizations">
<h2>Optimizations</h2>
<ul class="simple">
<li>Issue #25227: Optimize ASCII and latin1 encoders with the <tt class="docutils literal">surrogateescape</tt>
error handler: the encoders are now up to 3 times as fast.</li>
</ul>
</div>
<div class="section" id="changes">
<h2>Changes</h2>
<ul class="simple">
<li>Polish the implementation of the PEP 475 (retry syscall on EINTR)</li>
<li>Work on the "What's New in Python 3.5" document: add my changes
(PEP 475, socket timeout, os.urandom)</li>
<li>Work on asyncio: fix ResourceWarning warnings, fixes specific to Windows</li>
<li>test_time: rewrite rounding tests of the private pytime API</li>
<li>Issue #24707: Remove an assertion in monotonic clock. Don't check anymore at
runtime that the monotonic clock doesn't go backward. Yes, it happens! It
occurs sometimes each month on a Debian buildbot slave running in a VM.</li>
<li>test_eintr: replace os.fork() with subprocess (fork+exec) to make the test
more reliable</li>
</ul>
</div>
<div class="section" id="changes-specific-to-python-2-7">
<h2>Changes specific to Python 2.7</h2>
<ul class="simple">
<li>Backport python-gdb.py changes: enhance py-bt command</li>
<li>Issue #23375: Fix test_py3kwarn for modules implemented in C</li>
</ul>
</div>
<div class="section" id="bug-fixes">
<h2>Bug fixes</h2>
<ul class="simple">
<li>Closes #23247: Fix a crash in the StreamWriter.reset() of CJK codecs</li>
<li>Issue #24732, #23834: Fix sock_accept_impl() on Windows. Regression of the
PEP 475 (retry syscall on EINTR)</li>
<li>test_gdb: fix regex to parse the GDB version and fix ResourceWarning on error</li>
<li>Fix test_warnings: don't modify warnings.filters to fix random failures of
the test.</li>
<li>Issue #24891: Fix a race condition at Python startup if the file descriptor
of stdin (0), stdout (1) or stderr (2) is closed while Python is creating
sys.stdin, sys.stdout and sys.stderr objects.</li>
<li>Issue #24684: socket.socket.getaddrinfo() now calls
PyUnicode_AsEncodedString() instead of calling the encode() method of the
host, to handle correctly custom string with an encode() method which doesn't
return a byte string. The encoder of the IDNA codec is now called directly
instead of calling the encode() method of the string.</li>
<li>Issue #25118: Fix a regression of Python 3.5.0 in os.waitpid() on Windows.
Add an unit test on os.waitpid()</li>
<li>Issue #25122: Fix test_eintr, kill child process on error</li>
<li>Issue #25155: Add _PyTime_AsTimevalTime_t() function to fix a regression:
support again years after 2038.</li>
<li>Issue #25150: Hide the private _Py_atomic_xxx symbols from the public
Python.h header to fix a compilation error with OpenMP. PyThreadState_GET()
becomes an alias to PyThreadState_Get() to avoid ABI incompatibilies.</li>
<li>Issue #25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom()
function instead of the getentropy() function.</li>
</ul>
</div>
History of the Python private C API _PyTime2016-02-17T22:00:00+01:002016-02-17T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-02-17:/pytime.html<p class="first last">History of the Python private C API _PyTime</p>
<p>I added functions to the private "pytime" library to convert timestamps from/to
various formats. I expected to spend a few days, at the end I spent 3 years
(2012-2015) on them!</p>
<div class="section" id="python-3-3">
<h2>Python 3.3</h2>
<p>In 2012, I proposed the <a class="reference external" href="https://www.python.org/dev/peps/pep-0410/">PEP 410 -- Use decimal.Decimal type for timestamps</a> because storing timestamps as
floating point numbers looses precision. The PEP was rejected because it
modified many functions and had a bad API. At least, os.stat() got 3 new fields
(atime_ns, mtime_ns, ctime_ns): timestamps as a number of nanoseconds
(<tt class="docutils literal">int</tt>).</p>
<p>My <a class="reference external" href="https://www.python.org/dev/peps/pep-0418/">PEP 418 -- Add monotonic time, performance counter, and process time
functions</a> was accepted, Python
3.3 got a new <tt class="docutils literal">time.monotonic()</tt> function (and a few others). Again, I spent
much more time than I expected on a problem which looked simple at the first
look.</p>
<p>With the <a class="reference external" href="http://bugs.python.org/issue14180">issue #14180</a>, I added functions
to convert timestamps to the private "pytime" API to factorize the code of
various modules. Timestamps were rounded towards +infinity (ROUND_CEILING), but
it was not a deliberate choice.</p>
</div>
<div class="section" id="python-3-4">
<h2>Python 3.4</h2>
<p>To fix correctly a performance issue in asyncio (<a class="reference external" href="https://bugs.python.org/issue20311">issue20311</a>), I added two rounding modes to the
pytime API: _PyTime_ROUND_DOWN (round towards zero), and _PyTime_ROUND_UP
(round away from zero). Polling for events (ex: using <tt class="docutils literal">select.select()</tt>) with
a non-zero timestamp must not call the underlying C level in non-blocking mode.</p>
</div>
<div class="section" id="python-3-5">
<h2>Python 3.5</h2>
<p>When working on the <a class="reference external" href="https://bugs.python.org/issue22117">issue #22117</a>, I
noticed that the implementation of rounding methods was buggy for negative
timestamps. I replaced the _PyTime_ROUND_DOWN with _PyTime_ROUND_FLOOR (round
towards minus infinity), and _PyTime_ROUND_UP with _PyTime_ROUND_CEILING (round
towards infinity).</p>
<p>This issue also introduced a new private <tt class="docutils literal">_PyTime_t</tt> type to support
nanosecond resolution. The type is an opaque integer type to store timestamps.
In practice, it's a signed 64-bit integer. Since it's an integer, it's easy and
natural to compute the sum or differecence of two timestamps: <tt class="docutils literal">t1 + t2</tt> and
<tt class="docutils literal">t2 - t1</tt>. I added _PyTime_XXX() functions to create a timestamp and
_PyTime_AsXXX() functions to convert a timestamp to a different format.</p>
<p>I had to keep three _PyTime_ObjectToXXX() functions for fromtimestamp() methods
of the datetime module. These methods must support extreme timestamps (year
1..9999), whereas _PyTime_t is "limited" to a delta of +/- 292 years (year
1678..2262).</p>
</div>
<div class="section" id="python-3-6">
<h2>Python 3.6</h2>
<p>In 2015, the <a class="reference external" href="http://bugs.python.org/issue23517">issue #23517</a> reported that
Python 2 and Python 3 don't use the same rounding method in
datetime.datetime.fromtimestamp(): there was a difference of 1 microsecond.</p>
<p>After a long discussion, I modified fromtimestamp() methods of the datetime
module to round to nearest with ties going away from zero (ROUND_HALF_UP), as
done in Python 2.7, as round() in all Python versions.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>It took me three years to stabilize the API and fix all issues. Well, I didn't
spend all my days on it, but it shows that handling time is not a simple issue.</p>
<p>At the Python level, nothing changed, timestamps are still stored as float
(except of the 3 new fieleds of os.stat()).</p>
<p>Python 3.5 only supports timezones with fixed offset, it does not support the
locale timestamp for example. Timezones are still an hot topic: the
<a class="reference external" href="https://mail.python.org/mailman/listinfo/datetime-sig">datetime-sig mailing list</a> was created to
enhance timezone support in Python.</p>
</div>
Status of the FAT Python project, January 12, 20162016-01-12T13:42:00+01:002016-01-12T13:42:00+01:00Victor Stinnertag:vstinner.github.io,2016-01-12:/fat-python-status-janv12-2016.html<p class="first last">Status of the FAT Python project, January 12, 2016</p>
<a class="reference external image-reference" href="http://faster-cpython.readthedocs.org/fat_python.html"><img alt="FAT Python project" class="align-right" src="https://vstinner.github.io/images/fat_python.jpg" /></a>
<p>Previous status: <a class="reference external" href="https://vstinner.github.io/fat-python-status-nov26-2015.html">Status of the FAT Python project, November 26, 2015</a>.</p>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>New optimizations implemented:<ul>
<li>constant propagation</li>
<li>constant folding</li>
<li>dead code elimination</li>
<li>simplify iterable</li>
<li>replace builtin __debug__ variable with its value</li>
</ul>
</li>
<li>Major API refactoring to make the API more generic and reusable by other
projects, and maybe different use case.</li>
<li>Work on 3 different Python Enhancement Proposals (PEP): API for pluggable
static optimizers and function specialization</li>
</ul>
<p>The two previously known major bugs, "Wrong Line Numbers (and Tracebacks)" and
"exec(code, dict)", are now fixed.</p>
</div>
<div class="section" id="python-enhancement-proposals-pep">
<h2>Python Enhancement Proposals (PEP)</h2>
<p>I proposed an API for to support function specialization and static optimizers.
I splitted changes in 3 different Python Enhancement Proposals (PEP):</p>
<ul class="simple">
<li><a class="reference external" href="https://www.python.org/dev/peps/pep-0509/">PEP 509 - Add a private version to dict</a>: "Add a new private version to
builtin <tt class="docutils literal">dict</tt> type, incremented at each change, to implement fast guards
on namespaces."</li>
<li><a class="reference external" href="https://www.python.org/dev/peps/pep-0510/">PEP 510 - Specialize functions</a>: "Add functions to the Python C
API to specialize pure Python functions: add specialized codes with guards.
It allows to implement static optimizers respecting the Python semantics."</li>
<li><a class="reference external" href="https://www.python.org/dev/peps/pep-0511/">PEP 511 - API for AST transformers</a>: "Propose an API to
support AST transformers."</li>
</ul>
<p>The PEP 509 was sent to the python-ideas mailing list for a first round, and
then to python-dev mailing list. The PEP 510 was sent to python-ideas to a
first round. The last PEP was not published yet, I'm still working on it.</p>
</div>
<div class="section" id="major-api-refactor">
<h2>Major API refactor</h2>
<p>The API has been deeply refactored to write the Python Enhancement Proposals.</p>
<p>First set of changes for function specialization (PEP 510):</p>
<ul class="simple">
<li>astoptimizer now adds <tt class="docutils literal">import fat</tt> to optimized code when specialization is
used</li>
<li>Remove the function subtype: add directly the <tt class="docutils literal">specialize()</tt> method to
functions</li>
<li>Add support of any callable object to <tt class="docutils literal">func.specialize()</tt>, not only code
object (bytecode)</li>
<li>Create guard objects:<ul>
<li>fat.Guard</li>
<li>fat.GuardArgType</li>
<li>fat.GuardBuiltins</li>
<li>fat.GuardDict</li>
<li>fat.GuardFunc</li>
</ul>
</li>
<li>Add functions to create guards:<ul>
<li>fat.GuardGlobals</li>
<li>fat.GuardTypeDict</li>
</ul>
</li>
<li>Move code.replace_consts() to fat.replace_consts()</li>
</ul>
<p>Second set of changes for AST transformers (PEP 511):</p>
<ul class="simple">
<li>Add sys.implementation.ast_transformers and sys.implementation.optim_tag</li>
<li>Rename sys.asthook to sys.ast_transformers</li>
<li>Add -X fat command line option to enable the FAT mode: register the
astoptimizer in AST transformers</li>
<li>Replace -F command line option with -o OPTIM_TAG</li>
<li>Remove sys.flags.fat (Python flag) and Py_FatPython (C variable)</li>
<li>Rewrite how an AST transformer is registered</li>
<li>importlib skips .py if optim_tag is not 'opt' and required AST transformers
are missing. Raise ImportError if the .pyc file is missing.</li>
</ul>
<p>Third set of changes for dictionary versionning, updates after the first round
of the PEP 509 on python-ideas:</p>
<ul class="simple">
<li>Remove dict.__version__ read-only property: the version is now only
accessible from the C API</li>
<li>Change the type of the C field <tt class="docutils literal">ma_version</tt> from <tt class="docutils literal">size_t</tt> to <tt class="docutils literal">unsigned
PY_INT64_T</tt> to also use 64-bit unsigned integer on 32-bit platforms. The
risk of missing a change in a guard with a 32-bit version is too high,
whereas the risk with a 64-bit version is very very low.</li>
</ul>
<p>Fourth set of changes for function specialization, updates after the first round
of the PEP 510 on python-ideas:</p>
<ul class="simple">
<li>Remove func.specialize() and func.get_specialized() at the Python level,
replace them with C functions. Expose them again as fat.specialize(func, ...)
and fat.get_specialized(func)</li>
<li>fat.get_specialized() now returns a list of tuples, instead of a list of dict</li>
<li>Make fat.Guard type private: rename it to fat._Guard</li>
<li>Add fat.PyGuard: toy to implement a guard in pure Python</li>
<li>Guard C API: rename first_check to init and support reporting errors</li>
</ul>
</div>
<div class="section" id="change-log">
<h2>Change log</h2>
<p>Detailed changes of the FAT Python between November 24, 2015 and January 12,
2016.</p>
<div class="section" id="end-of-november">
<h3>End of november</h3>
<p>Major change:</p>
<ul class="simple">
<li>Add a __version__ read-only property to dict, remove the verdict subtype of
dict. As a consequence, dictionary guards now hold a strong reference to the
dict value</li>
</ul>
<p>Minor changes:</p>
<ul class="simple">
<li>Allocate dynamically memory for specialized code and guards, don't use fixed-size
arrays anymore</li>
<li>astoptimizer: enhance scope detection</li>
<li>optimize astoptimizer: don't copy a whole AST tree anymore with
copy.deepcopy(), only copy modified nodes.</li>
<li>Add Config.max_constant_size</li>
<li>Reenable checks on cell variables: allow cell variables if they are the same</li>
<li>Reenable optimizations on methods calling super(), but never copy super()
builtin to constants. If super() is replaced with a string, the required free
variable (reference to the current class) is not created by the compiler</li>
<li>Add PureBuiltin config</li>
<li>NodeVisitor now calls generic_visit() before visit_XXX()</li>
<li>Loop unrolling now also optimizes tuple iterators</li>
<li>At the end of Python initialization, create a copy of the builtins dictionary
to be able later to detect if a builtin name was replaced.</li>
<li>Implement collections.UserDict.__version__</li>
</ul>
</div>
<div class="section" id="december-first-half">
<h3>December (first half)</h3>
<p>Major changes:</p>
<ul class="simple">
<li>Implement 4 new optimizations:<ul>
<li>constant propagation</li>
<li>constant folding</li>
<li>replace builtin __debug__ variable with its value</li>
<li>dead code elimination</li>
</ul>
</li>
<li>Add support of per module configuration using an __astoptimizer__ variable</li>
<li>code.co_lnotab now supports negative line number delta. Change the type of
line number delta in co_lnotab from unsigned 8-bit integer to signed 8-bit
integer. This change fixes almost all issues about line numbers.</li>
</ul>
<p>Minor changes:</p>
<ul class="simple">
<li>Change .pyc magic number to 3600</li>
<li>Remove unused fat.specialized_method() function</li>
<li>Remove Lib/fat.py, rename Modules/_fat.c to Modules/fat.c: fat module is now
only implemented in C</li>
<li>Fix more tests of the Python test suite</li>
<li>A builtin guard now adds a guard on globals. Ignore also the specialization
if globals()[name] already exists.</li>
<li>Ignore duplicated guards</li>
<li>Implement namespace following the control flow for constant propagation</li>
<li>Config.max_int_bits becomes a simple integer</li>
<li>Fix bytecode compilation for tuple constants. Don't merge (0, 0) and (0.0,
0.0) constants, they are different.</li>
<li>Call more builtin functions</li>
<li>Optimize the optimizer: write a metaclass to discover visitors when the class
is created, not when the class is instanciated</li>
</ul>
</div>
<div class="section" id="december-second-half">
<h3>December (second half)</h3>
<p>Major changes:</p>
<ul class="simple">
<li>Implement "simplify iterable" optimization. The loop unrolling optimization
now relies on it to replace <tt class="docutils literal">range(n)</tt>.</li>
<li>Split the function optimization in two stages: first apply optimizations
which don't require specialization, then apply optimizations which
require specialization.</li>
<li>Replace the builtin __fat__ variable with a new sys.flags.fat flag</li>
</ul>
<p>Minor changes:</p>
<ul class="simple">
<li>Extend optimizations to optimize more cases (more builtins, more loop
unrolling, remove more dead code, etc.)</li>
<li>Add Config.logger attribute. astoptimize logs into sys.stderr when Python is
started in verbose mode (python3 -v)</li>
<li>Move func.patch_constants() to code.replace_consts()</li>
<li>Enhance marshal to fix tests: call frozenset() to get the empty frozenset
singleton</li>
<li>Don't remove code which must raise a SyntaxError. Don't remove code
containing the continue instruction.</li>
<li>Restrict GlobalNonlocalVisitor to the current namespace</li>
<li>Emit logs when optimizations are skipped</li>
<li>Use some maths to avoid optimization pow() if result is an integer and will
be larger than the configuration. For example, don't optimize 2 ** (2**100).</li>
</ul>
</div>
<div class="section" id="january">
<h3>January</h3>
<p>Major changes:</p>
<ul class="simple">
<li>astoptimizer now produces a single builtin guard with all names,
instead of a guard per name.</li>
<li>Major API refactoring detailed in a dedicated section above</li>
</ul>
<p>Minor changes:</p>
<ul class="simple">
<li>Start to write PEPs</li>
<li>Dictionary guards now expect a list of names, instead of a single name, to
reduce the cost of guards.</li>
<li>GuardFunc now uses a strong reference to the function, instead of a weak
reference to simplify the code</li>
<li>Initialize dictionary version to 0</li>
</ul>
</div>
</div>
Status of the FAT Python project, November 26, 20152015-11-26T17:30:00+01:002015-11-26T17:30:00+01:00Victor Stinnertag:vstinner.github.io,2015-11-26:/fat-python-status-nov26-2015.html<p class="first last">Status of the FAT Python project, November 26, 2015</p>
<a class="reference external image-reference" href="http://faster-cpython.readthedocs.org/fat_python.html"><img alt="FAT Python project" class="align-right" src="https://vstinner.github.io/images/fat_python.jpg" /></a>
<p>Previous status: [python-dev] <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2015-November/142113.html">Second milestone of FAT Python</a>
(Nov 4, 2015).</p>
<div class="section" id="documentation">
<h2>Documentation</h2>
<p>I combined the documentation of various optimizations projects into a single
documentation: <a class="reference external" href="http://faster-cpython.readthedocs.org/">Faster CPython</a>.
My previous optimizations projects:</p>
<ul class="simple">
<li><a class="reference external" href="http://faster-cpython.readthedocs.org/old_ast_optimizer.html">"old" astoptimizer</a> (now
replaced with a "new" astoptimizer included in the FAT Python)</li>
<li><a class="reference external" href="http://faster-cpython.readthedocs.org/registervm.html">registervm</a></li>
<li><a class="reference external" href="http://faster-cpython.readthedocs.org/readonly.html">read-only Python</a></li>
</ul>
<p>The FAT Python project has its own page: <a class="reference external" href="http://faster-cpython.readthedocs.org/fat_python.html">FAT Python project</a>.</p>
</div>
<div class="section" id="copy-builtins-to-constants-optimization">
<h2>Copy builtins to constants optimization</h2>
<p>The <tt class="docutils literal">LOAD_GLOBAL</tt> instruction is used to load a builtin function. The
instruction requires two dictionary lookup: one in the global namespace (which
almost always fail) and then in the builtin namespaces.</p>
<p>It's rare to replace builtins, so the idea here is to replace the dynamic
<tt class="docutils literal">LOAD_GLOBAL</tt> instruction with a static <tt class="docutils literal">LOAD_CONST</tt> instruction which
loads the function from a C array, a fast O(1) lookup.</p>
<p>It is not possible to inject a builtin function during the compilation. Python
code objects are serialized by the marshal module which only support simple
types like integers, strings and tuples, not functions. The trick is to modify
the constants at runtime when the module is loaded. I added a new
<tt class="docutils literal">patch_constants()</tt> method to functions.</p>
<p>Example:</p>
<pre class="literal-block">
def log(message):
print(message)
</pre>
<p>This function is specialized to:</p>
<pre class="literal-block">
def log(message):
'LOAD_GLOBAL print'(message)
log.patch_constants({'LOAD_GLOBAL print': print})
</pre>
<p>The specialized bytecode uses two guards on builtin and global namespaces to
disable the optimization if the builtin function is replaced.</p>
<p>See <a class="reference external" href="https://faster-cpython.readthedocs.org/fat_python.html#copy-builtin-functions-to-constants">Copy builtin functions to constants</a>
for more information.</p>
</div>
<div class="section" id="loop-unrolling-optimization">
<h2>Loop unrolling optimization</h2>
<p>A simple optimization is to "unroll" a loop to reduce the cost of loops. The
optimization generates assignement statements (for the loop index variable)
and duplicates the loop body.</p>
<p>Example with a <tt class="docutils literal">range()</tt> iterator:</p>
<pre class="literal-block">
def func():
for i in (1, 2, 3):
print(i)
</pre>
<p>The function is specialized to:</p>
<pre class="literal-block">
def func():
i = 1
print(i)
i = 2
print(i)
i = 3
print(i)
</pre>
<p>If the iterator uses the builtin <tt class="docutils literal">range</tt> function, two guards are
required on builtin and global namespaces.</p>
<p>The optimization also handles tuple iterator. No guard is needed in this case
(the code is always optimized).</p>
<p>See <a class="reference external" href="https://faster-cpython.readthedocs.org/fat_python.html#loop-unrolling">Loop unrolling</a>
for more information.</p>
</div>
<div class="section" id="lot-of-enhancements-of-the-ast-optimizer">
<h2>Lot of enhancements of the AST optimizer</h2>
<p>New optimizations helped to find bugs in the <a class="reference external" href="https://faster-cpython.readthedocs.org/new_ast_optimizer.html">AST optimizer</a>. Many fixes
and various enhancements were done in the AST optimizer.</p>
<p>The number of lines of code more than doubled: 500 to 1200 lines.</p>
<p>Optimization: <tt class="docutils literal">copy.deepcopy()</tt> is no more used to duplicate a full tree. The
new <tt class="docutils literal">NodeTransformer</tt> class now only copies a single node, if at least one
field is modified.</p>
<p>The <tt class="docutils literal">VariableVisitor</tt> class which detects local and global variables was
heavily modified. It understands much more kinds of AST node: <tt class="docutils literal">For</tt>, <tt class="docutils literal">AugAssign</tt>,
<tt class="docutils literal">AsyncFunctionDef</tt>, <tt class="docutils literal">ClassDef</tt>, etc. It now also detects non-local
variables (<tt class="docutils literal">nonlocal</tt> keyword). The scope is now limited to the current
function, it doesn't enter inside nested <tt class="docutils literal">DictComp</tt>, <tt class="docutils literal">FunctionDef</tt>,
<tt class="docutils literal">Lambda</tt>, etc. These nodes create a new separated namespace.</p>
<p>The optimizer is now able to optimize a function without guards: it's needed to
unroll a loop using a tuple as iterator.</p>
</div>
<div class="section" id="known-bugs">
<h2>Known bugs</h2>
<p>See the <a class="reference external" href="https://hg.python.org/sandbox/fatpython/file/0d30dba5fa64/TODO.rst">TODO.rst file</a> for
known bugs.</p>
<div class="section" id="wrong-line-numbers-and-tracebacks">
<h3>Wrong Line Numbers (and Tracebacks)</h3>
<p>AST nodes have <tt class="docutils literal">lineno</tt> and <tt class="docutils literal">col_offset</tt> fields, so an AST optimizer is not
"supposed" to break line numbers. In practice, line numbers, and so tracebacks,
are completly wrong in FAT mode. The problem is probably that AST optimizer can
copy and move instructions. Line numbers are no more motononic. CPython
probably don't handle this case (negative line delta).</p>
<p>It should be possible to fix it, but right now I prefer to focus on new
optimizations and fix other bugs.</p>
</div>
<div class="section" id="exec-code-dict">
<h3>exec(code, dict)</h3>
<p>In FAT mode, some optimizations require guards on the global namespace.
If <tt class="docutils literal">exec()</tt> if called with a Python <tt class="docutils literal">dict</tt> for globals, an exception
is raised because <tt class="docutils literal">func.specialize()</tt> requires a <tt class="docutils literal">fat.verdict</tt> for
globals.</p>
<p>It's not possible to convert implicitly the <tt class="docutils literal">dict</tt> to a <tt class="docutils literal">fat.verdict</tt>,
because the <tt class="docutils literal">dict</tt> is expected to be mutated, and the guards be will on
<tt class="docutils literal">fat.verdict</tt> not on the original <tt class="docutils literal">dict</tt>.</p>
<p>I worked around the bug by creating manually a <tt class="docutils literal">fat.verdict</tt> in FAT mode,
instead of a <tt class="docutils literal">dict</tt>.</p>
<p>This bug will go avoid if the versionning feature is moved directly into
the builtin <tt class="docutils literal">dict</tt> type (and the <tt class="docutils literal">fat.verdict</tt> type is removed).</p>
</div>
</div>
Port your Python 2 applications to Python 3 with sixer2015-06-16T15:00:00+02:002015-06-16T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2015-06-16:/python3-sixer.html<p class="first last">Port your Python 2 applications to Python 3 with sixer</p>
<div class="section" id="from-2to3-to-2to6">
<h2>From 2to3 to 2to6</h2>
<p>When Python 3.0 was released, the official statement was to port your
application using <a class="reference external" href="https://docs.python.org/3.5/library/2to3.html">2to3</a> and
drop Python 2 support. It didn't work because you had to port all libraries
first. If a library drops Python 2 support, existing applications running on
Python 2 cannot use this library anymore.</p>
<p>This chicken-and-egg issue was solved by the creation of the <a class="reference external" href="https://pythonhosted.org/six/">six module</a> by <a class="reference external" href="https://benjamin.pe/">Benjamin Peterson</a>. Thank you so much Benjamin! Using the six module, it
is possible to write a single code base working on Python 2 and Python 3.</p>
<p>2to3 was hacked to create the <a class="reference external" href="http://python-modernize.readthedocs.org/">modernize</a> and <a class="reference external" href="https://github.com/limodou/2to6">2to6</a> projects to <em>add Python 3 support</em> without
loosing Python 2 support. Problem solved!</p>
</div>
<div class="section" id="creation-of-the-sixer-tool">
<h2>Creation of the sixer tool</h2>
<p>Problem solved? Well, not for my specific use case. I'm porting the huge
OpenStack project to Python 3. modernize and 2to6 modify a lot of things at
once, add unwanted changes (ex: add <tt class="docutils literal">from __future__ import absolute_import</tt>
at the top of each file), and don't respect the OpenStack coding style
(especially the <a class="reference external" href="http://docs.openstack.org/developer/hacking/#imports">complex rules to sort and group Python imports</a>).</p>
<p>I wrote the <a class="reference external" href="https://pypi.python.org/pypi/sixer">sixer</a> project to
<em>generate</em> patches for OpenStack. The problem is that OpenStack code changes
very quickly, so it's common to have to fix conflicts the day after submiting
a change. At the beginning, it took at least one week to get Python 3 changes
merged, whereas many changes are merged every day, so being able to regenerate
patches helped a lot.</p>
<p>I created the <a class="reference external" href="https://pypi.python.org/pypi/sixer">sixer</a> tool using a list
of regular expressions to replace a pattern with another. For example, it
replaces <tt class="docutils literal">dict.itervalues()</tt> with <tt class="docutils literal">six.itervalues(dict)</tt>. The code was
very simple. The most difficult part was to respect the OpenStack coding
style for Python imports.</p>
<p>sixer is a success since its creationg, it helped me to fix the all obvious
Python 3 issues: replace <tt class="docutils literal">unicode(x)</tt> with <tt class="docutils literal">six.text_type(x)</tt>, replace
<tt class="docutils literal">dict.itervalues()</tt> with <tt class="docutils literal">six.itervalues(dict)</tt>, etc. These changes are
simple, but it's boring to have to modify manually many files. The OpenStack
Nova project has almost 1500 Python files for example.</p>
<p>The development version of sixer supports the following operations:</p>
<ul class="simple">
<li>all</li>
<li>basestring</li>
<li>dict0</li>
<li>dict_add</li>
<li>iteritems</li>
<li>iterkeys</li>
<li>itertools</li>
<li>itervalues</li>
<li>long</li>
<li>next</li>
<li>raise</li>
<li>six_moves</li>
<li>stringio</li>
<li>unicode</li>
<li>urllib</li>
<li>xrange</li>
</ul>
</div>
<div class="section" id="creation-of-the-sixer-test-suite">
<h2>Creation of the Sixer Test Suite</h2>
<p>Slowly, I added more and more patterns to sixer. The code became too complex
to be able to check regressions manually, so I also started to write unit
tests. Now each operation has at least one unit test. Some complex operations
have four tests or more.</p>
<p>At the beginning, tests called directly the Python function. It is fast and
convenient, but it failed to catch regressions on the command line program.
So I added tests running sixer has a blackbox: pass an input file and check
the output file. Then I added specific tests on the code parsing command line
options.</p>
</div>
<div class="section" id="the-new-all-operation">
<h2>The new "all" operation</h2>
<p>At the beginning, I used sixer to generate a patch for a single pattern. For
example, replace <tt class="docutils literal">unicode()</tt> in a whole project.</p>
<p>Later, I started to use it differently: I fixed all Python 3 issues at once,
but only in some selected files. I did that when we reached a minimum set of
tests which pass on Python 3 to have a green py34 check on Jenkins. Then we
ported tests one by one. It's better to write short patches, they are easier
and faster to review. And the review process is the bottlebeck of the
OpenStack development process.</p>
<p>To fix all Python 3 at once, I added an <tt class="docutils literal">all</tt> operation which simply applies
sequentially each operation. So <tt class="docutils literal">sixer</tt> can now be used as <tt class="docutils literal">modernize</tt> and
<tt class="docutils literal">2to6</tt> to fix all Python 3 issues at once in a whole project.</p>
<p>I also added the ability to pass filenames instead of having to pass a
directory to modify all files in all subdirectories.</p>
</div>
<div class="section" id="new-urllib-six-moves-and-stringio-operations">
<h2>New urllib, six_moves and stringio operations</h2>
<div class="section" id="urllib">
<h3>urllib</h3>
<p>I tried to keep the sixer code simple. But some changes are boring to write,
like replacing <tt class="docutils literal">urllib</tt> imports <tt class="docutils literal">six.moves.urllib</tt> imports. Python 2 has 3
modules (<tt class="docutils literal">urllib</tt>, <tt class="docutils literal">urllib2</tt>, <tt class="docutils literal">urlparse</tt>), whereas Pytohn 3 uses a
single <tt class="docutils literal">urllib</tt> namespace with submodules (<tt class="docutils literal">urllib.request</tt>,
<tt class="docutils literal">urllib.parse</tt>, <tt class="docutils literal">urllib.error</tt>). Some Python 2 functions moved to one
submodule, whereas others moved to another submodules. It required to know
well the old and new layout.</p>
<p>After loosing many hours to write manually patches for <tt class="docutils literal">urllib</tt>, I decided
to add a <tt class="docutils literal">urllib</tt> operation. In fact, it was so not long to implement it,
compared to the time taken to write patches manually.</p>
</div>
<div class="section" id="stringio">
<h3>stringio</h3>
<p>Handling StringIO is also a little bit tricky because String.StringIO and
String.cStringIO don't have the same performance on Python 2. Producing
patches without killing performances require to pick the right module or
symbol from six: <tt class="docutils literal">six.StringIO()</tt> or <tt class="docutils literal">six.moves.cStringIO.StringIO</tt> for
example.</p>
</div>
<div class="section" id="six-moves">
<h3>six_moves</h3>
<p>The generic <tt class="docutils literal">six_moves</tt> operation replaces various Python 2 imports with
imports from <tt class="docutils literal">six.moves</tt>:</p>
<ul class="simple">
<li>BaseHTTPServer</li>
<li>ConfigParser</li>
<li>Cookie</li>
<li>HTMLParser</li>
<li>Queue</li>
<li>SimpleHTTPServer</li>
<li>SimpleXMLRPCServer</li>
<li>__builtin__</li>
<li>cPickle</li>
<li>cookielib</li>
<li>htmlentitydefs</li>
<li>httplib</li>
<li>repr</li>
<li>xmlrpclib</li>
</ul>
</div>
</div>
<div class="section" id="kiss-emit-warnings-instead-of-complex-implementation">
<h2>KISS: emit warnings instead of complex implementation</h2>
<p>As I wrote, I tried to keep sixer simple (KISS principle: Keep It Simple,
Stupid). I'm also lazy, I didn't try to write a perfect tool. I don't want to
spend hours on the sixer project.</p>
<p>When it was too tricky to make a decision or to implement a pattern, sixer
emits "warnings" instead. For example, a warning is emitted on
<tt class="docutils literal">def next(self):</tt> to remind that a <tt class="docutils literal">__next__ = next</tt> alias is probably
needed on this class for Python 3.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>The sixer tool is incomplete and generates invalid changes. For example, it
replaces patterns in comments, docstrings and strings, whereas usually these
changes don't make sense. But I'm happy because the tool helped me a lot
for to port OpenStack, it saved me hours.</p>
<p>I hope that the tool will now be useful to others! Don't hesitate to give me
feedback.</p>
</div>