My contributions to CPython during 2017 Q3 (july, august, september), Part 3 (funny bugs).
Previous report: My contributions to CPython during 2017 Q3: Part 2 (dangling threads).
Summary:
- FreeBSD bug: minor() device regression
- regrtest snowball effect when hunting memory leaks
- Bugfixes
- Other Changes
FreeBSD bug: minor() device regression
bpo-31044: The test_makedev() of test_posix started to fail in the build 632 (Wed Jul 26 10:47:01 2017) of AMD64 FreeBSD CURRENT. The test failed on Debug, but also Non-Debug buildbots, in master and 3.6 branches. It looks more like a change on the buildbot, maybe a FreeBSD upgrade?
Thanks to koobs, I have a SSH access to the buildbot. I was able to reproduce the bug manually. I noticed that minor() truncates most significant bits.
I continued my analysis and I found that, at May 23, the FreeBSD dev_t type changed from 32 bits to 64 bits in the kernel, but the minor() userland function was not updated.
I reported a bug to FreeBSD: Bug 221048 - minor() truncates device number to 32 bits, whereas dev_t type was extended to 64 bits.
In the meanwhile, I skipped test_posix.test_makedev() on FreeBSD if dev_t is larger than 32-bit.
Hopefully, the FreeBSD bug was quickly fixed!
regrtest snowball effect when hunting memory leaks
While trying to fix all reference leaks on the new Windows and Linux "Refleaks" buildbots, I reported the bug bpo-31217:
test_code leaked [1, 1, 1] memory blocks, sum=3
Two weeks after reporting the bug, I was able to reproduce the bug, but only with Python compiled in 32-bit mode. Strange.
I spent one day to understand the bug. I removed as much as possible while making sure that I can still reproduce the bug. At the end, I wrote leak2.py which reproduces the bug with a single import: import sys. Even if the script is only 86 lines long, I was still unable to understand the bug.
My first hypothesis:
It seems like the "leak" is the call to sys.getallocatedblocks() which creates a new integer, and the integer is kept alive between two loop iterations.
Antoine Pitrou rejected it:
I doubt it. If that was the case, the reference count would increase as well.
It was Antoine Pitrou who understood the bug:
Ahah. Actually, it's quite simple :-) On 64-bit Python: >>> id(82914 - 82913) == id(1) True On 32-bit Python: >>> id(82914 - 82913) == id(1) False So the first non-zero alloc_delta really has a snowball effect, as it creates new memory block which will produce a non-zero alloc_delta on the next run, etc.
I implemented Antoine's idea to fix the bug, commit:
Use a pool of integer objects to prevent false alarm when checking for memory block leaks. Fill the pool with values in -1000..1000 which are the most common (reference, memory block, file descriptor) differences. Co-Authored-By: Antoine Pitrou <pitrou@free.fr>
The bug is probably as old as the code hunting memory leaks.
Bugfixes
- bpo-30891: Second fix for importlib _find_and_load() to handle correctly parallelism with threads. Call sys.modules.get() in the with _ModuleLockManager(name): block to protect the dictionary key with the module lock and use an atomic get to prevent race conditions.
- bpo-31019: multiprocessing.Process.is_alive() now removes the process from the _children set if the process completed. The change prevents leaking "dangling" processes.
- bpo-31326, concurrent.futures: ProcessPoolExecutor.shutdown() now explicitly closes the call queue. Moreover, shutdown(wait=True) now also joins the call queue thread, to prevent leaking a dangling thread.
- bpo-31170: Update libexpat from 2.2.3 to 2.2.4: fix copying of partial characters for UTF-8 input (libexpat bug 115). Later, I also wrote non-regression tests for this bug (libexpat doesn't have any test for this bug).
- bpo-31499, xml.etree: xmlparser_gc_clear() now sets self.parser to NULL to prevent a crash in xmlparser_dealloc() if xmlparser_gc_clear() was called previously by the garbage collector, because the parser was part of a reference cycle. Fix co-written with Serhiy Storchaka.
- bpo-30892: Fix _elementtree module initialization (accelerator of xml.etree), handle correctly getattr(copy, 'deepcopy') failure to not fail with an assertion error.
Other Changes
- bpo-30866: Add _testcapi.stack_pointer(). I used it to write the "Stack consumption" section of a previous report: My contributions to CPython during 2017 Q1
- _ssl_: Fix compiler warning. Cast Py_buffer.len (Py_ssize_t, signed) to size_t (unsigned) to prevent the "comparison between signed and unsigned integer expressions" warning.
- bpo-30486: Make cell_set_contents() symbol private. Don't export the cell_set_contents() symbol in the C API.