OVHcloud datacenter fire in Strasbourg

This article is about changes that I made, with the help other developers, in the Python C API in Python 3.8, 3.9 and 3.10 to avoid accessing structures members: prepare the C API to make structures opaque. These changes are related to my PEP 620 "Hide implementation details from the C API".

One change had negative impact on performance and had to be reverted. Making Python slower just to make structures opaque would first require to get the PEP 620 accepted.

While compatible changes merged in Python 3.8 and Python 3.9 went fine, one Python 3.10 incompatible change caused more troubles and had to be reverted.

Photo: OVHcloud data center fire in Strasbourg.

Rationale

The C API currently exposes most object structures, C extensions indirectly access structures members through the API, but can also access them directly. It causes different issues:

  • Modifying a structure can break an unknown number of C extensions. To prevent any risk, CPython core developers avoid modifying structures. Once most structures will be opaque, it will be possible to experiment optimizations which require deep structures changes without breaking C extensions. The irony is that we first have to break the backward compatibility and C extensions for that.
  • Any structure change breaks the ABI. The stable ABI solved this issue by not exposing structures into its limited C API. The idea is to bend the default C API towards the limited C API to provide a stable ABI for everyone in the long term.

Opaque structures

  • Python 3.8 made the PyInterpreterState structure opaque.
  • Python 3.9 made the PyGC_Head structure opaque.

Add getter functions to Python 3.9

  • PyObject, PyVarObject:
    • Py_SET_REFCNT()
    • Py_SET_TYPE()
    • Py_SET_SIZE()
    • Py_IS_TYPE()
  • PyFrameObject:
    • PyFrame_GetCode()
    • PyFrame_GetBack()
  • PyThreadState:
    • PyThreadState_GetInterpreter()
    • PyThreadState_GetFrame()
    • PyThreadState_GetID()
  • PyInterpreterState:
    • PyInterpreterState_Get()

PyInterpreterState_Get() can be used to replace PyThreadState_Get()->interp and PyThreadState_GetInterpreter(PyThreadState_Get()).

Convert macros to static inline functions in Python 3.8

Macro pitfalls

Macros are convenient but have multiple pitfalls. Some macros can be abused in surprising ways. For example, the following code is valid with Python 3.9:

if (obj == NULL || PyList_SET_ITEM (l, i, obj) < 0) { ... }

In Python 3.9, PyList_SET_ITEM() returns obj in this case, obj is a pointer, and so the test checks if a pointer is negative which makes no sense (but is accepted by C compilers by default). This code is likely a confusion with PyList_SetItem() which returns a int, negative in case of an error.

Zackery Spytz and me modified PyList_SET_ITEM() and PyCell_SET() macros in Python 3.10 to return void.

This change broke alsa-python: I proposed a fix which was merged.

One nice side effect of converting macros to static inline functions is that debuggers and profilers are able to retrieve the name of the function.

Converted macros

  • Py_INCREF(), Py_XINCREF()
  • Py_DECREF(), Py_XDECREF()
  • PyObject_INIT(), PyObject_INIT_VAR()
  • _PyObject_GC_TRACK(), _PyObject_GC_UNTRACK(), _Py_Dealloc()

Performance

Since Py_INCREF() is criticial for general Python performance, the impact of the change was analyzed in depth before being merged in bpo-35059. The usage of __attribute__((always_inline)) and __forceinline to force inlining was rejected.

Cast to PyObject*

Old Py_INCREF() implementation in Python 3.7:

#define Py_INCREF(op) (                   \
    _Py_INC_REFTOTAL  _Py_REF_DEBUG_COMMA \
    ((PyObject *)(op))->ob_refcnt++)

where _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA becomes _Py_RefTotal++, if the Py_REF_DEBUG macro is defined, or nothing otherwise. Current Py_INCREF() implementation in Python 3.10:

static inline void _Py_INCREF(PyObject *op)
{
#ifdef Py_REF_DEBUG
    _Py_RefTotal++;
#endif
    op->ob_refcnt++;
}
#define Py_INCREF(op) _Py_INCREF(_PyObject_CAST(op))

Most static inline functions go through a macro to cast their argument to PyObject* using the macro:

#define _PyObject_CAST(op) ((PyObject*)(op))

Convert macros to regular functions in Python 3.9

Converted macros

  • PyIndex_Check()
  • PyObject_CheckBuffer()
  • PyObject_GET_WEAKREFS_LISTPTR()
  • PyObject_IS_GC()
  • PyObject_NEW(): alias to PyObject_New()
  • PyObject_NEW_VAR(): alias to PyObjectVar_New()

Performance

PyType_HasFeature() was modified to always call PyType_GetFlags() function, rather than accessing directly PyTypeObject.tp_flags. The problem is that on macOS, Python is built without LTO, the PyType_GetFlags() call is not inlined, making functions like tuplegetter_descr_get() slower: see bpo-39542. I reverted the PyType_HasFeature() change until the PEP 620 is accepted. macOS does not use LTO to keep support support for macOS 10.6 (Snow Leopard): see bpo-41181.

Fast static inline functions

To keep best performances on Python built without LTO, fast private variants were added as static inline functions to the internal C API:

  • _PyIndex_Check()
  • _PyObject_IS_GC()
  • _PyType_HasFeature()
  • _PyType_IS_GC()

For example, PyObject_IS_GC() is defined as a function, whereas _PyObject_IS_GC() is defined as an internal static inline function. Header file:

/* Test if an object implements the garbage collector protocol */
PyAPI_FUNC(int) PyObject_IS_GC(PyObject *obj);

// Fast inlined version of PyObject_IS_GC()
static inline int _PyObject_IS_GC(PyObject *obj)
{
    return (PyType_IS_GC(Py_TYPE(obj))
            && (Py_TYPE(obj)->tp_is_gc == NULL
                || Py_TYPE(obj)->tp_is_gc(obj)));
}

C code:

int
PyObject_IS_GC(PyObject *obj)
{
    return _PyObject_IS_GC(obj);
}

Python 3.10 incompatible C API change

The Py_REFCNT() macro was converted to a static inline function: Py_REFCNT(obj) = refcnt; now fails with a compiler error. It must be replaced with Py_SET_REFCNT(obj, refcnt): Py_SET_REFCNT() was added to Python 3.9.

The complex case of Py_TYPE() and Py_SIZE() macros

Macros converted and then reverted

The Py_TYPE() and Py_SIZE() macros were also converted to static inline functions in Python 3.10, but the change broke 17 C extensions.

Since the change broke too many C extensions, I reverted the change: I converted Py_TYPE() and Py_SIZE() back to macros to have more time to fix fix C extensions.

What's Next?

  • Convert again Py_TYPE() and Py_SIZE() macros to static inline functions.
  • Add "%T" formatter for Py_TYPE(obj)->tp_name: see rejected bpo-34595.
  • Modify Cython to use getter functions.
  • Attempt to make some structures opaque, like PyThreadState.