PEP 757 – C API to import-export Python integers
- Author:
- Sergey B Kirpichev <skirpichev at gmail.com>, Victor Stinner <vstinner at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Final
- Type:
- Standards Track
- Created:
- 13-Sep-2024
- Python-Version:
- 3.14
- Post-History:
- 14-Sep-2024
- Resolution:
- 08-Dec-2024
Table of Contents
- Abstract
- Rationale
- Specification
- Optimize import for small integers
- Implementation
- Benchmarks- Export: PyLong_Export()with gmpy2
- Import: PyLongWriter_Create()with gmpy2
 
- Export: 
- Backwards Compatibility
- Rejected Ideas- Support arbitrary layout
- Don’t add PyLong_GetNativeLayout()function
- Provide mpz_import/export-like API instead
- Drop valuefield from the export API
 
- Discussions
- Copyright
Abstract
Add a new C API to import and export Python integers, int objects:
especially PyLongWriter_Create() and PyLong_Export() functions.
Rationale
Projects such as gmpy2, SAGE and Python-FLINT access directly Python
“internals” (the PyLongObject structure) or use an inefficient
temporary format (hex strings for Python-FLINT) to import and
export Python int objects.  The Python int implementation
changed in Python 3.12 to add a tag and “compact values”.
In the 3.13 alpha 1 release, the private undocumented _PyLong_New()
function had been removed, but it is being used by these projects to
import Python integers. The private function has been restored in 3.13
alpha 2.
A public efficient abstraction is needed to interface Python with these projects without exposing implementation details. It would allow Python to change its internals without breaking these projects. For example, implementation for gmpy2 was changed recently for CPython 3.9 and for CPython 3.12.
Specification
Layout API
Data needed by GMP-like import-export functions.
- 
struct PyLongLayout
- Layout of an array of “digits” (“limbs” in the GMP terminology), used to
represent absolute value for arbitrary precision integers.Use PyLong_GetNativeLayout()to get the native layout of Pythonintobjects, used internally for integers with “big enough” absolute value.See also sys.int_infowhich exposes similar information to Python.- 
uint8_t bits_per_digit
- Bits per digit. For example, a 15 bit digit means that bits 0-14 contain meaningful information.
 - 
uint8_t digit_size
- Digit size in bytes. For example, a 15 bit digit will require at least 2 bytes.
 - 
int8_t digits_order
- Digits order:- 1for most significant digit first
- -1for least significant digit first
 
 - 
int8_t digit_endianness
- Digit endianness:- 1for most significant byte first (big endian)
- -1for least significant byte first (little endian)
 
 
- 
uint8_t bits_per_digit
- 
const PyLongLayout *PyLong_GetNativeLayout(void)
- Get the native layout of Python intobjects.See the PyLongLayoutstructure.The function must not be called before Python initialization nor after Python finalization. The returned layout is valid until Python is finalized. The layout is the same for all Python sub-interpreters and so it can be cached. 
Export API
- 
struct PyLongExport
- Export of a Python intobject.There are two cases: - If digitsisNULL, only use thevaluemember.
- If digitsis notNULL, usenegative,ndigitsanddigitsmembers.
 - 
int64_t value
- The native integer value of the exported intobject. Only valid ifdigitsisNULL.
 - 
uint8_t negative
- 1 if the number is negative, 0 otherwise.
Only valid if digitsis notNULL.
 - 
Py_ssize_t ndigits
- Number of digits in digitsarray. Only valid ifdigitsis notNULL.
 - 
const void *digits
- Read-only array of unsigned digits. Can be NULL.
 
- If 
If PyLongExport.digits is not NULL, a private field of the
PyLongExport structure stores a strong reference to the Python
int object to make sure that that structure remains valid until
PyLong_FreeExport() is called.
- 
int PyLong_Export(PyObject *obj, PyLongExport *export_long)
- Export a Python intobject.export_long must point to a PyLongExportstructure allocated by the caller. It must not beNULL.On success, fill in *export_long and return 0. On error, set an exception and return -1. PyLong_FreeExport()must be called when the export is no longer needed.CPython implementation detail: This function always succeeds if obj is a Python intobject or a subclass.
On CPython 3.14, no memory copy is needed in PyLong_Export(), it’s just
a thin wrapper to expose Python int internal digits array.
- 
void PyLong_FreeExport(PyLongExport *export_long)
- Release the export export_long created by PyLong_Export().CPython implementation detail: Calling PyLong_FreeExport()is optional if export_long->digits isNULL.
Import API
The PyLongWriter API can be used to import an integer.
- 
struct PyLongWriter
- A Python intwriter instance.The instance must be destroyed by PyLongWriter_Finish()orPyLongWriter_Discard().
- 
PyLongWriter *PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits)
- Create a PyLongWriter.On success, allocate *digits and return a writer. On error, set an exception and return NULL.negative is 1if the number is negative, or0otherwise.ndigits is the number of digits in the digits array. It must be greater than 0. digits must not be NULL. After a successful call to this function, the caller should fill in the array of digits digits and then call PyLongWriter_Finish()to get a Pythonint. The layout of digits is described byPyLong_GetNativeLayout().Digits must be in the range [ 0;(1 << bits_per_digit) - 1] (where thebits_per_digitis the number of bits per digit). Any unused most significant digits must be set to0.Alternately, call PyLongWriter_Discard()to destroy the writer instance without creating anintobject.
On CPython 3.14, the PyLongWriter_Create() implementation is a thin
wrapper to the private _PyLong_New() function.
- 
PyObject *PyLongWriter_Finish(PyLongWriter *writer)
- Finish a PyLongWritercreated byPyLongWriter_Create().On success, return a Python intobject. On error, set an exception and returnNULL.The function takes care of normalizing the digits and converts the object to a compact integer if needed. The writer instance and the digits array are invalid after the call. 
- 
void PyLongWriter_Discard(PyLongWriter *writer)
- Discard a PyLongWritercreated byPyLongWriter_Create().writer must not be NULL.The writer instance and the digits array are invalid after the call. 
Optimize import for small integers
Proposed import API is efficient for large integers. Compared to accessing directly Python internals, the proposed import API can have a significant performance overhead on small integers.
For small integers of a few digits (for example, 1 or 2 digits), existing APIs can be used:
Implementation
Benchmarks
Code:
/* Query parameters of Python’s internal representation of integers. */
const PyLongLayout *layout = PyLong_GetNativeLayout();
size_t int_digit_size = layout->digit_size;
int int_digits_order = layout->digits_order;
size_t int_bits_per_digit = layout->bits_per_digit;
size_t int_nails = int_digit_size*8 - int_bits_per_digit;
int int_endianness = layout->digit_endianness;
Export: PyLong_Export() with gmpy2
Code:
static int
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
    static PyLongExport long_export;
    if (PyLong_Export(obj, &long_export) < 0) {
        return -1;
    }
    if (long_export.digits) {
        mpz_import(z, long_export.ndigits, int_digits_order, int_digit_size,
                   int_endianness, int_nails, long_export.digits);
        if (long_export.negative) {
            mpz_neg(z, z);
        }
        PyLong_FreeExport(&long_export);
    }
    else {
        const int64_t value = long_export.value;
        if (LONG_MIN <= value && value <= LONG_MAX) {
            mpz_set_si(z, value);
        }
        else {
            mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
            if (value < 0) {
                mpz_t tmp;
                mpz_init(tmp);
                mpz_ui_pow_ui(tmp, 2, 64);
                mpz_sub(z, z, tmp);
                mpz_clear(tmp);
            }
        }
    }
    return 0;
}
Reference code: mpz_set_PyLong() in the gmpy2 master for commit 9177648.
Benchmark:
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', mpz, 1 << 7)
runner.bench_func('1<<38', mpz, 1 << 38)
runner.bench_func('1<<300', mpz, 1 << 300)
runner.bench_func('1<<3000', mpz, 1 << 3000)
Results on Linux Fedora 40 with CPU isolation, Python built in release mode:
| Benchmark | ref | pep757 | 
|---|---|---|
| 1<<7 | 91.3 ns | 89.9 ns: 1.02x faster | 
| 1<<38 | 120 ns | 94.9 ns: 1.27x faster | 
| 1<<300 | 196 ns | 203 ns: 1.04x slower | 
| 1<<3000 | 939 ns | 945 ns: 1.01x slower | 
| Geometric mean | (ref) | 1.05x faster | 
Import: PyLongWriter_Create() with gmpy2
Code:
static PyObject *
GMPy_PyLong_From_MPZ(MPZ_Object *obj, CTXT_Object *context)
{
    if (mpz_fits_slong_p(obj->z)) {
        return PyLong_FromLong(mpz_get_si(obj->z));
    }
    size_t size = (mpz_sizeinbase(obj->z, 2) +
                   int_bits_per_digit - 1) / int_bits_per_digit;
    void *digits;
    PyLongWriter *writer = PyLongWriter_Create(mpz_sgn(obj->z) < 0, size,
                                               &digits);
    if (writer == NULL) {
        return NULL;
    }
    mpz_export(digits, NULL, int_digits_order, int_digit_size,
               int_endianness, int_nails, obj->z);
    return PyLongWriter_Finish(writer);
}
Reference code: GMPy_PyLong_From_MPZ() in the gmpy2 master for commit 9177648.
Benchmark:
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', int, mpz(1 << 7))
runner.bench_func('1<<38', int, mpz(1 << 38))
runner.bench_func('1<<300', int, mpz(1 << 300))
runner.bench_func('1<<3000', int, mpz(1 << 3000))
Results on Linux Fedora 40 with CPU isolation, Python built in release mode:
| Benchmark | ref | pep757 | 
|---|---|---|
| 1<<7 | 56.7 ns | 56.2 ns: 1.01x faster | 
| 1<<300 | 191 ns | 213 ns: 1.12x slower | 
| Geometric mean | (ref) | 1.03x slower | 
Benchmark hidden because not significant (2): 1<<38, 1<<3000.
Backwards Compatibility
There is no impact on the backward compatibility, only new APIs are added.
Rejected Ideas
Support arbitrary layout
It would be convenient to support arbitrary layout to import-export Python integers.
For example, it was proposed to add a layout parameter to
PyLongWriter_Create() and a layout member to the
PyLongExport structure.
The problem is that it’s more complex to implement and not really needed. What’s strictly needed is only an API to import-export using the Python “native” layout.
If later there are use cases for arbitrary layouts, new APIs can be added.
Don’t add PyLong_GetNativeLayout() function
Currently, most required information for int import/export is already
available via PyLong_GetInfo() (and sys.int_info).  We also
can add more (like order of digits), this interface doesn’t poses any
constraints on future evolution of the PyLongObject.
The problem is that the PyLong_GetInfo() returns a Python object,
named tuple, not a convenient C structure and that might distract
people from using it in favor e.g. of current semi-private macros like
PyLong_SHIFT and PyLong_BASE.
Provide mpz_import/export-like API instead
The other approach to import/export data from int objects might be
following: expect, that C extensions provide contiguous buffers that CPython
then exports (or imports) the absolute value of an integer.
API example:
struct PyLongLayout {
    uint8_t bits_per_digit;
    uint8_t digit_size;
    int8_t digits_order;
};
size_t PyLong_GetDigitsNeeded(PyLongObject *obj, PyLongLayout layout);
int PyLong_Export(PyLongObject *obj, PyLongLayout layout, void *buffer);
PyLongObject *PyLong_Import(PyLongLayout layout, void *buffer);
This might work for the GMP, as it has mpz_limbs_read() and
mpz_limbs_write() functions, that can provide required access to
internals of mpz_t.  Other libraries may require using temporary
buffers and then mpz_import/export-like functions on their side.
The major drawback of this approach is that it’s much more complex on the
CPython side (i.e. actual conversion between different layouts).  For example,
implementation of the PyLong_FromNativeBytes() and the
PyLong_AsNativeBytes() (together provided restricted version of the
required API) in the CPython took ~500 LOC (c.f. ~100 LOC in the current
implementation).
Drop value field from the export API
With this suggestion, only one export type will exist (array of “digits”).  If
such view is not available for a given integer, it will be either emulated by
export functions or the PyLong_Export() will return an error.  In both
cases, it’s assumed that users will use other C-API functions to get “small
enough” integers (i.e., that fits to some machine integer types), like the
PyLong_AsLongAndOverflow().  The PyLong_Export() will be
inefficient (or just fail) in this case.
An example:
static int
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
    int overflow;
#if SIZEOF_LONG == 8
    long value = PyLong_AsLongAndOverflow(obj, &overflow);
#else
    /* Windows has 32-bit long, so use 64-bit long long instead */
    long long value = PyLong_AsLongLongAndOverflow(obj, &overflow);
#endif
    Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t));
    if (!overflow) {
        if (LONG_MIN <= value && value <= LONG_MAX) {
            mpz_set_si(z, (long)value);
        }
        else {
            mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
            if (value < 0) {
                mpz_t tmp;
                mpz_init(tmp);
                mpz_ui_pow_ui(tmp, 2, 64);
                mpz_sub(z, z, tmp);
                mpz_clear(tmp);
            }
        }
    }
    else {
        static PyLongExport long_export;
        if (PyLong_Export(obj, &long_export) < 0) {
            return -1;
        }
        mpz_import(z, long_export.ndigits, int_digits_order, int_digit_size,
                   int_endianness, int_nails, long_export.digits);
        if (long_export.negative) {
            mpz_neg(z, z);
        }
        PyLong_FreeExport(&long_export);
    }
    return 0;
}
This might look as a simplification from the API designer point of view, but
will be less convenient for end users.  They will have to follow Python
development, benchmark different variants for exporting small integers (is that
obvious why above case was chosen instead of PyLong_AsInt64()?), maybe
support different code paths for various CPython versions or across different
Python implementations.
Discussions
- Discourse: PEP 757 – C API to import-export Python integers
- C API Working Group decision issue #35
- Pull request #121339
- Issue #102471: The C-API for Python to C integer conversion is, to be frank, a mess.
- Add public function PyLong_GetDigits()
- Consider restoring _PyLong_New() function as public
- Pull request gh-106320: Remove private _PyLong_New() function.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0757.rst
Last modified: 2024-12-16 07:23:59 GMT