Skip to content

gh-101178: Add Ascii85, Base85, and Z85 support to binascii#102753

Merged
serhiy-storchaka merged 45 commits intopython:mainfrom
kangtastic:gh-101178-rework-base85
Feb 6, 2026
Merged

gh-101178: Add Ascii85, Base85, and Z85 support to binascii#102753
serhiy-storchaka merged 45 commits intopython:mainfrom
kangtastic:gh-101178-rework-base85

Conversation

@kangtastic
Copy link
Copy Markdown
Contributor

@kangtastic kangtastic commented Mar 16, 2023

Synopsis

Add Ascii85, Base85, and Z85 encoder and decoder functions implemented in C to binascii and use them to greatly improve the performance and reduce the memory usage of the existing Ascii85, Base85, and Z85 codec functions in base64.

No API or documentation changes are necessary with respect to any functions in base64, and all existing unit tests for those functions continue to pass without modification.

Resolves: gh-101178

Discussion

The base85-related functions in base64 are now wrappers for the new functions in binascii, as envisioned in the docs:

The binascii module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules like uu or base64 instead. The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.

Parting out Ascii85 from Base85 and Z85 was warranted in my testing despite the code duplication due to the various performance-murdering special cases in Ascii85.

Comments and questions are welcome.

Benchmarks

Updated December 28, 2025.

# bench_b85.py

# Note: EXTREMELY SLOW on unmodified mainline CPython
#       when tracing malloc on the base-85 functions.

import base64
import sys
import timeit
import tracemalloc

funcs = [(base64.b64encode, base64.b64decode),  # sanity check/comparison
         (base64.a85encode, base64.a85decode),
         (base64.b85encode, base64.b85decode),
         (base64.z85encode, base64.z85decode)]

def mb(n):
    return f"{n / 1024 / 1024:.3f} MB"

def stats(func, data, t, m):
    name, n, bps = func.__qualname__, len(data), len(data) / t
    print(f"{name} : {n} b in {t:.3f} s ({mb(bps)}/s) using {mb(m)}")

if __name__ == "__main__":
    data = b"a" * int(sys.argv[1]) * 1024 * 1024
    for fenc, fdec in funcs:
        tracemalloc.start()
        enc = fenc(data)
        menc = tracemalloc.get_traced_memory()[1] - len(enc)
        tracemalloc.stop()
        tenc = timeit.timeit("fenc(data)", number=1, globals=globals())
        stats(fenc, data, tenc, menc)

        tracemalloc.start()
        dec = fenc(enc)
        mdec = tracemalloc.get_traced_memory()[1] - len(dec)
        tracemalloc.stop()
        tdec = timeit.timeit("fdec(enc)", number=1, globals=globals())
        stats(fdec, enc, tdec, mdec)
# Python 3.15.0a3+ (heads/main:0efbad60e13, Dec 28 2025, 11:02:16)
# ./configure --enable-optimizations --with-lto

# Unmodified
$ time ./python bench_b85.py 64
b64encode : 67108864 b in 0.092 s (693.266 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.234 s (364.961 MB/s) using 56.889 MB
a85encode : 67108864 b in 7.163 s (8.935 MB/s) using 2664.401 MB
a85decode : 83886080 b in 14.478 s (5.526 MB/s) using 3332.254 MB
b85encode : 67108864 b in 6.965 s (9.189 MB/s) using 2664.401 MB
b85decode : 83886080 b in 10.082 s (7.935 MB/s) using 3332.254 MB
z85encode : 67108864 b in 7.245 s (8.834 MB/s) using 2664.102 MB
z85decode : 83886080 b in 9.666 s (8.277 MB/s) using 3332.254 MB

real    9m44.382s
user    9m27.271s
sys     0m12.747s


# With this PR
b64encode : 67108864 b in 0.085 s (753.375 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.230 s (371.282 MB/s) using 56.889 MB
a85encode : 67108864 b in 0.094 s (681.709 MB/s) using 0.000 MB
a85decode : 83886080 b in 0.191 s (418.019 MB/s) using 0.000 MB
b85encode : 67108864 b in 0.075 s (850.118 MB/s) using 0.000 MB
b85decode : 83886080 b in 0.141 s (567.490 MB/s) using 0.000 MB
z85encode : 67108864 b in 0.074 s (864.559 MB/s) using 0.000 MB
z85decode : 83886080 b in 0.173 s (462.854 MB/s) using 0.000 MB

real    0m1.865s
user    0m1.726s
sys     0m0.126s

The old pure-Python implementation is two orders of magnitude slower and uses over O(40n) temporary memory.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stdlib Standard Library Python modules in the Lib/ directory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

base64.b85encode uses significant amount of RAM

8 participants