AFAIK ptmalloc (on which glibc is based) was created decades ago and both multi-threaded application and multi CPU systems where rare back then (at least in the Linux world) so multi-threaded performance didn't matter. Some improvement in glibc were made since then but I don't think it's possible to significantly improve glibc malloc without rewriting it more or less fully. At that point it would make more sense to import some existing malloc implementation.
And we are speaking about trafeoff the default number of arenas in glibc malloc is 8 times CPU which is a terrible tradeoff - on many workloads it cause heap fragmentation and memory usage (RSS) many times higher than allocated memory size, that's why it is common to find advice to set MALLOC_ARENA_MAX to 1 or 2. But probably such high number of areans allows glib to look less bad on synthetic benchmakrs.
Jemalloc, tcmalloc, mimalloc all were created with focus on multi-threaded applications from the beginning and while they don't work better than glibc malloc for single threaded application they don't work worse for this use case either. Probably the main disadvantage of using je/tc/mi mallocs for a single threaded app is large code size.