mirror of
https://sourceware.org/git/glibc.git
synced 2025-11-06 19:29:35 +03:00
The benchtests/inet_ntop_ipv4 and benchtests/inet_ntop_ipv6 profile
shows that most of time is spent in costly sprint operations:
$ perf record ./benchtests/bench-inet_ntop_ipv4 && perf report --stdio
[...]
38.53% bench-inet_ntop libc.so [.] __printf_buffer
18.69% bench-inet_ntop libc.so [.] __printf_buffer_write
11.01% bench-inet_ntop libc.so [.] _itoa_word
8.02% bench-inet_ntop bench-inet_ntop_ipv4 [.] bench_start
6.99% bench-inet_ntop libc.so [.] __memmove_avx_unaligned_erms
3.86% bench-inet_ntop libc.so [.] __strchrnul_avx2
2.82% bench-inet_ntop libc.so [.] __strcpy_avx2
1.90% bench-inet_ntop libc.so [.] inet_ntop4
1.78% bench-inet_ntop libc.so [.] __vsprintf_internal
1.55% bench-inet_ntop libc.so [.] __sprintf_chk
1.18% bench-inet_ntop libc.so [.] __GI___inet_ntop
$ perf record ./benchtests/bench-inet_ntop_ipv6 && perf report --stdio
35.44% bench-inet_ntop libc.so [.] __printf_buffer
14.35% bench-inet_ntop libc.so [.] __printf_buffer_write
10.27% bench-inet_ntop libc.so [.] __GI___inet_ntop
7.93% bench-inet_ntop libc.so [.] _itoa_word
7.00% bench-inet_ntop libc.so [.] __sprintf_chk
6.20% bench-inet_ntop libc.so [.] __vsprintf_internal
5.26% bench-inet_ntop libc.so [.] __strchrnul_avx2
5.05% bench-inet_ntop bench-inet_ntop_ipv6 [.] bench_start
3.70% bench-inet_ntop libc.so [.] __memmove_avx_unaligned_erms
2.11% bench-inet_ntop libc.so [.] __printf_buffer_done
A new implementation is used instead:
* The printf usage is replaced with an expanded function that prints
either an IPv4 octet or an IPv6 quartet;
* The strcpy is replaced with a memcpy (since ABIs usually tends to
optimize the latter);
* For IPv6, the '::' shorthanding is done in-place instead of using
a temporary buffer.
* An temporary buffer is used iff the size if larger than
INET_ADDRSTRLEN/INET6_ADDRSTRLEN.
* Inline is used for both inet_ntop4 and inet_ntop6,
The code is significand rewrote, so I take this requires a new license.
The performance results on aarch64 Neoverse1 with gcc 14.2.1:
* master
aarch64-linux-gnu-master$ ./benchtests/bench-inet_ntop_ipv4
"inet_ntop_ipv4": {
"workload-ipv4-random": {
"duration": 1.43067e+09,
"iterations": 8e+06,
"reciprocal-throughput": 178.572,
"latency": 179.096,
"max-throughput": 5.59997e+06,
"min-throughput": 5.58359e+06
}
aarch64-linux-gnu-master$ ./benchtests/bench-inet_ntop_ipv6
"inet_ntop_ipv6": {
"workload-ipv6-random": {
"duration": 1.68539e+09,
"iterations": 4e+06,
"reciprocal-throughput": 421.307,
"latency": 421.388,
"max-throughput": 2.37357e+06,
"min-throughput": 2.37311e+06
}
}
* patched
aarch64-linux-gnu$ ./benchtests/bench-inet_ntop_ipv4
"inet_ntop_ipv4": {
"workload-ipv4-random": {
"duration": 1.06133e+09,
"iterations": 5.6e+07,
"reciprocal-throughput": 18.8482,
"latency": 19.0565,
"max-throughput": 5.30555e+07,
"min-throughput": 5.24755e+07
}
}
aarch64-linux-gnu$ ./benchtests/bench-inet_ntop_ipv6
"inet_ntop_ipv6": {
"workload-ipv6-random": {
"duration": 1.01246e+09,
"iterations": 2.4e+07,
"reciprocal-throughput": 42.5576,
"latency": 41.8139,
"max-throughput": 2.34976e+07,
"min-throughput": 2.39155e+07
}
}
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
4.9 KiB
4.9 KiB