1
0
mirror of https://sourceware.org/git/glibc.git synced 2025-08-07 06:43:00 +03:00
Commit Graph

743 Commits

Author SHA1 Message Date
Adhemerval Zanella
79bfbc93de powerpc: Remove modf optimization
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).

On power10 with gcc 14.2.1:

reciprocal-throughput        master         patch        difference
workload-0_1                 1.1351        0.9067            20.12%
workload-1_maxint            1.4230        0.9040            36.47%
workload-maxint_maxfloat     1.5038        0.9076            39.65%
workload-integral            1.1280        0.9111            19.23%

latency                      master         patch        difference
workload-0_1                 1.1440        2.7117          -137.03%
workload-1_maxint            4.0556        2.7070            33.25%
workload-maxint_maxfloat     3.2122        2.7164            15.43%
workload-integral            3.2381        2.7281            15.75%

Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
2025-06-25 15:05:30 -03:00
Adhemerval Zanella
5c2b21c478 powerpc: Remove modff optimization
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).

On power10 with gcc 14.2.1:

reciprocal-throughput        master        patch        difference
workload-0_1                 1.5210       1.3942             8.34%
workload-1_maxint            2.0926       1.3940            33.38%
workload-maxint_maxfloat     1.7851       1.3940            21.91%
workload-integral            1.5216       1.3941             8.37%

latency                      master        patch        difference
workload-0_1                 1.5928       2.6337           -65.35%
workload-1_maxint            3.2929       2.6337            20.02%
workload-maxint_maxfloat     1.9697       2.6341           -33.73%
workload-integral            2.0597       2.6337           -27.87%

Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
2025-06-25 15:05:30 -03:00
Andreas Schwab
9b3730a54b powerpc: use .machine power10 in POWER10 assembler sources
They were misattributed as POWER9 sources.
2025-06-23 14:40:34 +02:00
Andreas Schwab
eae5bb0f60 powerpc: Remove assembler workarounds
Now that we require at least binutils 2.39 the support for POWER9 and
POWER10 instructions can be assumed.
2025-06-18 09:29:10 +02:00
Carlos O'Donell
15808c77b3 ppc64le: Revert "powerpc: Optimized strcmp for power10" (CVE-2025-5702)
This reverts commit 3367d8e180

Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)

Tested on ppc64le without regression.
2025-06-16 18:02:58 -04:00
Carlos O'Donell
a7877bb668 ppc64le: Revert "powerpc : Add optimized memchr for POWER10" (Bug 33059)
This reverts commit b9182c793c

Reason for revert: Power10 memchr clobbers v20 vector register
(Bug 33059)

This is not a security issue, unlike CVE-2025-5745 and
CVE-2025-5702.

Tested on ppc64le without regression.
2025-06-16 18:02:58 -04:00
Carlos O'Donell
c22de63588 ppc64le: Revert "powerpc: Fix performance issues of strcmp power10" (CVE-2025-5702)
This reverts commit 90bcc8721e

This change is in the chain of the final revert that fixes the CVE
i.e. 3367d8e180

Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)

Tested on ppc64le with no regressions.
2025-06-16 18:02:58 -04:00
Carlos O'Donell
63c60101ce ppc64le: Revert "powerpc: Optimized strncmp for power10" (CVE-2025-5745)
This reverts commit 23f0d81608

Reason for revert: Power10 strncmp clobbers non-volatile vector
registers (Bug 33060)

Tested on ppc64le with no regressions.
2025-06-16 18:02:58 -04:00
Adhemerval Zanella
39775f00b1 math: Optimize float ilogb/llogb
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.

The code now builds to with gcc-14 on aarch64:

0000000000000000 <__ilogbf>:
   0:   1e260000        fmov    w0, s0
   4:   d3577801        ubfx    x1, x0, #23, #8
   8:   340000e1        cbz     w1, 24 <__ilogbf+0x24>
   c:   5101fc20        sub     w0, w1, #0x7f
  10:   7103fc3f        cmp     w1, #0xff
  14:   54000040        b.eq    1c <__ilogbf+0x1c>  // b.none
  18:   d65f03c0        ret
  1c:   12b00000        mov     w0, #0x7fffffff                 // #2147483647
  20:   14000000        b       0 <__math_invalidf_i>
  24:   53175800        lsl     w0, w0, #9
  28:   340000a0        cbz     w0, 3c <__ilogbf+0x3c>
  2c:   5ac01000        clz     w0, w0
  30:   12800fc1        mov     w1, #0xffffff81                 // #-127
  34:   4b000020        sub     w0, w1, w0
  38:   d65f03c0        ret
  3c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  40:   14000000        b       0 <__math_invalidf_i>

Some ABI requires additional adjustments:

  * i386 and m68k requires to use the template version, since
    both provide __ieee754_ilogb implementatations.

  * loongarch uses a custom implementation as well.

  * powerpc64le also has a custom implementation for POWER9, which
    is also used for float and float128 version.  The generic
    e_ilogb.c implementation is moved on powerpc to keep the
    current code as-is.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella
c4be334400 math: Optimize double ilogb/llogb
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.

The code now builds to with gcc-14 on aarch64:

0000000000000000 <__ilogb>:
   0:   9e660000        fmov    x0, d0
   4:   d374f801        ubfx    x1, x0, #52, #11
   8:   340000e1        cbz     w1, 24 <__ilogb+0x24>
   c:   510ffc20        sub     w0, w1, #0x3ff
  10:   711ffc3f        cmp     w1, #0x7ff
  14:   54000040        b.eq    1c <__ilogb+0x1c>  // b.none
  18:   d65f03c0        ret
  1c:   12b00000        mov     w0, #0x7fffffff                 // #2147483647
  20:   14000000        b       0 <__math_invalid_i>
  24:   d374cc00        lsl     x0, x0, #12
  28:   b40000a0        cbz     x0, 3c <__ilogb+0x3c>
  2c:   dac01000        clz     x0, x0
  30:   12807fc1        mov     w1, #0xfffffc01                 // #-1023
  34:   4b000020        sub     w0, w1, w0
  38:   d65f03c0        ret
  3c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  40:   14000000        b       0 <__math_invalid_i>

Some ABI requires additional adjustments:

  * i386 and m68k requires to use the template version, since
    both provide __ieee754_ilogb implementatations.

  * loongarch uses a custom implementation as well.

  * powerpc64le also has a custom implementation for POWER9, which
    is also used for float and float128 version.  The generic
    e_ilogb.c implementation is moved on powerpc to keep the
    current code as-is.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Stefan Liebler
4b1ffb828c powerpc64le: Remove configure check for objcopy >= 2.26.
Due to raising the minimum binutils version to >= 2.26, the configure
check for testing support of --update-section is not needed anymore.
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
2025-05-14 10:35:55 +02:00
Joseph Myers
ae31254432 Implement C23 compoundn
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1).  The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.

Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).

As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space).  I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.

As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-05-09 15:17:27 +00:00
Adhemerval Zanella
ac4e838289 powerpc: Remove POWER7 strncasecmp optimization
These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.

[1] https://www.gnu.org/software/gnulib/manual/gnulib.html#C-strings

Checked with a build for some powerpc variations.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-05-06 13:31:01 -03:00
Florian Weimer
77e8b40a6e powerpc: Remove relocation cache flush code for power64
This is only needed for -mno-secure-plt, and this linkage mode
is not supported with powerpc64 and powerp64le.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-04-10 06:52:18 +02:00
Florian Weimer
3755ffb665 powerpc64le: Also avoid IFUNC for __mempcpy
Code used during early static startup in elf/dl-tls.c uses
__mempcpy.

Fixes commit cbd9fd2369 ("Consolidate
TLS block allocation for static binaries with ld.so").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-02-05 09:53:11 +01:00
Paul Eggert
2642002380 Update copyright dates with scripts/update-copyrights 2025-01-01 11:22:09 -08:00
Florian Weimer
2b1dba3eb3 elf: Introduce is_rtld_link_map
Unconditionally define it to false for static builds.

This avoids the awkward use of weak_extern for _dl_rtld_map
in checks that cannot be possibly true on static builds.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-20 15:52:57 +01:00
Peter Bergner
aec85b2557 powerpc64: Fix dl-trampoline.S big-endian / non-ROP build failure
Fix a big-endian / non-ROP build failure caused by commit 4d9a4c02 when
building dl-trampoline.S.

Reported-by: Joseph Myers <josmyers@redhat.com>
2024-12-11 23:15:13 +03:00
Peter Bergner
4d9a4c02f9 powerpc64le: ROP changes for the dl-trampoline functions
Add ROP protection for the _dl_runtime_resolve and _dl_profile_resolve
functions.
2024-12-10 23:25:56 -05:00
Sachin Monga
be13e46764 powerpc64le: ROP changes for the *context and setjmp functions
Add ROP protection for the getcontext, setcontext, makecontext, swapcontext
and __sigsetjmp_symbol functions.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-12-09 16:49:54 -05:00
Sachin Monga
2062e02772 powerpc64le: ROP Changes for strncpy/ppc-mount
Add ROP protect instructions to strncpy and ppc-mount functions.
Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional
16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-11-25 10:44:20 -05:00
Sachin Monga
3051f3495c powerpc64le: _init/_fini file changes for ROP
The ROP instructions were added in ISA 3.1 (ie, Power10), however they
were defined so that if executed on older cpus, they would behave as
nops.  This allows us to emit them on older cpus and they'd just be
ignored, but if run on a Power10, then the binary would be ROP protected.

Hash instructions use negative offsets so the default position
of ROP pointer is FRAME_ROP_SAVE from caller's SP.

Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve
additional 16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-11-20 16:50:34 -05:00
Mahesh Bodapati
3ef7e42861 powerpc64le: Optimized strcat for POWER10
This patch adds an optimized strcat which makes use of the default
strcat function which calls the Power10 strcpy and strlen routines.
2024-11-19 15:59:15 -05:00
Sachin Monga
f144dae4a1 powerpc64le: Adhere to ABI stack alignment requirement
The ABI requires all stack frames be 16-byte aligned.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-10-28 16:12:34 -05:00
Carlos O'Donell
cae9944a6c Fix whitespace related license issues.
Several copies of the licenses in files contained whitespace related
problems.  Two cases are addressed here, the first is two spaces
after a period which appears between "PURPOSE." and "See". The other
is a space after the last forward slash in the URL. Both issues are
corrected and the licenses now match the official textual description
of the license (and the other license in the sources).

Since these whitespaces changes do not alter the paragraph structure of
the license, nor create new sentences, they do not change the license.
2024-10-07 18:08:16 -04:00
Florian Weimer
cc3e743fc0 powerpc64le: Build new strtod tests with long double ABI flags (bug 32145)
This fixes several test failures:

=====FAIL: stdlib/tst-strtod1i.out=====
Locale tests
all OK
Locale tests
all OK
Locale tests
strtold("1,5") returns -6,38643e+367 and not 1,5
strtold("1.5") returns 1,5 and not 1
strtold("1.500") returns 1 and not 1500
strtold("36.893.488.147.419.103.232") returns 1500 and not 3,68935e+19
Locale tests
all OK

=====FAIL: stdlib/tst-strtod3.out=====
0: got wrong results -2.5937e+4826, expected 0

=====FAIL: stdlib/tst-strtod4.out=====
0: got wrong results -6,38643e+367, expected 0
1: got wrong results 0, expected 1e+06
2: got wrong results 1e+06, expected 10

=====FAIL: stdlib/tst-strtod5i.out=====
0: got wrong results -6,38643e+367, expected 0
2: got wrong results 0, expected -0
4: got wrong results -0, expected 0
5: got wrong results 0, expected -0
6: got wrong results -0, expected 0
7: got wrong results 0, expected -0
8: got wrong results -0, expected 0
9: got wrong results 0, expected -0
10: got wrong results -0, expected 0
11: got wrong results 0, expected -0
12: got wrong results -0, expected 0
13: got wrong results 0, expected -0
14: got wrong results -0, expected 0
15: got wrong results 0, expected -0
16: got wrong results -0, expected 0
17: got wrong results 0, expected -0
18: got wrong results -0, expected 0
20: got wrong results 0, expected -0
22: got wrong results -0, expected 0
23: got wrong results 0, expected -0
24: got wrong results -0, expected 0
25: got wrong results 0, expected -0
26: got wrong results -0, expected 0
27: got wrong results 0, expected -0

Fixes commit 3fc063dee0
("Make __strtod_internal tests type-generic").

Suggested-by: Joseph Myers <josmyers@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-09-05 22:02:23 +02:00
Jeevitha Palanisamy
29f0db6a2e powerpc64: Fix syscall_cancel build for powerpc64le-linux-gnu [BZ #32125]
In __syscall_cancel_arch, there's a tail call to __syscall_do_cancel.
On P10, since the caller uses the TOC and the callee is using
PC-relative addressing, there's only a branch instruction with no NOPs
to restore the TOC, which causes the build error. The fix involves adding
the NOTOC directive to the branch instruction, informing the linker
not to generate a TOC stub, thus resolving the issue.
2024-08-30 08:50:47 -05:00
Mahesh Bodapati
82b5340ebd powerpc64: Optimize strcpy and stpcpy for Power9/10
This patch modifies the current Power9 implementation of strcpy and
stpcpy to optimize it for Power9 and Power10.

No new Power10 instructions are used, so the original Power9 strcpy
is modified instead of creating a new implementation for Power10.

The changes also affect stpcpy, which uses the same implementation
with some additional code before returning.

Improvements compared to the old Power9 version:

Use simple comparisons for the first ~512 bytes:
  The main loop is good for long strings, but comparing 16B each time is
  better for shorter strings. After aligning the address to 16 bytes, we
  unroll the loop four times, checking 128 bytes each time. There may be
  some overlap with the main loop for unaligned strings, but it is better
  for shorter strings.

Loop with 64 bytes for longer bytes:
  Use 4 consecutive lxv/stxv instructions.

Showed an average improvement of 13%.

Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-08-23 16:48:32 -05:00
Adhemerval Zanella
89b53077d2 nptl: Fix Race conditions in pthread cancellation [BZ#12683]
The current racy approach is to enable asynchronous cancellation
before making the syscall and restore the previous cancellation
type once the syscall returns, and check if cancellation has happen
during the cancellation entrypoint.

As described in BZ#12683, this approach shows 2 problems:

  1. Cancellation can act after the syscall has returned from the
     kernel, but before userspace saves the return value.  It might
     result in a resource leak if the syscall allocated a resource or a
     side effect (partial read/write), and there is no way to program
     handle it with cancellation handlers.

  2. If a signal is handled while the thread is blocked at a cancellable
     syscall, the entire signal handler runs with asynchronous
     cancellation enabled.  This can lead to issues if the signal
     handler call functions which are async-signal-safe but not
     async-cancel-safe.

For the cancellation to work correctly, there are 5 points at which the
cancellation signal could arrive:

	[ ... )[ ... )[ syscall ]( ...
	   1      2        3    4   5

  1. Before initial testcancel, e.g. [*... testcancel)
  2. Between testcancel and syscall start, e.g. [testcancel...syscall start)
  3. While syscall is blocked and no side effects have yet taken
     place, e.g. [ syscall ]
  4. Same as 3 but with side-effects having occurred (e.g. a partial
     read or write).
  5. After syscall end e.g. (syscall end...*]

And libc wants to act on cancellation in cases 1, 2, and 3 but not
in cases 4 or 5.  For the 4 and 5 cases, the cancellation will eventually
happen in the next cancellable entrypoint without any further external
event.

The proposed solution for each case is:

  1. Do a conditional branch based on whether the thread has received
     a cancellation request;

  2. It can be caught by the signal handler determining that the saved
     program counter (from the ucontext_t) is in some address range
     beginning just before the "testcancel" and ending with the
     syscall instruction.

  3. SIGCANCEL can be caught by the signal handler and determine that
     the saved program counter (from the ucontext_t) is in the address
     range beginning just before "testcancel" and ending with the first
     uninterruptable (via a signal) syscall instruction that enters the
      kernel.

  4. In this case, except for certain syscalls that ALWAYS fail with
     EINTR even for non-interrupting signals, the kernel will reset
     the program counter to point at the syscall instruction during
     signal handling, so that the syscall is restarted when the signal
     handler returns.  So, from the signal handler's standpoint, this
     looks the same as case 2, and thus it's taken care of.

  5. For syscalls with side-effects, the kernel cannot restart the
     syscall; when it's interrupted by a signal, the kernel must cause
     the syscall to return with whatever partial result is obtained
     (e.g. partial read or write).

  6. The saved program counter points just after the syscall
     instruction, so the signal handler won't act on cancellation.
     This is similar to 4. since the program counter is past the syscall
     instruction.

So The proposed fixes are:

  1. Remove the enable_asynccancel/disable_asynccancel function usage in
     cancellable syscall definition and instead make them call a common
     symbol that will check if cancellation is enabled (__syscall_cancel
     at nptl/cancellation.c), call the arch-specific cancellable
     entry-point (__syscall_cancel_arch), and cancel the thread when
     required.

  2. Provide an arch-specific generic system call wrapper function
     that contains global markers.  These markers will be used in
     SIGCANCEL signal handler to check if the interruption has been
     called in a valid syscall and if the syscalls has side-effects.

     A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c
     is provided.  However, the markers may not be set on correct
     expected places depending on how INTERNAL_SYSCALL_NCS is
     implemented by the architecture.  It is expected that all
     architectures add an arch-specific implementation.

  3. Rewrite SIGCANCEL asynchronous handler to check for both canceling
     type and if current IP from signal handler falls between the global
     markers and act accordingly.

  4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
     use the appropriate cancelable syscalls.

  5. Adjust 'lowlevellock-futex.h' arch-specific implementations to
     provide cancelable futex calls.

Some architectures require specific support on syscall handling:

  * On i386 the syscall cancel bridge needs to use the old int80
    instruction because the optimized vDSO symbol the resulting PC value
    for an interrupted syscall points to an address outside the expected
    markers in __syscall_cancel_arch.  It has been discussed in LKML [1]
    on how kernel could help userland to accomplish it, but afaik
    discussion has stalled.

    Also, sysenter should not be used directly by libc since its calling
    convention is set by the kernel depending of the underlying x86 chip
    (check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60).

  * mips o32 is the only kABI that requires 7 argument syscall, and to
    avoid add a requirement on all architectures to support it, mips
    support is added with extra internal defines.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu,
powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and
x86_64-linux-gnu.

[1] https://lkml.org/lkml/2016/3/8/1105
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-08-23 14:27:43 -03:00
Andreas K. Hüttel
98ffc1bfeb Convert to autoconf 2.72 (vanilla release, no distribution patches)
As discussed at the patch review meeting

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
2024-06-17 21:15:28 +02:00
Joseph Myers
7ec903e028 Implement C23 exp2m1, exp10m1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).

As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future.  Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.

exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode.  Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.

The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.

The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 16:31:49 +00:00
Joseph Myers
bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Adhemerval Zanella
5fededd825 powerpc: Remove duplicate strchrnul and strncasecmp_l libc.a (BZ 31786)
For powerpc64 the generic version provides a weak definition of
strchrnul, which are already provided by the ifunc resolver.  The
powerpc32 version is slight different, where for static case there
is no iFUNC support.

The strncasecmp_l is provided ifunc resolver.

Checked on powerpc-linux-gnu-power4 and powerpc64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Manjunath Matti
a81cdde1cb powerpc64: Fix by using the configure value $libc_cv_cc_submachine [BZ #31629]
This patch ensures that $libc_cv_cc_submachine, which is set from
"--with-cpu", overrides $CFLAGS for configure time tests.

Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-05-16 17:31:45 -05:00
Amrita H S
23f0d81608 powerpc: Optimized strncmp for power10
This patch is based on __strcmp_power10.

Improvements from __strncmp_power9:

    1. Uses new POWER10 instructions
       - This code uses lxvp to decrease contention on load
	 by loading 32 bytes per instruction.

    2. Performance implication
       - This version has around 38% better performance on average.
       - Minor performance regression is seen for few small sizes
	 and specific combination of alignments.

Signed-off-by: Amrita H S <amritahs@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-05-06 09:01:29 -05:00
Florian Weimer
9abdae94c7 login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS.  This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
14e56bd4ce powerpc: Fix ld.so address determination for PCREL mode (bug 31640)
This seems to have stopped working with some GCC 14 versions,
which clobber r2.  With other compilers, the kernel-provided
r2 value is still available at this point.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-04-14 08:24:51 +02:00
Amrita H S
1ea0511456 powerpc: Placeholder and infrastructure/build support to add Power11 related changes.
The following three changes have been added to provide initial Power11 support.
    1. Add the directories to hold Power11 files.
    2. Add support to select Power11 libraries based on AT_PLATFORM.
    3. Let submachine=power11 be set automatically.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-03-19 21:11:34 -05:00
Adhemerval Zanella
4a76fb1da8 powerpc: Remove power8 strcasestr optimization
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation.  The geomean
on bench-strcasestr shows:

            __strcasestr_power8  __strcasestr_ppc
  power10                  1159              1120
  power9                   1640              1469
  power8                   1787              1904

The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.

Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-03-12 17:11:01 -03:00
Adhemerval Zanella
1e9a550ba4 powerpc: Remove power7 strstr optimization
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).

Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).

Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-02-23 08:50:00 -03:00
Joseph Myers
83d8d289b2 Rename c2x / gnu2x tests to c23 / gnu23
Complete the internal renaming from "C2X" and related names in GCC by
renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23.

Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
2024-02-01 17:55:57 +00:00
Adhemerval Zanella Netto
ae4b8d6a0e string: Use builtins for ffs and ffsll
It allows to remove a lot of arch-specific implementations.

Checked on x86_64, aarch64, powerpc64.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-02-01 09:31:33 -03:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Amrita H S
90bcc8721e powerpc: Fix performance issues of strcmp power10
Current implementation of strcmp for power10 has
performance regression for multiple small sizes
and alignment combination.

Most of these performance issues are fixed by this
patch. The compare loop is unrolled and page crosses
of unrolled loop is handled.

Thanks to Paul E. Murphy for helping in fixing the
performance issues.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2023-12-15 16:42:40 -06:00
MAHESH BODAPATI
b9182c793c powerpc : Add optimized memchr for POWER10
Optimized memchr for POWER10 based on existing rawmemchr and strlen.
Reordering instructions and loop unrolling helped in getting better performance.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2023-12-14 14:40:14 -06:00
Amrita H S
3367d8e180 powerpc: Optimized strcmp for power10
This patch is based on __strcmp_power9 and __strlen_power10.

Improvements from __strcmp_power9:

    1. Uses new POWER10 instructions
       - This code uses lxvp to decrease contention on load
         by loading 32 bytes per instruction.

    2. Performance implication
       - This version has around 30% better performance on average.
       - Performance regression is seen for a specific combination
         of sizes and alignments. Some of them is observed without
         changes also, while rest may be induced by the patch.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2023-12-07 11:10:40 -06:00
Adhemerval Zanella
55f41ef8de elf: Remove LD_PROFILE for static binaries
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects.  Since dlopen is deprecated for
static objects, it is better to remove the support.

It also allows to trim down libc.a of profile support.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-11-21 16:15:42 -03:00
Mahesh Bodapati
21841f0d56 PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES
This patch enables the option to influence hwcaps used by PowerPC.
The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz....,
can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx
and zzz, where the feature name is case-sensitive and has to match the ones
mentioned in the file{sysdeps/powerpc/dl-procinfo.c}.

Note that the hwcap tunables only used in the IFUNC selection.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-01 07:41:17 -05:00
Adhemerval Zanella Netto
648c3b574d powerpc: Fix powerpc64 strchrnul build with old gcc
The compiler might not see that internal definition is an alias
due the libc_ifunc macro, which redefines __strchrnul.  With
gcc 6 it fails with:

In file included from <command-line>:0:0:
./../include/libc-symbols.h:472:33: error: ‘__EI___strchrnul’ aliased to
undefined symbol ‘__GI___strchrnul’
   extern thread __typeof (name) __EI_##name \
                                 ^
./../include/libc-symbols.h:468:3: note: in expansion of macro
‘__hidden_ver2’
   __hidden_ver2 (, local, internal, name)
   ^~~~~~~~~~~~~
./../include/libc-symbols.h:476:29: note: in expansion of macro
‘__hidden_ver1’
 #  define hidden_def(name)  __hidden_ver1(__GI_##name, name, name);
                             ^~~~~~~~~~~~~
./../include/libc-symbols.h:557:32: note: in expansion of macro
‘hidden_def’
 # define libc_hidden_def(name) hidden_def (name)
                                ^~~~~~~~~~
../sysdeps/powerpc/powerpc64/multiarch/strchrnul.c:38:1: note: in
expansion of macro ‘libc_hidden_def’
 libc_hidden_def (__strchrnul)
 ^~~~~~~~~~~~~~~

Use libc_ifunc_hidden as stpcpy.  Checked on powerpc64 with
gcc 6 and gcc 13.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-26 09:45:22 -03:00
Siddhesh Poyarekar
c6cb8783b5 configure: Use autoconf 2.71
Bump autoconf requirement to 2.71 to allow regenerating configure on
more recent distributions.  autoconf 2.71 has been in Fedora since F36
and is the current version in Debian stable (bookworm).  It appears to
be current in Gentoo as well.

All sysdeps configure and preconfigure scripts have also been
regenerated; all changes are trivial transformations that do not affect
functionality.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-17 10:08:10 -04:00