The sched_getparam call is also replaced with a INTERNAL_SYSCALL_CALL
to avoid clobbering errno.
Also return EINVAL if the thread is already terminated at the time
of the call.
Checked on x86_64-linux-gnu.
Also return EINVAL if the thread is already terminated at the time
of the call.
The sched_getparam call is also replaced with a INTERNAL_SYSCALL_CALL
to avoid clobbering errno.
Checked on x86_64-linux-gnu.
Also return EINVAL if the thread is already terminated at the time of
the call. This is slight better than returning the calling thread
affinity (current behaviour), since the thread lifetime is defined.
Checked on x86_64-linux-gnu.
The use after free described in BZ#19951 is due the use of two different
PD fields, 'joinid' and 'cancelhandling', to describe the thread state
and to synchronize the calls of pthread_join, pthread_detach,
pthread_exit, and normal thread exit.
Any state change potentially requires to check for both field
atomically to handle partial state (such as pthread_join() with a
cancellation handler to issue a 'joinstate' field rollback).
This patch uses a different PD member with 4 possible states (JOINABLE,
DETACHED, EXITING, and EXITED) instead of pthread 'tid' field, with
the following logic:
1. On pthread_create the inital state is set either to JOINABLE or
DETACHED depending of the pthread attribute used.
2. On pthread_detach, a CAS is issued on the state. If the CAS
fails it means that thread is already detached (DETACHED) or is
being terminated (EXITING). For former an EINVAL is returned,
while for latter pthread_detach should be reponsible to join the
thread (and deallocate any internal resource).
3. In the exit phase of the wrapper function for the thread start
routine (reached either if the thread function has returned,
pthread_exit has being called, or cancellation handled has been
acted upon) we issue a CAS on state to set to EXITING mode. If the
thread is previously on DETACHED mode the thread itself is
responsible for arranging the deallocation of any resource,
otherwise the thread needs to be joined (detached threads cannot
immediately deallocate themselves).
4. The clear_tid_field on 'clone' call is changed to set the new
'state' field on thread exit (EXITED). This state is only
reached at thread termination.
5. The pthread_join implementation is now simpler: the futex wait
is done directly on thread state and there is no need to reset it
in case of timeout since the state is now set either by
pthread_detach() or by the kernel on process termination.
The race condition on pthread_detach is avoided with only one atomic
operation on PD state: once the mode is set to THREAD_STATE_DETACHED
it is up to thread itself to deallocate its memory (done on the exit
phase at pthread_create()).
Also, the INVALID_NOT_TERMINATED_TD_P is removed since a a negative
tid is not possible and the macro is not used anywhere.
This change trigger an invalid C11 thread tests: it crates a thread,
which detaches itself, and after a timeout the creating thread checks
if the join fails. The issue is once thrd_join() is called the thread
lifetime is not defined.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
arm-linux-gnueabihf, and powerpc64-linux-gnu.
The changes in commit a93d9e03a3
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool. It brings its
own definition of _r_debug (rather than a declaration).
Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC. To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play. Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.
Only perform the copy relocation initialization if libc has been
loaded. Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
It combines updating r_state with the debugger notification.
The second change to _dl_open introduces an additional debugger
notification for dlmopen, but debuggers are expected to ignore it.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
It replaces the ns_debug member of the namespaces. Previously,
the base namespace had an unused ns_debug member.
This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed. (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The iovec size should account for all substrings between each conversion
specification. For the format:
"abc %s efg"
The list of substrings are:
["abc ", arg, " efg]
which is 2 times the number of maximum arguments *plus* one.
This issue triggered 'out of bounds' errors by stdlib/tst-bz20544 when
glibc is built with experimental UBSAN support [1].
Besides adjusting the iovec size, a new runtime and check is added to
avoid wrong __libc_message_impl usage.
Checked on x86_64-linux-gnu.
[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/ubsan-undef
Co-authored-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
During early startup memcpy or memset must not be called since many targets
use ifuncs for them which won't be initialized yet. Security hardening may
use -ftrivial-auto-var-init=zero which inserts calls to memset. Redirect
memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c.
This fixes BZ #33112.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.1351 0.9067 20.12%
workload-1_maxint 1.4230 0.9040 36.47%
workload-maxint_maxfloat 1.5038 0.9076 39.65%
workload-integral 1.1280 0.9111 19.23%
latency master patch difference
workload-0_1 1.1440 2.7117 -137.03%
workload-1_maxint 4.0556 2.7070 33.25%
workload-maxint_maxfloat 3.2122 2.7164 15.43%
workload-integral 3.2381 2.7281 15.75%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.5210 1.3942 8.34%
workload-1_maxint 2.0926 1.3940 33.38%
workload-maxint_maxfloat 1.7851 1.3940 21.91%
workload-integral 1.5216 1.3941 8.37%
latency master patch difference
workload-0_1 1.5928 2.6337 -65.35%
workload-1_maxint 3.2929 2.6337 20.02%
workload-maxint_maxfloat 1.9697 2.6341 -33.73%
workload-integral 2.0597 2.6337 -27.87%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
Make '__close_nocancel_nostatus' standalone. This is a generic version
analogous to '__close_nocancel'. Platforms may choose to implement an
inline variant instead where the syscall invocation code sequence is
short enough to be beneficial over a function call.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Fix fallout from commit c181840c93 ("Consolidate non cancellable close
call") that caused '__close_nocancel_nostatus' to clobber 'errno' on a
close(2) failure, a 2.27 regression.
The problem came from a rewrite from 'close_not_cancel_no_status' to
'__close_nocancel_nostatus' switching from an inline implementation that
used INTERNAL_SYSCALL macro (which stays away from 'errno') to a call to
'__close_nocancel' function that uses INLINE_SYSCALL_CALL macro (which
does poke at 'errno').
Implement '__close_nocancel_nostatus' in terms of INTERNAL_SYSCALL_CALL
then, which leaves 'errno' intact.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
When building on GNU/Hurd the following warnings repeat themselves:
../Rules:400: target '/home/collin/obj/glibc/io/test-lfs.out' given more than once in the same rule
../Rules:400: target '/home/collin/obj/glibc/io/test-lfs.out' given more than once in the same rule
This is because commit 73b854e955 (hurd: Mark more memory-hungry tests
as unsupported, 2025-01-12) added it to 'tests-unsupported' even though
it was already added by decf02d382 (hurd: Mark two tests as unsupported,
2023-04-13).
Message-ID: <54dc6bf7e0dbedb1b19356f41fec843c1c523b11.1750130025.git.collin.funk1@gmail.com>
When building on GNU/Hurd warnings like the following occur:
../sysdeps/x86_64/multiarch/strnlen-evex-base.S:53:10: warning: "P2ALIGN" redefined
53 | # define P2ALIGN(...) .p2align 4,, 6
| ^~~~~~~
In file included from /usr/include/x86_64-gnu/mach/x86_64/syscall_sw.h:30,
from ../sysdeps/mach/sysdep.h:21,
from ../sysdeps/mach/x86/sysdep.h:31,
from ../sysdeps/x86_64/multiarch/strnlen-evex-base.S:24:
/usr/include/x86_64-gnu/mach/x86_64/asm.h:78:9: note: this is the location of the previous definition
78 | #define P2ALIGN(p2) .p2align p2 /* gas-specific */
| ^~~~~~~
The fix is to undefine the macro from system headers in sysdep.h so that
it can be properly defined in assembly files where its definition
depends on whether string functions are being compiled for
wide-characters or not.
Message-ID: <721cd3a1bae1a553857db1dd69761a175f611364.1750131904.git.collin.funk1@gmail.com>
Update tst-gnu2-tls2 tests to set XMM0...XMM7 to all 1s in malloc to
verify that XMM registers are preserved when _dl_tlsdesc_dynamic is
called by clearing vectors with zeroed XMM registers before
_dl_tlsdesc_dynamic and using these XMM registers to clear vectors
after _dl_tlsdesc_dynamic. This improves the BZ #31372 test.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Compiler generates the following instruction sequence for dynamic TLS
access:
leal tls_var@tlsgd(,%ebx,1), %eax
call ___tls_get_addr@PLT
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But
___tls_get_addr is a normal function which doesn't preserve any vector
registers.
1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal.
2. Change ___tls_get_addr to a wrapper function with implementations for
FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers.
3. dl-tlsdesc-dynamic.h has:
_dl_tlsdesc_dynamic:
/* Like all TLS resolvers, preserve call-clobbered registers.
We need two scratch regs anyway. */
subl $32, %esp
cfi_adjust_cfa_offset (32)
It is wrong to use
movl %ebx, -28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl -28(%esp), %ebx
to preserve EBX on stack. Fix it with:
movl %ebx, 28(%esp)
movl %esp, %ebx
cfi_def_cfa_register(%ebx)
...
mov %ebx, %esp
cfi_def_cfa_register(%esp)
movl 28(%esp), %ebx
4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly.
5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with
traditional TLS variant to verify the fix.
6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h.
This fixes BZ #32996.
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Refactor the generic implementation to use math_config.h definitions,
and add an alternative one if the ABI supports truncf instructions
(gated through math-use-builtins-trunc.h).
The generic implementation generates similar code on x86_64, while
the optimization one for aarch64 (where truncf is supported as a
builtin by through frintz), the improvements are:
reciprocal-throughput master patch difference
workload-0_1 3.0595 3.0698 -0.34%
workload-1_maxint 5.1747 3.0542 40.98%
workload-maxint_maxfloat 3.4391 3.0349 11.75%
workload-integral 3.2732 3.0293 7.45%
latency master patch difference
workload-0_1 3.5267 4.7107 -33.57%
workload-1_maxint 6.9074 4.7282 31.55%
workload-maxint_maxfloat 3.7210 4.7506 -27.67%
workload-integral 3.8634 4.8137 -24.60%
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Refactor the generic implementation to use math_config.h definitions,
and add an alternative one if the ABI supports truncf instructions
(gated through math-use-builtins-trunc.h).
The generic implementation generates similar code for x86_64, while
the optimization path aarch64 (where truncf is supported as a builtin)
through frintz), the improvements are:
reciprocal-throughput master patch difference
workload-0_1 3.0740 3.0326 1.35%
workload-1_maxint 5.2231 3.0436 41.73%
workload-maxint_maxfloat 4.0962 3.0551 25.42%
workload-integral 3.7093 3.0612 17.47%
latency master patch difference
workload-0_1 3.5521 4.7313 -33.20%
workload-1_maxint 6.7148 4.7314 29.54%
workload-maxint_maxfloat 4.0458 4.7518 -17.45%
workload-integral 3.9719 4.7427 -19.40%
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Improve codegen by packing coefficients.
4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh
and atanh respectively.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reworke SVE FP64 hyperbolics to use the SVE FEXPA
instruction.
Also update the special case handelling for large
inputs to be entirely vectorised.
Performance improvements on Neoverse V1:
cosh_sve: 19% for |x| < 709, 5x otherwise
sinh_sve: 24% for |x| < 709, 5.9x otherwise
tanh_sve: 12% for |x| < 19, 9x otherwise
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Improve performance of SVE exps by making better use
of the SVE FEXPA instruction.
Performance improvement on Neoverse V1:
exp2_sve: 21%
exp2f_sve: 24%
exp10f_sve: 23%
expm1_sve: 25%
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Commit 404526ee2e changed _start to write
the last argument to __libc_start_main without taking into consideration
that the function did not create a full stack frame, which leads to
overwriting the argv[0].
There is no functional change in this patch.
We remove stores and loads to stack, return address signing, and redundant
CFI directives before and after call to __libc_arm_za_disable().
The __libc_arm_za_disable implementation follows special calling convention
that allows to avoid most of the operations that would be necessary for a
call to a normal function (see [1] for details).
First, we rely on __libc_arm_za_disable() not clobbering certain registers,
and we put return address into one of these registers. Now we don't need
to store it on stack, so we don't need to sign return address using PAC.
Second, as a result of the above, we don't need to update the CFI offset.
This patch provides small optimisation avoiding unnecessary store and load
on stack also simplifies assembly code and CFI directives.
[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
No functional change here, just a small refactoring to simplify
using __alloc_gcs() for allocating shadow stacks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch adds the TCPI_OPT_USEC_TS constant from Linux 6.14 to
sysdeps/gnu/netinet/tcp.h
This patch adds the TCPI_OPT_TFO_CHILD constant from Linux 6.15 to
sysdeps/gnu/netinet/tcp.h
Signed-off-by: Jeremy Harris <jgh@exim.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Test that runs through a fairly large combination of the various
termios speed functions, for the new speed_t interface, for the old
speed_t interface (if enabled), and for the new baud_t interface.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
The generic code has __ispeed and __ospeed; Linux has c_ispeed and
c_ospeed. Use an anonymous union member to allow both set of names on
all platforms.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Add an explicitly numeric interface for baudrate setting. For glibc,
this only announces what is a fair accompli, but this is a plausible
way forward for standardization, and may be possible to infill on
non-compliant systems. The POSIX committee has stated:
[https://www.austingroupbugs.net/view.php?id=1916#c7135]
A future version of this standard is expected to add at least
the following symbolic constants for use as values of objects
of type speed_t: B57600, B115200, B230400, B460800, and
B921600.
Implementations are encouraged to propose additional
interfaces which will make it possible to set and query a
wider range of speeds than just those enumerated by the
constants beginning with B. If a set of common interfaces
emerges between several implementations, a future version of
this standard will likely add those interfaces.
This is exactly that interface.
The use of the term "baud" is due to the need to have a term
contrasting "speed", and it is already well established as a legacy
term -- including in the names of the legacy Bxxx
constants. Futhermore, it *is* valid from the point of view that the
termios interface fundamentally emulates an RS-232 serial port as far
as the application software is concerned.
The documentation states that for the current version of glibc,
speed_t == baud_t, but explicitly declares that this may not be the
case in the future.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Now all platforms unconditionally use the "sane" definitions of the
termios baud constants. Unify them into a common file.
Note: I have made them explicitly unsigned to avoid problems with
compiler warnings for comparisons of unequal signedness or
similar. These constants were historically octal on most platforms,
and so unsigned by default.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Hurd with USE_OLD_TTY was the only remaining platform with speed_t not
containing a proper baud rate. From the looks of it, that code has
long since bitrotted.
Remove the vestiges of USE_OLD_TTY.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Linux has supported arbitrary speeds and split speeds in the kernel
since 2008 on all platforms except Alpha (fixed in 2020), but glibc
was never updated to match. This is further complicated by POSIX uses
of macros for the cf[gs]et[io]speed interfaces, rather than plain
numbers, as it really ought to have.
On most platforms, the glibc ABI includes the c_[io]speed fields in
struct termios, but they are incorrectly used. On MIPS and SPARC, they
are entirely missing.
For backwards compatibility, the kernel will still use the legacy
speed fields unless they are set to BOTHER, and will use the legacy
output speed as the input speed if the latter is 0 (== B0). However,
the specific encoding used is visible to user space applications,
including ones other than the one running.
- SPARC and MIPS get a new struct termios, and tc[gs]etattr() is
versioned accordingly. However, the new struct termios is set to be
a strict extension of the old one, which means that cf* interfaces
other than the speed-related ones do not need versioning.
- The Bxxx constants are redefined as equivalent to their integer
values and the legacy Bxxx constants are renamed __Bxxx.
- cf[gs]et[io]speed() and cfsetspeed() are versioned accordingly.
- tcgetattr() and cfset[io]speed() are adjusted to always keep the
c_[io]speed fields correct (unlike earlier versions), but to
canonicalize the representation to ALSO configure the legacy fields
if a valid legacy representation exists.
- tcsetattr(), too, canonicalizes the representation in this way
before passing it to the kernel, to maximize compatibility with
older applications/tools.
- The old IBAUD0 hack is removed; it is no longer necessary since
even the legacy c_cflag baud rate fields have had separate input
values for a long time.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The powerpc architecture, only, emulates the termios ioctls using the
glibc termios structure. Export the real kernel ones as the termios2
interface; although the kernel doesn't call it termios2, it is exactly
the termios2 interface, and it avoids the namespace clash between the
emulated ioctls and the real kernel ioctls.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
In the kernel, these are <linux/sockios.h>. The differences between
<linux/sockios.h> and the copied data in <bits/ioctls.h> are minor;
mainly some #ifdefs, so try to use <linux/sockios.h> directly; it is
hopefully clean enough these days to use directly.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Replace local_isatty() inlined in libio with a proper function
__isatty_nostatus(). This allows simpler system-specific
implementations that don't need to touch errno at all.
Note: I left the prototype in include/unistd.h (the internal header
file.) It didn't much make sense to me to put it in a different header
(not-cancel.h), but perhaps someone can elucidate the need.
Add such an implementation for Linux, with a generic fallback.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>