glibc

lib/glibc

mirror of https://sourceware.org/git/glibc.git synced 2025-09-15 12:01:15 +03:00

Author	SHA1	Message	Date
Noah Goldstein	4af6844aa5	x86: Optimize memrchr-evex.S Optimizations are: 1. Use the fact that lzcnt(0) -> VEC_SIZE for memchr to save a branch in short string case. 2. Save several instructions in len = [VEC_SIZE, 4 * VEC_SIZE] case. 3. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... Code Size Changes: memrchr-evex.S : -29 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. memrchr-evex.S : 0.949 (Mostly from improvements in small strings) Full results attached in email. Full check passes on x86-64.	2022-10-19 17:31:03 -07:00
Noah Goldstein	b79f8ff26a	x86: Optimize strnlen-evex.S and implement with VMM headers Optimizations are: 1. Use the fact that bsf(0) leaves the destination unchanged to save a branch in short string case. 2. Restructure code so that small strings are given the hot path. - This is a net-zero on the benchmark suite but in general makes sense as smaller sizes are far more common. 3. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 4. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. The optimizations (especially for point 2) make the strnlen and strlen code essentially incompatible so split strnlen-evex to a new file. Code Size Changes: strlen-evex.S : -23 bytes strnlen-evex.S : -167 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. strlen-evex.S : 0.992 (No real change) strnlen-evex.S : 0.947 Full results attached in email. Full check passes on x86-64.	2022-10-19 17:31:03 -07:00
Noah Goldstein	69717709ec	x86: Shrink / minorly optimize strchr-evex and implement with VMM headers Size Optimizations: 1. Condence hot path for better cache-locality. - This is most impact for strchrnul where the logic strings with len <= VEC_SIZE or with a match in the first VEC no fits entirely in the first cache line. 2. Reuse common targets in first 4x VEC and after the loop. 3. Don't align targets so aggressively if it doesn't change the number of fetch blocks it will require and put more care in avoiding the case where targets unnecessarily split cache lines. 4. Align the loop better for DSB/LSD 5. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 6. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. Code Size Changes: strchr-evex.S : -63 bytes strchrnul-evex.S: -48 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. strchr-evex.S (Fixed) : 0.971 strchr-evex.S (Rand) : 0.932 strchrnul-evex.S : 0.965 Full results attached in email. Full check passes on x86-64.	2022-10-19 17:31:03 -07:00
Noah Goldstein	330881763e	x86: Optimize memchr-evex.S and implement with VMM headers Optimizations are: 1. Use the fact that tzcnt(0) -> VEC_SIZE for memchr to save a branch in short string case. 2. Restructure code so that small strings are given the hot path. - This is a net-zero on the benchmark suite but in general makes sense as smaller sizes are far more common. 3. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 4. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. The optimizations (especially for point 2) make the memchr and rawmemchr code essentially incompatible so split rawmemchr-evex to a new file. Code Size Changes: memchr-evex.S : -107 bytes rawmemchr-evex.S : -53 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. memchr-evex.S : 0.928 rawmemchr-evex.S : 0.986 (Less targets cross cache lines) Full results attached in email. Full check passes on x86-64.	2022-10-19 17:31:03 -07:00
Sunil K Pandey	451c6e5854	x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr This patch implements following evex512 version of string functions. evex512 version takes up to 30% less cycle as compared to evex, depending on length and alignment. - memchr function using 512 bit vectors. - rawmemchr function using 512 bit vectors. - wmemchr function using 512 bit vectors. Code size data: memchr-evex.o 762 byte memchr-evex512.o 576 byte (-24%) rawmemchr-evex.o 461 byte rawmemchr-evex512.o 412 byte (-11%) wmemchr-evex.o 794 byte wmemchr-evex512.o 552 byte (-30%) Placeholder function, not used by any processor at the moment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-18 13:26:33 -07:00
Noah Goldstein	be066536bd	x86: Update strlen-evex-base to use new reg/vec macros. To avoid duplicate the VMM / GPR / mask insn macros in all incoming evex512 files use the macros defined in 'reg-macros.h' and '{vec}-macros.h' This commit does not change libc.so Tested build on x86-64	2022-10-14 21:21:58 -07:00
Noah Goldstein	47f5d51461	x86: Remove now unused vec header macros. This commit does not change libc.so Tested build on x86-64	2022-10-14 21:21:58 -07:00
Noah Goldstein	a6784653f7	x86: Update memset to use new VEC macros Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64	2022-10-14 21:21:58 -07:00
Noah Goldstein	4fb7d8a938	x86: Update memmove to use new VEC macros Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64	2022-10-14 21:21:58 -07:00
Noah Goldstein	3088a66ff8	x86: Update memrchr to use new VEC macros Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64	2022-10-14 21:21:58 -07:00
Noah Goldstein	52ab7604db	x86: Update VEC macros to complete API for evex/evex512 impls 1) Copy so that backport will be easier. 2) Make section only define if there is not a previous definition 3) Add `VEC_lo` definition for proper reg-width but in the ymm/zmm0-15 range. 4) Add macros for accessing GPRs based on VEC_SIZE This is to make it easier to do think like: ``` vpcmpb %VEC(0), %VEC(1), %k0 kmov{d\|q} %k0, %{eax\|rax} test %{eax\|rax} ``` It adds macro s.t any GPR can get the proper width with: `V{upcase_GPR_name}` and any mask insn can get the proper width with: `{upcase_mask_insn_without_postfix}` This commit does not change libc.so Tested build on x86-64	2022-10-14 21:21:58 -07:00
Adhemerval Zanella	5355f9ca7b	elf: Remove -fno-tree-loop-distribute-patterns usage on dl-support Besides the option being gcc specific, this approach is still fragile and not future proof since we do not know if this will be the only optimization option gcc will add that transforms loops to memset (or any libcall). This patch adds a new header, dl-symbol-redir-ifunc.h, that can b used to redirect the compiler generated libcalls to port the generic memset implementation if required. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-10-10 10:32:28 -03:00
Adhemerval Zanella Netto	9dc4e29f63	x86: Fix -Os build (BZ #29576 ) The compiler might transform __stpcpy calls (which are routed to __builtin_stpcpy as an optimization) to strcpy and x86_64 strcpy multiarch implementation does not build any working symbol due ISA_SHOULD_BUILD not being evaluated for IS_IN(rtld). Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2022-10-05 18:04:13 -03:00
Aurelien Jarno	7e8283170c	x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: `df7e295d18` ("x86: Optimize {str\|wcs}rchr-avx2") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	3c0c78afab	x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation The AVX2 memrchr implementation uses the 'shlxl' instruction, which belongs to the BMI2 CPU feature and uses the 'lzcnt' instruction, which belongs to the LZCNT CPU feature. Fixes: `af5306a735` ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	e3e7fab7fe	x86-64: Require BMI2 for AVX2 (raw\|w)memchr implementations The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi' and 'sarx' instructions, which belongs to the BMI2 CPU feature. Fixes: `acfd088a19` ("x86: Optimize memchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	f31a5a884e	x86-64: Require BMI2 for AVX2 wcs(n)cmp implementations The AVX2 wcs(n)cmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	fc7de1d9b9	x86-64: Require BMI2 for AVX2 strncmp implementation The AVX2 strncmp implementations uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	4d64c64457	x86-64: Require BMI2 for AVX2 strcmp implementation The AVX2 strcmp implementation uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	10f79d3670	x86-64: Require BMI2 for AVX2 str(n)casecmp implementations The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Noah Goldstein	b0969fa53a	x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591 ] Previous implementation was adjusting length (rsi) to match bytes (eax), but since there is no bound to length this can cause overflow. Fix is to just convert the byte-count (eax) to length by dividing by sizeof (wchar_t) before the comparison. Full check passes on x86-64 and build succeeds w/ and w/o multiarch.	2022-09-28 20:15:16 -07:00
Noah Goldstein	312ded0d63	x86: Fix `#define STRCPY` guard in strcpy-sse2.S `#ifndef STPCPY` is incorrect for checking if `STRCPY` is already defined. It doesn't end up mattering as the whole check is guarded by `#if IS_IN (libc)` but is incorrect none the less.	2022-08-09 17:00:03 +08:00
Noah Goldstein	49889fb256	x86: Add support to build st{p\|r}{n}{cpy\|cat} with explicit ISA level 1. Add default ISA level selection in non-multiarch/rtld implementations. 2. Add ISA level build guards to different implementations. - I.e strcpy-avx2.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (strcpy-evex.S). 3. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-16 03:07:59 -07:00
Noah Goldstein	192979ee35	x86: Add support to build wcscpy with explicit ISA level 1. Add ISA level build guards to different implementations. - wcscpy-ssse3.S is used as ISA level 2/3/4. - wcscpy-generic.c is only used at ISA level 1 and will only build if compiled with ISA level == 1. Otherwise there is no reason to include it as we will always use wcscpy-ssse3.S 2. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-16 03:07:59 -07:00
Noah Goldstein	ceabdcd130	x86: Add support to build strcmp/strlen/strchr with explicit ISA level 1. Add default ISA level selection in non-multiarch/rtld implementations. 2. Add ISA level build guards to different implementations. - I.e strcmp-avx2.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (strcmp-evex.S). 3. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-16 03:07:59 -07:00
Noah Goldstein	42b014dd1b	x86: Remove unneeded rtld-wmemcmp wmemcmp isn't used by the dynamic loader so their no need to add an RTLD stub for it. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	e19bb87c97	x86: Move wcslen SSE2 implementation to multiarch/wcslen-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	64479f11b7	x86: Move wcschr SSE2 implementation to multiarch/wcschr-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	72a48ec0f7	x86: Move strcat SSE2 implementation to multiarch/strcat-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	cd080d0741	x86: Move strchr SSE2 implementation to multiarch/strchr-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	425647458b	x86: Move strrchr SSE2 implementation to multiarch/strrchr-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	08af081ffd	x86: Move memrchr SSE2 implementation to multiarch/memrchr-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	6b9006bfb0	x86: Move strcpy SSE2 implementation to multiarch/strcpy-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	58e6cd4bcb	x86: Move strlen SSE2 implementation to multiarch/strlen-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	60a583ec60	x86: Move strcmp SSE42 implementation to multiarch/strcmp-sse4_2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	427eaa2c85	x86: Move wcscmp SSE2 implementation to multiarch/wcscmp-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	d561fbb041	x86: Move strcmp SSE2 implementation to multiarch/strcmp-sse2.S This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Because strcmp-sse2.S implements so many functions (more from avx2/evex/sse42) add a new file 'strcmp-naming.h' to assist in getting the correct symbol name for all the function across multiarch/non-multiarch builds. Tested build on x86_64 and x86_32 with/without multiarch.	2022-07-13 14:55:31 -07:00
Noah Goldstein	30e57e0a21	x86: Rename STRCASECMP_NONASCII macro to STRCASECMP_L_NONASCII The previous macro name can be confusing given that both `__strcasecmp_l_nonascii` and `__strcasecmp_nonascii` are functions and we use the `_l` version.	2022-07-13 14:55:31 -07:00
Noah Goldstein	f2698954ff	x86: Remove __mmask intrinsics in strstr-avx512.c The intrinsics are not available before GCC7 and using standard operators generates code of equivalent or better quality. Removed: _cvtmask64_u64 _kshiftri_mask64 _kand_mask64 Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958	2022-07-12 15:41:14 -07:00
Noah Goldstein	9c38deec96	x86: Remove generic strncat, strncpy, and stpncpy implementations These functions all have optimized versions: __strncat_sse2_unaligned, __strncpy_sse2_unaligned, and stpncpy_sse2_unaligned which are faster than their respective generic implementations. Since the sse2 versions can run on baseline x86_64, we should use these as the baseline implementation and can remove the generic implementations. Geometric mean of N=20 runs of the entire benchmark suite on: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake) __strncat_sse2_unaligned / __strncat_generic: .944 __strncpy_sse2_unaligned / __strncpy_generic: .726 __stpncpy_sse2_unaligned / __stpncpy_generic: .650 Tested build with and without multiarch and full check with multiarch.	2022-07-12 11:44:12 -07:00
H.J. Lu	ec9013727d	x86-64: Remove redundant strcspn-generic/strpbrk-generic/strspn-generic Remove redundant strcspn-generic, strpbrk-generic and strspn-generic from sysdep_routines in sysdeps/x86_64/multiarch/Makefile added by commit `c69f960b01` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jul 3 21:28:07 2022 -0700 x86: Add support for building str{c\|p}{brk\|spn} with explicit ISA level since they have been added to sysdep_routines in sysdeps/x86_64/Makefile.	2022-07-08 16:06:04 -07:00
H.J. Lu	eedf7886ed	x86-64: Don't mark symbols as hidden in strcmp-XXX.S Don't mark symbols as hidden in strcmp-avx2.S, strcmp-evex.S and strcmp-sse42.S since they are marked as hidden in the IFUNC selectors.	2022-07-07 16:38:11 -07:00
Noah Goldstein	ae308947ff	x86: Add support for building {w}memcmp{eq} with explicit ISA level 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-05 16:42:42 -07:00
Noah Goldstein	37ecc657b2	x86: Add support for building {w}memset{_chk} with explicit ISA level 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memset sse2 from memset.S to multiarch/memset-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memset-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memset-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memset.S that just include the non-multiarch memset.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-05 16:42:42 -07:00
Noah Goldstein	b6a02c3606	x86: Add support for building {w}memmove{_chk} with explicit ISA level 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memmove sse2 from memmove.S to multiarch/memmove-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memmove-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memmove-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memmove.S that just include the non-multiarch memmove.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch. isa raising memmove	2022-07-05 16:42:42 -07:00
Noah Goldstein	c69f960b01	x86: Add support for building str{c\|p}{brk\|spn} with explicit ISA level The changes for these functions are different than the others because the best implementation (sse4_2) requires the generic implementation as a fallback to be built as well. Changes are: 1. Add non-multiarch functions for str{c\|p}{brk\|spn}.c to statically select the best implementation based on the configured ISA build level. 2. Add stubs for str{c\|p}{brk\|spn}-generic and varshift.c to in the sysdeps/x86_64 directory so that the the sse4 implementation will have all of its dependencies for the non-multiarch / rtld build when ISA level >= 2. 3. Add new multiarch/rtld-strcspn.c that just include the non-multiarch strcspn.c which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-05 16:42:42 -07:00
Noah Goldstein	baeae86fb8	x86: Add comment explaining no Slow_SSE4_2 check in ifunc-sse4_2 Just for clarities sake and so that if a future implementation is added we remember to add the check.	2022-07-05 16:42:42 -07:00
Noah Goldstein	96ac447d91	x86: Add missing IS_IN (libc) check to strncmp-sse4_2.S Was missing to for the multiarch build rtld-strncmp-sse4_2.os was being built and exporting symbols: build/glibc/string/rtld-strncmp-sse4_2.os: 0000000000000000 T __strncmp_sse42 Introduced in: commit `11ffcacb64` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 21 12:10:50 2017 -0700 x86-64: Implement strcmp family IFUNC selectors in C	2022-06-29 19:47:52 -07:00
Noah Goldstein	0aa294fb88	x86: Add missing IS_IN (libc) check to strcspn-sse4.c Was missing to for the multiarch build rtld-strcspn-sse4.os was being built and exporting symbols: build/glibc/string/rtld-strcspn-sse4.os: U ___m128i_shift_right U __strcspn_generic 0000000000000000 T __strcspn_sse42 U strlen build/glibc/string/rtld-varshift.os: 0000000000000000 R ___m128i_shift_right Introduced in: commit `06e51c8f3d` Author: H.J. Lu <hongjiu.lu@intel.com> Date: Fri Jul 3 02:48:56 2009 -0700 Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.	2022-06-29 19:47:52 -07:00
Noah Goldstein	8cfbbbcdf9	x86: Add missing IS_IN (libc) check to memmove-ssse3.S Was missing to for the multiarch build rtld-memmove-ssse3.os was being built and exporting symbols: >$ nm string/rtld-memmove-ssse3.os U __GI___chk_fail 0000000000000020 T __memcpy_chk_ssse3 0000000000000040 T __memcpy_ssse3 0000000000000020 T __memmove_chk_ssse3 0000000000000040 T __memmove_ssse3 0000000000000000 T __mempcpy_chk_ssse3 0000000000000010 T __mempcpy_ssse3 U __x86_shared_cache_size_half Introduced after 2.35 in: commit `26b2478322` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Thu Apr 14 11:47:40 2022 -0500 x86: Reduce code size of mem{move\|pcpy\|cpy}-ssse3	2022-06-29 19:47:52 -07:00

1 2 3 4 5 ...

505 Commits