-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
pclmulqdq intrinsics don't inline well across target_feature changes anymore #139029
Copy link
Copy link
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-target-featureArea: Enabling/disabling target features like AVX, Neon, etc.Area: Enabling/disabling target features like AVX, Neon, etc.C-bugCategory: This is a bug.Category: This is a bug.E-help-wantedCall for participation: Help is requested to fix this issue.Call for participation: Help is requested to fix this issue.E-mediumCall for participation: Medium difficulty. Experience needed to fix: Intermediate.Call for participation: Medium difficulty. Experience needed to fix: Intermediate.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)S-has-bisectionStatus: A bisection has been found for this issueStatus: A bisection has been found for this issueT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-target-featureArea: Enabling/disabling target features like AVX, Neon, etc.Area: Enabling/disabling target features like AVX, Neon, etc.C-bugCategory: This is a bug.Category: This is a bug.E-help-wantedCall for participation: Help is requested to fix this issue.Call for participation: Help is requested to fix this issue.E-mediumCall for participation: Medium difficulty. Experience needed to fix: Intermediate.Call for participation: Medium difficulty. Experience needed to fix: Intermediate.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)S-has-bisectionStatus: A bisection has been found for this issueStatus: A bisection has been found for this issueT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
View all comments
i'd noticed the
pclmulqdqintrinsics incrc32fastwere notable in aperf reportof a benchmark last night. somewhat shockingly, there were functions whose body waspclmulqdq xmm0, xmm1, 17; retandpclmulqdq xmm0, xmm1, 0; ret, complete with constraining callers' choice of xmm registers! after a bit of digging it seems to be a regression in nightly.the specific regression i'd started at can be reproduced with
cargo benchin https://github.com/srijs/rust-crc32fast .cargo +1.85.1 benchproduceswhereas
cargo +nightly benchproducesafter looking at perf a bit i believe this is representative: https://rust.godbolt.org/z/8dxcE4vo1 . i'm including everything there in this issue as well.
Code
I tried this code:
I expected to see this happen (with
-C opt-level=3):Instead, this happened (also
-C opt-level=3):Version it worked on
1.85.1, 1.31.0, and a half dozen in between.
additionally, beta (rust version 1.86.0-beta.7 (7824ede 2025-03-22) seems good.
nightly with
-C opt-level=3 -C target-feature=+pclmulstill does great.Version with regression
in the above godbolt link, i see
--versionin therustc nightlytab providesrustc 1.87.0-nightly (a2e63569f 2025-03-26). this is consistent with how i first saw this locally:rustc +nightly --version --verbose:Related improvement along the way
adding the same
target_featureblock on the inner function sees nightly produce somewhat better-than-baseline code: https://rust.godbolt.org/z/sGrYedeaPwith
rustc +nightly -C opt-level 3yields:whereas before the codegen was identical regardless of the
target_featureattribute on the inner function. so at least in some cases there is a modest improvement?@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged
LLVM upstream issue: llvm/llvm-project#142321