about summary refs log tree commit diff
path: root/src/test/assembly/align_offset.rs
AgeCommit message (Collapse)AuthorLines
2023-01-11Move /src/test to /testsAlbert Larsan-48/+0
2022-07-17Add a special case for align_offset /w stride != 1Simonas Kazlauskas-0/+48
This generalizes the previous `stride == 1` special case to apply to any situation where the requested alignment is divisible by the stride. This in turn allows the test case from #98809 produce ideal assembly, along the lines of: leaq 15(%rdi), %rax andq $-16, %rax This also produces pretty high quality code for situations where the alignment of the input pointer isn’t known: pub unsafe fn ptr_u32(slice: *const u32) -> *const u32 { slice.offset(slice.align_offset(16) as isize) } // => movl %edi, %eax andl $3, %eax leaq 15(%rdi), %rcx andq $-16, %rcx subq %rdi, %rcx shrq $2, %rcx negq %rax sbbq %rax, %rax orq %rcx, %rax leaq (%rdi,%rax,4), %rax Here LLVM is smart enough to replace the `usize::MAX` special case with a branch-less bitwise-OR approach, where the mask is constructed using the neg and sbb instructions. This appears to work across various architectures I’ve tried. This change ends up introducing more branches and code in situations where there is less knowledge of the arguments. For example when the requested alignment is entirely unknown. This use-case was never really a focus of this function, so I’m not particularly worried, especially since llvm-mca is saying that the new code is still appreciably faster, despite all the new branching. Fixes #98809. Sadly, this does not help with #72356.