I need to add two ulong.
The first half (low 32 bits) of each number is stored in lanes with even id, and the second half (high 32 bits) is stored in the lane next to it (next odd id).
So one lane adds the the low 32 bits, and one lane adds the high 32 bits.
The problem is if the low 32 bits sum has a carry flag, I need to add that flag to the high 32 bit sum.
One way may be like this (for a += b):
v_add_co_u32 %[a], vcc, %[b], %[a] v_addc_co_u32 %[carry], vcc, 0, 0, vcc v_add_co_u32_dpp %[a], vcc, %[carry], %[a] quad_perm[1, 0, 3, 2]
The second line is need because I need to translate the carry flag into a number so that I can share it cross lanes using dpp instruction in line 3.
Is there a better way to do this? Thanks in advance.