Lines Matching +full:matrix +full:- +full:multiply +full:- +full:accumulate +full:- +full:instruction

6  * This software is provided 'as-is', without any express or implied warranty.
43 * clang (at least 3.9.[0-1]) pessimizes "rm" (y) and "m" (y) in _mm_crc32_u8()
46 * the latter. This costs a register and an instruction but in _mm_crc32_u8()
69 /* CRC-32C (iSCSI) polynomial in reversed bit order. */
73 * Block sizes for three-way parallel crc computation. LONG and SHORT must
90 * Multiply a matrix times a vector over the Galois field of two elements,
111 * Multiply a matrix by itself over GF(2). Both mat and square must have 32
133 uint32_t odd[32]; /* odd-power-of-two zeros operator */ in crc32c_zeros_op()
138 odd[0] = POLY; /* CRC-32C polynomial */ in crc32c_zeros_op()
153 * bits), in even -- next square puts operator for two zero bytes in in crc32c_zeros_op()
165 /* answer ended up in odd -- copy to even */ in crc32c_zeros_op()
172 * for that length, byte-by-byte on the operand.
216 /* Compute CRC-32C using the Intel hardware instruction. */
236 while (len && ((uintptr_t)next & (align - 1)) != 0) { in sse42_crc32c()
239 len--; in sse42_crc32c()
245 * crc instructions, each on LONG bytes -- this is optimized for the in sse42_crc32c()
271 /*- in sse42_crc32c()
273 * loop. 'crc' is used to accumulate crc0 and crc1 in sse42_crc32c()
296 * can run in 24 cycles, so the 3-way blocking is worse in sse42_crc32c()
319 len -= LONG * 3; in sse42_crc32c()
354 len -= SHORT * 3; in sse42_crc32c()
359 end = next + (len - (len & (align - 1))); in sse42_crc32c()
368 len &= (align - 1); in sse42_crc32c()
374 len--; in sse42_crc32c()