Searched hist:"6 a8ce1ef3940e0cab5ff5f11e1cff5301f83fef6" (Results 1 – 4 of 4) sorted by relevance
/linux/arch/x86/crypto/ |
H A D | crc32c-intel_glue.c | diff 6a8ce1ef3940e0cab5ff5f11e1cff5301f83fef6 Fri Sep 28 00:44:22 CEST 2012 Tim Chen <tim.c.chen@linux.intel.com> crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu.
A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
H A D | crc32c-pcl-intel-asm_64.S | 6a8ce1ef3940e0cab5ff5f11e1cff5301f83fef6 Fri Sep 28 00:44:22 CEST 2012 Tim Chen <tim.c.chen@linux.intel.com> crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu.
A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
H A D | Makefile | diff 6a8ce1ef3940e0cab5ff5f11e1cff5301f83fef6 Fri Sep 28 00:44:22 CEST 2012 Tim Chen <tim.c.chen@linux.intel.com> crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu.
A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
/linux/crypto/ |
H A D | Kconfig | diff 6a8ce1ef3940e0cab5ff5f11e1cff5301f83fef6 Fri Sep 28 00:44:22 CEST 2012 Tim Chen <tim.c.chen@linux.intel.com> crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu.
A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|