Skip to content

Commit e821464

Browse files
mayeutdcommander
authored andcommitted
ARM64 NEON SIMD impl. of prog. Huffman encoding
This commit adds ARM64 NEON optimizations for the encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in progressive Huffman encoding. Compression speedups for the typical set of five libjpeg-turbo test images (https://libjpeg-turbo.org/About/Performance): Cortex-A53: 23.8-39.2% (avg. 32.2%) Cortex-A72: 26.8-41.1% (avg. 33.5%) Apple A7: 29.7-45.9% (avg. 39.6%) Closes #229
1 parent b8a7680 commit e821464

4 files changed

Lines changed: 675 additions & 1 deletion

File tree

ChangeLog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,11 @@ segfault or other user-visible errant behavior, and given that the lossless
9292
transformer (unlike the decompressor) is not generally exposed to arbitrary
9393
data exploits, this issue did not likely pose a security risk.
9494

95+
12. Added SIMD acceleration for progressive Huffman encoding on ARM 64-bit
96+
(ARMv8) platforms. This speeds up the compression of full-color progressive
97+
JPEGs by about 30-40% on average (relative to libjpeg-turbo 2.0.x) when using
98+
modern ARMv8 CPUs.
99+
95100

96101
2.0.3
97102
=====

simd/arm64/jsimd.c

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#include "../../jdct.h"
2323
#include "../../jsimddct.h"
2424
#include "../jsimd.h"
25+
#include "jconfigint.h"
2526

2627
#include <stdio.h>
2728
#include <string.h>
@@ -773,6 +774,18 @@ jsimd_huff_encode_one_block(void *state, JOCTET *buffer, JCOEFPTR block,
773774
GLOBAL(int)
774775
jsimd_can_encode_mcu_AC_first_prepare(void)
775776
{
777+
init_simd();
778+
779+
if (DCTSIZE != 8)
780+
return 0;
781+
if (sizeof(JCOEF) != 2)
782+
return 0;
783+
if (SIZEOF_SIZE_T != 8)
784+
return 0;
785+
786+
if (simd_support & JSIMD_NEON)
787+
return 1;
788+
776789
return 0;
777790
}
778791

@@ -781,11 +794,25 @@ jsimd_encode_mcu_AC_first_prepare(const JCOEF *block,
781794
const int *jpeg_natural_order_start, int Sl,
782795
int Al, JCOEF *values, size_t *zerobits)
783796
{
797+
jsimd_encode_mcu_AC_first_prepare_neon(block, jpeg_natural_order_start,
798+
Sl, Al, values, zerobits);
784799
}
785800

786801
GLOBAL(int)
787802
jsimd_can_encode_mcu_AC_refine_prepare(void)
788803
{
804+
init_simd();
805+
806+
if (DCTSIZE != 8)
807+
return 0;
808+
if (sizeof(JCOEF) != 2)
809+
return 0;
810+
if (SIZEOF_SIZE_T != 8)
811+
return 0;
812+
813+
if (simd_support & JSIMD_NEON)
814+
return 1;
815+
789816
return 0;
790817
}
791818

@@ -794,5 +821,7 @@ jsimd_encode_mcu_AC_refine_prepare(const JCOEF *block,
794821
const int *jpeg_natural_order_start, int Sl,
795822
int Al, JCOEF *absvalues, size_t *bits)
796823
{
797-
return 0;
824+
return jsimd_encode_mcu_AC_refine_prepare_neon(block,
825+
jpeg_natural_order_start,
826+
Sl, Al, absvalues, bits);
798827
}

0 commit comments

Comments
 (0)