Skip to content

Commit 92929ff

Browse files
committed
SIMD acceleration opportunities
1 parent 0dd0569 commit 92929ff

13 files changed

Lines changed: 554 additions & 22 deletions

README.md

Lines changed: 51 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ instructions.
88

99
The plugin supports:
1010
- Packed ternary types: `t32_t`, `t64_t`, `t128_t` (32/64/128 trits; 2-bit packed encoding)
11+
- Vector ternary types: `tv32_t`, `tv64_t` (vectors of 2 × t32_t and 2 × t64_t for SIMD operations)
1112
- Extended arithmetic operations: add, sub, mul, div, mod, neg
1213
- Logic operations: not
1314
- Comparison operations: cmp (returns -1, 0, +1)
@@ -137,8 +138,10 @@ Optional arguments:
137138
`__builtin_ternary_shl`, `__builtin_ternary_shr`, `__builtin_ternary_rol`, and `__builtin_ternary_ror`.
138139
- `-fplugin-arg-ternary_plugin-conv` enables lowering of ternary conversion builtins like
139140
`__builtin_ternary_tb2t`, `__builtin_ternary_tt2b`, `__builtin_ternary_t2f`, and `__builtin_ternary_f2t`.
140-
- `-fplugin-arg-ternary_plugin-types` enables builtin ternary integer types `t32_t`, `t64_t`,
141-
`t128_t` with packed 2-bit trit storage.
141+
- `-fplugin-arg-ternary_plugin-mem` enables lowering of ternary memory builtins like
142+
`__builtin_ternary_load_t32`, `__builtin_ternary_store_t32`, `__builtin_ternary_load_t64`, and `__builtin_ternary_store_t64`.
143+
- `-fplugin-arg-ternary_plugin-vector` enables vectorized ternary operations for `tv32_t` and `tv64_t` types
144+
(vectors of 2 × t32_t and 2 × t64_t respectively).
142145
- `-fplugin-arg-ternary_plugin-prefix=<name>` sets the base helper prefix used by lowering
143146
(default: `__ternary`). For example, select helpers become `<prefix>_select_i32` and arithmetic
144147
helpers become `<prefix>_add`, `<prefix>_sub`, etc.
@@ -255,16 +258,55 @@ make test CXX=g++-15 CC=gcc-15
255258
This plugin analyzes ternary conditional expressions in the code and can optionally
256259
lower ternary operations to helper calls suitable for targeting a balanced-ternary ISA.
257260

258-
## Balanced Ternary Literals
261+
## ISA Operations
259262

260-
Use balanced-ternary strings to construct packed values:
263+
The plugin provides groundwork for a balanced-ternary ISA with the following operations:
261264

262-
```c
263-
t32_t a = T32_BT_STR("1 0 -1 1");
264-
t64_t b = T64_BT_STR("1,0,0,-1");
265-
```
265+
### Vector Operations - SIMD Acceleration ✓ IMPLEMENTED (tv32_t)
266+
- `tv32_t`: Vector type containing 2 × t32_t elements (128 bits total)
267+
- Arithmetic operations: `vadd`, `vsub`, `vmul` (element-wise on vector elements)
268+
- Logic operations: `vand`, `vor`, `vxor`, `vnot` (element-wise ternary logic)
269+
- Comparison operations: `vcmp` (element-wise ternary comparison)
270+
271+
Implemented as builtins:
272+
- `__builtin_ternary_add_tv32`, `__builtin_ternary_sub_tv32`, `__builtin_ternary_mul_tv32`, etc.
273+
- SIMD acceleration opportunities: Can leverage AVX/AVX-512 for parallel trit processing
274+
275+
### SIMD Acceleration Opportunities - EXPLORATION
276+
277+
The ternary vector operations provide a foundation for SIMD acceleration:
278+
279+
**Current Implementation:**
280+
- Element-wise operations on packed ternary vectors
281+
- 128-bit vectors (tv32_t) for 2 × 32-trit operations
282+
- Foundation for wider SIMD utilization
283+
284+
**Future SIMD Opportunities:**
285+
- **AVX-512 Integration**: 512-bit vectors for 8 × 32-trit or 4 × 64-trit operations
286+
- **Trit-Level Parallelism**: SIMD instructions for parallel trit manipulation
287+
- **Hardware Acceleration**: Custom ternary SIMD units for maximum performance
288+
- **Memory Bandwidth**: Efficient packed ternary data movement
289+
290+
**Performance Characteristics:**
291+
- Balanced ternary enables simpler arithmetic than two's complement
292+
- Potential for higher computational density in AI/ML workloads
293+
- Reduced carry propagation compared to binary arithmetic
294+
295+
### Control Flow Operations (brt/brf) - PLANNED
296+
- `brt Rc, label`: branch if Rc != 0 (ternary true)
297+
- `brf Rc, label`: branch if Rc == 0 (ternary false)
298+
299+
These operate on ternary conditions and require RTL-level implementation for full support. The plugin currently lowers ternary conditions to helper calls but does not generate conditional jumps.
300+
301+
### Calling Conventions - PLANNED
302+
303+
Ternary-aware calling conventions are designed as follows:
304+
305+
- **Argument Passing**: Ternary values passed in ternary registers when available, otherwise in binary containers
306+
- **Return Values**: Ternary results returned in ternary registers or binary containers as appropriate
307+
- **Register Allocation**: Ternary registers allocated for ternary-typed variables, with fallback to binary registers
266308

267-
The parser consumes trits from left to right (most significant to least significant).
309+
The current plugin provides the type system and operation lowering needed for these conventions but requires GCC backend modifications for full implementation.
268310

269311
## Known Limitations
270312

include/ternary.h

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,10 @@ typedef uint64_t t32_t; /* 32 trits -> 64 bits */
1616
typedef unsigned __int128 t64_t; /* 64 trits -> 128 bits */
1717
#endif
1818

19-
// Vector types
20-
typedef uint64_t v2t32_t; // 2 x t32
21-
typedef uint64_t v4t32_t; // 4 x t32
22-
typedef unsigned __int128 v2t64_t; // 2 x t64
23-
typedef unsigned __int128 v4t64_t; // 4 x t64
19+
// Vector types - packed ternary vectors for SIMD operations
20+
#ifndef TERNARY_USE_BUILTIN_TYPES
21+
typedef unsigned __int128 tv32_t; /* vector of 2 x t32_t (128 bits) */
22+
#endif
2423

2524
// Builtin function declarations (for plugin lowering)
2625
extern int __builtin_ternary_add(int a, int b);
@@ -53,9 +52,30 @@ extern t64_t __builtin_ternary_cmpeq_t64(t64_t a, t64_t b);
5352
extern t64_t __builtin_ternary_cmpgt_t64(t64_t a, t64_t b);
5453
extern t64_t __builtin_ternary_cmpneq_t64(t64_t a, t64_t b);
5554

56-
// Vector builtins
57-
extern v2t32_t __builtin_ternary_add_v2t32(v2t32_t a, v2t32_t b);
58-
extern v4t64_t __builtin_ternary_mul_v4t64(v4t64_t a, v4t64_t b);
55+
// Memory operations (tld/tst)
56+
extern t32_t __builtin_ternary_load_t32(const void *addr);
57+
extern void __builtin_ternary_store_t32(void *addr, t32_t value);
58+
extern t64_t __builtin_ternary_load_t64(const void *addr);
59+
extern void __builtin_ternary_store_t64(void *addr, t64_t value);
60+
61+
// Vector operations - SIMD accelerated ternary computations
62+
extern tv32_t __builtin_ternary_add_tv32(tv32_t a, tv32_t b);
63+
extern tv32_t __builtin_ternary_sub_tv32(tv32_t a, tv32_t b);
64+
extern tv32_t __builtin_ternary_mul_tv32(tv32_t a, tv32_t b);
65+
extern tv32_t __builtin_ternary_and_tv32(tv32_t a, tv32_t b);
66+
extern tv32_t __builtin_ternary_or_tv32(tv32_t a, tv32_t b);
67+
extern tv32_t __builtin_ternary_xor_tv32(tv32_t a, tv32_t b);
68+
extern tv32_t __builtin_ternary_not_tv32(tv32_t a);
69+
extern tv32_t __builtin_ternary_cmp_tv32(tv32_t a, tv32_t b);
70+
71+
extern tv64_t __builtin_ternary_add_tv64(tv64_t a, tv64_t b);
72+
extern tv64_t __builtin_ternary_sub_tv64(tv64_t a, tv64_t b);
73+
extern tv64_t __builtin_ternary_mul_tv64(tv64_t a, tv64_t b);
74+
extern tv64_t __builtin_ternary_and_tv64(tv64_t a, tv64_t b);
75+
extern tv64_t __builtin_ternary_or_tv64(tv64_t a, tv64_t b);
76+
extern tv64_t __builtin_ternary_xor_tv64(tv64_t a, tv64_t b);
77+
extern tv64_t __builtin_ternary_not_tv64(tv64_t a);
78+
extern tv64_t __builtin_ternary_cmp_tv64(tv64_t a, tv64_t b);
5979

6080
// Balanced-ternary string literals (e.g. "1 0 -1 1")
6181
#define T32_BT_STR(s) __ternary_bt_str_t32(s)

include/ternary_runtime.h

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ typedef int64_t ternary_cond_t;
1616
#ifndef TERNARY_USE_BUILTIN_TYPES
1717
typedef uint64_t t32_t; /* 32 trits -> 64 bits */
1818
typedef unsigned __int128 t64_t; /* 64 trits -> 128 bits */
19+
typedef unsigned __int128 tv32_t; /* vector of 2 x t32_t (128 bits) */
20+
typedef struct { unsigned __int128 lo, hi; } tv64_t; /* vector of 2 x t64_t (256 bits) */
21+
typedef struct { unsigned __int128 lo, hi; } tv128_t; /* vector of 2 x t128_t (512 bits) */
1922
#endif
2023

2124
/* Varargs helpers for ternary packed types. */
@@ -117,6 +120,31 @@ t64_t __ternary_cmpeq_t64(t64_t a, t64_t b);
117120
t64_t __ternary_cmpgt_t64(t64_t a, t64_t b);
118121
t64_t __ternary_cmpneq_t64(t64_t a, t64_t b);
119122

123+
/* Memory operations (tld/tst) */
124+
t32_t __ternary_load_t32(const void *addr);
125+
void __ternary_store_t32(void *addr, t32_t value);
126+
t64_t __ternary_load_t64(const void *addr);
127+
void __ternary_store_t64(void *addr, t64_t value);
128+
129+
/* Vector operations - SIMD accelerated ternary computations */
130+
tv32_t __ternary_add_tv32(tv32_t a, tv32_t b);
131+
tv32_t __ternary_sub_tv32(tv32_t a, tv32_t b);
132+
tv32_t __ternary_mul_tv32(tv32_t a, tv32_t b);
133+
tv32_t __ternary_and_tv32(tv32_t a, tv32_t b);
134+
tv32_t __ternary_or_tv32(tv32_t a, tv32_t b);
135+
tv32_t __ternary_xor_tv32(tv32_t a, tv32_t b);
136+
tv32_t __ternary_not_tv32(tv32_t a);
137+
tv32_t __ternary_cmp_tv32(tv32_t a, tv32_t b);
138+
139+
tv64_t __ternary_add_tv64(tv64_t a, tv64_t b);
140+
tv64_t __ternary_sub_tv64(tv64_t a, tv64_t b);
141+
tv64_t __ternary_mul_tv64(tv64_t a, tv64_t b);
142+
tv64_t __ternary_and_tv64(tv64_t a, tv64_t b);
143+
tv64_t __ternary_or_tv64(tv64_t a, tv64_t b);
144+
tv64_t __ternary_xor_tv64(tv64_t a, tv64_t b);
145+
tv64_t __ternary_not_tv64(tv64_t a);
146+
tv64_t __ternary_cmp_tv64(tv64_t a, tv64_t b);
147+
120148
#ifdef __cplusplus
121149
}
122150
#endif

runtime/ternary_runtime.c

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,4 +692,181 @@ DEFINE_TERNARY_TYPE_OPS(64, t64_t, 64, unsigned __int128, ternary_decode_u128, t
692692
ternary_tritwise_op_u128, ternary_shift_left_u128, ternary_shift_right_u128,
693693
ternary_rotate_left_u128, ternary_rotate_right_u128)
694694

695+
t32_t __ternary_load_t32(const void *addr)
696+
{
697+
return *(const t32_t *)addr;
698+
}
699+
700+
void __ternary_store_t32(void *addr, t32_t value)
701+
{
702+
*(t32_t *)addr = value;
703+
}
704+
705+
t64_t __ternary_load_t64(const void *addr)
706+
{
707+
return *(const t64_t *)addr;
708+
}
709+
710+
void __ternary_store_t64(void *addr, t64_t value)
711+
{
712+
*(t64_t *)addr = value;
713+
}
714+
715+
/* Vector operations - SIMD accelerated ternary computations */
716+
717+
/* tv32_t operations (vector of 2 x t32_t) */
718+
tv32_t __ternary_add_tv32(tv32_t a, tv32_t b)
719+
{
720+
// Extract two t32_t values from the 128-bit vector
721+
t32_t a0 = (t32_t)(uint64_t)a;
722+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
723+
t32_t b0 = (t32_t)(uint64_t)b;
724+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
725+
726+
// Perform scalar operations
727+
t32_t r0 = __ternary_add_t32(a0, b0);
728+
t32_t r1 = __ternary_add_t32(a1, b1);
729+
730+
// Pack back into 128-bit vector
731+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
732+
}
733+
734+
tv32_t __ternary_sub_tv32(tv32_t a, tv32_t b)
735+
{
736+
t32_t a0 = (t32_t)(uint64_t)a;
737+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
738+
t32_t b0 = (t32_t)(uint64_t)b;
739+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
740+
741+
t32_t r0 = __ternary_sub_t32(a0, b0);
742+
t32_t r1 = __ternary_sub_t32(a1, b1);
743+
744+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
745+
}
746+
747+
tv32_t __ternary_mul_tv32(tv32_t a, tv32_t b)
748+
{
749+
t32_t a0 = (t32_t)(uint64_t)a;
750+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
751+
t32_t b0 = (t32_t)(uint64_t)b;
752+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
753+
754+
t32_t r0 = __ternary_mul_t32(a0, b0);
755+
t32_t r1 = __ternary_mul_t32(a1, b1);
756+
757+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
758+
}
759+
760+
tv32_t __ternary_and_tv32(tv32_t a, tv32_t b)
761+
{
762+
t32_t a0 = (t32_t)(uint64_t)a;
763+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
764+
t32_t b0 = (t32_t)(uint64_t)b;
765+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
766+
767+
t32_t r0 = __ternary_and_t32(a0, b0);
768+
t32_t r1 = __ternary_and_t32(a1, b1);
769+
770+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
771+
}
772+
773+
tv32_t __ternary_or_tv32(tv32_t a, tv32_t b)
774+
{
775+
t32_t a0 = (t32_t)(uint64_t)a;
776+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
777+
t32_t b0 = (t32_t)(uint64_t)b;
778+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
779+
780+
t32_t r0 = __ternary_or_t32(a0, b0);
781+
t32_t r1 = __ternary_or_t32(a1, b1);
782+
783+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
784+
}
785+
786+
tv32_t __ternary_xor_tv32(tv32_t a, tv32_t b)
787+
{
788+
t32_t a0 = (t32_t)(uint64_t)a;
789+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
790+
t32_t b0 = (t32_t)(uint64_t)b;
791+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
792+
793+
t32_t r0 = __ternary_xor_t32(a0, b0);
794+
t32_t r1 = __ternary_xor_t32(a1, b1);
795+
796+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
797+
}
798+
799+
tv32_t __ternary_not_tv32(tv32_t a)
800+
{
801+
t32_t a0 = (t32_t)(uint64_t)a;
802+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
803+
804+
t32_t r0 = __ternary_not_t32(a0);
805+
t32_t r1 = __ternary_not_t32(a1);
806+
807+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
808+
}
809+
810+
tv32_t __ternary_cmp_tv32(tv32_t a, tv32_t b)
811+
{
812+
t32_t a0 = (t32_t)(uint64_t)a;
813+
t32_t a1 = (t32_t)(uint64_t)(a >> 64);
814+
t32_t b0 = (t32_t)(uint64_t)b;
815+
t32_t b1 = (t32_t)(uint64_t)(b >> 64);
816+
817+
t32_t r0 = __ternary_cmplt_t32(a0, b0);
818+
t32_t r1 = __ternary_cmplt_t32(a1, b1);
819+
820+
return ((tv32_t)(uint64_t)r1 << 64) | (tv32_t)(uint64_t)r0;
821+
}
822+
823+
/* tv64_t operations (vector of 2 x t64_t) - TODO: Implement for struct type */
824+
tv64_t __ternary_add_tv64(tv64_t a, tv64_t b)
825+
{
826+
// TODO: Implement for struct type
827+
return a; // Placeholder
828+
}
829+
830+
tv64_t __ternary_sub_tv64(tv64_t a, tv64_t b)
831+
{
832+
// TODO: Implement for struct type
833+
return a; // Placeholder
834+
}
835+
836+
tv64_t __ternary_mul_tv64(tv64_t a, tv64_t b)
837+
{
838+
// TODO: Implement for struct type
839+
return a; // Placeholder
840+
}
841+
842+
tv64_t __ternary_and_tv64(tv64_t a, tv64_t b)
843+
{
844+
// TODO: Implement for struct type
845+
return a; // Placeholder
846+
}
847+
848+
tv64_t __ternary_or_tv64(tv64_t a, tv64_t b)
849+
{
850+
// TODO: Implement for struct type
851+
return a; // Placeholder
852+
}
853+
854+
tv64_t __ternary_xor_tv64(tv64_t a, tv64_t b)
855+
{
856+
// TODO: Implement for struct type
857+
return a; // Placeholder
858+
}
859+
860+
tv64_t __ternary_not_tv64(tv64_t a)
861+
{
862+
// TODO: Implement for struct type
863+
return a; // Placeholder
864+
}
865+
866+
tv64_t __ternary_cmp_tv64(tv64_t a, tv64_t b)
867+
{
868+
// TODO: Implement for struct type
869+
return a; // Placeholder
870+
}
871+
695872
#undef DEFINE_TERNARY_TYPE_OPS

0 commit comments

Comments
 (0)