Feature Request: loop_order constraint on arch components to model fixed-dataflow hardware

Summary

Most real DNN accelerators have a fixed internal dataflow — a hardware-determined loop execution order that cannot be changed at runtime (e.g., a raster scan: X → Y → Z → N, or a weight-stationary systolic array: K → C → X → Y). Currently, AccelForge has no way to express this in the arch definition. FFM generates and explores all valid loop orders during template generation, unaware that most of them are illegal on the target hardware. The arch API should provide a mechanism to declare the fixed loop order so FFM only searches over tile sizes within that constraint.


Motivation

The problem

Consider a custom DL accelerator with a MAC array that executes a fixed raster-scan dataflow internally:
for N:           ← outermost (batch)
  for Z:         ← channels
    for Y:       ← spatial height
      for X:     ← spatial width (innermost — fastest changing)
        MAC()

This order is etched into the hardware datapath. A mapping that iterates Z inside Y is physically impossible on this chip — it would require rewiring the memory bus at runtime.

The user wants to:
Fix the loop order (N → Z → Y → X) in the arch, as a hardware property, not a workload property.
Let FFM search for optimal tile sizes within that fixed order.
Get correct PPA estimates that reflect the actual hardware execution pattern.


What currently exists
The arch API provides Memory.tensors.tile_shape (Comparison objects) and Memory.tensors.no_refetch_from_above to constrain which loops go above or below a storage node — but nothing that fixes the order of temporal loops within a slot.

spec.mapper.explore_loop_orders = False reduces the number of loop orders FFM explores (it yields only one canonical ordering for partially-relevant rank variables), but:


It does not guarantee a specific user-defined order.
It applies globally, not per-component or per-rank.
It has no semantic connection to any arch property.


The only available workaround is the private/undocumented _pmapping_row_filter_function parameter in map_workload_to_arch(), which is a post-hoc filter applied after template generation — not a first-class arch constraint:
# Current workaround — fragile and disconnected from the arch
def raster_scan_filter(row) -> bool:
    ...  # inspect generated mapping and discard non-conforming templates

result = spec.map_workload_to_arch(
    _pmapping_row_filter_function=raster_scan_filter
)

This approach:
Is not expressed in the arch, so the constraint is invisible to anyone reading the arch definition.
Does not prevent FFM from generating invalid templates — it only discards them after the fact, wasting search budget.
Uses an internal/unstable API.
Requires users to understand FFM's internal mapping representation to implement correctly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: loop_order constraint on arch components to model fixed-dataflow hardware #50

Current workaround — fragile and disconnected from the arch

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: loop_order constraint on arch components to model fixed-dataflow hardware #50

Description

Current workaround — fragile and disconnected from the arch

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions