Skip to content

Feature Request: loop_order constraint on arch components to model fixed-dataflow hardware #50

Description

@okaikov

Summary

Most real DNN accelerators have a fixed internal dataflow — a hardware-determined loop execution order that cannot be changed at runtime (e.g., a raster scan: X → Y → Z → N, or a weight-stationary systolic array: K → C → X → Y). Currently, AccelForge has no way to express this in the arch definition. FFM generates and explores all valid loop orders during template generation, unaware that most of them are illegal on the target hardware. The arch API should provide a mechanism to declare the fixed loop order so FFM only searches over tile sizes within that constraint.

Motivation

The problem

Consider a custom DL accelerator with a MAC array that executes a fixed raster-scan dataflow internally:
for N: ← outermost (batch)
for Z: ← channels
for Y: ← spatial height
for X: ← spatial width (innermost — fastest changing)
MAC()

This order is etched into the hardware datapath. A mapping that iterates Z inside Y is physically impossible on this chip — it would require rewiring the memory bus at runtime.

The user wants to:
Fix the loop order (N → Z → Y → X) in the arch, as a hardware property, not a workload property.
Let FFM search for optimal tile sizes within that fixed order.
Get correct PPA estimates that reflect the actual hardware execution pattern.

What currently exists
The arch API provides Memory.tensors.tile_shape (Comparison objects) and Memory.tensors.no_refetch_from_above to constrain which loops go above or below a storage node — but nothing that fixes the order of temporal loops within a slot.

spec.mapper.explore_loop_orders = False reduces the number of loop orders FFM explores (it yields only one canonical ordering for partially-relevant rank variables), but:

It does not guarantee a specific user-defined order.
It applies globally, not per-component or per-rank.
It has no semantic connection to any arch property.

The only available workaround is the private/undocumented _pmapping_row_filter_function parameter in map_workload_to_arch(), which is a post-hoc filter applied after template generation — not a first-class arch constraint:

Current workaround — fragile and disconnected from the arch

def raster_scan_filter(row) -> bool:
... # inspect generated mapping and discard non-conforming templates

result = spec.map_workload_to_arch(
_pmapping_row_filter_function=raster_scan_filter
)

This approach:
Is not expressed in the arch, so the constraint is invisible to anyone reading the arch definition.
Does not prevent FFM from generating invalid templates — it only discards them after the fact, wasting search budget.
Uses an internal/unstable API.
Requires users to understand FFM's internal mapping representation to implement correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions