Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture
We introduce Stardust, a compiler from a sparse tensor algebra language to a reconfigurable dataflow architecture, by way of the Spatial parallel-patterns programming model. The key insight is to let performance engineers specify the placement of data into memories separately from the placement of computation onto compute units. Data is placed using an abstract memory model, and Stardust binds that data to complex, on-chip physical memories. Stardust then binds computation that uses on-chip data structures to the appropriate parallel patterns. Using cycle-accurate simulation, we show that Stardust can generate nine more tensor algebra kernels than the original Capstan work. The generated kernels result in 138$\times$ better performance on average than generated CPU kernels and 41$\times$ better performance on average than generated GPU kernels.