BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260630T193243EDT-4412KvNteG@132.216.98.100
DTSTAMP:20260630T233243Z
DESCRIPTION:Abstract\n\nThe rapid development of machine learning has drive
 n the demand for high computational performance\, making Graphics Processi
 ng Units (GPUs) essential for workloads such as deep neural networks. Howe
 ver\, alternative architectures such as Field Programmable Gate Arrays (FP
 GAs) remain critical in resource-constrained and power-limited settings. D
 espite their advantages\, FPGA programming remains challenging\, as both t
 raditional Hardware Description Languages (HDLs) and modern high-level fra
 meworks\, such as Spatial or Lift-HLS\, lack explicit abstractions for coa
 rse-grained resource sharing\, which limits the efficient implementation o
 f neural network applications.\n\nThis thesis adopts a functional programm
 ing-based approach to raise the level of abstraction in FPGA accelerator d
 esign while preserving performance.\n\nPrograms are lowered into an existi
 ng functional IR that captures both parallelism and memory behavior\, incl
 uding asynchronous off-chip accesses and synchronous on-chip buffering. Th
 e IR is extended with coarse-grained function sharing\, enabling efficient
  deployment of neural network workloads while exposing architectural chara
 cteristics for systematic optimization and performance analysis. Concretel
 y\, this thesis makes three main contributions.\n\nFirst\, hardware resour
 ce usage is reduced through coarse-grained function sharing in the functio
 nal IR. Based on Let-bindings and 𝜆-abstractions\, shared computations are
  represented in a function-call-based execution model. Compiler rewrite ru
 les and transformation passes eliminate redundant hardware and generate va
 lid design points\, including optimizations such as duplicate-path removal
  and function fusion to reduce sharing overhead. This enables full neural 
 network deployment on a single FPGA while achieving competitive performanc
 e compared to layer-specialized and hand-crafted designs.\n\nSecond\, opti
 mizations such as data partitioning have a significant impact on performan
 ce\, as they directly affect data reuse patterns and the efficient utiliza
 tion of hardware resources. A divide-and-conquer primitive enables the sym
 bolic expression of partitioning strategies\, with semi-automated insertio
 n of tunable parameters. These parameters are propagated through the compi
 ler pipeline and evaluated using a cost model\, avoiding expensive synthes
 is-driven evaluation while enabling efficient design-space exploration. Ex
 periments on Intel Arria 10 FPGAs demonstrate competitive performance on V
 GG and TinyYOLO benchmarks.\n\nFinally\, the Let-based sharing introduces 
 routing congestion that worsens as the number of function invocations incr
 eases. To address this issue\, a novel sharing mechanism\, Reduce-based sh
 aring\, improves runtime flexibility with respect to the number of layers 
 while reducing routing congestion during synthesis. Combined with SwitchAp
 ply over instruction streams\, this approach enables programmable function
  units with shared control and datapaths. Upper-bounded streams further en
 hance programmability by reducing control overhead for data-shape manageme
 nt\, thereby improving routability. Evaluations on networks ranging from L
 eNet-5 to ResNet demonstrate consistently routable designs and speedups of
  up to 3.4× over prior work.\n\nOverall\, this thesis demonstrates that a 
 functional IR-driven approach bridges high-level programmability and hardw
 are efficiency\, enabling scalable FPGA accelerator design. The evaluation
  is conducted on classical convolutional neural network models\, whose cor
 e operators (convolution and fully connected layers) remain fundamental bu
 ilding blocks in modern machine learning workloads\, supporting the broade
 r relevance of the results. This thesis represents a step towards bridging
  high-level machine learning frameworks and low-level hardware design.\n
DTSTART:20260708T140000Z
DTEND:20260708T160000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Tzung-Han Juang – Enabling Efficient Resource Sharin
 g with Functional IR for Mapping Neural Networks on FPGAs
URL:/ece/channels/event/phd-defence-tzung-han-juang-en
 abling-efficient-resource-sharing-functional-ir-mapping-neural-373414
END:VEVENT
END:VCALENDAR