BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260630T193243EDT-4412KvNteG@132.216.98.100 DTSTAMP:20260630T233243Z DESCRIPTION:Abstract\n\nThe rapid development of machine learning has drive n the demand for high computational performance\, making Graphics Processi ng Units (GPUs) essential for workloads such as deep neural networks. Howe ver\, alternative architectures such as Field Programmable Gate Arrays (FP GAs) remain critical in resource-constrained and power-limited settings. D espite their advantages\, FPGA programming remains challenging\, as both t raditional Hardware Description Languages (HDLs) and modern high-level fra meworks\, such as Spatial or Lift-HLS\, lack explicit abstractions for coa rse-grained resource sharing\, which limits the efficient implementation o f neural network applications.\n\nThis thesis adopts a functional programm ing-based approach to raise the level of abstraction in FPGA accelerator d esign while preserving performance.\n\nPrograms are lowered into an existi ng functional IR that captures both parallelism and memory behavior\, incl uding asynchronous off-chip accesses and synchronous on-chip buffering. Th e IR is extended with coarse-grained function sharing\, enabling efficient deployment of neural network workloads while exposing architectural chara cteristics for systematic optimization and performance analysis. Concretel y\, this thesis makes three main contributions.\n\nFirst\, hardware resour ce usage is reduced through coarse-grained function sharing in the functio nal IR. Based on Let-bindings and 𝜆-abstractions\, shared computations are represented in a function-call-based execution model. Compiler rewrite ru les and transformation passes eliminate redundant hardware and generate va lid design points\, including optimizations such as duplicate-path removal and function fusion to reduce sharing overhead. This enables full neural network deployment on a single FPGA while achieving competitive performanc e compared to layer-specialized and hand-crafted designs.\n\nSecond\, opti mizations such as data partitioning have a significant impact on performan ce\, as they directly affect data reuse patterns and the efficient utiliza tion of hardware resources. A divide-and-conquer primitive enables the sym bolic expression of partitioning strategies\, with semi-automated insertio n of tunable parameters. These parameters are propagated through the compi ler pipeline and evaluated using a cost model\, avoiding expensive synthes is-driven evaluation while enabling efficient design-space exploration. Ex periments on Intel Arria 10 FPGAs demonstrate competitive performance on V GG and TinyYOLO benchmarks.\n\nFinally\, the Let-based sharing introduces routing congestion that worsens as the number of function invocations incr eases. To address this issue\, a novel sharing mechanism\, Reduce-based sh aring\, improves runtime flexibility with respect to the number of layers while reducing routing congestion during synthesis. Combined with SwitchAp ply over instruction streams\, this approach enables programmable function units with shared control and datapaths. Upper-bounded streams further en hance programmability by reducing control overhead for data-shape manageme nt\, thereby improving routability. Evaluations on networks ranging from L eNet-5 to ResNet demonstrate consistently routable designs and speedups of up to 3.4× over prior work.\n\nOverall\, this thesis demonstrates that a functional IR-driven approach bridges high-level programmability and hardw are efficiency\, enabling scalable FPGA accelerator design. The evaluation is conducted on classical convolutional neural network models\, whose cor e operators (convolution and fully connected layers) remain fundamental bu ilding blocks in modern machine learning workloads\, supporting the broade r relevance of the results. This thesis represents a step towards bridging high-level machine learning frameworks and low-level hardware design.\n DTSTART:20260708T140000Z DTEND:20260708T160000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Tzung-Han Juang – Enabling Efficient Resource Sharin g with Functional IR for Mapping Neural Networks on FPGAs URL:/ece/channels/event/phd-defence-tzung-han-juang-en abling-efficient-resource-sharing-functional-ir-mapping-neural-373414 END:VEVENT END:VCALENDAR