Neural networks have prevailed in many fields, with image recognition and generation as prominent examples. However, the rapid growth in model size and complexity has made them increasingly reliant on cloud servers or high-end GPUs. This dependency incurs high operational costs, privacy concerns, and round-trip latency. Deploying models to edge and mobile devices is emerging as a promising solution but often faces challenges such as limited memory, long runtime, and unsatisfactory user experience. In the post-Moore’s Law era, hardware advancements alone are insufficient to overcome these challenges, highlighting the growing need for designing efficient neural networks for resource-constrained environments. This thesis addresses these critical efficiency bottlenecks by introducing novel opera-tors and architectures that enhance inference speed and reduce model size across various tasks, ranging from layout-specific applications to general visual tasks, and from image recognition to generation. First, we propose Translation Variant Convolution (TVConv), a novel operator optimized for layout-specific applications, such as face recognition, by leveraging spatial feature variance for efficient region-wise processing. Second, we identify inefficiencies in popular depthwise convolution, such as low compute intensity and frequent memory access, and present Partial Convolution (PConv) to overcome these in-efficiencies. Building on this, we develop FasterNet, a family of neural networks that achieves considerably faster running speeds across various devices without sacrificing recognition accuracy. Finally, we introduce SnapGen, a highly compact and fast text-to-image model, capable of generating high-quality, high-resolution images directly and instantly on mobile devices. These contributions collectively advance the democratization of neural networks on edge devices, enabling seamless, cost-effective, and privacy-preserving services accessible anytime, anywhere.
| Date of Award | 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Gary Shueng Han CHAN (Supervisor) |
|---|
Efficient neural networks for image recognition and generation on the edge
CHEN, J. (Author). 2025
Student thesis: Doctoral thesis