Skip to main content

ORIGINAL RESEARCH article

Front. Comput. Sci.
Sec. Networks and Communications
Volume 6 - 2024 | doi: 10.3389/fcomp.2024.1493399
This article is part of the Research Topic Design and Optimization of Distributed Computing/Storage Systems Driven by Novel Intelligent Networking Technologies View all articles

ML-NIC: Accelerating Machine Learning Inference using Smart Network Interface Cards

Provisionally accepted
  • Santa Clara University, Santa Clara, California, United States

The final, formatted version of the article will be published soon.

    Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all parts of the inference tasks onto specialized hardware such as graphic processing units. More recently, offloading machine learning inference onto programmable network devices, such as programmable network interface cards or a programmable switch, is gaining interest from both industry and academia, especially due to the latency reduction and computational benefits of performing inference directly on the data plane where the network packets are processed. Yet, current approaches are relatively limited in scope, and there is a need to develop more general approaches for mapping offloading machine learning models onto programmable network devices.fulfill such a need, this work introduces a novel framework, called ML-NIC, for deploying trained machine learning models onto programmable network devices' data planes. ML-NIC deploys models directly into the computational cores of the devices to efficiently leverage the inherent parallelism capabilities of network devices, thus providing huge latency and throughput gains. Our experiments show that ML-NIC reduced inference latency by at least 6× on average and in the 99th percentile and increased throughput by at least 16x with little to no degradation in model effectiveness compared to the existing CPU solutions. In addition, ML-NIC can provide tighter guaranteed latency bounds in the presence of other network traffic with shorter tail latencies. Furthermore, ML-NIC reduces CPU and host server RAM utilization by 6.65% and 320.80 MB. Finally, ML-NIC can handle machine learning models that are 2.25× larger than the current state-of-the-art network device offloading approaches.

    Keywords: machine learning, SmartNic, Netronome, Data plane, inference

    Received: 09 Sep 2024; Accepted: 04 Dec 2024.

    Copyright: © 2024 Kapoor, Anastasiu and Choi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    David C. Anastasiu, Santa Clara University, Santa Clara, 95053, California, United States
    Sean Choi, Santa Clara University, Santa Clara, 95053, California, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.