Deep neural networks, or DNNs — layers of algorithms that perform computations in an orderly progression to make the apps work — need to be optimized for each device. On other devices, it can take longer for the DNN to perform computations, causing the app to lag. This lag is known to computer scientists as “latency.” But with so many different makes and models of devices and customization options, it is almost impossible for developers to accomplish.
Even if they tried, the process would be lengthy and add considerably to the cost. A crucial bottleneck is the difficulty of quickly evaluating latencies of many DNN candidate models on a wide range of devices. Thus, DNNs tend to be optimized for one specific device.
A team led by Professor Shaolei Ren, an associate professor of electrical and computer engineering in UCR’s Marlan and Rosemary Bourns College of Engineering, have come up with a simple, inexpensive way to optimize DNNs for numerous devices across different platforms. The work was accepted by and will be presented at the highly selective ACM SIGMETRICS / IFIP Performance 2022 conference.
Ren’s group studied the DNN latency relationships across different devices. They found that to find the best DNN model for a device, you don’t need to know the actual latencies — you need to know the latency ranking. Ranking latency is a relatively simple matter of sorting latency values from high to low, where lowest is best. The latency rankings for different devices are highly correlated.
If latency ranking follows one order on one device, it will be about the same on a different device. The order doesn’t change that much. For example, it’s not that different between all types of cellphones, regardless of operating system or model.
If two devices have similar latency rankings, there is no need to do anything. But if a new device has a very different ranking order, a technique called proxy adaptation based on transfer learning can help optimize the DNN for that device quickly.
Lightweight transfer learning enables us to adapt our default proxy device latency evaluator to the new device. That only needs a few tens of models for latency measurement. Compared to measuring thousands of models to get a new latency predictor, measuring 50 or 60 is nothing. So we can scale up our design process very quickly without building a new latency predictor for each device.
For example, if for 100 different devices, pre-training a super DNN model containing all the candidate models takes about 1,000 hours, building an accuracy predictor takes about 100 hours, and measuring the latencies to build a latency predictor for each device takes 20-100 hours, it will end up taking anywhere between 3,100 and 11,000 machine-hours to optimize DNNs for these 100 devices using existing approaches.
The new approach can keep the total design costs about the same as if for only one device, but much better optimized for a wider variety of devices. Ren’s group tested their method on a wide range of devices and public datasets and found that it worked extremely well to optimize DNNs.
The paper, “One proxy device is enough for hardware-aware neural architecture search,” has been accepted for the ACM SIGMETRICS 2022 conference and is published in the December, 2021 issue of Proceedings of the ACM on Measurement and Analysis of Computing Systems and can be downloaded here. Other authors include Bingqian Lu, Jianyi Yang, who are doctoral students at UC Riverside; Weiwen Jiang at George Mason University; and Yiyu Shi at Notre Dame.
Link to the UCR research highlight: https://news.ucr.edu/articles/2021/11/15/quickly-optimizing-deep-neural-networks-different-devices