We present a post-training weight pruning method for deep neural networks that achieves accuracy levels tolerable for the production setting and that is sufficiently fast to be run on commodity hardware such as desktop CPUs or edge devices. We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. We obtain state-of-the-art results for data-free neural network pruning, with ∼1.5% top@1 accuracy drop for a ResNet50 on ImageNet at 50% sparsity rate. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting with a ∼1% top@1 accuracy drop. We release the code as a part of theOpenVINOTM Post-Training Optimization tool
Deep neural network (DNN) models have achieved unprecedented accuracy in several crucial domains such as computer vision and natural language processing. Despite the success of DNN models, an unreasonably large amount of computations and memory required for their inference limits their deployment on edge devices, such as smart cameras equipped with low-power CPUs, GPUs or ASIC accelerators. Significant efforts in recent years have been devoted to both hardware design and algorithmic approaches to DNN model compression to enable inference speedups for various model architectures and use cases. Some of the DNN compression methods, such as 8-bit quantization, were adapted to the post-training setting where the original DNN model to be compressed could come from any software framework and no access to the original training pipeline and the training dataset is given. One of the promising approaches to reduce the memory footprint and inference latency of DNNs is weight pruning , which results in models with sparse weight matrices. Recently, a lot of research and development has been aimed at leveraging weight sparsity to achieve inference speedups on a range of hardware platforms . However, relatively little effort was devoted to providing accurate sparse DNN models in the post-training
In this work, we propose a recipe for fast posttraining pruning of DNNs that produces models with significant sparsity rates (e.g. 50%) but negligible accuracy drops. Furthermore, if combined with weight
quantization techniques, the proposed method could reduce the model memory footprint by a factor of 6-8x . We propose a fast data-free extension of our weight pruning pipeline which allows getting
state-of-the-art accuracy levels for a range of computer vision models. To streamline the deployment process of sparse quantized DNNs on hardware, we have implemented the proposed method as a part of the OpenVINOTM Post-Training Optimization tool.