Low performance of convolution with AOT

Zlorf · May 15, 2024, 12:08pm

Hi everybody,

I currently try to benchmark the inference of models when using ONNX, C++ Tensorflow and Ahead-Of-Time (AOT) compilation.
The benchmark itself uses std::chrono to measure the runtime. To reduce fluctuation I 500 calls of the networks. The inputs are just random generated floats, since i’m not interested in the actual predictions.

Benchmark for DNN:
The benchmark results for simple FeedForward networks are somewhat comparable. All approaches of inference are at least within the same order of magnitude.

Benchmark CNN:
When creating a very simple CNN AOT takes much longer (a factor 100x) to run.

The CNN is fairly simple: The model takes 32 inputs, has 1 channel, kernel size ranges from 1 to 4 and the follow up feedforward network is decent from size (10 layers and 128 units).

My Question is: Why does the AOT network perform so bad. Is there a way to prevent this, for example by utilizing certain XLA flags?

After looking around I found out that this is fairly known problem atleast on GPU:
See here.

I have the feeling that AOT tried to be “clever” and map the convolution kernel in a way which results in a very large number of operations.

For example by reserving a buffer for each movement of the kernel window. This could atleast explain why I found a very large buffer for the filters in the header of the AOT model (x10 as large as a dense layer).

Thanks a lot for your time.
I appreciate this a lot

PS: If you are interested I can also upload the benchmark plots and a picture of the buffers.

Topic		Replies	Views
What is the reason why XLA does not perform very well on small datasets？ General Discussion xla	1	1427	March 14, 2023
Reducing XLA AOT compiled model size General Discussion models , help_request , xla , model_optimization	4	1840	May 17, 2022
CNN net much slower than DNN General Discussion models , gpu , keras , help_request	9	2503	October 30, 2021
Simple benchmarking of model predication times General Discussion models , help_request	5	450	August 20, 2021
Why is it slow for the first inference General Discussion models , help_request	1	1000	July 19, 2022

Low performance of convolution with AOT

Related topics