I am currently some writing some OpenCL kernels for a project. I wanted to use those kernels (and/or sequences of kernels) as layers for use in tensorflow.
I am targeting GPUs with these kernels.
They operate on 4D data like the convolutions, but I am confused after hearing about by Google’s PHWC4 data format. Some of my kernels might need a data transform too. I imagine I would have to do some permute to get things to work.
I saw this guide, but it did not give me any clarity for how to bring my cl kernels as a layer.
I don’t really know where to start.
For differentiability support, I am comfortable working out that math, it’s not to hard. As for implementing, my current idea is to use Enzyme, an autodiff tool. But I am getting ahead of myself, First the operator should be able to run in tensorflow lite on the gpu.
Any guidance is much appreciated
Varun Nawathey