How to create a new operator with an OpenCL kernel

I am currently some writing some OpenCL kernels for a project. I wanted to use those kernels (and/or sequences of kernels) as layers for use in tensorflow.

I am targeting GPUs with these kernels.

They operate on 4D data like the convolutions, but I am confused after hearing about by Google’s PHWC4 data format. Some of my kernels might need a data transform too. I imagine I would have to do some permute to get things to work.

I saw this guide, but it did not give me any clarity for how to bring my cl kernels as a layer.

I don’t really know where to start.

For differentiability support, I am comfortable working out that math, it’s not to hard. As for implementing, my current idea is to use Enzyme, an autodiff tool. But I am getting ahead of myself, First the operator should be able to run in tensorflow lite on the gpu.

Any guidance is much appreciated

Varun Nawathey