I want to add support of OpenCL to TF.
I have working set of DL operators (cudnn/miopen like) and mini framework with relatively good performance: GitHub - artyom-beilis/dlprimitives: Deep Learning Primitives and Mini-Framework for OpenCL
It gets comparable performance to TF/cuda (75% for training and 90% for inference), and vastly outperforms existing OpenCL solutions like PlaidML or Caffe-OpenCL: DLPrimitives Blog
I discovered that there is what is called pluggable device, however I want to find some API reference and best if there some basic dummy device I can start with (apu as you had written).
Is there anything like that?