I was wondering if there is a minimal set of operators that could be shipped with a pluggable device and could form a basis to compute all other operators on that device.
By this I mean that higher level operators (like Conv2D) wouldn’t have a specific kernel implementation registered for that plugin, but tensorflow would reuse some of operators of this minimal set (like matrix multiplication, addition, etc), to compute the Conv2D on the device.
Looking at the files in c/experimental/ops/, I have a feeling that if there was such a set, it could be the set of operators defined in those files.
Unfortunately, kernel implementations are not that composable at the moment. (We do have a project called TF Composite exploring that.) PluggableDevice authors will have to manually register custom kernels for high-level ops they want to support for now. And the minimal op set (to avoid transferring tensors back and forth between devices) varies based on the models the device is targeting.
Thank you for your answer!
With the advance of deep learning and the increasing number of operations, I believe this is a promising project. I look forward to it!