Hi @markdaoust @Bhack
(I removed all links - this chat keeps saying “Sorry, you can’t post a link to that host”)
I looked over those artifacts Mark mentioned.
So personally I feel like this already has a lot of moving parts (too many), so I’d vote on the side of cleaning up and expanding the systems we already have instead of rolling new ones.
Yes I agree - I would not advocate for just creating a new system and have it sit alongside the existing infrastructure. That would just create clutter and redundancy and confusion. My sense from looking at some of this current infrastructure, for example shape_inference.h
is that the basic abstractions are lacking. So I feel the first step, if there were interest, would be to discuss just on a logical level what abstractions are needed to define legal inputs.
For instance, tf.gather_nd
accepts over 1600 different combinations, just for int32 inputs:
python -m opschema.cl explain tf.gather_nd -i \
| awk '{ if ($2 == "int32" && $4 == "int32") { print $0 } }' \
| less
which are specified in ops/tf/gather_nd.py
The TensorFlow counterpart seems to be partly expressed in:
common_shape_fns.cc:GatherNdShape
But, it doesn’t seem like the set of primitives defined in shape_inference.h
are sufficient to truly define the legal inputs of an op.
My sense is the first step if this were viable would be to talk to people who wrote shape_inference.{h,cc}
and discuss what other problems came up, what motivated its design.
I would love to hear from other developers who have thought about this problem, as well as your thoughts about the other aspects of my proposal - such as the comp_dims
and notion of signatures and layouts etc.
The other concern I have about this approach is that there are 3000+ APIs in TensorFlow at the moment, and even if we can find a good sponsor for this, it would be months of work to get the top 10% of the API covered.
When you say ‘APIs’ do you mean APIs for each op, or something else?
For sure it would be a ton of work. However, I spent almost four months full time on opschema (though many blind alleys) so I’d be eager to put in a significant effort on the TF codebase. Part of my motivation, I’ll admit, is to pad my CV.
Also, my hope is that ultimately it will save a lot of time for TF core developers. Defining the op in the schema automatically gives you a generator for unit tests, so things like individual edge cases that cause a new issue (such as with LSTMBlockCell) which are discovered late can be discovered very early, and more thoroughly. As an example, it took about an hour to write the schema for LSTMBlockCell, but I can now use it to quickly discover that it crashes.
Best,
Henry