Support for ONNX, or better conversion support from ONNX -> TF?

Hi all, I posted this in the Tensorflow.js Discord but am reposting here for additional visibility.

I’ve been trying to port a number of recent research papers’ repositories into Tensorflow.js with limited success.

It seems like quite a lot of the recent research code implementations I’ve seen are written in Pytorch (this article ostensibly provides some hard numbers, though I’m not clear where the data comes from PyTorch vs TensorFlow in 2023).

The Pytorch → ONNX → Tensorflow → Tensorflow.js seems to be the generally recommended conversion path for a model, but I’ve yet to be able to successfully convert a model all the way from Pytorch to TFJS (here’s an example issue Working ONNX file converted to tfjs (via tf SavedModel) doesn't work · Issue #5832 · tensorflow/tfjs · GitHub). On top of that, for the two main libraries for ONNX → Tensorflow conversion - GitHub - onnx/onnx-tensorflow: Tensorflow Backend for ONNX and GitHub - gmalivenko/onnx2keras: Convert ONNX model graph to Keras model format. - the former is actively unmaintained and the latter hasn’t seen a commit since mid-2021.

I’m curious whether the Tensorflow team has any plans to better support ONNX models, either through conversion to Tensorflow or natively? Or if the team has a recommendation for ingesting the latest research into Tensorflow short of reimplementing the Pytorch models natively in Tensorflow?

I think there’s a number of recent models whose inference speeds are fast enough to run on the web and Tensorflow.js would be a great vehicle to share them.

5 Likes

Thanks for the post @thekevinscott!

I would love to hear from others too who are having issues here. Specifically:

  1. What types of models you are trying to convert
  2. The main blocker you are enounting
  3. The use case / impact it would have if we found a way to make Pytorch to TensorFlow / JS more seamless.

If others see this thread who are trying similar (either PyTorch to TF Python or PyTorch to TF JS) then do let us know! Our command line converter from TF to TFJS is actively maintained so my guess the main bottleneck is the ONNX stage?

1 Like

I can start:

I’ve tried to convert DDNM. I was able to convert it to ONNX and run inference in Javascript. I was also able to convert it to Tensorflow and run inference. I was able to convert it to Tensorflow.js as well, but it fails when run with the issue I posted above.

I also tried to convert MIRNet (I know there’s a TF and TFJS port available, but it appears that not all the weights were converted, and the .h5 Google Drive link is broken from the repo). With this, ONNX works in Javascript, but I was unable to convert from ONNX to TF, with the tool failing with:

ValueError: '/conv_in/Conv_pad/' is not a valid root scope name. A root scope name has to match the following pattern: ^[A-Za-z0-9.][A-Za-z0-9_.\\/>-]*$

And none of the other name command line options working either.

my guess the main bottleneck is the ONNX stage?

While the second seems like an ONNX → TF issue, the first issue seems like a legitimate issue in the TFJS converter (since the Python TF version works). Still, to your point I agree with you that the ONNX → TF conversion step is the one I’m most worried about, since there seems to be no actively maintained library I’m aware of for doing the conversion.

2 Likes

Thanks for this extra context. For the TFJS converter issues you found do you know why it failed? Missing OP or such? Or any extra error message you can share for the TFJS Converter? I can then pass that back to our team to check if it is something we are aware of or not. If it is a missing op then that is usually the most common issue, and can only be solved with gaining more op parity but if there are ops that keep coming up when and if we do have capacity to look into that we can then at least prioritize based on the valuable feedback in this thread to see what could help most folk.

For the TFJS converter issues you found do you know why it failed? Missing OP or such? Or any extra error message you can share for the TFJS Converter?

The forum is not letting me repost the Github link, but I posted details in the Github thread linked from the original comment.

Weirdly, I get different errors depending on when I switch backends in Node - I wonder if that might offer a relevant clue?

@Joana @Marcus Can you check what is going on with the forum? Kevin cant post valid links.

1 Like

I don’t know if this will be useful to you guys, but I am building my own tool to generate saved_model, h5 and tflite from onnx.

saved_model is a specification that does not allow OPs to have leading slashes in their names. Thus, my conversion tool automatically replaces leading slashes with harmless strings. I believe this is a problem caused by the hdf5 specification.

onnx2tf

onnx2tf \
-i mirnet_180x320.onnx \
-osd \
-cotof \
-cotoa 1e-4

tensorflowjs_converter \
--input_format tf_saved_model \
--output_format tfjs_graph_model \
saved_model \
tfjs_model

Writing weight file tfjs_model/model.json...

The tool tries to convert as much as possible taking into account TPU and GPU Delegate support.

I am also creating a set of tools to modify onnx at will.

simple-onnx-processing-tools

I know that there are various difficulties in converting to TFJS, but I also know that it is possible to convert with a fairly high probability by modifying onnx and so on.

This is my first time posting a comment in this forum, so I apologize if some of my comments are not in line with policy.

4 Likes

@PINTO0309 Thank you for sharing this work with us! I was hoping to find gems like this from folk reading so this is good to see what others have been up to do solve so far too as I am sure many folk are effected by this more generally - as you say you also target TFLite too as an output which is good to know that this is useful for folk converting to other TF ecosystems too.

I’ve had these same issues. The hardest issue is when there’s a custom operation that needs to be implemented, but just getting all the Python dependencies correct, and getting the flow to work has been super challenging. Even converting from the Python version of TF to TFJS can be tough. I managed to get super slo-mo and deinterlace converted and using the pretrained models. However, like Kevin, despite many hours of attempts, I have yet to get a PyTorch model converted. My next target is the Maxim models, which are being refactored to work with arbitrary resolutions.

Thank you Dan for weighing in, this is very valuable feedback, and I shall be sure to bring this up in our future team discussions internally :slight_smile:

Converting to TFJS is a notoriously frustrating process. I once had a model with the following operation which crashed during inference:

x = 1 - x

Wrapping it in a separate layer solved the problem:

class OneMinus(keras.layers.Layer):
    def call(self, x):
        return 1 - x

The moral is, it’s usually possible to get what you want if you have fine control over the Tensorflow model. That is not the case when the model is converted from onnx, though.

If your source model is in Pytorch, you might want to take a look at Nobuco library. It should be flexible enough to let you fix the possible problems.

Thanks for the feedback and welcome to the TF forum Lex!

I am trying to convert this model GitHub - Megvii-BaseDetection/YOLOX at bb9185c095dfd7a8015a1b82f3e9a065090860b8

It just wouldn’t get the right numbers after running it in tfjs, I think pth->onnx->tf->tfjs is too many steps and something may be missed along the way.

1 Like

Thank you for sharing, indeed it is known to be tricky to go from Pytorch to TensorFlow via onnx. Have passed this on to the team.

1 Like

Currently I am just using onnx runtime, which works in node js and the browser, but it is far more convenient to have it in tfjs, as it works pretty well for the pre/post-processing steps, which onnxruntimes can not do.