Tf-serving with tensorrt seems compress batchs to 1

I have a model can be successfully run tenorflow-serving. Then I covert it with commond saved_model_cli, below is detail command line:

docker run --rm --user 3004 --gpus all -it \
    -v /path/to/tensorflow_serving:/work/tf_model \
    -e CUDA_VISIBLE_DEVICES=1 \
    harbor.private.com/dev/tf:1.15.5-gpu /usr/local/bin/saved_model_cli convert \
    --dir /work/tf_model/buyer_sent_model_pb_02/01 \
    --output_dir /work/tf_model/buyer_sent_model_trt/02 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 16 --is_dynamic_op True

Then I serve it with tensorflow-serving, command line:

docker run -d --gpus all -p 8501:8501 --mount type=bind,source=/path/to/tensorflow_serving/my_model_dir,target=/models/my_model_dir \
-e MODEL_NAME=my_model_name -e CUDA_VISIBLE_DEVICES=1 \
-e TF_FORCE_GPU_ALLOW_GROWTH='true' \
-t harbor.private.com/dev/tf-serving:2.4.1-gpu

my input:

{
    "inputs": {
             "Input-Token": data1,
             "Input-Segment": data2
        }
}

data1 and data2 are both lists, length is 16.

data1:

[
    [101, 3766, 752, 8024, 6814, 3341, 6760, 6760, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 8238, 697, 1259, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 6206, 743, 2643, 5948, 1947, 6163, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 930, 702, 6963, 3221, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 2769, 6821, 6804, 791, 1921, 1157, 2802, 2458, 3341, 4500, 749, 671, 833, 6230, 2533, 679, 1916, 3265, 102, 0, 0, 0],
    [101, 2769, 3221, 6206, 2864, 4706, 5296, 3890, 5011, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1962, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1355, 749, 1557, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6843, 3819, 4706, 3344, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 872, 1962, 6435, 7309, 6821, 702, 743, 671, 6843, 671, 3221, 2582, 720, 702, 6843, 3791, 102, 0, 0, 0, 0, 0, 0, 0],
    [101, 1119, 3247, 2458, 1993, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6716, 7770, 8725, 8175, 1408, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 743, 749, 6821, 702, 121, 119, 8146, 4638, 4385, 1762, 4684, 2970, 4802, 6371, 3119, 6573, 2218, 1377, 809, 749, 511, 1968, 102],
    [101, 4692, 1168, 928, 2622, 1726, 1908, 678, 1521, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 155, 4772, 7027, 7481, 1377, 809, 3022, 679, 6585, 6716, 4638, 3688, 6132, 720, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2571, 6853, 4157, 3766, 3300, 2571, 6853, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]

data2:

[
    [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1],
    [1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
]

This set of data works fine.

But when change data2 to:

[
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

This set of data1 and data2 run into troubles.

On the server side it has log as below:

2021-07-01 08:29:53.363285: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.734463: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.863914: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,30,768], [1,30,768]]

on the client, I got:

{'error': 'Timed out waiting for notification'}

It seems tensorflow compresses the data2 from a list of 16 length to 1 length?

What is the problem in my case, do I miss someting?

Environment

Nvidia Driver Version : 455.38 in Host
GPU Type : 2080ti, both convert and serving

tensorflow:1.15.5-gpu for converting
tensorflow-serving: 2.4.1-gpu for serving
both docker is pulled from offical site in docker hub

Hello @Arashi

Thank you for using TensorFlow
In this data2 which consists of all zeros please add log statement to get intermediate shape of tensors so that we can investigate the situation where the reshape is happening, also the model is dependent on input distribution, if non-zero input is given the model works fine, please check with any random input to confirm the behaviour.

1 Like