Tf-serving with tensorrt seems compress batchs to 1

Arashi · July 2, 2021, 2:35am

I have a model can be successfully run tenorflow-serving. Then I covert it with commond saved_model_cli， below is detail command line:

docker run --rm --user 3004 --gpus all -it \
    -v /path/to/tensorflow_serving:/work/tf_model \
    -e CUDA_VISIBLE_DEVICES=1 \
    harbor.private.com/dev/tf:1.15.5-gpu /usr/local/bin/saved_model_cli convert \
    --dir /work/tf_model/buyer_sent_model_pb_02/01 \
    --output_dir /work/tf_model/buyer_sent_model_trt/02 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 16 --is_dynamic_op True

Then I serve it with tensorflow-serving, command line:

docker run -d --gpus all -p 8501:8501 --mount type=bind,source=/path/to/tensorflow_serving/my_model_dir,target=/models/my_model_dir \
-e MODEL_NAME=my_model_name -e CUDA_VISIBLE_DEVICES=1 \
-e TF_FORCE_GPU_ALLOW_GROWTH='true' \
-t harbor.private.com/dev/tf-serving:2.4.1-gpu

my input:

{
    "inputs": {
             "Input-Token": data1,
             "Input-Segment": data2
        }
}

data1 and data2 are both lists, length is 16.

data1:

[
    [101, 3766, 752, 8024, 6814, 3341, 6760, 6760, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 8238, 697, 1259, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 6206, 743, 2643, 5948, 1947, 6163, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 930, 702, 6963, 3221, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 2769, 6821, 6804, 791, 1921, 1157, 2802, 2458, 3341, 4500, 749, 671, 833, 6230, 2533, 679, 1916, 3265, 102, 0, 0, 0],
    [101, 2769, 3221, 6206, 2864, 4706, 5296, 3890, 5011, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1962, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1355, 749, 1557, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6843, 3819, 4706, 3344, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 872, 1962, 6435, 7309, 6821, 702, 743, 671, 6843, 671, 3221, 2582, 720, 702, 6843, 3791, 102, 0, 0, 0, 0, 0, 0, 0],
    [101, 1119, 3247, 2458, 1993, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6716, 7770, 8725, 8175, 1408, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 743, 749, 6821, 702, 121, 119, 8146, 4638, 4385, 1762, 4684, 2970, 4802, 6371, 3119, 6573, 2218, 1377, 809, 749, 511, 1968, 102],
    [101, 4692, 1168, 928, 2622, 1726, 1908, 678, 1521, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 155, 4772, 7027, 7481, 1377, 809, 3022, 679, 6585, 6716, 4638, 3688, 6132, 720, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2571, 6853, 4157, 3766, 3300, 2571, 6853, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]

data2:

[
    [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1],
    [1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
]

This set of data works fine.

But when change data2 to:

[
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

This set of data1 and data2 run into troubles.

On the server side it has log as below:

2021-07-01 08:29:53.363285: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.734463: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.863914: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,30,768], [1,30,768]]

on the client, I got:

{'error': 'Timed out waiting for notification'}

It seems tensorflow compresses the data2 from a list of 16 length to 1 length?

What is the problem in my case， do I miss someting?

Environment

Nvidia Driver Version : 455.38 in Host
GPU Type : 2080ti, both convert and serving

tensorflow:1.15.5-gpu for converting
tensorflow-serving: 2.4.1-gpu for serving
both docker is pulled from offical site in docker hub

Jetti_Bharat · October 3, 2024, 7:25pm

Hello @Arashi

Thank you for using TensorFlow
In this data2 which consists of all zeros please add log statement to get intermediate shape of tensors so that we can investigate the situation where the reshape is happening, also the model is dependent on input distribution, if non-zero input is given the model works fine, please check with any random input to confirm the behaviour.

Topic		Replies	Views
Tf serving docker not working General Discussion models , serving , help_request , tf-serving	7	3036	April 4, 2022
Different input in the same batch generate same output General Discussion models , help_request	1	682	September 5, 2022
Tensorflow serving in Kubernetes deployment fails to predict based on input json (text based messages) - Output exceeds the size limit error General Discussion models , serving , keras , help_request , tf-serving	3	1927	November 7, 2022
Raise ValueError("as_list() is not defined on an unknown TensorShape.") General Discussion tfmodel , tfcompat	1	129	January 31, 2025
Save a RandomForestModel? General Discussion models , tfdf , help_request	3	925	June 23, 2021

Tf-serving with tensorrt seems compress batchs to 1

Environment

Related topics