Hi there,
I’ve been running into issues with arbitrary spikes in latency when using tensorflow serving. I noticed the issue for tensorflow decision forest (TFDF) gradient boosted tree models, TFDF random forest models and tensorflow deep learning models.
I simulated the issue using the classic penguin classification dataset. The majority of my predictions fall under the 5 ms range. However, I will randomly get a spike to >50 ms, which worries me for putting models into production.
I created a repo here that reproduces the issue along with instructions in the README.
The training code is from tensorflow decision forest’s penguin classification tutorial here. Here is a snippet of the training code:
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label=label)
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_ds_pd, label=label)
# Specify the model.
model_1 = tfdf.keras.RandomForestModel()
# Optionally, add evaluation metrics.
model_1.compile(
metrics=["accuracy"])
# Train the model.
# "sys_pipes" is optional. It enables the display of the training logs.
with sys_pipes():
model_1.fit(x=train_ds)
And here is a snippet of the request code calling TF serving:
data = {
"instances": [
{
"bill_length_mm": 20.0,
"body_mass_g": 100.0,
"bill_depth_mm": 100.0,
"flipper_length_mm": 100.0,
"island": "Mallorca",
"sex": "male",
"year": 2021,
}
]
}
def make_request(data):
# Define start time of prediction
start_time = time.time()
# Define request to fake_model_id
requests.post(
"http://localhost:8501/v1/models/fake_model_id_gbt:predict",
headers={"Content-Type": "application/octet-stream"},
data=json.dumps(data),
)
# Track end time
end_time = time.time()
pred_time = (end_time - start_time) * 1000
# print("prediction time was {} ms".format(pred_time))
return pred_time
I am running on an M1 macbook air, using TFDF v 0.2.3 and this docker image.
Any ideas as to what could be causing the spikes?
Thank you,
Shayan