I can’t seem to be able to make prediction using a tfdf model, I’m working in colab and I obtain an “assertion error” anytime I try using something like model(Xtest) or model.predict(Xtest) while everything seems to go smooth up to that point.
Any idea of what goes wrong ?
can you post the code you used?
Using the beginner colab, after the cell with…) (in the section Prepare this model for TensorFlow Serving) I’ve tried this:
indeed it works this way! thanks a lot!
I guess I was feeding in the wrong data types…
Here’s my code, and there are probably a lot of more than awkward things… If anyone has the time to comment on that, that would be much helpful. I was trying to use the iris dataset:
import tensorflow_decision_forests as tfdf
import pandas as pd
from sklearn.datasets import load_iris
Thanks for the code snippet. Some of the error messages are certainly not explicit enough. We will improve that in the next TF-DF release.
Regarding your example, the issue is that Xtest is a Pandas dataframe, while predict expects a TensorFlow dataset, a Numpy array or a Tensor (or one of the more exotique formats such as DatasetCreator).
Your code could be re-written as follow:
import tensorflow_decision_forests as tfdf
import pandas as pd
from sklearn.datasets import load_iris
iris_frame = load_iris(as_frame=True)
iris_dataframe = pd.DataFrame(data =
iris_dataframe["species"] =
# Replace the spaces by "_" in the feature names.
iris_dataframe = iris_dataframe.rename(columns=lambda x: x.replace(" ","_"))
# Shuffle the dataset
iris_dataframe = iris_dataframe.sample(frac=1)
# Train/Test split.
train_dataframe = iris_dataframe[:100]
test_dataframe = iris_dataframe[100:]
# Converts from Pandas dataframes to TensorFlow datasets.
train_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(train_dataframe, label="species")
test_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(test_dataframe, label="species")
# Train the model.
model = tfdf.keras.RandomForestModel()
# Generate the predictions.
Keras also directly supports the consumption of Numpy arrays. This option is less powerful than Pandas Dataframes, but in your case, it leads to a more compact code:
import tensorflow_decision_forests as tfdf
import pandas as pd
from sklearn.datasets import load_iris
iris_frame = load_iris()
features =
labels =
# Shuffle the examples.
permutations = np.random.permutation(features.shape[0])
features = features[permutations]
labels = labels[permutations]
train_features = features[:100]
train_labels = labels[:100]
test_features = features[100:]
test_labels = labels[100:]
model = tfdf.keras.RandomForestModel(), y=train_labels)
Edit: Shuffle the examples before the train/test split.
The Iris examples returned by sklearn are grouped by classes. Therefore, the :100 vs 100: split would be of poor quality for a train/test evaluation. Shuffling the examples before the split solves the issue. The example above was edited accordingly.