How deep do you get into evaluating your model performance? Do you slice your data and evaluate the slices? Do you try to measure fairness or accuracy for different subsets of your users or use-cases?
2 Likes