Hi,
I’m using tfdf.keras.GradientBoostedTreesModel with verbos=2, added several metrics: model.compile(metrics=[‘binary_crossentropy’, ‘mse’, ‘AUC’, ‘accuracy’]) , trained with 500 num_trees.
I would like to see the train loss vs the validation loss for each iteration (tree in my case)
I have here 2 problems:
1- In the logs I can’t see the loss after each tree addition, I rather see logs after building few trees.
2- I won’t see all the metrics I added (saw here that it’s not supported)
2023-06-13T13:57:10.553+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:[1,mpirank:0,algo-1]:1047] 600000 examples used for training and 200000 examples used for validation[1,mpirank:0,algo-1]:
2023-06-13T13:57:11.553+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1430] #011num-trees:1 train-loss:0.262175 train-accuracy:0.965992 valid-loss:0.266117 valid-accuracy:0.965540
2023-06-13T13:57:11.553+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:2 train-loss:0.255583 train-accuracy:0.966120 valid-loss:0.262195 valid-accuracy:0.965565
2023-06-13T13:57:41.563+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:68 train-loss:0.176204 train-accuracy:0.971425 valid-loss:0.263119 valid-accuracy:0.966120
2023-06-13T13:58:12.573+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:127 train-loss:0.133686 train-accuracy:0.976682 valid-loss:0.274393 valid-accuracy:0.965830
2023-06-13T13:58:42.583+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:186 train-loss:0.100317 train-accuracy:0.982568 valid-loss:0.286893 valid-accuracy:0.965585
2023-06-13T13:59:12.595+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:242 train-loss:0.076956 train-accuracy:0.987472 valid-loss:0.300240 valid-accuracy:0.965450
2023-06-13T13:59:42.604+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:293 train-loss:0.061192 train-accuracy:0.991143 valid-loss:0.311722 valid-accuracy:0.965425
2023-06-13T14:00:12.613+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:347 train-loss:0.048823 train-accuracy:0.994160 valid-loss:0.324167 valid-accuracy:0.965365
2023-06-13T14:00:43.624+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:408 train-loss:0.038996 train-accuracy:0.996265 valid-loss:0.337411 valid-accuracy:0.965270
2023-06-13T14:01:13.634+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1432] #011num-trees:470 train-loss:0.031601 train-accuracy:0.997723 valid-loss:0.350617 valid-accuracy:0.965165
2023-06-13T14:01:28.639+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1430] #011num-trees:500 train-loss:0.028782 train-accuracy:0.998177 valid-loss:0.357087 valid-accuracy:0.965185
2023-06-13T14:01:28.639+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:264] Final model num-trees:10 valid-loss:0.357087 valid-accuracy:0.965185
2023-06-13T14:01:28.639+03:00 [1,mpirank:0,algo-1]:[INFO kernel.cc:957] Export model in log directory: /opt/adva/checkpoints/tfdf with prefix 8aef82d8e4434fb8
I tried to used the inspector model.make_inspector().training_logs(), however i encountered a problem of retrieving the model, please find attached log:
[1,mpirank:0,algo-1]:The model at /opt/adva/checkpoints/tfdf/v0/model/ contains multiple YDF models. Please specify the prefix of the intended model. Available prefixes: [‘72a694e546584792’, ‘3d9f0b2ad7984337’, ‘a39992b7f3aa4541’, ‘bdf53386edd64b97’, ‘b729c3f939b14592’, ‘8aef82d8e4434fb8’, ‘7b0eeab2e4e840a3’, ‘cadda57c48874500’, ‘8aeda86d45964712’, ‘0ecb05d742d24dce’, ‘44e569ec76d742ea’, ‘325fb31fa3b54dd1’, ‘3307f0d3ab014532’, ‘77cf5988e9e64ce8’, ‘7f2654b72f7c4b6a’, ‘d4013745082f4abe’, ‘18e7f5be5f8840c8’]
what are these multiple YDF models (the dir is new so how come I have several models there)?
I noticed that after fitting the model is exported without me saving it (the log: Export model in log directory: /opt/adva/checkpoints/tfdf/v0 with prefix cadda57c48874500 ) Is there a way to determine the log prefix of the model and to pass it to the inspector?
Any idea how can I solve my problem and see logs for each tree?
Thanks