I am brand new to tensorflow and machine learning in general, but I have been able to do a lot more than I originally anticipated (actually getting code to run). However, I have gotten some strange results from my model. My data is stored in a pandas dataframe, and is structured as follows:
ptype E px py pz E1E9 ... E_L4 dEdxCDC dEdxFDC tShower tTrack thetac
205971 -11 4.617147 4.685261 1.792256 1.387385 NaN ... 2.057134 0.000003 NaN 2.464077 -0.021874 NaN
50287 130 0.264139 NaN NaN NaN NaN ... 0.176349 NaN NaN 7.092736 NaN NaN
133619 -2212 0.685756 -0.862369 -0.147122 0.232955 NaN ... 0.438425 0.000003 NaN 4.046603 -0.232221 NaN
269408 -211 0.290424 -1.381033 4.391991 4.357138 NaN ... 0.128230 0.000002 NaN 3.223814 0.202712 NaN
124688 2212 1.118175 -1.094253 0.306560 4.897157 NaN ... 0.488274 0.000002 0.000002 10.575593 0.050699 NaN
I then followed some code I found in a keras tutorial, which I have included below:
trainingData = tfdf.keras.pd_dataframe_to_tf_dataset(trainingDataDF,label='ptype')
testData = tfdf.keras.pd_dataframe_to_tf_dataset(testDataDF,label='ptype')
boostedDecisionTree = tfdf.keras.GradientBoostedTreesModel(max_depth= 4, num_trees = 50)
boostedDecisionTree.fit(trainingData, verbose=2)
results = boostedDecisionTree.predict(testData)
print(results)
In case this helps, here is what I get when I print out the variable testData
:
<PrefetchDataset element_spec=({'E': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'px': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'py': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'pz': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'E1E9': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'E9E25': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'docaTrack': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'sumU': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'sumV': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'preshowerE': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'sigLong': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'sigTrans': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'sigTheta': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'E_L2': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'E_L3': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'E_L4': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'dEdxCDC': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'dEdxFDC': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'tShower': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'tTrack': TensorSpec(shape=(None,), dtype=tf.float64, name=None), 'thetac': TensorSpec(shape=(None,), dtype=tf.float64, name=None)}, TensorSpec(shape=(None,), dtype=tf.int64, name=None))>
What I expect from this is a list of values that occur in the ptype column (they are all integers, and can only take on certain values). However, the output of this code is:
[[3.1166435e-03 3.5639703e-03 5.6484230e-03 ... 6.8573549e-04
9.3489707e-01 8.1527224e-03]
[4.2285807e-03 4.7440380e-01 2.0284561e-02 ... 5.7000392e-03
4.8168894e-02 3.8701162e-02]
[1.2789826e-03 3.1301938e-03 2.9117553e-03 ... 1.2747306e-02
3.7976676e-03 2.0397757e-03]
...
[3.2000155e-03 6.0929000e-02 1.4465253e-01 ... 3.7460185e-03
5.2436376e-01 1.2216719e-01]
[1.8458949e-02 4.1683179e-01 3.9546650e-02 ... 6.7035188e-03
9.7159848e-02 8.9666203e-02]
[4.4867280e-01 8.1635034e-03 6.7420891e-03 ... 2.5453945e-03
1.0263861e-02 1.5434443e-02]]
which makes absolutely no sense as an output. I have no clue how to interpret this- Iām not even sure how to troubleshoot, so any help would be greatly appreciated. Thank you in advance!