Hi,
I’m trying to run tfdf on sagemaker, I keep getting segmentation fault, I tried different versions and keep getting the same error.
tensorflow_decision_forests==1.2.0
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:[1,mpirank:0,algo-1]:1051] 4096 examples used for training and 4096 examples used for validation[1,mpirank:0,algo-1]:
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[INFO gradient_boosted_trees.cc:1195] Resume the GBT training from tree #246
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[INFO abstract_model.cc:1248] Engine “[1,mpirank:0,algo-1]:GradientBoostedTreesQuickScorerExtended” built
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] *** Process received signal ***
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] Signal: Segmentation fault (11)
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] Signal code: Address not mapped (1)
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] Failing at address: 0x55a5ce05a478
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 0] [1,mpirank:0,algo-1]:/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fa0ca620420]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 1]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so(+0x762f20)[0x7fa028321f20]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 2]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so(_ZN26yggdrasil_decision_forests7serving15decision_forest7PredictINS1_59GradientBoostedTreesBinaryClassificationQuickScorerExtendedEEEvRKT_RKNS4_10ExampleSetEiPSt6vectorIfSaIfEE+0x30)[0x7fa028322130]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 3]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so(_ZN26yggdrasil_decision_forests5model22gradient_boosted_trees8internal18ComputePredictionsEPKNS1_25GradientBoostedTreesModelEPKNS_7serving10FastEngineERKSt6vectorIPNS0_13decision_tree12DecisionTreeESaISD_EERKNS2_24AllTrainingConfigurationERKNS_7dataset15VerticalDatasetEPSA_IfSaIfEE+0x14a)[0x7fa0281f73ea]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 4]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so(_ZNK26yggdrasil_decision_forests5model22gradient_boosted_trees27GradientBoostedTreesLearner15TrainWithStatusERKNS_7dataset15VerticalDatasetEN4absl12lts_202111028optionalISt17reference_wrapperIS5_EEE+0x278b)[0x7fa02820a0bb]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 5]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow_decision_forests/tensorflow/ops/training/training.so(_ZN27tensorflow_decision_forests3ops20SimpleMLModelTrainer7ComputeEPN10tensorflow15OpKernelContextE+0x855)[0x7fa0281595a5]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 6]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/…/libtensorflow_framework.so.2(_ZN10tensorflow16ThreadPoolDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x4b)[0x7fa096f0f69b]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [1,mpirank:0,algo-1]:[ 7]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow17KernelAndDeviceOp3RunEPNS_19ScopedStepContainerERKNS_15EagerKernelArgsEPSt6vectorIN4absl12lts_202111027variantIJNS_6TensorENS_11TensorShapeEEEESaISC_EEPNS_19CancellationManagerERKNS8_8optionalINS_19EagerFunctionParamsEEERKNSI_INS_17ManagedStackTraceEEEPNS_24CoordinationServiceAgentE+0x9c7)[0x7fa0a514ce47]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [ 8]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow18EagerKernelExecuteEPNS_12EagerContextERKN4absl12lts_2021110213InlinedVectorIPNS_12TensorHandleELm4ESaIS6_EEERKNS3_8optionalINS_19EagerFunctionParamsEEERKSt10unique_ptrINS_15KernelAndDeviceENS_4core15RefCountDeleterEEPNS_14GraphCollectorEPNS_19CancellationManagerENS3_4SpanIS6_EERKNSB_INS_17ManagedStackTraceEEE+0x289)[0x7fa09dc17149]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [1,mpirank:0,algo-1]:[ 9]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow11ExecuteNode3RunEv+0x1c9)[0x7fa09dc18509]
2023-05-29T13:56:30.882+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [1,mpirank:0,algo-1]:[10]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow13EagerExecutor11SyncExecuteEPNS_9EagerNodeE+0x410)[0x7fa0a5778ba0]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [11]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(+0x59e55c6)[0x7fa09dc125c6]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [12]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow12EagerExecuteEPNS_14EagerOperationEPPNS_12TensorHandleEPi+0x254)[0x7fa09dc12c34]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [13]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow14EagerOperation7ExecuteEN4absl12lts_202111024SpanIPNS_20AbstractTensorHandleEEEPi+0x200)[0x7fa09d958e50]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [14]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow21CustomDeviceOpHandler7ExecuteEPNS_27ImmediateExecutionOperationEPPNS_30ImmediateExecutionTensorHandleEPi+0x5da)[0x7fa0a5158d7a]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [1,mpirank:0,algo-1]:[15]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(TFE_Execute+0x66)[0x7fa09d0a95b6]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [16]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(_Z24TFE_Py_FastPathExecute_CP7_object+0x25a7)[0x7fa09cd55427]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [1,mpirank:0,algo-1]:[17] /usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tfe.so(+0x69ea7)[0x7fa06b248ea7]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [18] [1,mpirank:0,algo-1]:/usr/local/lib/python3.9/site-packages/tensorflow/python/_pywrap_tfe.so(+0x9529f)[0x7fa06b27429f]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [19]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(+0x227513)[0x55a5bfd18513]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [20] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(_PyObject_MakeTpCall+0x8c)[0x55a5bfb64e2c]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [21] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(_PyEval_EvalFrameDefault+0x7fb8)[0x55a5bfb55cd8]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [22] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(+0x1276aa)[0x55a5bfc186aa]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [23]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(_PyFunction_Vectorcall+0x97)[0x55a5bfb65e77]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [24] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(PyVectorcall_Call+0xc6)[0x55a5bfb658a6]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [25]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(_PyEval_EvalFrameDefault+0x20b4)[0x55a5bfb4fdd4]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [26] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(+0x1276aa)[0x55a5bfc186aa]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [27] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(_PyFunction_Vectorcall+0x97)[0x55a5bfb65e77]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [28] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(_PyEval_EvalFrameDefault+0x603a)[0x55a5bfb53d5a]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] [29] [1,mpirank:0,algo-1]:/usr/local/bin/python3.9(+0x1276aa)[0x55a5bfc186aa]
2023-05-29T13:56:30.883+03:00 [1,mpirank:0,algo-1]:[algo-1:00091] *** End of error message ***