Before Tensorflow V2.4, string tensors were defined as (tensorflow/tensorflow/c/c_api.h at r1.13 · tensorflow/tensorflow · GitHub) a list of uint64 offsets to varint prefixed char strings (where the varint defines the length of the string).
we have the following code to pass a string tensor as argument.
TF_Tensor* ScalarStringTensor(const char* str, TF_Status* status) {
size_t nbytes = 8 + TF_StringEncodedSize(strlen(str));
TF_Tensor* t = TF_AllocateTensor(TF_STRING, NULL, 0, nbytes);
void* data = TF_TensorData(t);
memset(data, 0, 8); // 8-byte offset of first string.
TF_StringEncode(str, strlen(str), data + 8, nbytes - 8, status);
return t;
}
void foo() {
TF_Tensor* t = ScalarStringTensor(checkpoint_prefix, model->status);
if (!Okay(model->status)) {
TF_DeleteTensor(t);
return 0;
}
TF_Output inputs[1] = {model->checkpoint_file};
TF_Tensor* input_values[1] = {t};
const TF_Operation* op[1] = {type == SAVE ? model->save_op
: model->restore_op};
TF_SessionRun(model->session, NULL, inputs, input_values, 1,
/* No outputs */
NULL, NULL, 0,
/* The operation */
op, 1, NULL, model->status);
TF_DeleteTensor(t);
}
Since tensorflow V2.4, the string representation in C/C++/TFCore is unified.
- The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
tensorflow::tstring
/TF_TString
s.- C-API functions
TF_StringDecode
,TF_StringEncode
, andTF_StringEncodedSize
are no longer relevant and have been removed; see core/platform/ctstring.h for string access/modification in C.
And this document describes how a string is represented in memory.
How should I update the code above to send a string as a tensor? just copy the memory layout into a 1-d tensor? What about memory padding? I am unclear about this part.
Thank you in advance