In the pre 2.4 versions of TF, I allocated the space for my string tensor with TF_AllocateTensor with size (8 bytes + TF_StringEncodedSize). Then I encoded the actual string with TF_StringEncode directly to the right place in the Tensor (offset 8 bytes). This also worked with an array of strings playing cleverly the offset.
Since 2.4 the two above string functions are not available.
Can somebody give me a small hint how to do it now? What did actually StringEncode do? Shall I do something similar manually or there is another solution?
Thanks,
Zsolt
Btw. I am using Pascal, but through the c API (c_api.h and related ones)
I have just found the same question from last August: http://discuss.ai.google.dev/t/wrap-string-into-tensor-in-tensorflow-c-api-v2-4/11468
Unfortunately there is no answer there.
I think I finally solved it. The trick is that one cell of a tensor is 24 bytes (size of TF_Tstring structure). When a String type tensor is to be made, a tensor has to be created as TF_String type and allocate a memory of 24 * number of cells. Then in every cell the result (dst) of the TF_StringCopy has to be copied. If the string is less or equal than 22 bytes, it is placed into the 24 bytes cell, if longer than it takes care of the heap allocation and referencing. No need to do more.
Once a Tensor is deleted it frees up the Heap as well if used.
If I misunderstood something, please let me know.
For anyone that might have stumbled upon this via a google search, creating the string tensor will look something like this
const char embedding[] = "The quick brown fox jumps over the lazy dog.";
TF_TString tstring[1];
TF_TString_Init(&tstring[0]);
TF_TString_Copy(&tstring[0], embedding, sizeof(embedding) - 1);
int64_t dims[] = {1,1};
int num_dims = 1;
TF_Tensor* input_tensor = TF_NewTensor(TF_STRING, dims, num_dims, &tstring[0], sizeof(tstring), &Deallocator, nullptr);
Can you give a code example?
I am trying to make an Octave interface to tensorflow though OCT files in C++ and the C API from tensorflow, but I cannot find a way to add multiple strings (from an Octave cellstr array) to a single tensor. I can load a single TF_TString into a TF_Tensor, but not multiple ones. The following code works properly from within an oct file. Note: the args
variable is an octave_value_list
parsed into the oct file and the second argument in that list is the character vector.
else if (args(1).is_char_matrix ())
{
if (! rowvec)
{
error ("tensorflow: only a character vector can be loaded into Tensor. "
"For multiple rows of characters use a cellstr array.");
}
string oct_data = args(1).string_value ();
const char* oct_str = oct_data.c_str ();
size_t str_len = strlen (oct_str);
TF_TString* tstring = reinterpret_cast<TF_TString *> (malloc (str_len));
TF_StringInit (tstring);
TF_StringCopy (tstring, oct_str, str_len);
int64_t dims[] = {1};
int ndims = 1;
newTensor = TF_NewTensor (TF_STRING, dims, ndims, tstring,
str_len, &NoOpDeallocator, 0);
}
I did not deal with this question for a while, and what I did was a Pascal wrapper around the C interface, so I am not sure I can help a lot. What I remember that when TF changed from 1.5 to 2.x, the string handling changed and I had to rewrite the interface. If you look at my unit under tensorflowforpascal/units/tf_tensors.pas at master · zsoltszakaly/tensorflowforpascal · GitHub, there you find three subroutines, all called CreateTensorString. For arrays check from line 442.
I hope you find something useful there.