i’m following this doc:
trying to convert it to be used by this model:
inputs = keras.Input(shape=(), dtype="string") x = SPtokenizer(inputs) x = layers.Embedding(input_dim=SPtokenizer.vocabulary_size(), output_dim=embed_size,name="embed")(x) predictions = layers.Dense(1, activation="sigmoid", name="predictions")(x) model = keras.Model(inputs,predictions)
I get a (HUgging Face) datasets.Dataset like this:
LH_dataset_HF = datasets.load_dataset("nguha/legalbench", 'learned_hands_torts')
I create a keras/TF friendly dataset like this:
trainDS = renameDS['train'].to_tf_dataset(
columns=[text"
label_cols="answer",
batch_size=batch_size,
shuffle=False,
)
the resulting trainDS looks like this:
_PrefetchDataset: <_PrefetchDataset
element_spec=(TensorSpec(shape=(None,), dtype=tf.string, name=None),
TensorSpec(shape=(None,), dtype=tf.string, name=None))>
when I try to give it to model.fit(trainDS)
producing this error:
ValueError: Arguments
target
andoutput
must have the same rank (ndim). Received: target.shape=(None,), output.shape=(None, 32, 1)
i have also tried: applying a map to make the labels integers:
def binaryLbl(txt,tlbl):
if tlbl=='Yes':
ilbl = 1
else:
ilbl = 0
return txt,ilbl
trainDS2 = trainDS.map(binaryLbl)
it produces a trainDS2 that looks like this:
_MapDataset: <_MapDataset
element_spec=(TensorSpec(shape=(None,),dtype=tf.string, name=None),
TensorSpec(shape=(), dtype=tf.int32,name=None))>
and generates this error:
ValueError: Arguments `target` and `output` must have the same rank (ndim). Received: target.shape=(), output.shape=(None, 32, 1)
what can i do to make the dataset conform to the model’s expectation?! Thanks for any help.