Can someone help me with this code i cant solve it

This is link to my code but i tried everything i know but still same can someone really help?
Thank you.

Are you getting an error when importing AdamW ?

from transformers import AdamW, get_linear_schedule_with_warmup

ImportError                               Traceback (most recent call last)
<ipython-input-73-8113644e73e6> in <cell line: 0>()
----> 1 from transformers import AdamW, get_linear_schedule_with_warmup
ImportError: cannot import name 'AdamW' from 'transformers' (/usr/local/lib/python3.11/dist-packages/transformers/__init__.py)

I ran into errors there, started to make progress using
from torch.optim import AdamW

I reached a point where, with only a quick glance with a huggingface training dataset and lacking your config file and tokenization settings, I could only do so much. I will leave my Debugging Colab Notebook available.

In recent versions of transformers AdamW - “This optimizer has been removed from the transformers library, and users are now expected to use the AdamW implementation provided by PyTorch, located in torch.optim.”


TypeError Traceback (most recent call last)
in <cell line: 433>()
461
462 # main fonksiyonunu DOĞRUDAN ÇAĞIRIN ve değişkenleri argüman olarak geçirin
→ 463 main(
464 argparse.Namespace( # Namespace nesnesi oluşturarak argümanları taklit ediyoruz
465 config_path=config_path,

in main(args)
382 config[‘tokenizer_path’] = tokenizer_path
383
→ 384 train_sequences = data_processor.prepare_training_data(
385 seq_length=config.get(‘seq_length’, 64),
386 stride=config.get(‘stride’, 32)

in prepare_training_data(self, seq_length, stride)
167 all_token_ids.extend(encoded.ids)
168 sequences =
→ 169 for i in range(0, len(all_token_ids) - seq_length, stride):
170 seq = all_token_ids[i:i + seq_length]
171 if len(seq) == seq_length:

TypeError: ‘NoneType’ object cannot be interpreted as an integer

im having this error and can you explain little more easy? İm kinda a newbie and thank you for your help.

At a glance, this looks like an issue in the code that handles processing/reading the training dataset file or an issue with how the training dataset file is formatted.

The line
all_token_ids.extend(encoded.ids)
is causing the error as there is no valid data being read into encoded.ids which
the next steps will most likely being debugging by adding some debugging statements i.e.

print("Dataset sample:", dataset[:5])  # Check if dataset is loaded correctly
print("Encoded sample:", encoded)  # Check if tokenizer is working
print("Token IDs:", encoded.ids)  # Ensure token IDs exist
print("seq_length:", seq_length, "stride:", stride)  # Confirm values are set correctly`

I’ll be able to check more Monday or Tuesday. You can check a ChatGpt Conversation here with some some more interactable information. (where to insert some debugging code etc and translated from English > Turkish)
Just tell ChatGPT
A) English or B) Turkish

  1. comfortable with basic Python syntax
  2. more detailed guidance
  3. experience with debugging and machine learning pipelines

It will be Monday or Tuesday until I can do some hands on testing with the script to get more specific / verify

Thank you, thanks to you, I solved the error quickly, but now I have finished my code, but unfortunately the kaggle notebook I use when I run it unfortunately does not run it, I could not solve it, it automatically stops the cell after running for about 5 seconds, I leave the link to the new code. Thank you very much again. New and not running code