What are the input/output token limits for Claude Sonnet via the Vertex Model Garden?

Ron_Parker · July 16, 2024, 7:01am

In the Vertex Model Garden, Anthropic Claude Sonnet 3.5 is now available: Google Cloud console

What are the input/output token limits? Where do I even find that information?

Siva_Sravana_Kumar_N · July 18, 2024, 9:37pm

Hi @Ron_Parker,

The official documentation of Claude Sonnet 3.5 is here: Models - Anthropic and they have all rate limits mentioned.

Also Google Doc for using Partner Models: Partner-Models/Claude

Thanks

Ron_Parker · July 25, 2024, 8:09pm

From Home - Anthropic

Max output 8192 tokens1

8192 output tokens is in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15. If the header is not specified, the limit is 4096 tokens.

I read somewhere there was a feature request to modify headers in the AnthropicVertex request.

github.com/googleapis/python-aiplatform

Add support for custom headers in vertexai.init function

opened 10:45AM - 29 Mar 24 UTC

patrykkotlowski-dsstream

api: vertex-ai

**Is your feature request related to a problem? Please describe.** I cannot fin…d option for adding custom headers to VertexAI Gemini API. **Describe the solution you'd like** I'm trying to implement proxy between end client and VertexAI Gemini API. In my implementation custom HTTP headers are must have requirement. I would like to ask for adding option to vertexai.init or any other place to add HTTP headers, similar as it is in genai gemini library. **Describe alternatives you've considered** Actually there are no alternatives. Headers are must have. **Additional context** Example for custom headers in google.generativeai (https://github.com/google/generative-ai-python) ``` import google.generativeai as genai genai.configure( api_key="mytoken", transport="rest", client_options={ "api_endpoint": "https://my/custom/proxy" }, default_metadata=tuple(my_headers.items()), ) ``` My proposal for vertexai.init: ``` import vertexai from vertexai.generative_models import GenerativeModel vertexai.init( api_endpoint="https://my/custom/proxy", api_transport="rest", default_metadata=tuple(my_headers.items()), # New argument ) multimodal_model = GenerativeModel("gemini-1.0-pro-vision") response = multimodal_model.generate_content( [ "How are you", ] ) ```

Can I get some sort of follow up from Google?

Ron_Parker · July 26, 2024, 1:53am

Actually, I realized that I needed to post the issue here:

github.com/GoogleCloudPlatform/vertex-ai-samples

How to add header to AnthropicVertex Python request

opened 08:50PM - 25 Jul 24 UTC

closed 02:10PM - 18 Oct 24 UTC

SomebodySysop

## Expected Behavior Allow max output tokens of 8192 when making calls to cla…ude-sonnet-3.5 using AnthropicVertex SDK (Python) ## Actual Behavior Only 4196 output tokens allows. ## Steps to Reproduce the Problem From: https://docs.anthropic.com/en/docs/about-claude/models > Max output | 8192 tokens1 > 8192 output tokens is in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15. If the header is not specified, the limit is 4096 tokens. > Question: How do I add _anthropic-beta: max-tokens-3-5-sonnet-2024-07-15_ to header when using AnthropicVertex SDK? ## Specifications - Version: Python 3.11.9. AnthropicVertex was installed about a month ago. - Platform: Linux system with Vertex AI Platform and Anthropic Vertex SDKs installed for Python

In the Google Cloud Platform - Vertex AI GitHub.

But, doesn’t look like anyone responds there.

Ron_Parker · July 26, 2024, 6:36am

After extensive online searches, I did find two ways advertised to modify the headers to increase the maximum output tokens:

    client = AnthropicVertex(
        region=LOCATION,
        project_id=project_id,
		# https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#default-headers
		default_headers={"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"} # Custom headers here
    )

And also:

x.com

    message = client.messages.create(
        max_tokens=max_tokens,
        messages=[
            {
                "role": "user",
                "content": content,
            }
        ],
        model="claude-3-5-sonnet@20240620",
        # https://x.com/alexalbert__/status/1812921642143900036
		extra_headers={"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"}  # Custom headers here
    )

But neither way appears to work in the AnthropicVertex SDK:

Error code: 400 - {‘type’: ‘error’, ‘error’: {‘type’: ‘invalid_request_error’, ‘message’: ‘max_tokens: 8192 > 4096, which is the maximum allowed number of output tokens for claude-3-5-sonnet-20240620’}}

Topic		Replies	Views
Failing to use the API (2.5 pro) - Why Google needs to overcomplicate things? Gemini API api	1	205	June 17, 2025
Hitting input token limits that are way lower than advertised in gemini 2.0 Gemini API api , gemini-20	4	353	January 28, 2025
Can I increase max_output_tokens Gemini API api , models	2	1673	December 18, 2024
Get logprobs at output token level Gemini API api , models	10	2381	May 18, 2025
400 Invalid argument while using candidate_count>2 and long json in the prompt Gemini API prompt	2	419	February 28, 2025

What are the input/output token limits for Claude Sonnet via the Vertex Model Garden?

Related topics