Home Gen AI News Talk Scale AI in South Africa using Amazon Bedrock global cross-Region inference with...

Scale AI in South Africa using Amazon Bedrock global cross-Region inference with Anthropic Claude 4.5 models

4

Building AI applications with Amazon Bedrock presents throughput challenges impacting the scalability of your applications. Global cross-Region inference in the af-south-1 AWS Region changes that. You can now invoke models from the Cape Town Region while Amazon Bedrock automatically routes requests to Regions with available capacity. Your applications get consistent response times, your users get reliable experiences, and your Amazon CloudWatch and AWS CloudTrail logs stay centralized in af-south-1.

Global cross-Region inference with Anthropic Claude Sonnet 4.5, Haiku 4.5 and Opus 4.5 on Amazon Bedrock in the Cape Town Region (af-south-1) gives you access to the Claude 4.5 model family. South African customers can now use global inference profiles to access these models with enhanced throughput and resilience. Global cross-Region inference routes requests to supported commercial Regions worldwide, optimizing resources and enabling higher throughput—particularly valuable during peak usage times. The feature supports Amazon Bedrock prompt caching, batch inference, Amazon Bedrock Guardrails, Amazon Bedrock Knowledge Bases, and more.

In this post, we walk through how global cross-Region inference routes requests and where your data resides, then show you how to configure the required AWS Identity and Access Management (IAM) permissions and invoke Claude 4.5 models using the global inference profile Amazon Resource Name (ARN). We also cover how to request quota increases for your workload. By the end, you’ll have a working implementation of global cross-Region inference in af-south-1.

Understanding cross-Region inference

Cross-Region inference is a powerful feature that organizations can use to seamlessly distribute inference processing across multiple Regions. This capability helps you get higher throughput while building at scale, allowing your generative AI applications to remain responsive and reliable even under heavy load.

An inference profile in Amazon Bedrock defines a foundation model (FM) and one or more Regions to which it can route model invocation requests. Inference profiles operate on two key concepts:

  • Source Region – The Region from which the API request is made
  • Destination Region – A Region to which Amazon Bedrock can route the request for inference

Cross-Region inference operates through the secure AWS network with end-to-end encryption for both data in transit and at rest. When a customer submits an inference request from a source Region, cross-Region inference intelligently routes the request to one of the destination Regions configured for the inference profile over the Amazon Bedrock managed network.

The key distinction is that while inference processing (the transient computation) can occur in another Region, data at rest—including logs, knowledge bases, and stored configurations—is designed to remain within your source Region. Requests travel over the AWS Global Network managed by Bedrock. Data transmitted during cross-Region inference is encrypted and remains within the secure AWS network. Sensitive information is designed to stay protected throughout the inference process, regardless of which Region handles the request, and encrypted responses are returned to your application in your source Region.

Amazon Bedrock provides two types of cross-Region inference profiles:

  1. Geographic cross-Region inference: Amazon Bedrock automatically selects the optimal commercial Region within a defined geography (US, EU, Australia, and Japan) to process your inference request. (Recommended for use-cases with data residency needs.)
  2. Global cross-Region inference: Global cross-Region inference further enhances cross-Region inference by enabling the routing of inference requests to supported commercial Regions worldwide, optimizing available resources and enabling higher model throughput. (Recommended for use-cases that don’t have data residency needs).

Monitoring and logging

With global cross-Region inference from af-south-1, your requests can be processed anywhere across the AWS global infrastructure. However, Amazon CloudWatch and AWS CloudTrail logs are recorded in af-south-1, simplifying monitoring by keeping your records in one place.

Data security and compliance

Security and compliance is a shared responsibility between AWS and each customer. Global cross-Region inference is designed to maintain data security. Data transmitted during cross-Region inference is encrypted by Amazon Bedrock and is designed to remain within the secure AWS network. Sensitive information remains protected throughout the inference process, regardless of which Region processes the request. Customers are responsible for configuring their applications and IAM policies appropriately and for evaluating whether global cross-Region inference meets their specific security and compliance requirements. Because global cross-Region inference routes requests to supported commercial Regions worldwide, you should evaluate whether this approach aligns with your regulatory obligations, including the Protection of Personal Information Act (POPIA) and other sector-specific requirements. We recommend consulting with your legal and compliance teams to determine the appropriate approach for your specific use cases.

Implement global cross-Region inference

To use global cross-Region inference with Claude 4.5 models, developers must complete the following key steps:

  1. Use the global inference profile ID – When making API calls to Amazon Bedrock, specify the global Claude 4.5 model’s inference profile ID (for example, global.anthropic.claude-opus-4-5-20251101-v1:0). This works with both InvokeModel and Converse APIs.
  2. Configure IAM permissions – Grant IAM permissions to access the inference profile and FMs in potential destination Regions. In the next section, we provide more details. You can also read more about prerequisites for inference profiles.

Implementing global cross-Region inference with Claude 4.5 models is straightforward, requiring only a few changes to your existing application code. The following is an example of how to update your code in Python:

import boto3
import json

# Connect to Bedrock from your deployed region
bedrock = boto3.client('bedrock-runtime', region_name='af-south-1')

# Use global cross-Region inference inference profile for Opus 4.5
model_id = "global.anthropic.claude-opus-4-5-20251101-v1:0"  

# Make request - Global CRIS automatically routes to optimal AWS Region globally
response = bedrock.converse(
    messages=[
        {
            "role": "user", 
            "content": [{"text": "Explain cloud computing in 2 sentences."}]
        }
    ],
    modelId=model_id,
)

print("Response:", response['output']['message']['content'][0]['text'])
print("Token usage:", response['usage'])
print("Total tokens:", response['usage']['totalTokens'])

If you’re using the Amazon Bedrock InvokeModel API, you can quickly switch to a different model by changing the model ID, as shown in Invoke model code examples.

IAM policy requirements for global cross-Region inference

Global cross-Region inference requires three specific permissions because the routing mechanism spans multiple scopes: your Regional inference profile, the FM definition in your source Region, and the FM definition at the global level. Without these three, the service cannot resolve the model, validate your access, and route requests across Regions. Access to Anthropic models requires a use case submission before invoking a model. This submission can be completed at either the individual account level or centrally through the organization’s management account. To submit your use case, use the PutUseCaseForModelAccess API or select an Anthropic model from the model catalog in the AWS Management Console for Amazon Bedrock. AWS Marketplace permissions are required to enable models and can be scoped to specific product IDs where supported.

The following example IAM policy provides granular control:

{
    "Version": "2012-10-17",
    "Statement": [{
            "Sid": "GrantGlobalCrisInferenceProfileRegionAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:af-south-1:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "af-south-1"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileInRegionModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:af-south-1::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "af-south-1",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:af-south-1:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileGlobalModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:::foundation-model/<MODEL NAME> "
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "unspecified",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:af-south-1:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        }
    ]
}

The policy comprises three parts. The first statement grants access to the Regional inference profile in af-south-1, so that users can invoke the specified global cross-Region inference inference profile from South Africa. The second statement provides access to the Regional FM resource, which the service needs to understand which model is being requested within the Regional context. The third statement grants access to the global FM resource, which allows cross-Region routing to function.

When implementing these policies, verify that the three ARNs are included:

  • The Regional inference profile ARN follows the pattern arn:aws:bedrock:af-south-1:<ACCOUNT>:inference-profile/global.<MODEL NAME>. This grants access to the global inference profile in your source Region.
  • The Regional FM uses arn:aws:bedrock:af-south-1::foundation-model/<MODEL NAME>. This grants access to the model definition in af-south-1.
  • The global FM requires arn:aws:bedrock:::foundation-model/<MODEL NAME>. This grants access to the model across Regions—note that this ARN intentionally omits the Region and account segments to allow cross-Region routing.

The global FM ARN has no Region or account specified, which is intentional and required for the cross-Region functionality.

Important note on Service Control Policies (SCPs): If your organization uses Region-specific SCPs, verify that "aws:RequestedRegion": "unspecified" isn’t included in the deny Regions list, because global cross-Region inference requests use this Region value. Organizations using restrictive SCPs that deny multiple Regions except specifically approved ones will need to explicitly allow this value to enable global cross-Region inference functionality.

If your organization determines that global cross-Region inference isn’t appropriate for certain workloads because of data residency or compliance requirements, you can disable it using one of two approaches:

  • Remove IAM permissions – Remove one or more of the three required IAM policy statements. Because global cross-Region inference requires the three statements to function, removing one of these statements causes requests to the global inference profile to return an access denied error.
  • Implement an explicit deny policy – Create a deny policy that specifically targets global cross-Region inference profiles using the condition "aws:RequestedRegion": "unspecified". This approach clearly documents your security intent, and the explicit deny takes precedence even if allow policies are accidentally added later.

Request limit increases for global cross-Region inference

When using global cross-Region inference profiles from af-south-1, you can request quota increases through the AWS Service Quotas console . Because this is a global limit, requests must be made in your source Region (af-south-1).

Before requesting an increase, calculate your required quota using the burndown rate for your model. For Sonnet 4.5 and Haiku 4.5, output tokens have a five-fold burndown rate—each output token consumes 5 tokens from your quota—while input tokens maintain a 1:1 ratio. Your total token consumption per request is:

Input token count + Cache write input tokens + (Output token count x Burndown rate)

To request a limit increase:

  • Sign in to the AWS Service Quotas console in af-south-1.
  • In the navigation pane, choose AWS services.
  • Find and choose Amazon Bedrock.
  • Search for the specific global cross-Region inference quotas (for example, Global cross-Region model inference tokens per minute for Claude Sonnet 4.5 V1).
  • Select the quota and choose Request increase at account level.
  • Enter your desired quota value and submit the request.

Conclusion

Global cross-Region inference also brings the Claude 4.5 model family to the Cape Town Region, giving you access to the same capabilities available in other Regions. You can build with Sonnet 4.5, Haiku 4.5, and Opus 4.5 from your local Region while the routing infrastructure handles distribution transparently. To get started, update your applications to use the global inference profile ID, configure appropriate IAM permissions, and monitor performance as your applications use the worldwide AWS infrastructure. Visit the Amazon Bedrock console and explore how global cross-Region inference can enhance your AI applications. For more information, see the following resources:


About the authors

Christian Kamwangala is an AI/ML and Generative AI Specialist Solutions Architect at AWS, where he partners with enterprise customers to architect, optimize, and deploy production-grade AI solutions. His expertise lies in inference optimization—balancing performance, cost, and latency for large-scale deployments. Outside of work, he enjoys exploring nature and spending time with family and friends.

Jarryd Konar is a Senior Cloud Support Engineer at AWS, based in Cape Town, South Africa. He specializes in helping customers architect, optimize, and operate AI/ML and generative AI workloads in the cloud. Jarryd works closely with customers to implement best practices across the AWS AI/ML service portfolio, turning complex technical requirements into practical, scalable solutions. He is passionate about building sustainable and secure AI systems that empower both customers and teams.

Melanie Li PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions using state-of-the-art AI/ML tools. She has been actively involved in multiple generative AI initiatives across APJ, harnessing the power of LLMs. Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.