The adoption and implementation of generative AI inference has increased with organizations building more operational workloads that use AI capabilities in production at scale. To help customers achieve the scale of their generative AI applications, Amazon Bedrock offers cross-Region inference (CRIS) profiles. CRIS is...
This post is cowritten with Abdullahi Olaoye, Curtice Lockhart, Nirmal Kumar Juluru from NVIDIA.
We are excited to announce that NVIDIA’s Nemotron 3 Nano is now available as a fully managed and serverless model in Amazon Bedrock. This follows our earlier announcement at AWS re:Invent...
Organizations increasingly deploy custom large language models (LLMs) on Amazon SageMaker AI real-time endpoints using their preferred serving frameworks—such as SGLang, vLLM, or TorchServe—to help gain greater control over their deployments, optimize costs, and align with compliance requirements. However, this flexibility introduces a critical...
As your conversational AI initiatives evolve, developing Amazon Lex assistants becomes increasingly complex. Multiple developers working on the same shared Lex instance leads to configuration conflicts, overwritten changes, and slower iteration cycles. Scaling Amazon Lex development requires isolated environments, version control, and automated deployment...
This post is cowritten by Jeremy Jacobson and Rado Fulek from Ricoh.
This post demonstrates how enterprises can overcome document processing scaling limits by combining generative AI, serverless architecture, and standardized frameworks. Ricoh engineered a repeatable, reusable framework using the AWS GenAI Intelligent Document Processing...
Call center analytics play a crucial role in improving customer experience and operational efficiency. With foundation models (FMs), you can improve the quality and efficiency of call center operations and analytics. Organizations can use generative AI to assist human customer support agents and managers...