In December 2025, we announced the availability of Reinforcement fine-tuning (RFT) on Amazon Bedrock starting with support for Nova models. This was followed by extended support for Open weight models such as OpenAI GPT OSS 20B and Qwen 3 32B in February 2026. RFT...
Nemotron 3 Super is now available as a fully managed and serverless model on Amazon Bedrock, joining the Nemotron Nano models that are already available within the Amazon Bedrock environment.
With NVIDIA Nemotron open models on Amazon Bedrock, you can accelerate innovation and deliver tangible...
With a wide array of Nova customization offerings, the journey to customization and transitioning between platforms has traditionally been intricate, necessitating technical expertise, infrastructure setup, and considerable time investment. This disconnect between potential and practical applications is precisely what we aimed to address. Nova...
We thank Greg Pereira and Robert Shaw from the llm-d team for their support in bringing llm-d to AWS.
In the agentic and reasoning era, large language models (LLMs) generate 10x more tokens and compute through complex reasoning chains compared to single-shot replies. Agentic AI...
EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you speculate, the more sequential forward passes the drafter needs. Eventually those overhead eats into your gains. P-EAGLE...
This post is a collaboration between AWS, NVIDIA and Heidi.
Automatic speech recognition (ASR), often called speech-to-text (STT) is becoming increasingly critical across industries like healthcare, customer service, and media production. While pre-trained models offer strong capabilities for general speech, fine-tuning for specific domains and...