GenAI News

Build an offline feature store using Amazon SageMaker Unified Studio and SageMaker Catalog

Building and managing machine learning (ML) features at scale is one of the most critical and complex challenges in modern data science workflows. Organizations often struggle with fragmented feature pipelines, inconsistent data definitions, and redundant engineering efforts across teams....

How Workhuman built multi-tenant self-service reporting using Amazon Quick Sight embedded dashboards

This post is cowritten with Ilija Subanovic and Michael Rice from Workhuman. Workhuman’s customer service and analytics team were drowning in one-time reporting requests from seven million users worldwide—a common challenge with legacy reporting tools at scale. Business intelligence (BI)...

P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you speculate, the more sequential forward passes the drafter needs. Eventually those overhead...

Secure AI agents with Policy in Amazon Bedrock AgentCore

Deploying AI agents safely in regulated industries is challenging. Without proper boundaries, agents that access sensitive data or execute transactions can pose significant security risks. Unlike traditional software, an AI agent chooses actions to achieve a goal by invoking...

Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption

As organizations scale their generative AI workloads on Amazon Bedrock, operational visibility into inference performance and resource consumption becomes critical. Teams running latency-sensitive applications must understand how quickly models begin generating responses. Teams managing high-throughput workloads must understand how...

Fine-tuning NVIDIA Nemotron Speech ASR on Amazon EC2 for domain adaptation

This post is a collaboration between AWS, NVIDIA and Heidi.  Automatic speech recognition (ASR), often called speech-to-text (STT) is becoming increasingly critical across industries like healthcare, customer service, and media production. While pre-trained models offer strong capabilities for general speech,...
spot_img

latest articles