Home Blog Page 22

UC San Diego Lab Advances Generative AI Research With NVIDIA DGX...

0
The Hao AI Lab research team at the University of California San Diego  — at the forefront of pioneering AI model innovation — recently...

Into the Omniverse: OpenUSD and NVIDIA Halos Accelerate Safety for Robotaxis,...

0
Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows...

Tracking and managing assets used in AI development with Amazon SageMaker AI 

0
Building custom foundation models requires coordinating multiple assets across the development lifecycle such as data assets, compute infrastructure, model architecture and frameworks, lineage, and...

Track machine learning experiments with MLflow on Amazon SageMaker using Snowflake...

0
A user can conduct machine learning (ML) data experiments in data environments, such as Snowflake, using the Snowpark library. However, tracking these experiments across...

Governance by design: The essential guide for successful AI scaling

0
Picture this: Your enterprise has just deployed its first generative AI application. The initial results are promising, but as you plan to scale across...

Unlocking video understanding with TwelveLabs Marengo on Amazon Bedrock

0
Media and entertainment, advertising, education, and enterprise training content combines visual, audio, and motion elements to tell stories and convey information, making it far...

How Tata Power CoE built a scalable AI-powered solar panel inspection...

0
This post is co-written with Vikram Bansal from Tata Power, and Gaurav Kankaria, Omkar Dhavalikar from Oneture. The global adoption of solar energy is rapidly...

Checkpointless training on Amazon SageMaker HyperPod: Production-scale training with faster fault...

0
Foundation model training has reached an inflection point where traditional checkpoint-based recovery methods are becoming a bottleneck to efficiency and cost-effectiveness. As models grow...

Adaptive infrastructure for foundation model training with elastic training on SageMaker...

0
Modern AI infrastructure serves multiple concurrent workloads on the same cluster, from foundation model (FM) pre-training and fine-tuning to production inference and evaluation. In...

NVIDIA Acquires Open-Source Workload Management Provider SchedMD

0
NVIDIA today announced it has acquired SchedMD — the leading developer of Slurm, an open-source workload management system for high-performance computing (HPC) and AI...