YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

Model Overview

Description

GR00T-N1.6-Rheo-PickNPlace is a vision language action model (VLA). This model is fine-tuned for preparing for surgical instruments handling in the Isaac for Healthcare Rheo workflow. It performs the pick‑and‑place of a sterilized box from a shelf to a cart using a G1 embodiment. This model is ready for commercial/non-commercial use.

License/Terms of Use

Governing Terms: Your usage of the GR00T-N1.6-Rheo-PickNPlaceTray model is governed by the NVIDIA License.
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Deployment Geography

Global

Use Case

This model is intended for Rheo simulation workflows focused on surgical instruments handling (sterilized box pick-and-place from shelf to cart). It is not intended for real-world clinical deployment.

Release Date

Hugging Face (03/10/2026) via https://huggingface.co/nvidia/GR00T-N1.6-Rheo-PickNPlaceTray/tree/main

Reference(s)

Nvidia Isaac-GR00T N1.6 Isaac For Healthcare

Model Architecture

Architecture Type: Vision Language Action model Network Architecture: GR00T N1.6 This model was developed based on GR00T N1.6 Number of model parameters: 3 billion

Computational Load

Cumulative Compute: 2.45×10^19 FLOPs (hardware-based calculation using single NVIDIA H100 NVL for training)

Estimated Energy and Emissions for Model Training: 5.37 kWh, 0.00217 tCO₂e

Input(s)

Input Type(s): Vision, State, Language Instruction
Input Format(s):

Vision: RGB images (uint8)
State: Floating point
Language Instruction: String

Input Parameters:

Vision: Two-Dimensional (2D)
State: One-Dimensional (1D)
Language Instruction: One-Dimensional (1D)

Other Properties Related to Input:

Vision: Raw 480x640 uint8 RGB frames from robot head camera; training preprocessing uses shortest_edge=256 with crop_fraction=0.95 (albumentations).
State: 1x31 vector.

Output(s)

Output Type(s): Actions Output Format(s): Continuous-value vectors Output Parameters: Two-Dimensional (2D), 16x32 tensor
Other Properties Related to Output: Continuous-value vectors correspond to different motor controls on the robot embodiment.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine(s): PyTorch 2.8.0

Supported Operating System:

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper

Preferred/Supported Operating System(s):

Linux (Ubuntu 22.04/24.04 LTS)

Model Version(s)

GR00T-N1.6-Rheo-PickNPlace

Training Datasets, Testing, and Evaluation Datasets

Manual teleoperation and IsaacLab mimic generation.

Training Dataset

Total Size: 120 samples Text Training Data Size: Less than a Billion Tokens
Video Training Data Size: Less than 10,000 Hours
Non-Audio, Image, Text Training Data Size:

Image/Video Data: RGB video frames from robot head camera (640x480 pixels)
Text Data: 120 language instruction strings by human labelling
Action Data: 120 episodes of robot action trajectories (state observations and action sequences)

Data Modality:

Text
Video
Action

Data Collection Method by dataset: Automatic/Sensors Labeling Method by dataset: Human

Data Properties:
Quantity: 120 simulation samples Modalities: Multi-modal data consisting of (i) RGB video frames, (ii) text-based language instructions, (iii) robot state observations
Nature of Content: Data from Isaac Sim simulation environment collected in Isaac Lab mimic; no personal data or copyright-protected content; data represents surgical instrument manipulation tasks
Linguistic Characteristics: Language instructions describing surgical instrument prepartion

Sensor(s):
Vision sensors: RGB cameras (robot head-mounted) capturing 640x480 pixel images in simulation Action sensors: Motor sensors on G1 embodiment

Testing Datasets

Data Collection Method by dataset: Not Applicable Labeling Method by dataset: Not Applicable Data Properties: The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pick-and-place task.

Evaluation Datasets

Inference

Engine: PyTorch
Test Hardware: NVIDIA RTX 5880 Ada Generation
Inference mode / Latency / Memory: PyTorch 92.4 ± 1.3 ms, 8 GB

Limitations

This model was trained on data from the Isaac for Healthcare Rheo workflow. Therefore, the model will only perform well in that specific operating room environment. This model is not expected to generalize to different robot platforms, environments, or surgical procedures outside of the trained domain.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.

Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month: 28

Safetensors

Model size

3B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support