YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Overview
Description
GR00T-N1.6-Rheo-PickNPlace is a vision language action model (VLA). This model is fine-tuned for preparing for surgical instruments handling in the Isaac for Healthcare Rheo workflow. It performs the pick‑and‑place of a sterilized box from a shelf to a cart using a G1 embodiment. This model is ready for commercial/non-commercial use.
License/Terms of Use
Governing Terms: Your usage of the GR00T-N1.6-Rheo-PickNPlaceTray model is governed by the NVIDIA License.
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.
Deployment Geography
Global
Use Case
This model is intended for Rheo simulation workflows focused on surgical instruments handling (sterilized box pick-and-place from shelf to cart). It is not intended for real-world clinical deployment.
Release Date
Hugging Face (03/10/2026) via https://huggingface.co/nvidia/GR00T-N1.6-Rheo-PickNPlaceTray/tree/main
Reference(s)
Nvidia Isaac-GR00T N1.6 Isaac For Healthcare
Model Architecture
Architecture Type: Vision Language Action model Network Architecture: GR00T N1.6 This model was developed based on GR00T N1.6 Number of model parameters: 3 billion
Computational Load
Cumulative Compute: 2.45×10^19 FLOPs (hardware-based calculation using single NVIDIA H100 NVL for training)
Estimated Energy and Emissions for Model Training: 5.37 kWh, 0.00217 tCO₂e
Input(s)
Input Type(s): Vision, State, Language Instruction
Input Format(s):
- Vision: RGB images (uint8)
- State: Floating point
- Language Instruction: String
Input Parameters:
- Vision: Two-Dimensional (2D)
- State: One-Dimensional (1D)
- Language Instruction: One-Dimensional (1D)
Other Properties Related to Input:
- Vision: Raw 480x640 uint8 RGB frames from robot head camera; training preprocessing uses shortest_edge=256 with crop_fraction=0.95 (albumentations).
- State: 1x31 vector.
Output(s)
Output Type(s): Actions
Output Format(s): Continuous-value vectors
Output Parameters: Two-Dimensional (2D), 16x32 tensor
Other Properties Related to Output: Continuous-value vectors correspond to different motor controls on the robot embodiment.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration
Runtime Engine(s): PyTorch 2.8.0
Supported Operating System:
- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper
Preferred/Supported Operating System(s):
- Linux (Ubuntu 22.04/24.04 LTS)
Model Version(s)
GR00T-N1.6-Rheo-PickNPlace
Training Datasets, Testing, and Evaluation Datasets
Manual teleoperation and IsaacLab mimic generation.
Training Dataset
Total Size: 120 samples
Text Training Data Size: Less than a Billion Tokens
Video Training Data Size: Less than 10,000 Hours
Non-Audio, Image, Text Training Data Size:
Image/Video Data: RGB video frames from robot head camera (640x480 pixels)
Text Data: 120 language instruction strings by human labelling
Action Data: 120 episodes of robot action trajectories (state observations and action sequences)
Data Modality:
- Text
- Video
- Action
Data Collection Method by dataset: Automatic/Sensors Labeling Method by dataset: Human
Data Properties:
Quantity: 120 simulation samples
Modalities: Multi-modal data consisting of (i) RGB video frames, (ii) text-based language instructions, (iii) robot state observations
Nature of Content: Data from Isaac Sim simulation environment collected in Isaac Lab mimic; no personal data or copyright-protected content; data represents surgical instrument manipulation tasks
Linguistic Characteristics: Language instructions describing surgical instrument prepartion
Sensor(s):
Vision sensors: RGB cameras (robot head-mounted) capturing 640x480 pixel images in simulation
Action sensors: Motor sensors on G1 embodiment
Testing Datasets
Data Collection Method by dataset: Not Applicable Labeling Method by dataset: Not Applicable Data Properties: The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pick-and-place task.
Evaluation Datasets
Data Collection Method by dataset: Not Applicable Labeling Method by dataset: Not Applicable Data Properties: The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pick-and-place task.
Inference
Engine: PyTorch
Test Hardware: NVIDIA RTX 5880 Ada Generation
Inference mode / Latency / Memory: PyTorch 92.4 ± 1.3 ms, 8 GB
Limitations
This model was trained on data from the Isaac for Healthcare Rheo workflow. Therefore, the model will only perform well in that specific operating room environment. This model is not expected to generalize to different robot platforms, environments, or surgical procedures outside of the trained domain.
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
- Downloads last month
- 28