Inference Endpoint re-initializes during active API requests causing 500 errors (custom Docker, multi-model pipeline)

roll-ai · January 12, 2026, 4:04pm

I am facing a critical issue with a dedicated Inference Endpoint where the container unexpectedly re-initializes while it is already in the RUNNING state and actively processing API requests.

Problem description:

The endpoint shows status RUNNING
An API request is sent and processing starts
During processing, the endpoint restarts / re-initializes automatically
The request fails with 500 Internal Server Error
After re-initialization, the endpoint becomes RUNNING again

Additional observations:

The endpoint logs do not show any error, crash, or exception before the restart
There is no warning or failure message prior to entering the Initializing state
GPU and CPU utilization remain well below maximum (verified via analytics)
Memory usage also appears stable

Based on this, it does not seem to be related to:

OOM kills
memory limits
GPU/CPU resource exhaustion

Deployment details:

This endpoint uses a custom Docker image
The inference system is a custom pipeline
Multiple models are involved in the workflow
Models are dynamically loaded and offloaded (weights moved in/out of GPU/CPU memory) depending on the requested operation, as different models are used sequentially within a single request

Impact:

Ongoing inference jobs are terminated mid-execution
API clients receive 500 errors
Production usage becomes unreliable despite stable traffic patterns

Questions:

Under what conditions can an inference endpoint automatically re-initialize while in the RUNNING state?
Could this be related to:
- platform-level container restarts
- autoscaling mechanisms?
Are there recommended configurations or best practices to prevent restarts / re-initialization during long-running inference requests, especially when using:
- custom Docker images
- multi-model pipelines
- dynamic model loading/offloading?

This issue is currently blocking stable production usage, so any guidance or investigation would be greatly appreciated.

Regards

meganariley · January 12, 2026, 7:26pm

Hey @roll-ai thanks for posting and emailing support! Please feel free to follow up with support via email if you’re still running into issues so that we can investigate further.

Topic		Replies	Views
Inference Endpoint Initializing forever! Inference Endpoints on the Hub	2	129	November 20, 2025
Dedicated endpoint stuck at Initializing Inference Endpoints on the Hub	4	345	July 8, 2024
HF Inference Endpoints don't finish Initializing Inference Endpoints on the Hub	0	274	March 28, 2024
Inference Endpoint not starting on HTTP request Inference Endpoints on the Hub	2	292	March 6, 2024
Autoscaling on inference endpoints not initializing from 0 replicas Inference Endpoints on the Hub	2	435	June 27, 2024

Inference Endpoint re-initializes during active API requests causing 500 errors (custom Docker, multi-model pipeline)

Related topics