Reduce latency and cost by skipping retries on non-retryable LLM errors

We now classify common non-retryable conditions (e.g., 4xx responses, payload too large, context limit exceeded) and skip retries across both sync and async flows. This delivers faster failure signals, lower compute spend, and clearer logs — improving reliability without any changes to your code.

‍

Details:

Consistent behavior across orchestration paths and providers
Automatic optimization; no configuration required

‍

Who this is for: Teams running production LLM workloads at scale who want to minimize wasted cycles and speed up incident triage.