Clone
3
LLM API Router Service
midnight edited this page 2025-09-26 04:09:41 +02:00

LLM API Router Service: The current implementation

This service acts as a central point for routing requests to different Large Language Models, ensuring high availability and resilience.

Components

  1. LLMApiRouter: The main service entry point. It orchestrates the overall process.

    • Configuration: Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters. The configuration is provided through a LLMApiRouterConfiguration object.
    • OpenAiModelFactory: Used to create instances of chat models for each LLM service.
    • LLMService instances: Represents each individual LLM service. The LLMApiRouter maintains a list of these.
    • CounterMetric: Used for tracking metrics such as LLM timeouts.
  2. LLMService: Represents a single LLM service.

    • Operational Status: Tracks whether the service is currently available (operational flag).
    • LLM Client: Contains a client for interacting with the LLM API (built using the OpenAiModelFactory). It has two instances:
      • shortModel for requests that require a fast execution
      • defaultModel for requests that need more processing time.
    • Configuration: Stores specific configuration parameters for the LLM (API URL, API Key, model name, timeouts, etc.) via LLMServiceConfiguration.

Workflow

  1. Initialization:

    • The LLMApiRouter initializes a list of LLMService instances based on the provided configuration.
    • Each LLMService creates an LLM client using the OpenAiModelFactory.
  2. Request Processing (processQuery or generateSummary):

    • A client sends a request (either a Prompt or a list of Message objects) to the LLMApiRouter.
    • The LLMApiRouter selects the first currently operational LLMService. If no services are available, it throws an AllLLMServicesUnavailableException.
    • The request is routed to the selected LLMService.
      • This call is wrapped in the autoRetry method
      • The shortTimeout parameter is set based on the endpoint.
        • true for processQuery
        • false for generateSummary

Note

The service selection mechanism is implemented in the currentService property.
This property is evaluated for each access, so it should always return an LLMService that is flagged as operational

  1. LLMService Processing:

    • The LLMService uses its LLM client (shortModel or defaultModel, determined by the caller) to send the request to the LLM API.
    • The LLM API processes the request and returns a response.
    • The LLMService wraps the response in an LlmServiceResponse object (including the service ID) and returns it to the LLMApiRouter.
  2. Automatic Retry:

    • If a timeout is encountered by an LLMService, it sets its operational flag to false and throws a ServiceUnavailableException
    • the autoRetry method catches the exception and automatically retries the request
      • the LLMApiRouter always selects the first available service for each attempt independently
  3. Service Health Checks:

    • A scheduled task (@Scheduled) periodically checks the availability of each LLMService.
    • For each LLMService that is not operational, a latency test is performed using a simple API call.
    • If the latency test succeeds, the LLMService is marked as operational. If it fails, the service remains unavailable and the health check is retried later.
    • A map of the operational status of each service can be retrieved with the status property of the LLMApiRouter

Error Handling

  • Timeouts: The LLMService uses configurable timeouts for API calls. If a timeout occurs, a ServiceUnavailableException is thrown, and the health check mechanism will attempt to revive the service later. This exception is handled internally by autoRetry.
  • AllLLMServicesUnavailableException: Thrown by LLMApiRouter when all configured LLM services are unavailable. The request cannot be processed and the exception should be handled by the caller.

New features

Model preference

It can be useful to select a different provider and/or model depending on the task (lower latency, lower cost, etc.).

Currently, the router will always select the first operational LLMService from the list, regardless of the task, and the caller has no influence over this behaviour.

Proposal 1.:

  • The LLMService should contain a property than can be queried in the service selector to determine whether the service is appropriate for the task.
  • The purpose of the call should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.

Proposal 2.:

  • The LLMApiRouter should have multiple service lists based on appropriate grouping of the different providers/models.
  • The desired service group should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.

Prompt service

This feature should be implemented outside of the LLMApiRouter service.