LLM API Router Service: The current implementation
This service acts as a central point for routing requests to different Large Language Models, ensuring high availability and resilience.
Components
-
LLMApiRouter: The main service entry point. It orchestrates the overall process.- Configuration: Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters. The
configuration is provided through a
LLMApiRouterConfigurationobject. OpenAiModelFactory: Used to create instances of chat models for each LLM service.LLMServiceinstances: Represents each individual LLM service. TheLLMApiRoutermaintains a list of these.CounterMetric: Used for tracking metrics such as LLM timeouts.
- Configuration: Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters. The
configuration is provided through a
-
LLMService: Represents a single LLM service.- Operational Status: Tracks whether the service is currently available (
operationalflag). - LLM Client: Contains a client for interacting with the LLM API (built using the
OpenAiModelFactory). It has two instances:shortModelfor requests that require a fast executiondefaultModelfor requests that need more processing time.
- Configuration: Stores specific configuration parameters for the LLM (API URL, API Key, model name, timeouts, etc.) via
LLMServiceConfiguration.
- Operational Status: Tracks whether the service is currently available (
Workflow
-
Initialization:
- The
LLMApiRouterinitializes a list ofLLMServiceinstances based on the provided configuration. - Each
LLMServicecreates an LLM client using theOpenAiModelFactory.
- The
-
Request Processing (
processQueryorgenerateSummary):- A client sends a request (either a
Promptor a list ofMessageobjects) to theLLMApiRouter. - The
LLMApiRouterselects the first currentlyoperationalLLMService. If no services are available, it throws anAllLLMServicesUnavailableException. - The request is routed to the selected
LLMService.- This call is wrapped in the
autoRetrymethod - The
shortTimeoutparameter is set based on the endpoint.trueforprocessQueryfalseforgenerateSummary
- This call is wrapped in the
- A client sends a request (either a
Note
The service selection mechanism is implemented in the
currentServiceproperty.
This property is evaluated for each access, so it should always return anLLMServicethat is flagged asoperational
-
LLMServiceProcessing:- The
LLMServiceuses its LLM client (shortModelordefaultModel, determined by the caller) to send the request to the LLM API. - The LLM API processes the request and returns a response.
- The
LLMServicewraps the response in anLlmServiceResponseobject (including the service ID) and returns it to theLLMApiRouter.
- The
-
Automatic Retry:
- If a timeout is encountered by an
LLMService, it sets itsoperationalflag to false and throws aServiceUnavailableException - the autoRetry method catches the exception and automatically retries the request
- the
LLMApiRouteralways selects the first available service for each attempt independently
- the
- If a timeout is encountered by an
-
Service Health Checks:
- A scheduled task (
@Scheduled) periodically checks the availability of eachLLMService. - For each
LLMServicethat is notoperational, a latency test is performed using a simple API call. - If the latency test succeeds, the
LLMServiceis marked asoperational. If it fails, the service remains unavailable and the health check is retried later. - A map of the
operationalstatus of each service can be retrieved with thestatusproperty of theLLMApiRouter
- A scheduled task (
Error Handling
- Timeouts: The
LLMServiceuses configurable timeouts for API calls. If a timeout occurs, aServiceUnavailableExceptionis thrown, and the health check mechanism will attempt to revive the service later. This exception is handled internally byautoRetry. AllLLMServicesUnavailableException: Thrown byLLMApiRouterwhen all configured LLM services are unavailable. The request cannot be processed and the exception should be handled by the caller.
New features
Model preference
It can be useful to select a different provider and/or model depending on the task (lower latency, lower cost, etc.).
Currently, the router will always select the first operational LLMService from the list, regardless of the task, and the caller has no influence over this behaviour.
Proposal 1.:
- The
LLMServiceshould contain a property than can be queried in the service selector to determine whether the service is appropriate for the task. - The purpose of the call should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
Proposal 2.:
- The
LLMApiRoutershould have multiple service lists based on appropriate grouping of the different providers/models. - The desired service group should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
Prompt service
This feature should be implemented outside of the LLMApiRouter service.