Add LLM API Router Service

2025-09-26 03:42:51 +02:00
commit 56edddc6aa

82
LLM-API-Router-Service.md Normal file

@@ -0,0 +1,82 @@
## LLM API Router Service: The current implementation
This service acts as a central point for routing requests to different Large
Language Models, ensuring high availability and resilience.
### Components
1. **`LLMApiRouter`:** The main service entry point. It orchestrates the overall process.
* **Configuration:** Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters. The
configuration is provided through a `LLMApiRouterConfiguration` object.
* **`OpenAiModelFactory`:** Used to create instances of chat models for each LLM service.
* **`LLMService` instances:** Represents each individual LLM service. The `LLMApiRouter` maintains a list of these.
* **`CounterMetric`:** Used for tracking metrics such as LLM timeouts.
2. **`LLMService`:** Represents a single LLM service.
* **Operational Status:** Tracks whether the service is currently available (`operational` flag).
* **LLM Client:** Contains a client for interacting with the LLM API (built using the `OpenAiModelFactory`). It has two instances:
* `shortModel` for requests that require a fast execution
* `defaultModel` for requests that need more processing time.
* **Configuration:** Stores specific configuration parameters for the LLM (API URL, API Key, model name, timeouts, etc.) via
`LLMServiceConfiguration`.
### Workflow
1. **Initialization:**
* The `LLMApiRouter` initializes a list of `LLMService` instances based on the provided configuration.
* Each `LLMService` creates an LLM client using the `OpenAiModelFactory`.
2. **Request Processing (`processQuery` or `generateSummary`):**
* A client sends a request (either a `Prompt` or a list of `Message` objects) to the `LLMApiRouter`.
* The `LLMApiRouter` selects the first currently `operational` `LLMService`. If no services are available, it throws an
`AllLLMServicesUnavailableException`.
* The request is routed to the selected `LLMService`.
* This call is wrapped in the `autoRetry` method
> **Note** The service selection mechanism is implemented in the `currentService` property.
> This property is evaluated for each access, so it should always return an `LLMService` that is flagged as `operational`
3. **`LLMService` Processing:**
* The `LLMService` uses its LLM client (`shortModel` or `defaultModel`, determined by the caller) to send the request to the LLM API.
* The LLM API processes the request and returns a response.
* The `LLMService` wraps the response in an `LlmServiceResponse` object (including the service ID) and returns it to the `LLMApiRouter`.
4. **Automatic Retry:**
* If a timeout is encountered by an `LLMService`, it sets its `operational` flag to false and throws a `ServiceUnavailableException`
* the autoRetry method catches the exception and automatically retries the request
* the `LLMApiRouter` always selects the first available service for each attempt independently each attempt independently
5. **Service Health Checks:**
* A scheduled task (`@Scheduled`) periodically checks the availability of each `LLMService`.
* For each `LLMService` that is not `operational`, a latency test is performed using a simple API call.
* If the latency test succeeds, the `LLMService` is marked as `operational`. If it fails, the service remains unavailable and the health
check is retried later.
* A map of the `operational` status of each service can be retrieved with the `status` property of the `LLMApiRouter`
### Error Handling
* **Timeouts:** The `LLMService` uses configurable timeouts for API calls. If a timeout occurs, a `ServiceUnavailableException` is thrown, and
the health check mechanism will attempt to revive the service later.
This exception is handled internally by `autoRetry`.
* **`AllLLMServicesUnavailableException`:** Thrown by `LLMApiRouter` when all configured LLM services are unavailable. The request cannot be processed and the exception should be handled by the caller.
## New features
### Model preference
It can be useful to select a different provider and/or model depending on the task (lower latency, lower cost, etc.).
Currently, the router will always select the first operational `LLMService` from the list, regardless of the task, and the caller has no influence over this behaviour.
#### Proposal 1.:
* The `LLMService` should contain a property than can be queried in the service selector to determine whether the service is appropriate for the task.
* The purpose of the call should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
#### Proposal 2.:
* The `LLMApiRouter` should have multiple service lists based on appropriate grouping of the different providers/models.
* The desired service group should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
### Prompt service
This feature should be implemented outside of the `LLMApiRouter` service.