commit 56edddc6aab95b071016926851756db2bc8232b3 Author: midnight Date: Fri Sep 26 03:42:51 2025 +0200 Add LLM API Router Service diff --git a/LLM-API-Router-Service.md b/LLM-API-Router-Service.md new file mode 100644 index 0000000..b75460f --- /dev/null +++ b/LLM-API-Router-Service.md @@ -0,0 +1,82 @@ +## LLM API Router Service: The current implementation + +This service acts as a central point for routing requests to different Large +Language Models, ensuring high availability and resilience. + +### Components + +1. **`LLMApiRouter`:** The main service entry point. It orchestrates the overall process. + * **Configuration:** Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters. The +configuration is provided through a `LLMApiRouterConfiguration` object. + * **`OpenAiModelFactory`:** Used to create instances of chat models for each LLM service. + * **`LLMService` instances:** Represents each individual LLM service. The `LLMApiRouter` maintains a list of these. + * **`CounterMetric`:** Used for tracking metrics such as LLM timeouts. + +2. **`LLMService`:** Represents a single LLM service. + * **Operational Status:** Tracks whether the service is currently available (`operational` flag). + * **LLM Client:** Contains a client for interacting with the LLM API (built using the `OpenAiModelFactory`). It has two instances: + * `shortModel` for requests that require a fast execution + * `defaultModel` for requests that need more processing time. + * **Configuration:** Stores specific configuration parameters for the LLM (API URL, API Key, model name, timeouts, etc.) via +`LLMServiceConfiguration`. + +### Workflow + +1. **Initialization:** + * The `LLMApiRouter` initializes a list of `LLMService` instances based on the provided configuration. + * Each `LLMService` creates an LLM client using the `OpenAiModelFactory`. + +2. **Request Processing (`processQuery` or `generateSummary`):** + * A client sends a request (either a `Prompt` or a list of `Message` objects) to the `LLMApiRouter`. + * The `LLMApiRouter` selects the first currently `operational` `LLMService`. If no services are available, it throws an +`AllLLMServicesUnavailableException`. + * The request is routed to the selected `LLMService`. + * This call is wrapped in the `autoRetry` method + +> **Note** The service selection mechanism is implemented in the `currentService` property. +> This property is evaluated for each access, so it should always return an `LLMService` that is flagged as `operational` + +3. **`LLMService` Processing:** + * The `LLMService` uses its LLM client (`shortModel` or `defaultModel`, determined by the caller) to send the request to the LLM API. + * The LLM API processes the request and returns a response. + * The `LLMService` wraps the response in an `LlmServiceResponse` object (including the service ID) and returns it to the `LLMApiRouter`. + +4. **Automatic Retry:** + + * If a timeout is encountered by an `LLMService`, it sets its `operational` flag to false and throws a `ServiceUnavailableException` + * the autoRetry method catches the exception and automatically retries the request + * the `LLMApiRouter` always selects the first available service for each attempt independently each attempt independently + +5. **Service Health Checks:** + * A scheduled task (`@Scheduled`) periodically checks the availability of each `LLMService`. + * For each `LLMService` that is not `operational`, a latency test is performed using a simple API call. + * If the latency test succeeds, the `LLMService` is marked as `operational`. If it fails, the service remains unavailable and the health +check is retried later. + * A map of the `operational` status of each service can be retrieved with the `status` property of the `LLMApiRouter` + +### Error Handling + +* **Timeouts:** The `LLMService` uses configurable timeouts for API calls. If a timeout occurs, a `ServiceUnavailableException` is thrown, and +the health check mechanism will attempt to revive the service later. +This exception is handled internally by `autoRetry`. +* **`AllLLMServicesUnavailableException`:** Thrown by `LLMApiRouter` when all configured LLM services are unavailable. The request cannot be processed and the exception should be handled by the caller. + + +## New features +### Model preference + +It can be useful to select a different provider and/or model depending on the task (lower latency, lower cost, etc.). + +Currently, the router will always select the first operational `LLMService` from the list, regardless of the task, and the caller has no influence over this behaviour. + +#### Proposal 1.: +* The `LLMService` should contain a property than can be queried in the service selector to determine whether the service is appropriate for the task. +* The purpose of the call should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints. + +#### Proposal 2.: +* The `LLMApiRouter` should have multiple service lists based on appropriate grouping of the different providers/models. +* The desired service group should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints. + +### Prompt service + +This feature should be implemented outside of the `LLMApiRouter` service. \ No newline at end of file