Add LLM API Router Service

2025-09-26 03:42:51 +02:00
commit 56edddc6aa
1 changed files with 82 additions and 0 deletions
--- a/LLM-API-Router-Service.md
+++ b/LLM-API-Router-Service.md
@@ -0,0 +1,82 @@
+## LLM API Router Service: The current implementation
+
+This service acts as a central point for routing requests to different Large 
+Language Models, ensuring high availability and resilience.
+
+### Components
+
+1. **`LLMApiRouter`:**  The main service entry point. It orchestrates the overall process.
+    * **Configuration:**  Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters.  The 
+configuration is provided through a `LLMApiRouterConfiguration` object.
+    * **`OpenAiModelFactory`:**  Used to create instances of chat models for each LLM service.
+    * **`LLMService` instances:**  Represents each individual LLM service. The `LLMApiRouter` maintains a list of these.
+    * **`CounterMetric`:**  Used for tracking metrics such as LLM timeouts.
+
+2. **`LLMService`:**  Represents a single LLM service.
+    * **Operational Status:**  Tracks whether the service is currently available (`operational` flag).
+    * **LLM Client:**  Contains a client for interacting with the LLM API (built using the `OpenAiModelFactory`). It has two instances: 
+        * `shortModel` for requests that require a fast execution
+        * `defaultModel` for requests that need more processing time.
+    * **Configuration:** Stores specific configuration parameters for the LLM (API URL, API Key, model name, timeouts, etc.) via 
+`LLMServiceConfiguration`.
+
+### Workflow
+
+1. **Initialization:**
+    * The `LLMApiRouter` initializes a list of `LLMService` instances based on the provided configuration.
+    * Each `LLMService` creates an LLM client using the `OpenAiModelFactory`.
+
+2. **Request Processing (`processQuery` or `generateSummary`):**
+    * A client sends a request (either a `Prompt` or a list of `Message` objects) to the `LLMApiRouter`.
+    * The `LLMApiRouter` selects the first currently `operational` `LLMService`. If no services are available, it throws an 
+`AllLLMServicesUnavailableException`.
+    * The request is routed to the selected `LLMService`.
+        * This call is wrapped in the `autoRetry` method
+
+> **Note** The service selection mechanism is implemented in the `currentService` property.  
+> This property is evaluated for each access, so it should always return an `LLMService` that is flagged as `operational`
+
+3. **`LLMService` Processing:**
+    * The `LLMService` uses its LLM client (`shortModel` or `defaultModel`, determined by the caller) to send the request to the LLM API.
+    * The LLM API processes the request and returns a response.
+    * The `LLMService` wraps the response in an `LlmServiceResponse` object (including the service ID) and returns it to the `LLMApiRouter`.
+
+4. **Automatic Retry:**
+
+    * If a timeout is encountered by an `LLMService`, it sets its `operational` flag to false and throws a `ServiceUnavailableException`
+    * the autoRetry method catches the exception and automatically retries the request
+        * the `LLMApiRouter` always selects the first available service for each attempt independently each attempt independently
+
+5. **Service Health Checks:**
+    * A scheduled task (`@Scheduled`) periodically checks the availability of each `LLMService`.
+    * For each `LLMService` that is not `operational`, a latency test is performed using a simple API call.
+    * If the latency test succeeds, the `LLMService` is marked as `operational`. If it fails, the service remains unavailable and the health 
+check is retried later.
+    * A map of the `operational` status of each service can be retrieved with the `status` property of the `LLMApiRouter`
+
+### Error Handling
+
+* **Timeouts:**  The `LLMService` uses configurable timeouts for API calls.  If a timeout occurs, a `ServiceUnavailableException` is thrown, and 
+the health check mechanism will attempt to revive the service later.
+This exception is handled internally by `autoRetry`.
+* **`AllLLMServicesUnavailableException`:** Thrown by `LLMApiRouter` when all configured LLM services are unavailable. The request cannot be processed and the exception should be handled by the caller.
+
+
+## New features
+### Model preference
+
+It can be useful to select a different provider and/or model depending on the task (lower latency, lower cost, etc.).
+
+Currently, the router will always select the first operational `LLMService` from the list, regardless of the task, and the caller has no influence over this behaviour.
+
+#### Proposal 1.:
+* The `LLMService` should contain a property than can be queried in the service selector to determine whether the service is appropriate for the task.
+* The purpose of the call should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
+
+#### Proposal 2.:
+* The `LLMApiRouter` should have multiple service lists based on appropriate grouping of the different providers/models.
+* The desired service group should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
+
+### Prompt service
+
+This feature should be implemented outside of the `LLMApiRouter` service.