Add LLM API Router Service
82
LLM-API-Router-Service.md
Normal file
82
LLM-API-Router-Service.md
Normal file
@@ -0,0 +1,82 @@
|
||||
## LLM API Router Service: The current implementation
|
||||
|
||||
This service acts as a central point for routing requests to different Large
|
||||
Language Models, ensuring high availability and resilience.
|
||||
|
||||
### Components
|
||||
|
||||
1. **`LLMApiRouter`:** The main service entry point. It orchestrates the overall process.
|
||||
* **Configuration:** Loads configuration settings, including a list of available LLM services, timeouts, and retry parameters. The
|
||||
configuration is provided through a `LLMApiRouterConfiguration` object.
|
||||
* **`OpenAiModelFactory`:** Used to create instances of chat models for each LLM service.
|
||||
* **`LLMService` instances:** Represents each individual LLM service. The `LLMApiRouter` maintains a list of these.
|
||||
* **`CounterMetric`:** Used for tracking metrics such as LLM timeouts.
|
||||
|
||||
2. **`LLMService`:** Represents a single LLM service.
|
||||
* **Operational Status:** Tracks whether the service is currently available (`operational` flag).
|
||||
* **LLM Client:** Contains a client for interacting with the LLM API (built using the `OpenAiModelFactory`). It has two instances:
|
||||
* `shortModel` for requests that require a fast execution
|
||||
* `defaultModel` for requests that need more processing time.
|
||||
* **Configuration:** Stores specific configuration parameters for the LLM (API URL, API Key, model name, timeouts, etc.) via
|
||||
`LLMServiceConfiguration`.
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Initialization:**
|
||||
* The `LLMApiRouter` initializes a list of `LLMService` instances based on the provided configuration.
|
||||
* Each `LLMService` creates an LLM client using the `OpenAiModelFactory`.
|
||||
|
||||
2. **Request Processing (`processQuery` or `generateSummary`):**
|
||||
* A client sends a request (either a `Prompt` or a list of `Message` objects) to the `LLMApiRouter`.
|
||||
* The `LLMApiRouter` selects the first currently `operational` `LLMService`. If no services are available, it throws an
|
||||
`AllLLMServicesUnavailableException`.
|
||||
* The request is routed to the selected `LLMService`.
|
||||
* This call is wrapped in the `autoRetry` method
|
||||
|
||||
> **Note** The service selection mechanism is implemented in the `currentService` property.
|
||||
> This property is evaluated for each access, so it should always return an `LLMService` that is flagged as `operational`
|
||||
|
||||
3. **`LLMService` Processing:**
|
||||
* The `LLMService` uses its LLM client (`shortModel` or `defaultModel`, determined by the caller) to send the request to the LLM API.
|
||||
* The LLM API processes the request and returns a response.
|
||||
* The `LLMService` wraps the response in an `LlmServiceResponse` object (including the service ID) and returns it to the `LLMApiRouter`.
|
||||
|
||||
4. **Automatic Retry:**
|
||||
|
||||
* If a timeout is encountered by an `LLMService`, it sets its `operational` flag to false and throws a `ServiceUnavailableException`
|
||||
* the autoRetry method catches the exception and automatically retries the request
|
||||
* the `LLMApiRouter` always selects the first available service for each attempt independently each attempt independently
|
||||
|
||||
5. **Service Health Checks:**
|
||||
* A scheduled task (`@Scheduled`) periodically checks the availability of each `LLMService`.
|
||||
* For each `LLMService` that is not `operational`, a latency test is performed using a simple API call.
|
||||
* If the latency test succeeds, the `LLMService` is marked as `operational`. If it fails, the service remains unavailable and the health
|
||||
check is retried later.
|
||||
* A map of the `operational` status of each service can be retrieved with the `status` property of the `LLMApiRouter`
|
||||
|
||||
### Error Handling
|
||||
|
||||
* **Timeouts:** The `LLMService` uses configurable timeouts for API calls. If a timeout occurs, a `ServiceUnavailableException` is thrown, and
|
||||
the health check mechanism will attempt to revive the service later.
|
||||
This exception is handled internally by `autoRetry`.
|
||||
* **`AllLLMServicesUnavailableException`:** Thrown by `LLMApiRouter` when all configured LLM services are unavailable. The request cannot be processed and the exception should be handled by the caller.
|
||||
|
||||
|
||||
## New features
|
||||
### Model preference
|
||||
|
||||
It can be useful to select a different provider and/or model depending on the task (lower latency, lower cost, etc.).
|
||||
|
||||
Currently, the router will always select the first operational `LLMService` from the list, regardless of the task, and the caller has no influence over this behaviour.
|
||||
|
||||
#### Proposal 1.:
|
||||
* The `LLMService` should contain a property than can be queried in the service selector to determine whether the service is appropriate for the task.
|
||||
* The purpose of the call should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
|
||||
|
||||
#### Proposal 2.:
|
||||
* The `LLMApiRouter` should have multiple service lists based on appropriate grouping of the different providers/models.
|
||||
* The desired service group should be communicated to the service by adding new endpoints, or by adding parameters to the current endpoints.
|
||||
|
||||
### Prompt service
|
||||
|
||||
This feature should be implemented outside of the `LLMApiRouter` service.
|
||||
Reference in New Issue
Block a user