Overview
This article describes how to implement a AI Chat API, in AWS. It will discuss use cases, requirements, architectures and detail designs.
Use Case
In SaaS version of Typee, An authorized user can use a preset model without setting up any configuration. Admin first needs to setup an model which include a model name, a backing model, API Key for the backing model, and if use training data. Once Admin finish setup and published the model, Authorized users can see the new model in front end and are ready to use. Admin can see the usage of each user.
Requirements
- Admin can add and delete a model.
- Admin can publish and unpublish a model.
- A model includes a id/name, a backing model, API key for backing model, status, training data.
- Authorized user can see and use published modes.
- Admin can track token usage of each authorized user.
- Better to support streaming
Designs
Architectures
There are two architectures due to the limit of API Gateway and Lambda in supporting streaming.
Not Support Streaming
The current architecture of Typee API looks like below.
However, API Gateway doesn’t support SSE and Lambda have to response all content at once. This means If I use current architecture, I can’t support streaming feature.
Pros:
- Same architecture as existing API. I don’t need to create other resources
- No new packages, less effort.
- Built-in Authentication
Cons:
- Can’t support streaming. Users have to wait for full response from AI.
Given Typee is early stage and streaming is a better to have feature, so this architecture is best for Typee now.
Supporting Streaming
The architecture above is able to support streaming because:
- ALB supports SSE
- There is no restriction on service running on EC2
Pros:
- Support streaming
Cons:
- New resources and packages are required.
- More effort.
- Built-in authentication doesn’t support Authorize Header. It use cookie to store API Keys.
Detail Designs
DynamoDb Model
Preset Model Design
Key Type | Key Name | Value Type | Comment |
---|---|---|---|
Partition Key | UserId | String | Admin’ UserId |
Sort Key | ModelId | String | |
Attribute | BackingModelId | String | |
Attribute | BackingModelApiKey | String | |
Attribute | Status | String | Draft/Published |
Attribute | CreatedTime | String | |
Attribute | TrainingDataId | String | Used to look up relative text from vector database |
Attribute | LastUpdatedTime | String | Nullable |
Optional Secondary Indexes: UserId_LastUpdatedTime
Usage Tracking Design
Key Type | Key Name | Value Type | Comment |
---|---|---|---|
Partition Key | UserId | String | |
Sort Key | UsageId | String | Total/Total-ModelId/requestId. When usageId is total, it should total tokenCount |
Attribute | TokenCount | Number | |
Attribute | ModelId | String | |
Attribute | Status | String | Draft/Published |
Attribute | CreatedTime | String |
Reference
Typee Chat: https://typee.chat
Typee Chat Deploy: https://deploy.typee.chat