Overview

This article describes how to implement a AI Chat API, in AWS. It will discuss use cases, requirements, architectures and detail designs.

Use Case

In SaaS version of Typee, An authorized user can use a preset model without setting up any configuration. Admin first needs to setup an model which include a model name, a backing model, API Key for the backing model, and if use training data. Once Admin finish setup and published the model, Authorized users can see the new model in front end and are ready to use. Admin can see the usage of each user.

Requirements

Admin can add and delete a model.
Admin can publish and unpublish a model.
A model includes a id/name, a backing model, API key for backing model, status, training data.
Authorized user can see and use published modes.
Admin can track token usage of each authorized user.
Better to support streaming

Designs

Architectures

There are two architectures due to the limit of API Gateway and Lambda in supporting streaming.

Not Support Streaming

The current architecture of Typee API looks like below.

However, API Gateway doesn’t support SSE and Lambda have to response all content at once. This means If I use current architecture, I can’t support streaming feature.

Pros:

Same architecture as existing API. I don’t need to create other resources
No new packages, less effort.
Built-in Authentication

Cons:

Can’t support streaming. Users have to wait for full response from AI.

Given Typee is early stage and streaming is a better to have feature, so this architecture is best for Typee now.

Supporting Streaming

The architecture above is able to support streaming because:

ALB supports SSE
There is no restriction on service running on EC2

Pros:

Support streaming

Cons:

New resources and packages are required.
More effort.
Built-in authentication doesn’t support Authorize Header. It use cookie to store API Keys.

Detail Designs

DynamoDb Model

Preset Model Design

Key Type	Key Name	Value Type	Comment
Partition Key	UserId	String	Admin’ UserId
Sort Key	ModelId	String
Attribute	BackingModelId	String
Attribute	BackingModelApiKey	String
Attribute	Status	String	Draft/Published
Attribute	CreatedTime	String
Attribute	TrainingDataId	String	Used to look up relative text from vector database
Attribute	LastUpdatedTime	String	Nullable

Optional Secondary Indexes: UserId_LastUpdatedTime

Usage Tracking Design

Key Type	Key Name	Value Type	Comment
Partition Key	UserId	String
Sort Key	UsageId	String	Total/Total-ModelId/requestId. When usageId is total, it should total tokenCount
Attribute	TokenCount	Number
Attribute	ModelId	String
Attribute	Status	String	Draft/Published
Attribute	CreatedTime	String

Reference

Typee Chat: https://typee.chat

Typee Chat Deploy: https://deploy.typee.chat

Overview#

Use Case#

Requirements#

Designs#

Architectures#

Not Support Streaming#

Supporting Streaming#

Detail Designs#

DynamoDb Model#

Preset Model Design#

Usage Tracking Design#

Reference#