Overview

This article describes how to implement a AI Chat API, in AWS. It will discuss use cases, requirements, architectures and detail designs.

Use Case

In SaaS version of Typee, An authorized user can use a preset model without setting up any configuration. Admin first needs to setup an model which include a model name, a backing model, API Key for the backing model, and if use training data. Once Admin finish setup and published the model, Authorized users can see the new model in front end and are ready to use. Admin can see the usage of each user.

Requirements

  1. Admin can add and delete a model.
  2. Admin can publish and unpublish a model.
  3. A model includes a id/name, a backing model, API key for backing model, status, training data.
  4. Authorized user can see and use published modes.
  5. Admin can track token usage of each authorized user.
  6. Better to support streaming

Designs

Architectures

There are two architectures due to the limit of API Gateway and Lambda in supporting streaming.

Not Support Streaming

The current architecture of Typee API looks like below.

However, API Gateway doesn’t support SSE and Lambda have to response all content at once. This means If I use current architecture, I can’t support streaming feature.

Pros:

  1. Same architecture as existing API. I don’t need to create other resources
  2. No new packages, less effort.
  3. Built-in Authentication

Cons:

  1. Can’t support streaming. Users have to wait for full response from AI.

Given Typee is early stage and streaming is a better to have feature, so this architecture is best for Typee now.

Supporting Streaming

The architecture above is able to support streaming because:

  1. ALB supports SSE
  2. There is no restriction on service running on EC2

Pros:

  1. Support streaming

Cons:

  1. New resources and packages are required.
  2. More effort.
  3. Built-in authentication doesn’t support Authorize Header. It use cookie to store API Keys.

Detail Designs

DynamoDb Model

Preset Model Design

Key TypeKey NameValue TypeComment
Partition KeyUserIdStringAdmin’ UserId
Sort KeyModelIdString
AttributeBackingModelIdString
AttributeBackingModelApiKeyString
AttributeStatusStringDraft/Published
AttributeCreatedTimeString
AttributeTrainingDataIdStringUsed to look up relative text from vector database
AttributeLastUpdatedTimeStringNullable

Optional Secondary Indexes: UserId_LastUpdatedTime

Usage Tracking Design

Key TypeKey NameValue TypeComment
Partition KeyUserIdString
Sort KeyUsageIdStringTotal/Total-ModelId/requestId. When usageId is total, it should total tokenCount
AttributeTokenCountNumber
AttributeModelIdString
AttributeStatusStringDraft/Published
AttributeCreatedTimeString

Reference

Typee Chat: https://typee.chat

Typee Chat Deploy: https://deploy.typee.chat