Models


Models are specialized agents that serve as a crucial abstraction layer, providing a standardized interface for interacting with external AI services, such as Large Language Models (LLMs) and image generation platforms. They encapsulate the complexity of API communication, allowing developers to work with various AI providers through a consistent and unified contract.

The AIGNE Framework defines a base Model class, which is extended by two primary specializations: ChatModel for text-based conversational AI and ImageModel for image generation tasks. These abstractions are the foundation upon which higher-level agents like AIAgent and ImageAgent are built.

Core Concepts#

The Model layer is designed to streamline interactions with different AI providers. Instead of writing provider-specific code for each service (like OpenAI, Anthropic, or Google Gemini), you interact with the standardized ChatModel or ImageModel interface. The AIGNE framework, through specific model packages (e.g., @aigne/openai), handles the translation between this standard format and the provider's native API.

This design offers several key advantages:

  • Provider Agnostic: Swap out underlying AI models with minimal code changes. For example, you can switch from OpenAI's GPT-4 to Anthropic's Claude 3 by simply changing the model instantiation.
  • Standardized Data Structures: All models use consistent input and output schemas (ChatModelInput, ImageModelOutput, etc.), simplifying data handling and agent composition.
  • Simplified API: The models provide a clean, high-level API that abstracts away the nuances of authentication, request formatting, and error handling for each external service.

The following diagram illustrates the relationship between the base Agent, the Model abstractions, and the external AI services they connect to.


ChatModel Abstraction#

The ChatModel is an abstract class designed for interfacing with Large Language Models (LLMs). It provides a structured way to handle conversational interactions, including multi-turn dialogues, tool usage, and structured data extraction.

ChatModelInput#

The ChatModelInput interface defines the data structure for requests sent to a language model. It standardizes how messages, tools, and other configurations are passed.

messages
ChatModelInputMessage[]
required

An array of message objects that form the conversation history and the current prompt.

responseFormat
ChatModelInputResponseFormat

Specifies the desired format for the model's output, such as plain text or structured JSON based on a provided schema.

tools
ChatModelInputTool[]

A list of available tools (functions) that the model can request to call to perform actions or retrieve information.

toolChoice
ChatModelInputToolChoice

Controls how the model uses the provided tools. It can be set to "auto", "none", "required", or to force a specific function call.

modelOptions
ChatModelInputOptions

A container for provider-specific options, such as temperature, topP, or parallelToolCalls.

outputFileType
'local' | 'file'

Specifies the desired format for any file-based outputs, either as a local file path (local) or a base64-encoded string (file).

ChatModelInputMessage#

Each message in the messages array follows a defined structure.

role
'system' | 'user' | 'agent' | 'tool'
required

The role of the message sender. system provides instructions, user represents user input, agent is for model responses, and tool is for the output of a tool call.

content
string | UnionContent[]

The content of the message. It can be a simple string or an array for multimodal content, combining text and images (FileUnionContent).

toolCalls
object[]

Used in an agent message to indicate one or more tool calls initiated by the model.

toolCallId
string

Used in a tool message to link the tool's output back to the corresponding toolCalls request.

ChatModelOutput#

The ChatModelOutput interface standardizes the response received from a language model.

text
string

The text-based content of the model's response.

json
object

The JSON object returned by the model when responseFormat is set to "json_schema".

toolCalls
ChatModelOutputToolCall[]

An array of tool call requests made by the model. Each object includes the function name and arguments.

usage
ChatModelOutputUsage

An object containing token usage statistics, including inputTokens and outputTokens.

model
string

The identifier of the model that generated the response.

files
FileUnionContent[]

An array of files generated by the model, if any.

ImageModel Abstraction#

The ImageModel is an abstract class for interfacing with image generation models. It provides a simplified contract for creating or editing images based on textual prompts.

ImageModelInput#

The ImageModelInput interface defines the request structure for an image generation task.

prompt
string
required

A textual description of the desired image.

image
FileUnionContent[]

An optional array of input images, used for tasks like image editing or creating variations.

n
number

The number of images to generate. Defaults to 1.

outputFileType
'local' | 'file'

Specifies whether the output images should be saved as local files (local) or returned as base64-encoded strings (file).

modelOptions
ImageModelInputOptions

A container for provider-specific options, such as image dimensions, quality, or style presets.

ImageModelOutput#

The ImageModelOutput interface defines the response structure from an image generation service.

images
FileUnionContent[]
required

An array of the generated images. The format of each element depends on the outputFileType specified in the input.

usage
ChatModelOutputUsage

An object containing usage statistics, which may include token counts or other provider-specific metrics.

model
string

The identifier of the model that generated the images.

File Content Types#

Models handle various forms of file inputs for multimodal tasks through the FileUnionContent type. This discriminated union allows files to be represented in three ways:

  • LocalContent: Represents a file stored on the local filesystem.
    • type: "local"
    • path: The absolute path to the file.
  • UrlContent: Represents a file accessible via a public URL.
    • type: "url"
    • url: The URL of the file.
  • FileContent: Represents a file as a base64-encoded string.
    • type: "file"
    • data: The base64-encoded content of the file.

The Model base class includes a transformFileType method that can automatically convert between these formats as needed, simplifying file handling across different agents and model providers.

Summary#

The ChatModel and ImageModel abstractions are core components that make the AIGNE Framework flexible and provider-agnostic. They provide a stable, standardized interface for interacting with a wide range of external AI services.

  • To learn how to use these models in practice, see the documentation for the AI Agent and Image Agent.
  • For details on configuring specific providers like OpenAI, Anthropic, or Google Gemini, refer to the guides in the Models section.