Google Gemini

This guide provides instructions for configuring and using Google's Gemini models within the AIGNE Framework via the @aigne/gemini package. It covers API key setup, model selection, and the specific features available for chat, image, and video generation.
The @aigne/gemini package offers a seamless integration with Google's advanced AI capabilities, including the Gemini multimodal models and the Imagen text-to-image models, providing a consistent interface within the AIGNE ecosystem.
Features#Google API Integration: Provides a direct interface to Google's Gemini, Imagen, and Veo API services.
Chat Completions: Supports all available Gemini chat models for conversational AI.
Image Generation: Integrates with both Imagen and Gemini models for image generation and editing.
Video Generation: Leverages Google's Veo models for text-to-video, image-to-video, and frame interpolation tasks.
Multimodal Support: Natively handles inputs combining text, images, audio, and video.
Function Calling: Supports Gemini's function calling capabilities to interact with external tools.
Streaming Responses: Enables real-time data processing for more responsive applications.
Type-Safe: Includes comprehensive TypeScript typings for all APIs and model configurations.
Installation#Install the required packages using your preferred package manager.
npm install @aigne/gemini @aigne/core
Configuration#To authenticate requests, you must provide a Google API key. This can be done by setting an environment variable, which the framework will automatically detect.
Environment Variable
export GEMINI_API_KEY="your-google-api-key"
Alternatively, you can pass the apiKey directly in the model's constructor.
Chat Completions#The GeminiChatModel class is used for conversational interactions.
Basic Usage#The following example demonstrates how to instantiate and invoke the GeminiChatModel.
Chat Model Usage
import { GeminiChatModel } from "@aigne/gemini";

const model = new GeminiChatModel({
  // API key is optional if the GEMINI_API_KEY environment variable is set.
  apiKey: "your-api-key",
  // Specify the model. Defaults to 'gemini-2.0-flash'.
  model: "gemini-1.5-flash",
  modelOptions: {
    temperature: 0.7,
  },
});

const result = await model.invoke({
  messages: [{ role: "user", content: "Hi there, introduce yourself" }],
});

console.log(result);
Example Response
{
  "text": "Hello from Gemini! I'm Google's helpful AI assistant. How can I assist you today?",
  "model": "gemini-1.5-flash",
  "usage": {
    "inputTokens": 12,
    "outputTokens": 18
  }
}
Streaming Responses#For real-time applications, you can process response chunks as they arrive by enabling streaming.
Streaming Example
import { isAgentResponseDelta } from "@aigne/core";
import { GeminiChatModel } from "@aigne/gemini";

const model = new GeminiChatModel({
  apiKey: "your-api-key",
  model: "gemini-1.5-flash",
});

const stream = await model.invoke(
  {
    messages: [{ role: "user", content: "Hi there, introduce yourself" }],
  },
  { streaming: true }
);

let fullText = "";
const json = {};

for await (const chunk of stream) {
  if (isAgentResponseDelta(chunk)) {
    const text = chunk.delta.text?.text;
    if (text) fullText += text;
    if (chunk.delta.json) Object.assign(json, chunk.delta.json);
  }
}
See all 6 lines
Chat Model Parameters#messages
array
required
The conversation history. Each message object contains a 'role' and 'content'.
tools
array
A list of available function tools for the model to call.
toolChoice
string | object
Controls how the model uses tools. Can be 'auto', 'required', 'none', or a specific tool.
responseFormat
object
Specifies the desired output format, such as structured JSON.
model
string
The model to use (e.g., 'gemini-1.5-pro', 'gemini-1.5-flash').
temperature
number
Controls randomness (0-1). Higher values produce more creative responses.
topP
number
Nucleus sampling parameter (0-1).
topK
number
Top-k sampling parameter.
frequencyPenalty
number
Reduces the likelihood of repeating tokens.
presencePenalty
number
Encourages the model to introduce new topics.
reasoningEffort
string | number
For thinking models (e.g., Gemini 2.5), sets the token budget for reasoning. Can be 'minimal', 'low', 'medium', 'high' or a specific token count.
modalities
array
Specifies the desired response modalities, such as ['TEXT'], ['IMAGE'], or ['TEXT', 'IMAGE'].
Image Generation#The GeminiImageModel class supports generating and editing images using both specialized Imagen models and multimodal Gemini models.
Basic Image Generation#This example generates an image using an Imagen model.
Image Generation
import { GeminiImageModel } from "@aigne/gemini";

const model = new GeminiImageModel({
  apiKey: "your-api-key",
  model: "imagen-4.0-generate-001", // Default Imagen model
});

const result = await model.invoke({
  prompt: "A serene mountain landscape at sunset with golden light",
  n: 1,
});

console.log(result);
Example Response
{
  "images": [
    {
      "type": "file",
      "data": "iVBORw0KGgoAAAANSUhEUgAA...",
      "mimeType": "image/png"
    }
  ],
  "usage": { "inputTokens": 0, "outputTokens": 0 },
  "model": "imagen-4.0-generate-001"
}
Image Editing with Gemini Models#Multimodal Gemini models can edit existing images based on a text prompt.
Image Editing
import { GeminiImageModel } from "@aigne/gemini";

const model = new GeminiImageModel({
  apiKey: "your-api-key",
  model: "gemini-2.0-flash-exp", // Gemini model for editing
});

const result = await model.invoke({
  prompt: "Add vibrant flowers in the foreground",
  image: [
    {
      type: "url",
      url: "https://example.com/original-image.png",
    },
  ],
  n: 1,
});

console.log(result.images); // Array of edited images
Image Model Parameters#Parameters vary depending on the model family used.
Common Parameters#Parameter
Type
Description
prompt
string
Required. A text description of the desired image.
model
string
The model to use. Defaults to imagen-4.0-generate-001.
n
number
The number of images to generate. Defaults to 1.
image
array
For Gemini models, an array of reference images for editing.
Imagen Model Parameters#Parameter
Type
Description
seed
number
A random seed for reproducible results.
safetyFilterLevel
string
The content moderation safety filter level.
personGeneration
string
Controls settings for generating images of people.
outputMimeType
string
The output image format (e.g., image/png).
negativePrompt
string
A description of what to exclude from the image.
imageSize
string
The dimensions of the generated image (e.g., "1024x1024").
aspectRatio
string
The aspect ratio of the image (e.g., "16:9").
Gemini Model Parameters#Parameter
Type
Description
temperature
number
Controls randomness (0.0 to 1.0).
maxOutputTokens
number
The maximum number of tokens in the response.
topP
number
Nucleus sampling parameter.
topK
number
Top-k sampling parameter.
safetySettings
array
Custom safety settings for content generation.
seed
number
A random seed for reproducible results.
systemInstruction
string
System-level instructions to guide the model.
Video Generation#The GeminiVideoModel class uses Google's Veo models to generate videos from text or images.
Basic Video Generation#Text-to-Video
import { GeminiVideoModel } from "@aigne/gemini";

const videoModel = new GeminiVideoModel({
  apiKey: "your-api-key",
  model: "veo-3.1-generate-preview",
});

const result = await videoModel.invoke({
  prompt: "A serene lake with mountains in the background, gentle waves rippling",
  aspectRatio: "16:9",
  size: "720p",
  seconds: "8",
});

console.log(result);
Example Response
{
  "videos": [
    {
      "type": "file",
      "data": "base64-encoded-video-data...",
      "mimeType": "video/mp4",
      "filename": "timestamp.mp4"
    }
  ],
  "usage": { "inputTokens": 0, "outputTokens": 0 },
  "model": "veo-3.1-generate-preview",
  "seconds": 8
}
Advanced Video Generation#Veo models also support image-to-video and frame interpolation.
Image-to-Video: Provide a prompt and a source image to animate a static picture.
Frame Interpolation: Provide a prompt, a starting image, and an ending lastFrame to generate a smooth transition between them.
Image-to-Video
const result = await videoModel.invoke({
  prompt: "Animate this image with gentle movement, clouds drifting slowly",
  image: {
    type: "url",
    url: "https://example.com/input-image.png",
  },
  seconds: "8",
});
Video Model Parameters#prompt
string
required
A text description of the desired video content.
model
string
The Veo model to use. Defaults to 'veo-3.1-generate-preview'.
aspectRatio
string
Video aspect ratio, either '16:9' (default) or '9:16'.
size
string
Video resolution, either '720p' (default) or '1080p'.
seconds
string
Video duration in seconds: '4', '6', or '8' (default).
image
object
A reference image for image-to-video or the first frame for interpolation.
lastFrame
object
The last frame for frame interpolation.
referenceImages
array
Additional reference images for video generation (Veo 3.1 only).
negativePrompt
string
A description of what to avoid in the video.
Further Reading#For complete API details, refer to the official documentation.
AIGNE Framework Documentation
Google GenAI API Reference

Parameter	Type	Description
`prompt`	`string`	Required. A text description of the desired image.
`model`	`string`	The model to use. Defaults to `imagen-4.0-generate-001`.
`n`	`number`	The number of images to generate. Defaults to `1`.
`image`	`array`	For Gemini models, an array of reference images for editing.

Parameter	Type	Description
`seed`	`number`	A random seed for reproducible results.
`safetyFilterLevel`	`string`	The content moderation safety filter level.
`personGeneration`	`string`	Controls settings for generating images of people.
`outputMimeType`	`string`	The output image format (e.g., `image/png`).
`negativePrompt`	`string`	A description of what to exclude from the image.
`imageSize`	`string`	The dimensions of the generated image (e.g., "1024x1024").
`aspectRatio`	`string`	The aspect ratio of the image (e.g., "16:9").

Parameter	Type	Description
`temperature`	`number`	Controls randomness (0.0 to 1.0).
`maxOutputTokens`	`number`	The maximum number of tokens in the response.
`topP`	`number`	Nucleus sampling parameter.
`topK`	`number`	Top-k sampling parameter.
`safetySettings`	`array`	Custom safety settings for content generation.
`seed`	`number`	A random seed for reproducible results.
`systemInstruction`	`string`	System-level instructions to guide the model.

Overview

Getting Started

Core Concepts

Agent Types

Advanced Topics

User Guide

Models

Google Gemini

Features#

Installation#

Configuration#

Chat Completions#

Basic Usage#

Streaming Responses#

Chat Model Parameters#

Image Generation#

Basic Image Generation#

Image Editing with Gemini Models#

Image Model Parameters#

Common Parameters#

Imagen Model Parameters#

Gemini Model Parameters#

Video Generation#

Basic Video Generation#

Advanced Video Generation#

Video Model Parameters#

Further Reading#

Anthropic

AWS Bedrock