Google Gemini


This guide provides instructions for configuring and using Google's Gemini models within the AIGNE Framework via the @aigne/gemini package. It covers API key setup, model selection, and the specific features available for chat, image, and video generation.

The @aigne/gemini package offers a seamless integration with Google's advanced AI capabilities, including the Gemini multimodal models and the Imagen text-to-image models, providing a consistent interface within the AIGNE ecosystem.

Features#

  • Google API Integration: Provides a direct interface to Google's Gemini, Imagen, and Veo API services.
  • Chat Completions: Supports all available Gemini chat models for conversational AI.
  • Image Generation: Integrates with both Imagen and Gemini models for image generation and editing.
  • Video Generation: Leverages Google's Veo models for text-to-video, image-to-video, and frame interpolation tasks.
  • Multimodal Support: Natively handles inputs combining text, images, audio, and video.
  • Function Calling: Supports Gemini's function calling capabilities to interact with external tools.
  • Streaming Responses: Enables real-time data processing for more responsive applications.
  • Type-Safe: Includes comprehensive TypeScript typings for all APIs and model configurations.

Installation#

Install the required packages using your preferred package manager.

npm install @aigne/gemini @aigne/core

Configuration#

To authenticate requests, you must provide a Google API key. This can be done by setting an environment variable, which the framework will automatically detect.

Environment Variable

export GEMINI_API_KEY="your-google-api-key"

Alternatively, you can pass the apiKey directly in the model's constructor.

Chat Completions#

The GeminiChatModel class is used for conversational interactions.

Basic Usage#

The following example demonstrates how to instantiate and invoke the GeminiChatModel.

Chat Model Usage

import { GeminiChatModel } from "@aigne/gemini";

const model = new GeminiChatModel({
  // API key is optional if the GEMINI_API_KEY environment variable is set.
  apiKey: "your-api-key",
  // Specify the model. Defaults to 'gemini-2.0-flash'.
  model: "gemini-1.5-flash",
  modelOptions: {
    temperature: 0.7,
  },
});

const result = await model.invoke({
  messages: [{ role: "user", content: "Hi there, introduce yourself" }],
});

console.log(result);

Example Response

{
  "text": "Hello from Gemini! I'm Google's helpful AI assistant. How can I assist you today?",
  "model": "gemini-1.5-flash",
  "usage": {
    "inputTokens": 12,
    "outputTokens": 18
  }
}

Streaming Responses#

For real-time applications, you can process response chunks as they arrive by enabling streaming.

Streaming Example

import { isAgentResponseDelta } from "@aigne/core";
import { GeminiChatModel } from "@aigne/gemini";

const model = new GeminiChatModel({
  apiKey: "your-api-key",
  model: "gemini-1.5-flash",
});

const stream = await model.invoke(
  {
    messages: [{ role: "user", content: "Hi there, introduce yourself" }],
  },
  { streaming: true }
);

let fullText = "";
const json = {};

for await (const chunk of stream) {
  if (isAgentResponseDelta(chunk)) {
    const text = chunk.delta.text?.text;
    if (text) fullText += text;
    if (chunk.delta.json) Object.assign(json, chunk.delta.json);
  }
}

See all 6 lines

Chat Model Parameters#

messages
array
required
The conversation history. Each message object contains a 'role' and 'content'.
tools
array
A list of available function tools for the model to call.
toolChoice
string | object
Controls how the model uses tools. Can be 'auto', 'required', 'none', or a specific tool.
responseFormat
object
Specifies the desired output format, such as structured JSON.
model
string
The model to use (e.g., 'gemini-1.5-pro', 'gemini-1.5-flash').
temperature
number
Controls randomness (0-1). Higher values produce more creative responses.
topP
number
Nucleus sampling parameter (0-1).
topK
number
Top-k sampling parameter.
frequencyPenalty
number
Reduces the likelihood of repeating tokens.
presencePenalty
number
Encourages the model to introduce new topics.
reasoningEffort
string | number
For thinking models (e.g., Gemini 2.5), sets the token budget for reasoning. Can be 'minimal', 'low', 'medium', 'high' or a specific token count.
modalities
array
Specifies the desired response modalities, such as ['TEXT'], ['IMAGE'], or ['TEXT', 'IMAGE'].

Image Generation#

The GeminiImageModel class supports generating and editing images using both specialized Imagen models and multimodal Gemini models.

Basic Image Generation#

This example generates an image using an Imagen model.

Image Generation

import { GeminiImageModel } from "@aigne/gemini";

const model = new GeminiImageModel({
  apiKey: "your-api-key",
  model: "imagen-4.0-generate-001", // Default Imagen model
});

const result = await model.invoke({
  prompt: "A serene mountain landscape at sunset with golden light",
  n: 1,
});

console.log(result);

Example Response

{
  "images": [
    {
      "type": "file",
      "data": "iVBORw0KGgoAAAANSUhEUgAA...",
      "mimeType": "image/png"
    }
  ],
  "usage": { "inputTokens": 0, "outputTokens": 0 },
  "model": "imagen-4.0-generate-001"
}

Image Editing with Gemini Models#

Multimodal Gemini models can edit existing images based on a text prompt.

Image Editing

import { GeminiImageModel } from "@aigne/gemini";

const model = new GeminiImageModel({
  apiKey: "your-api-key",
  model: "gemini-2.0-flash-exp", // Gemini model for editing
});

const result = await model.invoke({
  prompt: "Add vibrant flowers in the foreground",
  image: [
    {
      type: "url",
      url: "https://example.com/original-image.png",
    },
  ],
  n: 1,
});

console.log(result.images); // Array of edited images

Image Model Parameters#

Parameters vary depending on the model family used.

Common Parameters#

Parameter

Type

Description

prompt

string

Required. A text description of the desired image.

model

string

The model to use. Defaults to imagen-4.0-generate-001.

n

number

The number of images to generate. Defaults to 1.

image

array

For Gemini models, an array of reference images for editing.

Imagen Model Parameters#

Parameter

Type

Description

seed

number

A random seed for reproducible results.

safetyFilterLevel

string

The content moderation safety filter level.

personGeneration

string

Controls settings for generating images of people.

outputMimeType

string

The output image format (e.g., image/png).

negativePrompt

string

A description of what to exclude from the image.

imageSize

string

The dimensions of the generated image (e.g., "1024x1024").

aspectRatio

string

The aspect ratio of the image (e.g., "16:9").

Gemini Model Parameters#

Parameter

Type

Description

temperature

number

Controls randomness (0.0 to 1.0).

maxOutputTokens

number

The maximum number of tokens in the response.

topP

number

Nucleus sampling parameter.

topK

number

Top-k sampling parameter.

safetySettings

array

Custom safety settings for content generation.

seed

number

A random seed for reproducible results.

systemInstruction

string

System-level instructions to guide the model.

Video Generation#

The GeminiVideoModel class uses Google's Veo models to generate videos from text or images.

Basic Video Generation#

Text-to-Video

import { GeminiVideoModel } from "@aigne/gemini";

const videoModel = new GeminiVideoModel({
  apiKey: "your-api-key",
  model: "veo-3.1-generate-preview",
});

const result = await videoModel.invoke({
  prompt: "A serene lake with mountains in the background, gentle waves rippling",
  aspectRatio: "16:9",
  size: "720p",
  seconds: "8",
});

console.log(result);

Example Response

{
  "videos": [
    {
      "type": "file",
      "data": "base64-encoded-video-data...",
      "mimeType": "video/mp4",
      "filename": "timestamp.mp4"
    }
  ],
  "usage": { "inputTokens": 0, "outputTokens": 0 },
  "model": "veo-3.1-generate-preview",
  "seconds": 8
}

Advanced Video Generation#

Veo models also support image-to-video and frame interpolation.

  • Image-to-Video: Provide a prompt and a source image to animate a static picture.
  • Frame Interpolation: Provide a prompt, a starting image, and an ending lastFrame to generate a smooth transition between them.

Image-to-Video

const result = await videoModel.invoke({
  prompt: "Animate this image with gentle movement, clouds drifting slowly",
  image: {
    type: "url",
    url: "https://example.com/input-image.png",
  },
  seconds: "8",
});

Video Model Parameters#

prompt
string
required
A text description of the desired video content.
model
string
The Veo model to use. Defaults to 'veo-3.1-generate-preview'.
aspectRatio
string
Video aspect ratio, either '16:9' (default) or '9:16'.
size
string
Video resolution, either '720p' (default) or '1080p'.
seconds
string
Video duration in seconds: '4', '6', or '8' (default).
image
object
A reference image for image-to-video or the first frame for interpolation.
lastFrame
object
The last frame for frame interpolation.
referenceImages
array
Additional reference images for video generation (Veo 3.1 only).
negativePrompt
string
A description of what to avoid in the video.

Further Reading#

For complete API details, refer to the official documentation.