Google Gemini
This guide provides instructions for configuring and using Google's Gemini models within the AIGNE Framework via the @aigne/gemini package. It covers API key setup, model selection, and the specific features available for chat, image, and video generation.
The @aigne/gemini package offers a seamless integration with Google's advanced AI capabilities, including the Gemini multimodal models and the Imagen text-to-image models, providing a consistent interface within the AIGNE ecosystem.
Features#
- Google API Integration: Provides a direct interface to Google's Gemini, Imagen, and Veo API services.
- Chat Completions: Supports all available Gemini chat models for conversational AI.
- Image Generation: Integrates with both Imagen and Gemini models for image generation and editing.
- Video Generation: Leverages Google's Veo models for text-to-video, image-to-video, and frame interpolation tasks.
- Multimodal Support: Natively handles inputs combining text, images, audio, and video.
- Function Calling: Supports Gemini's function calling capabilities to interact with external tools.
- Streaming Responses: Enables real-time data processing for more responsive applications.
- Type-Safe: Includes comprehensive TypeScript typings for all APIs and model configurations.
Installation#
Install the required packages using your preferred package manager.
npm install @aigne/gemini @aigne/coreConfiguration#
To authenticate requests, you must provide a Google API key. This can be done by setting an environment variable, which the framework will automatically detect.
Environment Variable
export GEMINI_API_KEY="your-google-api-key"Alternatively, you can pass the apiKey directly in the model's constructor.
Chat Completions#
The GeminiChatModel class is used for conversational interactions.
Basic Usage#
The following example demonstrates how to instantiate and invoke the GeminiChatModel.
Chat Model Usage
import { GeminiChatModel } from "@aigne/gemini";
const model = new GeminiChatModel({
// API key is optional if the GEMINI_API_KEY environment variable is set.
apiKey: "your-api-key",
// Specify the model. Defaults to 'gemini-2.0-flash'.
model: "gemini-1.5-flash",
modelOptions: {
temperature: 0.7,
},
});
const result = await model.invoke({
messages: [{ role: "user", content: "Hi there, introduce yourself" }],
});
console.log(result);Example Response
{
"text": "Hello from Gemini! I'm Google's helpful AI assistant. How can I assist you today?",
"model": "gemini-1.5-flash",
"usage": {
"inputTokens": 12,
"outputTokens": 18
}
}Streaming Responses#
For real-time applications, you can process response chunks as they arrive by enabling streaming.
Streaming Example
import { isAgentResponseDelta } from "@aigne/core";
import { GeminiChatModel } from "@aigne/gemini";
const model = new GeminiChatModel({
apiKey: "your-api-key",
model: "gemini-1.5-flash",
});
const stream = await model.invoke(
{
messages: [{ role: "user", content: "Hi there, introduce yourself" }],
},
{ streaming: true }
);
let fullText = "";
const json = {};
for await (const chunk of stream) {
if (isAgentResponseDelta(chunk)) {
const text = chunk.delta.text?.text;
if (text) fullText += text;
if (chunk.delta.json) Object.assign(json, chunk.delta.json);
}
}See all 6 lines
Chat Model Parameters#
Image Generation#
The GeminiImageModel class supports generating and editing images using both specialized Imagen models and multimodal Gemini models.
Basic Image Generation#
This example generates an image using an Imagen model.
Image Generation
import { GeminiImageModel } from "@aigne/gemini";
const model = new GeminiImageModel({
apiKey: "your-api-key",
model: "imagen-4.0-generate-001", // Default Imagen model
});
const result = await model.invoke({
prompt: "A serene mountain landscape at sunset with golden light",
n: 1,
});
console.log(result);Example Response
{
"images": [
{
"type": "file",
"data": "iVBORw0KGgoAAAANSUhEUgAA...",
"mimeType": "image/png"
}
],
"usage": { "inputTokens": 0, "outputTokens": 0 },
"model": "imagen-4.0-generate-001"
}Image Editing with Gemini Models#
Multimodal Gemini models can edit existing images based on a text prompt.
Image Editing
import { GeminiImageModel } from "@aigne/gemini";
const model = new GeminiImageModel({
apiKey: "your-api-key",
model: "gemini-2.0-flash-exp", // Gemini model for editing
});
const result = await model.invoke({
prompt: "Add vibrant flowers in the foreground",
image: [
{
type: "url",
url: "https://example.com/original-image.png",
},
],
n: 1,
});
console.log(result.images); // Array of edited imagesImage Model Parameters#
Parameters vary depending on the model family used.
Common Parameters#
Parameter | Type | Description |
|---|---|---|
|
| Required. A text description of the desired image. |
|
| The model to use. Defaults to |
|
| The number of images to generate. Defaults to |
|
| For Gemini models, an array of reference images for editing. |
Imagen Model Parameters#
Parameter | Type | Description |
|---|---|---|
|
| A random seed for reproducible results. |
|
| The content moderation safety filter level. |
|
| Controls settings for generating images of people. |
|
| The output image format (e.g., |
|
| A description of what to exclude from the image. |
|
| The dimensions of the generated image (e.g., "1024x1024"). |
|
| The aspect ratio of the image (e.g., "16:9"). |
Gemini Model Parameters#
Parameter | Type | Description |
|---|---|---|
|
| Controls randomness (0.0 to 1.0). |
|
| The maximum number of tokens in the response. |
|
| Nucleus sampling parameter. |
|
| Top-k sampling parameter. |
|
| Custom safety settings for content generation. |
|
| A random seed for reproducible results. |
|
| System-level instructions to guide the model. |
Video Generation#
The GeminiVideoModel class uses Google's Veo models to generate videos from text or images.
Basic Video Generation#
Text-to-Video
import { GeminiVideoModel } from "@aigne/gemini";
const videoModel = new GeminiVideoModel({
apiKey: "your-api-key",
model: "veo-3.1-generate-preview",
});
const result = await videoModel.invoke({
prompt: "A serene lake with mountains in the background, gentle waves rippling",
aspectRatio: "16:9",
size: "720p",
seconds: "8",
});
console.log(result);Example Response
{
"videos": [
{
"type": "file",
"data": "base64-encoded-video-data...",
"mimeType": "video/mp4",
"filename": "timestamp.mp4"
}
],
"usage": { "inputTokens": 0, "outputTokens": 0 },
"model": "veo-3.1-generate-preview",
"seconds": 8
}Advanced Video Generation#
Veo models also support image-to-video and frame interpolation.
- Image-to-Video: Provide a
promptand a sourceimageto animate a static picture. - Frame Interpolation: Provide a
prompt, a startingimage, and an endinglastFrameto generate a smooth transition between them.
Image-to-Video
const result = await videoModel.invoke({
prompt: "Animate this image with gentle movement, clouds drifting slowly",
image: {
type: "url",
url: "https://example.com/input-image.png",
},
seconds: "8",
});Video Model Parameters#
Further Reading#
For complete API details, refer to the official documentation.