Skip to main content

AI models

Welcome to our comprehensive guide on building with generative AI! Before we get our hands dirty, let's understand the two core concepts we'll be working with: Large Language Models (LLMs) and APIs. An LLM is a powerful AI model trained on a massive amount of text and code. This training allows it to understand, generate, and process human language in a remarkable way. Think of it as a super-intelligent digital brain for text. An API (Application Programming Interface) is a set of rules that allows different applications to talk to each other. In our case, the API is the bridge that lets your code send a request (like a prompt) to an LLM and receive a response (like generated text). Using an API means you don't need to host or manage the massive LLM on your own computer.

How to Build with Generative AI

Part 1: Getting Started with Large Language Models

Google has launched Gemma, a family of lightweight, open-source AI models designed to compete with offerings like Meta's Llama and other readily available AI technologies. This move signifies a strategic shift, aiming to democratize AI development and foster innovation within the broader developer community.
This tutorial concludes by showing how to build specialized AI assistants, or "Gems," using prompt engineering with a general-purpose model like Gemini. Gems are custom AI models excelling at specific tasks, exhibiting domain expertise, specialized behavior, and consistent persona. This is achieved through a detailed system prompt, acting as the AI's "DNA," defining its persona, goals, constraints, and formatting. The prompt includes the AI's role (e.g., financial analyst), its function, limitations (e.g., avoiding giving investment advice), and desired response structure. A "Financial Analyst Gem" example demonstrates how a well-crafted system prompt transforms a general chatbot into a specialized, focused assistant. Mastering prompt engineering unlocks the Gemini API's flexibility, allowing creation of diverse AI assistants.
Gemini, a multimodal model, processes text, images, videos, and audio. Multimodal prompts combine different data types (e.g., an image and a question) in a single request. The API accepts an array of "parts," each a text string or inline data (like a base64-encoded image). A Python example demonstrates sending an image and text prompt to the Gemini API to analyze the image. Best practices include specific instructions, placing images before text in the `contents` array, and using the correct MIME type for image data.

Understanding Gemini Models

A quickstart on making your first request to the Gemini API to generate text.

The world of AI development is moving at an incredible pace, and Google's Gemini API is continuously evolving to provide developers with more powerful, flexible, and efficient tools.