@built-in-ai
@built-in-ai/web-llm

Usage

Features and usage examples for @built-in-ai/web-llm with AI SDK v6

Basic Text Generation

Streaming Text

import { streamText } from "ai";
import { webLLM } from "@built-in-ai/web-llm";

const result = streamText({
  model: webLLM("Qwen3-0.6B-q0f16-MLC"),
  prompt: 'Invent a new holiday and describe its traditions.',
});

for await (const textPart of result.textStream) {
  console.log(textPart);
}

Non-streaming Text

import { generateText } from "ai";
import { webLLM } from "@built-in-ai/web-llm";

const result = await generateText({
  model: webLLM("Qwen3-0.6B-q0f16-MLC"),
  prompt: 'Invent a new holiday and describe its traditions.',
});

console.log(result.text);

Download Progress Tracking

When using WebLLM models for the first time, the model needs to be downloaded. Track progress to improve UX:

import { streamText } from "ai";
import { webLLM } from "@built-in-ai/web-llm";

const model = webLLM("Qwen3-0.6B-q0f16-MLC");
const availability = await model.availability();

if (availability === "unavailable") {
  console.log("Browser doesn't support WebLLM");
  return;
}

if (availability === "downloadable") {
  await model.createSessionWithProgress((progress) => {
    console.log(`Download: ${progress.text}`);
  });
}

// Model is ready
const result = streamText({
  model,
  messages: [{ role: "user", content: "Hello!" }],
});

Tool Calling

For best tool calling results, use reasoning models like Qwen3.

The webLLM model supports tool calling with multi-step execution:

import { streamText, tool, stepCountIs } from "ai";
import { webLLM } from "@built-in-ai/web-llm";
import { z } from "zod";

const result = await streamText({
  model: webLLM("Qwen3-0.6B-q0f16-MLC"),
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
  tools: {
    weather: tool({
      description: 'Get the weather in a location',
      inputSchema: z.object({
        location: z.string().describe('The location to get the weather for'),
      }),
      execute: async ({ location }) => ({
        location,
        temperature: 72 + Math.floor(Math.random() * 21) - 10,
      }),
    }),
  },
  stopWhen: stepCountIs(5), // multiple steps
});

It also supports tool execution approval (needsApproval).

Tool Calling with Structured Output

import { Output, ToolLoopAgent, tool } from "ai";
import { webLLM } from "@built-in-ai/web-llm";
import { z } from "zod";

const agent = new ToolLoopAgent({
  model: webLLM("Qwen3-0.6B-q0f16-MLC"),
  tools: {
    weather: tool({
      description: "Get the weather in a location",
      inputSchema: z.object({ city: z.string() }),
      execute: async ({ city }) => {
        // ...
      },
    }),
  },
  output: Output.object({
    schema: z.object({
      summary: z.string(),
      temperature: z.number(),
      recommendation: z.string(),
    }),
  }),
});

const { output } = await agent.generate({
  prompt: "What is the weather in San Francisco and what should I wear?",
});

Structured Output

Generate structured JSON output with schema validation:

Using generateText

import { generateText, Output } from "ai";
import { webLLM } from "@built-in-ai/web-llm";
import { z } from "zod";

const { output } = await generateText({
  model: webLLM("Qwen3-0.6B-q0f16-MLC"),
  output: Output.object({
    schema: z.object({
      recipe: z.object({
        name: z.string(),
        ingredients: z.array(
          z.object({ name: z.string(), amount: z.string() }),
        ),
        steps: z.array(z.string()),
      }),
    }),
  }),
  prompt: "Generate a lasagna recipe.",
});

Using streamText

import { streamText, Output } from "ai";
import { webLLM } from "@built-in-ai/web-llm";
import { z } from "zod";

const { partialOutputStream } = streamText({
  model: webLLM("Qwen3-0.6B-q0f16-MLC"),
  output: Output.object({
    schema: z.object({
      recipe: z.object({
        name: z.string(),
        ingredients: z.array(
          z.object({ name: z.string(), amount: z.string() }),
        ),
        steps: z.array(z.string()),
      }),
    }),
  }),
  prompt: 'Generate a lasagna recipe.',
});

Web Worker Usage

For better performance, run models off the main thread:

1. Create worker.ts

import { WebWorkerMLCEngineHandler } from "@built-in-ai/web-llm";

const handler = new WebWorkerMLCEngineHandler();
self.onmessage = (msg: MessageEvent) => {
  handler.onmessage(msg);
};

2. Use the worker

import { streamText } from "ai";
import { webLLM } from "@built-in-ai/web-llm";

const model = webLLM("Qwen3-0.6B-q0f16-MLC", {
  worker: new Worker(new URL("./worker.ts", import.meta.url), {
    type: "module",
  }),
});

const result = streamText({
  model,
  messages: [{ role: "user", content: "Hello!" }],
});

for await (const chunk of result.textStream) {
  console.log(chunk);
}

On this page