[Add AI Brains to Any App Without API Keys or Sign-Ups]

Analyze with AI

Get AI-powered insights from this Mad Devs tech article:

How we replaced OpenAI, Anthropic, and other APIs with one local agent that just works: no token counting, no billing, no headaches.

The problem: LLM integration is boring

Imagine you're building an application: you need text generation, data analysis, or just a "smart" backend. The first thing that comes to mind is grabbing an OpenAI API key and writing a wrapper.

Well, here's what actually happens in real life:

Key management – where to store them, how to rotate them.
Billing and limits – suddenly, someone in the dev environment forgot about token limits.
Proxies and blocks – in some countries, APIs aren't accessible without workarounds.
Prompt formatting – each provider has their own quirks.
Error handling – rate limits, timeouts, retry logic.
Tooling – which tools to add, how to call them, how to combine them.

We ran into this when automating large-scale content generation; we needed to process thousands of terms, each requiring web search, HTML generation, and strict formatting rules. Using commercial APIs would mean either burning through the budget or building an abstraction layer thicker than the business logic itself. The task wasn't expensive or important enough to drag in a full agent SDK, but we wanted it to work 24/7.

The solution: use an open-source agent as a black box

We based our approach on OpenCode, a CLI agent that runs locally and can handle small, straightforward tasks. No registration or keys, only a binary that takes a text task and returns results.

The core approach: Instead of writing an HTTP client for an API, we run opencode as a subprocess, give it instructions via stdin/arguments, and read the result from a file or stdout. The agent decides which model to use, performs web searches if needed, and formats the output itself.

Key advantage: You don't pay for tokens. For tasks like ours (generating content following specific rules), this is the difference between "a few dollars per run" and "practically free." As long as open code allows working with free models, it's great. When opencode restricts this, you can switch to free models from OpenRouter (which would require an OpenRouter key).

How it works under the hood

The architecture is dead simple:

Your app → subprocess(opencode run "...") → result file

Here's the minimal Python wrapper class we use in production:

import subprocess
import uuid
import os

class OpencodeRunner:
    def __init__(self, prompt_file='prompt.md'):
        self.prompt_file = prompt_file
        
    def run_generation(self, keyword):
        output_path = f"/tmp/result_{uuid.uuid4()}.html"
        
        # СBuilding a team: term + prompt content + instruction to save to a file
        cmd = f'opencode run "Term: {keyword}. $(cat {self.prompt_file}). Save to {output_path}"'
        
        # Important: pass the config so that the agent doesn’t ask for confirmations
        env = os.environ.copy()
        env['OPENCODE_CONFIG'] = '/app/opencode-config.json'
        
        process = subprocess.Popen(
            cmd,
            shell=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            env=env
        )
        
        # Stream output in real time (useful for debugging)
        for line in process.stdout:
            print(line, end='', flush=True)
        
        process.wait()
        
        # Read the result
        with open(output_path, 'r') as f:
            content = f.read()
        os.remove(output_path)
        
        return content

The opencode-config.json configuration is minimal:

{
    "$schema": "https://opencode.ai/config.json",
    "permission": "allow"
}

This permission: allow is critically important because it tells the agent not to wait for user confirmation on every action. Without it, everything hangs in headless mode.

Docker deployment: just drop in the binary

To add AI brains to any service, just copy one file into the image:

FROM python:3.11-slim

# Copy agent binary
COPY bin/opencode /usr/local/bin/opencode
RUN chmod +x /usr/local/bin/opencode

# Copy config with auto-allow
COPY opencode-config.json /app/opencode-config.json
ENV OPENCODE_CONFIG=/app/opencode-config.json

# Your application
COPY . /app
WORKDIR /app

CMD ["python", "main.py"]

❗️ Important note
The opencode binary is too large for GitHub, so it's better not to keep it in the repo. Before building, copy it from the system where it's installed:

mkdir -p bin && cp $(which opencode) bin/opencode

You can download the latest opencode during Docker build, but if you're building on a machine where opencode is already installed, copying it locally is much faster and simpler.

Production patterns: what we learned in practice

1. Rate limiting and graceful degradation

Throughout our agent's operation, we haven't hit any limits. However, we noticed something interesting: at some point, opencode, with its free models, starts working very slowly, sometimes at a rate of 1 token per second or even less. Apparently, this is how they implement a queue concept: instead of hitting the agent with limits, they just put it in a batch queue. In our case, this is actually better; let it work slowly as long as it works 24/7.

2. Streaming output

When the agent performs web searches or processes complex queries, it can work for minutes. There's no point waiting for full completion; stream stdout to logs. This helps you understand what's happening and debug hangs.

3. Prompt files instead of inline strings

Don't stuff prompts into code. Store them in .md files, use $(cat file.md) when calling. This lets you:

Version prompts through Git
Edit without rebuilding the container
Use multi-line instructions with formatting

When this approach works perfectly

This architecture isn't a silver bullet, but it's ideal for:

✅ Batch processing with fuzzy logic

SEO content generation from templates
Data classification and tagging
Extracting structured information from unstructured sources

✅ Simple autonomous agents

Systems that run 24/7 unattended because there's a low risk or non-critical value
Processing task queues where there's no rush
Services where latency isn't critical (not real-time chat)

Example agent output

Below is the agent's reasoning for a term generation task by keyword. Notice how the agent makes web requests and decides on its own how many queries to make, where to send them, and what to search for.

Found 1 tasks. Processing batch...
 Running command: opencode run "Term: statistical significance. $(cat docs/prompts/term-prompt-ext.md). Save the result to a file /tmp/term_9617574d-484d-4358-910a-c391d26624fb.html"

I'll help you create a factual description of "statistical significance" following your detailed instructions. Let me first search for current information about this term.
  Search   Web search: statistical significance definition context usage challenges 2025 2026

Now let me search for more specific information about common challenges and related topics for statistical significance.  
  Search   Web search: "statistical significance" common challenges p-value limitations misuse research

Now let me search for related topics to complete the description.
  Search   Web search: statistical significance related topics hypothesis testing p-value confidence interval effect size
  Write    tmp/term_9617574d-484d-4358-910a-c391d26624fb.html

Based on the research I've conducted, I'll now create the factual description of "Statistical Significance" following your detailed instructions. Let me write the HTML file:

The HTML file has been created successfully at `/tmp/term_9617574d-484d-4358-910a-c391d26624fb.html`. The description follows all your specified formatting requirements, uses American English, stays within the 1500-character limit, and provides factual information about statistical significance based on reputable sources.

Bottom line: brains for your app in 5 minutes

We went from "we should integrate OpenAI" to a production system that generates hundreds of pages of content daily, without a single API key in the code, without sign-ups, and without hitting limits. The secret is to stop thinking about LLMs as external services with HTTP APIs and start thinking about them as local command-line tools; in other words, apply an agentic mindset.

Minimum starter kit:

Download the opencode binary.
Create opencode-config.json with "permission": "allow".
Write a subprocess wrapper (the code above works).
Add the binary to your Dockerfile.
Profit.

No keys or sign-ups: only working AI in your application.