───✦───✦───✦───✦───✦───✦───✦───✦───

Prelude

I was on LinkedIn, half-distracted, when I stumbled onto a post by Arpit Bhayani about Firecracker.

At first, it was just another tech post in the feed, but something about it stuck. Firecracker wasn’t a shiny new framework or a library with yet another abstraction. It was raw infrastructure — the kind of thing that quietly powers the world but rarely gets talked about outside of deep engineering circles.

And that made me curious.

We all use serverless platforms all the time — AWS Lambda, Vercel, Cloudflare Workers — without stopping to think about what’s really happening. You write a function, hit deploy, and magically it’s running somewhere in the cloud. But that “somewhere” intrigued me. What’s actually happening in that invisible layer where a request triggers a fresh environment, your code spins up in milliseconds, and then disappears without a trace?

How do these systems manage to cold-start fast enough to feel instant, stay secure when thousands of strangers’ code runs side by side, and still scale up or down as if by sleight of hand?

I didn’t want the marketing answer.

So I gave myself a small challenge: take Firecracker, strip away the cloud-scale buzzwords, and try to build something around it with my own hands. No production deadlines, no polish — just an excuse to learn.

That’s how Ignis 🔥 was born.

About Firecracker

Firecracker is an open-source virtualization solution developed by Amazon Web Services (AWS). It is designed to provide lightweight, secure, and efficient virtual machines tailored for containerized and serverless workloads. It powers AWS Lambda and AWS Fargate. It also powers multi-tenant serverless services such as Koyeb and Fly.io.

Lets get started

So like any other project I went to firecracker’s github to learn about it. Well firecracker is written in rust now I know rust but I wouldn’t use it for writing backends. Why? Because I don’t wanna fight with the borrow-checker when I am just trying to get something work.

Lucky for me the firecracker folks have bindings for Golang along with a pretty well-maintained official source i.e. the firectl.

So some warnings: Firecracker only runs on Linux (no wsl support sorry) for now. Also Firecracker uses the KVM linux module to read/write access to /dev/kvm.

So lets get started: Followed the QuickStart guide and after a little bit of 💻💻 we have firecracker up and running.

Project Structure

The project, named Ignis (inspired by the Latin word for fire, akin to Firecracker), supports multiple programming languages through dedicated runtime environments. Currently, it accommodates Python and Go, with an extensible architecture for adding more languages.

The complete source code is available on GitHub.

Here is a schema of the global architecture:

Project Schema

The primary API, developed in Golang, manages the core business logic. Code submissions are queued via NATS for asynchronous processing, with execution handled by virtual machines.

Implementation Details

The system is divided into three components:

  • Ignis-Agent
  • Ignis-VMM
  • Ignis-Backend.

Ignis-Agent

The Ignis-Agent runs within the microVM as an HTTP API built with Echo in Golang. It compiles and executes code for supported languages.

Multi-Language Architecture in Ignis-Agent

A key feature of Ignis is its language-agnostic design, which employs distinct root filesystem (rootfs) images for each supported language.

A central language dispatch mechanism routes requests to the appropriate handler based on the specified language:

// Call language handler
switch toID[req.Language] {
case Python:
    return pythonHandler(c, req)
case Golang:
    return golangHandler(c, req)
default:
    return echo.NewHTTPError(http.StatusBadRequest, "Invalid language")
}

This dispatch ensures efficient handling of diverse languages within a unified agent framework.

The execution engine captures detailed metrics, such as duration and memory usage:

func execCmd(c echo.Context, program string, arg ...string) error {
    var execStdOut, execStdErr bytes.Buffer
    cmd := exec.Command(program, arg...)
    cmd.Stdout = &execStdOut
    cmd.Stderr = &execStdErr
    start := time.Now()
    err := cmd.Run()
    elapsed := time.Since(start)
    if err != nil {
        return c.JSON(http.StatusBadRequest, runCRes{
            Message:      "Failed to run",
            Error:        err.Error(),
            Stdout:       execStdOut.String(),
            Stderr:       execStdErr.String(),
            ExecDuration: elapsed.Milliseconds(),
            MemUsage:     cmd.ProcessState.SysUsage().(*syscall.Rusage).Maxrss,
        })
    }
    return c.JSON(http.StatusOK, runCRes{
        Message:      "Success",
        Stdout:       execStdOut.String(),
        Stderr:       execStdErr.String(),
        ExecDuration: elapsed.Milliseconds(),
        MemUsage:     cmd.ProcessState.SysUsage().(*syscall.Rusage).Maxrss,
    })
}

While effective, this method relies on shell commands, which may lack elegance and introduce security considerations.

To prepare the VM image, the agent is preinstalled, and an OpenRC service ensures it starts on boot.

The rootfs is constructed using techniques from the Firecracker documentation, involving an ext4 image mounted in a Docker container with Alpine.

Ignis-VMM

This component bridges the API and microVMs, managing job reception, VM lifecycles for multiple languages, agent communication, and status updates.

Jobs are received asynchronously via NATS. To minimize latency—given boot times of 1-2 seconds—a pool of pre-warmed VMs is maintained per language.

Multi-Language Architecture in Ignis-VMM

Dynamic Language Discovery

At startup, the system scans the agent directory to identify available languages automatically:

func discoverAvailableLanguages() ([]string, error) {
    files, err := ioutil.ReadDir("agent")
    if err != nil {
        return nil, fmt.Errorf("failed to read agent directory: %w", err)
    }
    var languages []string
    for _, file := range files {
        if file.IsDir() {
            continue
        }
        // Look for rootfs-<language>.ext4 files
        if strings.HasPrefix(file.Name(), "rootfs-") && strings.HasSuffix(file.Name(), ".ext4") {
            // Extract language from filename: rootfs-python.ext4 -> python
            language := strings.TrimPrefix(file.Name(), "rootfs-")
            language = strings.TrimSuffix(language, ".ext4")
            languages = append(languages, language)
        }
    }
    if len(languages) == 0 {
        return nil, fmt.Errorf("no rootfs images found in agent directory")
    }
    return languages, nil
}
Language Pool Manager

Ignis utilizes a multi-language pool management system rather than a single VM pool:

type LanguagePoolManager struct {
    pools map[string]chan runningFirecracker
    mutex sync.RWMutex
}
 
func NewLanguagePoolManager() *LanguagePoolManager {
    return &LanguagePoolManager{
        pools: make(map[string]chan runningFirecracker),
    }
}
 
func (lpm *LanguagePoolManager) GetPool(language string) (chan runningFirecracker, error) {
    lpm.mutex.RLock()
    pool, exists := lpm.pools[language]
    lpm.mutex.RUnlock()
    if !exists {
        return nil, fmt.Errorf("no pool found for language: %s", language)
    }
    return pool, nil
}

This approach provides:

  • Dedicated pools of pre-warmed VMs for each language.
  • Opportunities for language-specific optimizations.
  • Isolation of resources across different runtimes.
  • Scalability tailored to demand per language.

Language-Specific VM Creation:

// Create a VMM with a specific language rootfs and start the VM
func createAndStartVMForLanguage(ctx context.Context, language string) (*runningFirecracker, error) {
    vmmID := xid.New().String()
    baseRootFS := getRootfsPath(language)
    // Check if the rootfs file exists
    if _, err := os.Stat(baseRootFS); os.IsNotExist(err) {
        return nil, fmt.Errorf("rootfs for language %s not found: %s", language, baseRootFS)
    }
    // Prepare a dedicated writable copy of the root filesystem for this VM instance
    if err := copyFile(baseRootFS, "/tmp/rootfs-"+vmmID+".ext4"); err != nil {
        return nil, fmt.Errorf("failed to prepare rootfs for language %s: %w", language, err)
    }
    fcCfg, err := getFirecrackerConfig(vmmID)
    if err != nil {
        log.Errorf("Error: %s", err)
        return nil, err
    }
    // ... VM creation logic
}
 
func getRootfsPath(language string) string {
    return filepath.Join("agent", fmt.Sprintf("rootfs-%s.ext4", language))
}

Multi-Language Pool Management: Pools are created for each language, with dedicated goroutines for replenishment:

// Create pools for each language and start pool fillers
const poolSize = 1 // Number of VMs per language pool
for _, language := range languages {
    poolManager.AddPool(language, poolSize)
    // Start the pool filler for this language
    pool, _ := poolManager.GetPool(language)
    go fillLanguageVMPool(ctx, language, pool)
}

The pool filler ensures each language always has warm VMs available.

It operates as an infinite loop in a goroutine, adding VMs to the pool once ready, verified by checking the health endpoint of the Ignis-Agent.

Job Execution with Language Routing When a job is received, it is routed to the correct language pool:

func (job benchJob) run(ctx context.Context, poolManager *LanguagePoolManager) {
    log.WithField("job", job).Info("Handling job")
    // Get the appropriate pool for this job's language
    pool, err := poolManager.GetPool(job.Language)
    if err != nil {
        log.WithField("language", job.Language).WithError(err).Error("No pool available for language")
        q.setjobFailed(ctx, job, agentExecRes{Error: fmt.Sprintf("No pool available for language %s: %v", job.Language, err)})
        return
    }
    // Get a ready-to-use microVM from the language-specific pool
    vm := <-pool
    // Defer cleanup of VM and VMM
    defer vm.shutDown()
    // Execute the job on the language-specific VM
    reqJSON, err := json.Marshal(agentRunReq{
        ID:       job.ID,
        Language: job.Language,
        Code:     job.Code,
        Variant:  "TODO",
    })
    // ... rest of execution logic
}

To initiate a microVM, the worker performs several steps:

  • Copies the language-specific “golden image” of the rootfs to a temporary file dedicated to the microVM.
  • Configures the firecracker.Config.
  • Creates a firecracker.Machine.
  • Starts the machine instance.

The resulting runningFirecracker struct is stored in the channel, containing elements such as the VM’s IP address for agent communication and the firecracker.Machine and context for shutdown.

Networking is facilitated by CNI plugins. On the host, a configuration file is required in /etc/cni/conf.d/fcnet.conflist:

{
  "name": "fcnet",
  "cniVersion": "1.0.0",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": true,
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.127.0/24",
        "resolvConf": "/etc/resolv.conf"
      }
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

With CNI plugin binaries in /opt/cni/bin from

Also figured out a tc-redirect-tap bug

The setup initially seemed straightforward: I prepared a language-specific root filesystem, launched the microVM, and configured networking parameters via a CNI configuration file (fcnet.conflist). However, an unexpected version conflict disrupted the process.

The first error appeared when using the default CNI version 1.0.0:

incompatible CNI versions; config is "1.0.0", plugin supports ["0.3.0", "0.3.1", "0.4.0", "1.1.0"]

This error indicated that the tc-redirect-tap plugin did not support version 1.0.0, despite it being a standard choice in recent CNI tools. Following the error’s guidance, I updated the fcnet.conflist to use version 0.4.0, hoping to resolve the mismatch. Instead, a new error emerged:

plugin type="tc-redirect-tap" failed (add): unsupported CNI result version "1.1.0"

This was puzzling—the plugin was returning a result version (1.1.0) that conflicted with the configured version (0.4.0). After digging deeper, I discovered the issue stemmed from the plugin’s logic, which always returned the latest implemented CNI version (1.1.0), regardless of the configuration.

This caused a compatibility issue with the firecracker-go-sdk, which relied on an older CNI library version (1.0.1) that didn’t recognize 1.1.0.

So I opened a issue for this awslabs/tc-redirect-tap/issues/98.

The maintainers promptly identified the bug and released fixes in two pull requests (#99 and #106).

These updates corrected the plugin’s version handling to respect the configured CNI version and ensured compatibility with 1.0.0.

Ignis-Backend

This is a Gin backend implemented in Go, providing CRUD operations for jobs and updates, along with integration for pushing and receiving from the NATS queue.

That’s it.

Possible Improvements

  1. Rootfs Optimization: Reduce the image sizes even more by custom building linux images.
  2. Snapshotting: Try out snapshotting the initial state of the rootfs image and use that instead of spinning up new ones repeatedly.
  3. Fix the balloon up problem: Firecracker doesn’t return the space it has occupied back till the vm is killed.
  4. Also random crashes and vm not starting up sometimes.

Conclusion

Ignis is not production-grade, nor was it meant to be. It’s a prototype that sits somewhere between a playground and a proof-of-concept.

If you’re curious, the code’s here on GitHub.

And if you’ve ever thought “I wonder how serverless actually works under the hood?” — try pointing Firecracker at a rootfs and booting your first micro-VM. You’ll learn a ton about virtualization, container runtimes, and the beautiful complexity of modern cloud infrastructure.

───✦───✦───✦───✦───✦───✦───✦───✦───