From a Flight to Victory: How I Won the FutureStack GenAI Hackathon

From a Flight to Victory: How I Won the FutureStack GenAI Hackathon

Hackathon
AI
DevOps
Docker
Cerebras
Meta
Open Source

A story of building SRE Sentinel - an AI-powered DevOps copilot that won the Docker MCP Gateway track among 250+ submissions

From a Flight to Victory: How I Won the FutureStack GenAI Hackathon

A story of building SRE Sentinel - an AI-powered DevOps copilot that won the Docker MCP Gateway track

"Sometimes the best work happens when you finally get the chance to build what you've been planning."

When I first heard about the FutureStack GenAI Hackathon by WeMakeDevs on September 25th, 2025, I knew I wanted to participate. The hackathon featured three incredible tracks with cutting-edge AI technologies from Meta, Cerebras, and Docker, with a total prize pool of $15,000+.

The Real Beginning

I had been thinking about the project idea since I learned about the hackathon, but life got in the way. Between a heavy workload and some personal emergencies, I couldn't start working on it. The hackathon began on September 29th, and days passed by.

Finally, on October 2nd, I was on a flight from Pune to Delhi. With the deadline just three days away on October 5th, I knew it was now or never. I opened my laptop mid-flight and started coding.

Coding on the Flight

The hackathon had three tracks:

  • Cerebras Track: Best use of Cerebras API ($5,000 + Interview Opportunity)
  • Meta Track: Best use of Llama models ($5,000 + Coffee Chat with Engineers)
  • Docker Track: Most creative use of Docker MCP Gateway ($5,000)

I had three days to build something that could compete with the other 250+ submissions.

The Idea: SRE Sentinel

As someone who has worked with containerized applications, I know how painful it can be when things go wrong at 3 AM. Servers crash, databases refuse to connect, memory leaks bring down entire services. You have to wake up, dig through logs, figure out what went wrong, and manually fix it.

What if we could automate all of that?

SRE Sentinel is an AI-powered monitoring and self-healing system that:

  • Monitors Docker containers in real-time
  • Detects anomalies using AI
  • Analyzes root causes automatically
  • Fixes problems without human intervention
  • Explains what happened in simple terms

I used all three hackathon sponsors' technologies:

  • Cerebras for lightning-fast anomaly detection (1,800+ tokens per second)
  • Meta's Llama 4 Scout for deep root cause analysis
  • Docker MCP Gateway for secure, automated container operations

SRE Sentinel Architecture

Building in Three Days

Day 1: Core Monitoring System

I built the monitoring foundation using Python:

  • Connected to Docker and watched containers in real-time
  • Streamed logs and collected metrics
  • Used Docker events API to discover new containers automatically

The key innovation was using Cerebras AI for anomaly detection. Most systems use slow, rule-based approaches. Cerebras processes AI models at incredible speeds - analyzing hundreds of log lines in under a second.

# AI-powered anomaly detection
anomaly = self.cerebras.detect_anomaly(
    log_chunk=recent_logs,
    service_name=service_name,
    context=container_stats
)

AI Log Analysis

Day 2: MCP Gateway Integration

Docker's Model Context Protocol (MCP) Gateway is a new technology that lets AI systems securely control Docker containers. Instead of giving AI direct server access, the MCP Gateway acts as a secure intermediary.

Here's how the MCP architecture works:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  SRE Sentinel   │────▶│  MCP Gateway    │────▶│  MCP Servers    │────▶│  Docker Socket  │
│  (Python Brain) │     │  (Coordinator)  │     │  (Node.js)      │     │  /var/run/...   │
└─────────────────┘     └─────────────────┘     └─────────────────┘     └─────────────────┘
                                                                                    │
                                                                                    ▼
                                                                           ┌─────────────────┐
                                                                           │ Docker Daemon   │
                                                                           │ Controls:       │
                                                                           │ - demo-api      │
                                                                           │ - demo-postgres │
                                                                           └─────────────────┘

The Flow:

  1. SRE Sentinel (Python AI) detects an issue → sends request to MCP Gateway
  2. MCP Gateway validates and routes request to appropriate MCP Server
  3. MCP Server executes Docker commands via Docker Socket
  4. Docker Daemon performs the actual container operation
  5. Response flows back through the chain

I built two custom MCP servers:

  1. Docker Control Server: Restart containers, check health status, update resource limits, execute diagnostic commands

  2. Config Patcher Server: Update environment variables and patch container settings

The key security feature: The AI never touches Docker directly. Everything goes through the secure MCP Gateway with full audit trails. MCP servers control Docker itself, not the application containers directly.

MCP Gateway Security

Day 3: Root Cause Analysis

When SRE Sentinel detects a critical problem, it:

  1. Gathers comprehensive context (logs, configurations, metrics, environment variables)
  2. Sends everything to Meta's Llama 4 Scout for deep analysis
  3. Gets back detailed root cause analysis with suggested fixes
  4. Executes fixes through the MCP Gateway
  5. Verifies the system is healthy
  6. Generates a human-friendly explanation

Llama 4 Scout's large context window means it can analyze thousands of log lines, understand your entire system configuration, and make intelligent decisions.

analysis = self.llama.analyze_root_cause(
    anomaly_summary=anomaly.summary,
    full_logs=all_logs,
    docker_compose=docker_compose,
    environment_vars=environment_vars,
    container_stats=container_stats,
    available_tools=available_tools
)

The Architecture

Workflow Infographic

1. Real-Time Monitoring

  • Watches Docker containers with the label sre-sentinel.monitor=true
  • Collects logs, metrics (CPU, memory, network, disk), and events
  • Uses Docker events API for instant discovery
  • Publishes data to a Redis event bus for real-time dashboards

2. Lightning-Fast Anomaly Detection

  • Uses Cerebras AI running at 1,800 tokens/second
  • Analyzes recent logs every few seconds
  • Detects crashes, errors, performance issues, and warnings
  • Classifies severity (LOW, MEDIUM, HIGH, CRITICAL)

3. Deep Root Cause Analysis

  • Powered by Meta's Llama 4 Scout
  • Considers full system context
  • Understands relationships between services
  • Generates step-by-step reasoning

4. Automated Remediation

  • Executes fixes through Docker MCP Gateway
  • Secure, audited operations
  • Can restart containers, update configs, modify resources
  • Full validation before and after

5. Health Verification & Reporting

  • Checks if fixes actually worked
  • Monitors container health after remediation
  • Generates human-friendly incident reports

The Demo

I created a demo environment with a Node.js API connected to PostgreSQL. Here's the complete incident resolution flow:

[Issue Occurs] → demo-api can't connect to database
        ↓
[Monitor] SRE Sentinel detects anomaly in logs (< 5s)
        ↓
[Analyze] Cerebras classifies severity: CRITICAL
        ↓
[Deep Analysis] Llama 4 Scout analyzes full context
        ↓
[Identify] Root cause: Wrong database credentials
        ↓
[Fix] Via MCP Gateway → Docker restart demo-api
        ↓
[Verify] Check container health & logs
        ↓
[Report] Generate human-friendly explanation
        ↓
[Resolved] ✓ Total time: ~30 seconds

When you run the break script, SRE Sentinel:

  1. Detects the database connection failure in < 5 seconds
  2. Analyzes thousands of log lines with Llama 4 Scout
  3. Identifies the root cause (wrong database credentials)
  4. Automatically restarts the container via MCP Gateway
  5. Verifies everything is healthy
  6. Generates a stakeholder-friendly explanation

All in under 30 seconds. Zero human intervention required.

Dashboard Monitoring

Key Learnings

Speed Matters in AI

Cerebras processes AI inferences at 1,800+ tokens per second. This meant I could analyze logs in real-time - the difference between catching a problem early or having a full outage.

Context is Everything

Llama 4 Scout's large context window was crucial. It could understand the entire history, read docker-compose configurations, see environment variables, and make connections a human SRE would make.

Security Through Architecture

The MCP Gateway separates AI from direct Docker access, providing full audit trails, parameter validation, session-based security, and no credential management in AI prompts.

The Results

On October 16th, 2025, I got the news - SRE Sentinel had won the Docker MCP Gateway track!

Among 250+ submissions, my project stood out because it:

  • Used all three sponsor technologies innovatively
  • Solved a real problem that every DevOps team faces
  • Demonstrated cutting-edge AI capabilities
  • Showed secure, practical implementation
  • Worked end-to-end with a live demo

Technical Details

The Stack

  • Backend: Python 3.11 with asyncio for concurrency
  • AI: Cerebras (via OpenRouter) for anomaly detection, Llama 4 Scout for analysis
  • Container Management: Docker SDK, MCP Gateway
  • Event Bus: Redis with pub/sub
  • API: FastAPI with WebSockets
  • Dashboard: React with real-time updates
  • MCP Servers: Node.js with @modelcontextprotocol/sdk

Key Innovations

  1. Dynamic Tool Discovery: The system discovers available MCP tools at runtime
  2. Intelligent Secret Redaction: Uses Cerebras to identify and redact secrets before AI analysis
  3. Incident Deduplication: Tracks active incidents to avoid duplicates
  4. Priority-Based Fix Execution: Executes high-priority fixes first with verification
  5. Event-Driven Architecture: Everything is an event - observable and testable

Performance

  • Anomaly Detection: < 2 seconds for 200 log lines
  • Root Cause Analysis: 5-10 seconds for full incident context
  • Fix Execution: 1-3 seconds per action
  • Total Incident Resolution: 20-45 seconds from detection to verification

Performance Metrics

Acknowledgments

Collaboration Acknowledgments

This wouldn't have been possible without:

  • WeMakeDevs for organizing an amazing hackathon
  • Cerebras for providing the fastest AI inference I've ever used
  • Meta for Llama models that made intelligent analysis possible
  • Docker for the innovative MCP Gateway architecture
  • The open-source community for incredible tools and libraries

Final Thoughts

When I started coding on that flight on October 2nd, I had no idea that three days later I'd be submitting a project that would win. SRE Sentinel demonstrates how AI can fundamentally change DevOps - detecting problems before users notice, understanding full infrastructure context, applying fixes automatically, and explaining everything clearly.

If you're thinking about your own ideas, my advice is simple: Start building. Don't wait for the perfect moment. Sometimes you just need to open your laptop and start coding - even if it's on a flight to Delhi.


Resources

Connect


This blog post chronicles my journey building SRE Sentinel for the FutureStack GenAI Hackathon (September 29 - October 5, 2025). The project won the Docker MCP Gateway track among 250+ submissions.

Comments (0)

Loading comments...