A story of building SRE Sentinel - an AI-powered DevOps copilot that won the Docker MCP Gateway track among 250+ submissions

From a Flight to Victory: How I Won the FutureStack GenAI Hackathon

A story of building SRE Sentinel - an AI-powered DevOps copilot that won the Docker MCP Gateway track

"Sometimes the best work happens when you finally get the chance to build what you've been planning."

When I first heard about the FutureStack GenAI Hackathon by WeMakeDevs on September 25th, 2025, I knew I wanted to participate. The hackathon featured three incredible tracks with cutting-edge AI technologies from Meta, Cerebras, and Docker, with a total prize pool of $15,000+.

The Real Beginning

I had been thinking about the project idea since I learned about the hackathon, but life got in the way. Between a heavy workload and some personal emergencies, I couldn't start working on it. The hackathon began on September 29th, and days passed by.

Finally, on October 2nd, I was on a flight from Pune to Delhi. With the deadline just three days away on October 5th, I knew it was now or never. I opened my laptop mid-flight and started coding.

Coding on the Flight

The hackathon had three tracks:

Cerebras Track: Best use of Cerebras API ($5,000 + Interview Opportunity)
Meta Track: Best use of Llama models ($5,000 + Coffee Chat with Engineers)
Docker Track: Most creative use of Docker MCP Gateway ($5,000)

I had three days to build something that could compete with the other 250+ submissions.

The Idea: SRE Sentinel

As someone who has worked with containerized applications, I know how painful it can be when things go wrong at 3 AM. Servers crash, databases refuse to connect, memory leaks bring down entire services. You have to wake up, dig through logs, figure out what went wrong, and manually fix it.

What if we could automate all of that?

SRE Sentinel is an AI-powered monitoring and self-healing system that:

Monitors Docker containers in real-time
Detects anomalies using AI
Analyzes root causes automatically
Fixes problems without human intervention
Explains what happened in simple terms

I used all three hackathon sponsors' technologies:

Cerebras for lightning-fast anomaly detection (1,800+ tokens per second)
Meta's Llama 4 Scout for deep root cause analysis
Docker MCP Gateway for secure, automated container operations

SRE Sentinel Architecture

Building in Three Days

Day 1: Core Monitoring System

I built the monitoring foundation using Python:

Connected to Docker and watched containers in real-time
Streamed logs and collected metrics
Used Docker events API to discover new containers automatically

The key innovation was using Cerebras AI for anomaly detection. Most systems use slow, rule-based approaches. Cerebras processes AI models at incredible speeds - analyzing hundreds of log lines in under a second.

# AI-powered anomaly detection
anomaly = self.cerebras.detect_anomaly(
    log_chunk=recent_logs,
    service_name=service_name,
    context=container_stats
)

AI Log Analysis

Day 2: MCP Gateway Integration

Docker's Model Context Protocol (MCP) Gateway is a new technology that lets AI systems securely control Docker containers. Instead of giving AI direct server access, the MCP Gateway acts as a secure intermediary.

Here's how the MCP architecture works:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  SRE Sentinel   │────▶│  MCP Gateway    │────▶│  MCP Servers    │────▶│  Docker Socket  │
│  (Python Brain) │     │  (Coordinator)  │     │  (Node.js)      │     │  /var/run/...   │
└─────────────────┘     └─────────────────┘     └─────────────────┘     └─────────────────┘
                                                                                    │
                                                                                    ▼
                                                                           ┌─────────────────┐
                                                                           │ Docker Daemon   │
                                                                           │ Controls:       │
                                                                           │ - demo-api      │
                                                                           │ - demo-postgres │
                                                                           └─────────────────┘

The Flow:

SRE Sentinel (Python AI) detects an issue → sends request to MCP Gateway
MCP Gateway validates and routes request to appropriate MCP Server
MCP Server executes Docker commands via Docker Socket
Docker Daemon performs the actual container operation
Response flows back through the chain

I built two custom MCP servers:

Docker Control Server: Restart containers, check health status, update resource limits, execute diagnostic commands
Config Patcher Server: Update environment variables and patch container settings

The key security feature: The AI never touches Docker directly. Everything goes through the secure MCP Gateway with full audit trails. MCP servers control Docker itself, not the application containers directly.

MCP Gateway Security

Day 3: Root Cause Analysis

When SRE Sentinel detects a critical problem, it:

Gathers comprehensive context (logs, configurations, metrics, environment variables)
Sends everything to Meta's Llama 4 Scout for deep analysis
Gets back detailed root cause analysis with suggested fixes
Executes fixes through the MCP Gateway
Verifies the system is healthy
Generates a human-friendly explanation

Llama 4 Scout's large context window means it can analyze thousands of log lines, understand your entire system configuration, and make intelligent decisions.

analysis = self.llama.analyze_root_cause(
    anomaly_summary=anomaly.summary,
    full_logs=all_logs,
    docker_compose=docker_compose,
    environment_vars=environment_vars,
    container_stats=container_stats,
    available_tools=available_tools
)

The Architecture

Workflow Infographic

1. Real-Time Monitoring

Watches Docker containers with the label sre-sentinel.monitor=true
Collects logs, metrics (CPU, memory, network, disk), and events
Uses Docker events API for instant discovery
Publishes data to a Redis event bus for real-time dashboards

2. Lightning-Fast Anomaly Detection

Uses Cerebras AI running at 1,800 tokens/second
Analyzes recent logs every few seconds
Detects crashes, errors, performance issues, and warnings
Classifies severity (LOW, MEDIUM, HIGH, CRITICAL)

3. Deep Root Cause Analysis

Powered by Meta's Llama 4 Scout
Considers full system context
Understands relationships between services
Generates step-by-step reasoning

4. Automated Remediation

Executes fixes through Docker MCP Gateway
Secure, audited operations
Can restart containers, update configs, modify resources
Full validation before and after

5. Health Verification & Reporting

Checks if fixes actually worked
Monitors container health after remediation
Generates human-friendly incident reports

The Demo

I created a demo environment with a Node.js API connected to PostgreSQL. Here's the complete incident resolution flow:

[Issue Occurs] → demo-api can't connect to database
        ↓
[Monitor] SRE Sentinel detects anomaly in logs (< 5s)
        ↓
[Analyze] Cerebras classifies severity: CRITICAL
        ↓
[Deep Analysis] Llama 4 Scout analyzes full context
        ↓
[Identify] Root cause: Wrong database credentials
        ↓
[Fix] Via MCP Gateway → Docker restart demo-api
        ↓
[Verify] Check container health & logs
        ↓
[Report] Generate human-friendly explanation
        ↓
[Resolved] ✓ Total time: ~30 seconds

When you run the break script, SRE Sentinel:

Detects the database connection failure in < 5 seconds
Analyzes thousands of log lines with Llama 4 Scout
Identifies the root cause (wrong database credentials)
Automatically restarts the container via MCP Gateway
Verifies everything is healthy
Generates a stakeholder-friendly explanation

All in under 30 seconds. Zero human intervention required.

Dashboard Monitoring

Key Learnings

Speed Matters in AI

Cerebras processes AI inferences at 1,800+ tokens per second. This meant I could analyze logs in real-time - the difference between catching a problem early or having a full outage.

Context is Everything

Llama 4 Scout's large context window was crucial. It could understand the entire history, read docker-compose configurations, see environment variables, and make connections a human SRE would make.

Security Through Architecture

The MCP Gateway separates AI from direct Docker access, providing full audit trails, parameter validation, session-based security, and no credential management in AI prompts.

The Results

On October 16th, 2025, I got the news - SRE Sentinel had won the Docker MCP Gateway track!

Among 250+ submissions, my project stood out because it:

Used all three sponsor technologies innovatively
Solved a real problem that every DevOps team faces
Demonstrated cutting-edge AI capabilities
Showed secure, practical implementation
Worked end-to-end with a live demo

Technical Details

The Stack

Backend: Python 3.11 with asyncio for concurrency
AI: Cerebras (via OpenRouter) for anomaly detection, Llama 4 Scout for analysis
Container Management: Docker SDK, MCP Gateway
Event Bus: Redis with pub/sub
API: FastAPI with WebSockets
Dashboard: React with real-time updates
MCP Servers: Node.js with @modelcontextprotocol/sdk

Key Innovations

Dynamic Tool Discovery: The system discovers available MCP tools at runtime
Intelligent Secret Redaction: Uses Cerebras to identify and redact secrets before AI analysis
Incident Deduplication: Tracks active incidents to avoid duplicates
Priority-Based Fix Execution: Executes high-priority fixes first with verification
Event-Driven Architecture: Everything is an event - observable and testable

Performance

Anomaly Detection: < 2 seconds for 200 log lines
Root Cause Analysis: 5-10 seconds for full incident context
Fix Execution: 1-3 seconds per action
Total Incident Resolution: 20-45 seconds from detection to verification

Performance Metrics

Acknowledgments

Collaboration Acknowledgments

This wouldn't have been possible without:

WeMakeDevs for organizing an amazing hackathon
Cerebras for providing the fastest AI inference I've ever used
Meta for Llama models that made intelligent analysis possible
Docker for the innovative MCP Gateway architecture
The open-source community for incredible tools and libraries

Final Thoughts

When I started coding on that flight on October 2nd, I had no idea that three days later I'd be submitting a project that would win. SRE Sentinel demonstrates how AI can fundamentally change DevOps - detecting problems before users notice, understanding full infrastructure context, applying fixes automatically, and explaining everything clearly.

If you're thinking about your own ideas, my advice is simple: Start building. Don't wait for the perfect moment. Sometimes you just need to open your laptop and start coding - even if it's on a flight to Delhi.

Resources

GitHub: SRE Sentinel Repository
Demo Video:
Docker MCP Gateway: Official Documentation

Connect

X/Twitter: @aryankumar877
LinkedIn: Aryan Kumar
GitHub: aryan877

This blog post chronicles my journey building SRE Sentinel for the FutureStack GenAI Hackathon (September 29 - October 5, 2025). The project won the Docker MCP Gateway track among 250+ submissions.

From a Flight to Victory: How I Won the FutureStack GenAI Hackathon

From a Flight to Victory: How I Won the FutureStack GenAI Hackathon

The Real Beginning

The Idea: SRE Sentinel

Building in Three Days

Day 1: Core Monitoring System

Day 2: MCP Gateway Integration

Day 3: Root Cause Analysis

The Architecture

1. Real-Time Monitoring

2. Lightning-Fast Anomaly Detection

3. Deep Root Cause Analysis

4. Automated Remediation

5. Health Verification & Reporting

The Demo

Key Learnings

Speed Matters in AI

Context is Everything

Security Through Architecture

The Results

Technical Details

The Stack

Key Innovations

Performance

Acknowledgments

Final Thoughts

Resources

Connect

Comments (0)