How We Fixed the “First Web Container is Unhealthy” Error: A DNS Deep Dive


The Error That Nearly Broke Our Deployment

Three hours into our Kamal deployment, we were stuck in a loop:

ERROR Failed to boot web on {ip_address}
  INFO First web container is unhealthy on {ip_address}, not booting any other roles

The container would start, but Kamal’s health check kept failing. After 30 seconds, Kamal would kill the container
and retry, creating an endless loop.

We spent hours debugging deployment scripts, PostgreSQL configurations, and Rails settings. The fix turned out to
be much simpler: DNS configuration.

The Root Cause: Broken DNS Resolution

What Was Happening

When Kamal tried to verify container health, it performed this sequence:

  1. Container starts → my_app-web-abc123 boots
  2. Traefik (Kamal proxy) tries to check /up endpoint
  3. DNS lookup → Resolve my_app-web-abc123 to an IP address
  4. Health check fails → DNS resolution times out or fails
  5. Container killed → Kamal marks it as unhealthy

The DNS Failure

The Traefik container’s /etc/resolv.conf showed:

nameserver 127.0.0.53
  search members.linode.com
  options edns0 trust-ad ndots:0

Problem: 127.0.0.53 is the host’s systemd-resolved DNS server. It’s not accessible from inside
Docker containers!

When Traefik tried to resolve my_app-web-abc123:

  • It queried 127.0.0.53 (systemd-resolved)
  • The query failed with “connection refused”
  • Health check failed
  • Container was killed

The Solution: Proper Docker DNS Configuration

What We Fixed

We configured Docker’s DNS settings in /etc/docker/daemon.json:

{
    "dns": ["127.0.0.11", "8.8.8.8", "1.1.1.1"]
  }

Why This Works

1. 127.0.0.11 (Docker’s Internal DNS) – First Priority

  • Resolves container hostnames automatically
  • Handles inter-container communication
  • Always available inside Docker networks

2. 8.8.8.8 (Google DNS) – Second Priority

  • Resolves external domains (APIs, gems, etc.)
  • Fast and reliable
  • Global infrastructure

3. 1.1.1.1 (Cloudflare DNS) – Third Priority

  • Privacy-focused external DNS
  • Backup if 8.8.8.8 fails
  • No query logging

How Docker Uses This

Docker’s DNS resolution order:

  1. Try 127.0.0.11 (internal) → container names
  2. If that fails → 8.8.8.8 (external) → domains
  3. If that fails → 1.1.1.1 (external) → domains

The IPv4/IPv6 Issue

While debugging, we discovered another subtle problem:

The IPv6 Trap

The server setup script used:

SERVER_IP=$(curl -s ifconfig.me || echo "ip_address_goes_here")

Problem: ifconfig.me returned an IPv6 address:

2600:3c03::...

This IPv6 address was used in PostgreSQL’s pg_hba.conf:

host my_app_production my_app_user 2600:3c03.../32 md5

PostgreSQL had issues with this IPv6 address, causing authentication failures.

The Fix

Force IPv4 detection:

SERVER_IP=$(curl -s -4 ifconfig.me || echo "ip_address_goes_here")

The -4 flag ensures we always get an IPv4 address, which PostgreSQL handles reliably.

The PostgreSQL Network Isolation Issue

The Problem

Kamal uses a separate Docker network (172.18.0.0/16) for containers, while PostgreSQL is on the host’s Docker
bridge network (172.17.0.0/16).

The firewall only allowed 172.17.0.0/16:

5432/tcp  ALLOW  172.17.0.0/16

The Fix

Add the Kamal network to both firewall and PostgreSQL config:

Firewall (ufw):

sudo ufw allow from 172.18.0.0/16 to any port 5432

PostgreSQL (pg_hba.conf):

host my_app_production my_app_user 172.18.0.0/16 md5

Complete Fix in our setup script

IPv4 Fix

SERVER_IP=$(curl -s -4 ifconfig.me || echo "ip_address_goes_here")

Kamal Network Firewall Rule

sudo ufw allow from 172.18.0.0/16 to any port 5432

PostgreSQL Kamal Network Rule

host $DB_NAME $DB_USER 172.18.0.0/16 md5

Docker DNS Configuration

{
    "dns": ["127.0.0.11", "8.8.8.8", "1.1.1.1"]
  }

Key Takeaways

  1. DNS is Critical for Container Orchestration
    • Always configure Docker’s DNS properly
    • Include both internal and external DNS servers
    • Test DNS resolution from containers
  2. Network Isolation Matters
    • Docker networks are isolated by default
    • PostgreSQL must allow connections from all Docker networks
    • Firewall rules must match
  3. IPv4 vs IPv6 Can Break Things
    • PostgreSQL works better with IPv4
    • Force IPv4 when detecting server IPs
    • Test both IPv4 and IPv6 connectivity
  4. Health Checks are Essential
    • The /up endpoint is critical for Kamal
    • DNS must work for health checks to succeed
    • Timeout settings matter (30s default)

Troubleshooting DNS Issues

If you encounter “First web container is unhealthy”:

  1. Check Container Logs
    docker logs my_app-web-abc123
  2. Check Traefik/Kamal Proxy Logs
    docker logs kamal-proxy | grep -i healthcheck
  3. Test DNS Resolution
    # From inside Traefik container
      docker exec kamal-proxy getent hosts my_app-web-abc123
      docker exec kamal-proxy getent hosts google.com
  4. Verify DNS Configuration
    # Check daemon.json
      cat /etc/docker/daemon.json
    
      # Check container's resolv.conf
      docker exec kamal-proxy cat /etc/resolv.conf
  5. Check PostgreSQL Connectivity
    # From kamal network
      docker run --rm --network kamal postgres:16 psql \
        -h 172.17.0.1 -U my_app_user -d my_app_production -c "SELECT 1"

Results

After implementing all fixes:

  • ✅ DNS resolution works (internal and external)
  • ✅ Health checks pass (Traefik can reach containers)
  • ✅ PostgreSQL connections work (from both Docker networks)
  • ✅ Deployments succeed (consistent, reliable)
  • ✅ IPv4 detection works (no IPv6 issues)

Final Thoughts

The “First web container is unhealthy” error can be a DNS configuration issue, not a deployment or application
problem.

By understanding how Docker networks work, how DNS resolution functions, and how PostgreSQL authentication works, we can prevent this issue from ever occurring again.

Key files to review:

  • /etc/docker/daemon.json – Docker DNS configuration
  • /etc/postgresql/16/main/pg_hba.conf – PostgreSQL authentication
  • /etc/ufw/rules.conf – Firewall rules

The fix is now automated in our setup script, ensuring new servers have proper DNS and network configuration from
day one.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.