How We Fixed the “First Web Container is Unhealthy” Error: A DNS Deep Dive

The Error That Nearly Broke Our Deployment

Three hours into our Kamal deployment, we were stuck in a loop:

ERROR Failed to boot web on {ip_address}
  INFO First web container is unhealthy on {ip_address}, not booting any other roles

The container would start, but Kamal’s health check kept failing. After 30 seconds, Kamal would kill the container
and retry, creating an endless loop.

We spent hours debugging deployment scripts, PostgreSQL configurations, and Rails settings. The fix turned out to
be much simpler: DNS configuration.

The Root Cause: Broken DNS Resolution

What Was Happening

When Kamal tried to verify container health, it performed this sequence:

Container starts → my_app-web-abc123 boots
Traefik (Kamal proxy) tries to check /up endpoint
DNS lookup → Resolve my_app-web-abc123 to an IP address
Health check fails → DNS resolution times out or fails
Container killed → Kamal marks it as unhealthy

The DNS Failure

The Traefik container’s /etc/resolv.conf showed:

nameserver 127.0.0.53
  search members.linode.com
  options edns0 trust-ad ndots:0

Problem: 127.0.0.53 is the host’s systemd-resolved DNS server. It’s not accessible from inside
Docker containers!

When Traefik tried to resolve my_app-web-abc123:

It queried 127.0.0.53 (systemd-resolved)
The query failed with “connection refused”
Health check failed
Container was killed

The Solution: Proper Docker DNS Configuration

What We Fixed

We configured Docker’s DNS settings in /etc/docker/daemon.json:

{
    "dns": ["127.0.0.11", "8.8.8.8", "1.1.1.1"]
  }

Why This Works

1. 127.0.0.11 (Docker’s Internal DNS) – First Priority

Resolves container hostnames automatically
Handles inter-container communication
Always available inside Docker networks

2. 8.8.8.8 (Google DNS) – Second Priority

Resolves external domains (APIs, gems, etc.)
Fast and reliable
Global infrastructure

3. 1.1.1.1 (Cloudflare DNS) – Third Priority

Privacy-focused external DNS
Backup if 8.8.8.8 fails
No query logging

How Docker Uses This

Docker’s DNS resolution order:

Try 127.0.0.11 (internal) → container names
If that fails → 8.8.8.8 (external) → domains
If that fails → 1.1.1.1 (external) → domains

The IPv4/IPv6 Issue

While debugging, we discovered another subtle problem:

The IPv6 Trap

The server setup script used:

SERVER_IP=$(curl -s ifconfig.me || echo "ip_address_goes_here")

Problem: ifconfig.me returned an IPv6 address:

2600:3c03::...

This IPv6 address was used in PostgreSQL’s pg_hba.conf:

host my_app_production my_app_user 2600:3c03.../32 md5

PostgreSQL had issues with this IPv6 address, causing authentication failures.

The Fix

Force IPv4 detection:

SERVER_IP=$(curl -s -4 ifconfig.me || echo "ip_address_goes_here")

The -4 flag ensures we always get an IPv4 address, which PostgreSQL handles reliably.

The PostgreSQL Network Isolation Issue

The Problem

Kamal uses a separate Docker network (172.18.0.0/16) for containers, while PostgreSQL is on the host’s Docker
bridge network (172.17.0.0/16).

The firewall only allowed 172.17.0.0/16:

5432/tcp  ALLOW  172.17.0.0/16

The Fix

Add the Kamal network to both firewall and PostgreSQL config:

Firewall (ufw):

sudo ufw allow from 172.18.0.0/16 to any port 5432

PostgreSQL (pg_hba.conf):

host my_app_production my_app_user 172.18.0.0/16 md5

Complete Fix in our setup script

IPv4 Fix

SERVER_IP=$(curl -s -4 ifconfig.me || echo "ip_address_goes_here")

Kamal Network Firewall Rule

sudo ufw allow from 172.18.0.0/16 to any port 5432

PostgreSQL Kamal Network Rule

host $DB_NAME $DB_USER 172.18.0.0/16 md5

Docker DNS Configuration

{
    "dns": ["127.0.0.11", "8.8.8.8", "1.1.1.1"]
  }

Key Takeaways

DNS is Critical for Container Orchestration
- Always configure Docker’s DNS properly
- Include both internal and external DNS servers
- Test DNS resolution from containers
Network Isolation Matters
- Docker networks are isolated by default
- PostgreSQL must allow connections from all Docker networks
- Firewall rules must match
IPv4 vs IPv6 Can Break Things
- PostgreSQL works better with IPv4
- Force IPv4 when detecting server IPs
- Test both IPv4 and IPv6 connectivity
Health Checks are Essential
- The /up endpoint is critical for Kamal
- DNS must work for health checks to succeed
- Timeout settings matter (30s default)

Troubleshooting DNS Issues

If you encounter “First web container is unhealthy”:

Check Container Logs
docker logs my_app-web-abc123
Check Traefik/Kamal Proxy Logs
docker logs kamal-proxy | grep -i healthcheck

Test DNS Resolution

# From inside Traefik container
  docker exec kamal-proxy getent hosts my_app-web-abc123
  docker exec kamal-proxy getent hosts google.com

Verify DNS Configuration

# Check daemon.json
  cat /etc/docker/daemon.json

  # Check container's resolv.conf
  docker exec kamal-proxy cat /etc/resolv.conf

Check PostgreSQL Connectivity

# From kamal network
  docker run --rm --network kamal postgres:16 psql \
    -h 172.17.0.1 -U my_app_user -d my_app_production -c "SELECT 1"

Results

After implementing all fixes:

✅ DNS resolution works (internal and external)
✅ Health checks pass (Traefik can reach containers)
✅ PostgreSQL connections work (from both Docker networks)
✅ Deployments succeed (consistent, reliable)
✅ IPv4 detection works (no IPv6 issues)

Final Thoughts

The “First web container is unhealthy” error can be a DNS configuration issue, not a deployment or application
problem.

By understanding how Docker networks work, how DNS resolution functions, and how PostgreSQL authentication works, we can prevent this issue from ever occurring again.

Key files to review:

/etc/docker/daemon.json – Docker DNS configuration
/etc/postgresql/16/main/pg_hba.conf – PostgreSQL authentication
/etc/ufw/rules.conf – Firewall rules

The fix is now automated in our setup script, ensuring new servers have proper DNS and network configuration from
day one.

Other Things

How We Fixed the “First Web Container is Unhealthy” Error: A DNS Deep Dive

The Error That Nearly Broke Our Deployment

The Root Cause: Broken DNS Resolution

What Was Happening

The DNS Failure

The Solution: Proper Docker DNS Configuration

What We Fixed

Why This Works

How Docker Uses This

The IPv4/IPv6 Issue

The IPv6 Trap

The Fix

The PostgreSQL Network Isolation Issue

The Problem

The Fix

Complete Fix in our setup script

IPv4 Fix

Kamal Network Firewall Rule

PostgreSQL Kamal Network Rule

Docker DNS Configuration

Key Takeaways

Troubleshooting DNS Issues

Results

Final Thoughts

Leave a Reply Cancel reply