Category Archives: Uncategorized

How We Fixed the “First Web Container is Unhealthy” Error: A DNS Deep Dive


The Error That Nearly Broke Our Deployment

Three hours into our Kamal deployment, we were stuck in a loop:

ERROR Failed to boot web on {ip_address}
  INFO First web container is unhealthy on {ip_address}, not booting any other roles

The container would start, but Kamal’s health check kept failing. After 30 seconds, Kamal would kill the container
and retry, creating an endless loop.

We spent hours debugging deployment scripts, PostgreSQL configurations, and Rails settings. The fix turned out to
be much simpler: DNS configuration.

The Root Cause: Broken DNS Resolution

What Was Happening

When Kamal tried to verify container health, it performed this sequence:

  1. Container starts → my_app-web-abc123 boots
  2. Traefik (Kamal proxy) tries to check /up endpoint
  3. DNS lookup → Resolve my_app-web-abc123 to an IP address
  4. Health check fails → DNS resolution times out or fails
  5. Container killed → Kamal marks it as unhealthy

The DNS Failure

The Traefik container’s /etc/resolv.conf showed:

nameserver 127.0.0.53
  search members.linode.com
  options edns0 trust-ad ndots:0

Problem: 127.0.0.53 is the host’s systemd-resolved DNS server. It’s not accessible from inside
Docker containers!

When Traefik tried to resolve my_app-web-abc123:

  • It queried 127.0.0.53 (systemd-resolved)
  • The query failed with “connection refused”
  • Health check failed
  • Container was killed

The Solution: Proper Docker DNS Configuration

What We Fixed

We configured Docker’s DNS settings in /etc/docker/daemon.json:

{
    "dns": ["127.0.0.11", "8.8.8.8", "1.1.1.1"]
  }

Why This Works

1. 127.0.0.11 (Docker’s Internal DNS) – First Priority

  • Resolves container hostnames automatically
  • Handles inter-container communication
  • Always available inside Docker networks

2. 8.8.8.8 (Google DNS) – Second Priority

  • Resolves external domains (APIs, gems, etc.)
  • Fast and reliable
  • Global infrastructure

3. 1.1.1.1 (Cloudflare DNS) – Third Priority

  • Privacy-focused external DNS
  • Backup if 8.8.8.8 fails
  • No query logging

How Docker Uses This

Docker’s DNS resolution order:

  1. Try 127.0.0.11 (internal) → container names
  2. If that fails → 8.8.8.8 (external) → domains
  3. If that fails → 1.1.1.1 (external) → domains

The IPv4/IPv6 Issue

While debugging, we discovered another subtle problem:

The IPv6 Trap

The server setup script used:

SERVER_IP=$(curl -s ifconfig.me || echo "ip_address_goes_here")

Problem: ifconfig.me returned an IPv6 address:

2600:3c03::...

This IPv6 address was used in PostgreSQL’s pg_hba.conf:

host my_app_production my_app_user 2600:3c03.../32 md5

PostgreSQL had issues with this IPv6 address, causing authentication failures.

The Fix

Force IPv4 detection:

SERVER_IP=$(curl -s -4 ifconfig.me || echo "ip_address_goes_here")

The -4 flag ensures we always get an IPv4 address, which PostgreSQL handles reliably.

The PostgreSQL Network Isolation Issue

The Problem

Kamal uses a separate Docker network (172.18.0.0/16) for containers, while PostgreSQL is on the host’s Docker
bridge network (172.17.0.0/16).

The firewall only allowed 172.17.0.0/16:

5432/tcp  ALLOW  172.17.0.0/16

The Fix

Add the Kamal network to both firewall and PostgreSQL config:

Firewall (ufw):

sudo ufw allow from 172.18.0.0/16 to any port 5432

PostgreSQL (pg_hba.conf):

host my_app_production my_app_user 172.18.0.0/16 md5

Complete Fix in our setup script

IPv4 Fix

SERVER_IP=$(curl -s -4 ifconfig.me || echo "ip_address_goes_here")

Kamal Network Firewall Rule

sudo ufw allow from 172.18.0.0/16 to any port 5432

PostgreSQL Kamal Network Rule

host $DB_NAME $DB_USER 172.18.0.0/16 md5

Docker DNS Configuration

{
    "dns": ["127.0.0.11", "8.8.8.8", "1.1.1.1"]
  }

Key Takeaways

  1. DNS is Critical for Container Orchestration
    • Always configure Docker’s DNS properly
    • Include both internal and external DNS servers
    • Test DNS resolution from containers
  2. Network Isolation Matters
    • Docker networks are isolated by default
    • PostgreSQL must allow connections from all Docker networks
    • Firewall rules must match
  3. IPv4 vs IPv6 Can Break Things
    • PostgreSQL works better with IPv4
    • Force IPv4 when detecting server IPs
    • Test both IPv4 and IPv6 connectivity
  4. Health Checks are Essential
    • The /up endpoint is critical for Kamal
    • DNS must work for health checks to succeed
    • Timeout settings matter (30s default)

Troubleshooting DNS Issues

If you encounter “First web container is unhealthy”:

  1. Check Container Logs
    docker logs my_app-web-abc123
  2. Check Traefik/Kamal Proxy Logs
    docker logs kamal-proxy | grep -i healthcheck
  3. Test DNS Resolution
    # From inside Traefik container
      docker exec kamal-proxy getent hosts my_app-web-abc123
      docker exec kamal-proxy getent hosts google.com
  4. Verify DNS Configuration
    # Check daemon.json
      cat /etc/docker/daemon.json
    
      # Check container's resolv.conf
      docker exec kamal-proxy cat /etc/resolv.conf
  5. Check PostgreSQL Connectivity
    # From kamal network
      docker run --rm --network kamal postgres:16 psql \
        -h 172.17.0.1 -U my_app_user -d my_app_production -c "SELECT 1"

Results

After implementing all fixes:

  • ✅ DNS resolution works (internal and external)
  • ✅ Health checks pass (Traefik can reach containers)
  • ✅ PostgreSQL connections work (from both Docker networks)
  • ✅ Deployments succeed (consistent, reliable)
  • ✅ IPv4 detection works (no IPv6 issues)

Final Thoughts

The “First web container is unhealthy” error can be a DNS configuration issue, not a deployment or application
problem.

By understanding how Docker networks work, how DNS resolution functions, and how PostgreSQL authentication works, we can prevent this issue from ever occurring again.

Key files to review:

  • /etc/docker/daemon.json – Docker DNS configuration
  • /etc/postgresql/16/main/pg_hba.conf – PostgreSQL authentication
  • /etc/ufw/rules.conf – Firewall rules

The fix is now automated in our setup script, ensuring new servers have proper DNS and network configuration from
day one.

Setting up a cron job on Mac

In this example, we’re setting up a job, which will run at 08:00 am daily and clean rails logs

Step 1 – create a file with what we want to run (delete_logs.sh):

#!/bin/(shell) 

cd ~/code/rails_project
/Users/username/.rbenv/shims/bundle exec rake log:clear

Setp 2 – set up the cron scheduler:

$ env EDITOR=vim crontab -e

# Add the line to run the script at 8am daily:

0 8 * * * sh ~/scripts/delete_logs.sh >/tmp/stdout.log 2>/tmp/stderr.log

Phoenix and Ecto. First Steps

Valid as of Phoenix 1.2

Show routes:
mix phoenix.routes

Generating resources:
mix phoenix.gen.html Post posts –no-model
mix phoenix.gen.json Post posts

Ecto
Types

  • :string
  • :integer
  • :map
  • :binary_id
  • :float
  • :boolean

Writing Queries
Two ways

  1. Query
    import Ecto.Query
    
    from p in context.Posts
    where p.Title.Contains("Stuff")
    select p;
    
  2. Expression
    MyApp.Post
    |> where(titlle: "Stuff")
    |> limit(1)
    

Making changes – https://hexdocs.pm/ecto/Ecto.Changeset.html
changeset = Post.changeset(post, %{title: “updated”})
Repo.update(changeset)
Repo.delete(post)

Migrations
Generate migration:
mix ecto.gen.migration [migration_name] -r [repo]

Generate schema:
mix phoenix.gen.model [Schema] [table] [fields] -r [repo]

Run/Rollback migration
mix ecto.migrate -r [repo]
mix ecto.rollback -r [repo]

Generate migration:
Does not generate schema module:
mix ecto.gen.migration [migration_name] -r [repo]

Generates both schema model and a migration:
mix phoenix.gen.model [Schema] [table] [fields] -r [repo]

To avoid specifying repo all the time:
config :my_app, ecto_repos: [MyApp.Repo]

Setting up Elixir with Atom on Linux

Sublime is a pretty nice editor for working with Elixir, but I just can’t get the autocomplete plugin to work. So, giving Atom a try.

Install the following packages:

atom-elixir – https://atom.io/packages/atom-elixir
language-elixir – https://atom.io/packages/language-elixir
linter (apm install linter) – https://atom.io/packages/linter
linter-elixirc – https://atom.io/packages/linter-elixirc
script – https://atom.io/packages/script

I use script to build within Atom (shift-ctrl-b). The only issue is that it seems to use the top-level folder of your project as a base path. So, if you try to do something like Code.load_file(“file.exs”) in one of your nested directories, it’ll try to load it from the top level directory.

Addendum for Mac
brew install elixir
brew install erlang

Set elixir path for autocomplete-elixir (if using Atom):
/usr/local/bin/elixir

Parsing jQuery Unobtrusive Validation Parameters

Client-side parsing of unubtrusive jQuery validation parameters. Assuming we are setting up an unobtrusive validator named “validatorname” and want to pass “parameterfromserver” from the the server:

$.validator.unobtrusive.adapters.add('validatorname', ['parameterfromserver'], function(options) {
	options.rules['validatorname'] = {
		parameterfromserver: options.params['parameterfromserver']
	};
	options.messages['validatorname'] = options.message;
});
$.validator.addMethod('validatorname', function(value, element, parameters) {
	var hereIsOurParameter = parameters.parameterfromserver;
});

One Hand Typing

If you have to type with one hand, AutoHotkey is a free option to turn your regular keyboard into a half mirror keyboard, where each key on one side is mirrored to the other (activate it by pressing Space key first).

Here’s a script, copied from AutoHotkey forum with slight modifications:

#SingleInstance, force
; Half-QWERTY: One-handed Typing - version 3a
; http://www.autohotkey.com/forum/viewtopic.php?p=228783#228783
;
; HalfKeyboard invented by Matias Corporation between 1992 and 1996
; Originally coded in AutoHotkey by jonny in 2004
; Many thanks to Chris for helping him out with this script.
; Capslock hacks and `~ remap to '" by Watcher
; This implementation was done by mbirth in 2007
;
; version 3a script, mod by hugov:
; 2008-10-31:
; - mixed with "Capitalize letters after 1 second hold" at request of Calibran
; http://www.autohotkey.com/forum/post-228311.html#228311
; just tested very briefly so try at your own peril :-)
;

KeyIsDown = 0
UpperDelay = 300
UpperDelay *= -1

RegRead KLang, HKEY_CURRENT_USER, Keyboard Layout\Preload, 1
StringRight KLang, KLang, 4
If (!KLang)
KLang := A_Language

If (KLang = "0407") {
; 0407 DE_de QWERTZ mirror set
original := "^12345qwertasdfgyxcvb"
mirrored := "ß09876poiuzölkjh-.,mn"
} Else If (KLang = "040c" || KLang = "040C") {
; 040c FR_fr AZERTY mirror set
original := "²&é" . """" . "'(azertqsdfgwxcvb" ; split up string for better
mirrored := ")àç" . "_" . "è-poiuymlkjh!:;,n" ; human readability
} Else {
; 0409 US_us QWERTY mirror set
original := "``" . "12345qwertasdfgzxcvb" ; split up string for better
mirrored := "'" . "09876poiuy;lkjh/.,mn" ; human readability
}

; Now define all hotkeys
Loop % StrLen(original)
{
c1 := SubStr(original, A_Index, 1)
c2 := SubStr(mirrored, A_Index, 1)
Hotkey Space & %c1%, DoHotkey
Hotkey Space & %c2%, DoHotkey
Hotkey %c1%, KeyDown
Hotkey %c1% UP, KeyUP
Hotkey %c2%, KeyDown ;
Hotkey %c2% UP, KeyUP ;
}

return

; This key may help, as the space-on-up may get annoying, especially if you type fast.
Control & Space::Suspend

; Not exactly mirror but as close as we can get, Capslock enter, Tab backspace.
Space & CapsLock::Send {Enter}
Space & Tab::Send {Backspace}

; If spacebar didn't modify anything, send a real space keystroke upon release.
+Space::Send {Space}
Space::Send {Space}

; General purpose
DoHotkey:
StartTime := A_TickCount
StringRight ThisKey, A_ThisHotkey, 1
i1 := InStr(original, ThisKey)
i2 := InStr(mirrored, ThisKey)
If (i1+i2 = 0) {
MirrorKey := ThisKey
} Else If (i1 > 0) {
MirrorKey := SubStr(mirrored, i1, 1)
} Else {
MirrorKey := SubStr(original, i2, 1)
}

Modifiers := ""
If (GetKeyState("LWin") || GetKeyState("RWin")) {
Modifiers .= "#"
}
If (GetKeyState("Control")) {
Modifiers .= "^"
}
If (GetKeyState("Alt")) {
Modifiers .= "!"
}
If (GetKeyState("Shift") + GetKeyState("CapsLock", "T") = 1) {
; only add if Shift is held OR CapsLock is on (XOR) (both held down would result in value of 2)
Modifiers .= "+"
}

If (KeyIsDown < 1 or ThisKey <> LastKey)
{
KeyIsDown := True
LastKey := ThisKey
Send %Modifiers%{%MirrorKey%}
SetKeyDelay, 65535
SetTimer, ReplaceWithUpperMirror, %UpperDelay%
}

Return

MenuShowKeyboardLayout:
IfWinNotExist, HalfKeyboard - permanent keyboard layout
{
Gui, +Owner +Toolwindow +AlwaysOnTop
Gui, Add, picture, x0 y0 w310 h104, %A_ScriptDir%\halfkeyboard_help.png
Gui, Show, w310 h104 NA, HalfKeyboard - permanent keyboard layout
Menu, Tray, Check, &Show Keyboard layout
}
Else
{
Gosub, GuiClose
}
Return

GuiClose:
Menu, Tray, UnCheck, &Show Keyboard layout
Gui, Destroy
Return

MenuExit:
ExitApp
Return

KeyDown:
Key:=A_ThisHotkey
If (KeyIsDown < 1 or Key <> LastKey)
{
KeyIsDown := True
LastKey := Key
Send %Key%
SetKeyDelay, 65535
SetTimer, ReplaceWithUpper, %UpperDelay%
}
Return

KeyUp:
Key:=A_ThisHotkey
SetTimer, ReplaceWithUpper, Off
SetTimer, ReplaceWithUpperMirror, Off
KeyIsDown := False
Return

ReplaceWithUpper:
SetKeyDelay, -1
Send {Backspace}+%LastKey%
Return

ReplaceWithUpperMirror:
SetKeyDelay, -1
Send {Backspace}+%MirrorKey%
Return

Exceptionless Self-Hosting Tips

Exceptionless is an awesome open source real-time error, feature, and log reporting solution that can be self-hosted for free (they provide a very reasonable pricing plans as well). Here’re some tips on self-hosting in the Windows environment.

Use these detailed instructions to set it up. Instructions are quite good. However, there are some details not mentioned in the guide which you could find helpful for self-hosting.

Elastic Search

When setting up Elastic Search, it’s a good idea to change the default location of the data folder. It can help avoid an accidental loss of your data when updating to the next version of Elastic Search. The data path can be set in the elasticsearch.yml – “path.data” property.

Rename your cluster name – “cluster.name” property. This can help prevent someone from accidentally joining your cluster.

To directly query Elastic Search, you can install Sense chrome plugin and run queries inside Chrome.

Another useful ES GUI plugin: elasticsearch-gui
To install, run cmd from ES installation bin folder: [~/elasticsearch] $ bin/plugin –install jettro/elasticsearch-gui
To access it, go to http://yourServer:9200/_plugin/gui/index.html#/dashboard (assuming default ES port 9200)

If you want to set up backups for ES, first set the repo location in elasticsearch.yml
path.repo: [“backupLocation”]

From Chrome, Sense plugin, run the following query to create a repo for your backups:

PUT /_snapshot/exceptionless_backup
{
  "type": "fs",
   "settings": {
        "compress" : true,
        "location": "backupLocation"
    }
}

When you want to create a snapshot, run:

PUT /_snapshot/exceptionless_backup/yourShapshotName?wait_for_completion=true

To delete an old snapshot:

DELETE _snapshot/exceptionless_backup/yourShapshotName

Removing Old Data
Exceptionless provides a job which will automatically remove data over a given retention period. But if you self-host, the retention period is indefinite. If you prefer to have Exceptionless remove the old automatically, you can modify the retention period directly in the Elastic Search. Run the following ES query to set retention period to 45 days:

POST /organizations-v1/organization/YourOrganizationId/_update '{ "doc" : 
{
    "doc" : {        
 "_source": {
     "retention_days": 45
   }
    }
}

You can use Elasticsearch-gui plugin to figure out your organization Id.

Weirdness with IE 11.
For some reason IE 11 did not want to auto refresh itself. So, you could add these headers at IIS level:
Cache-Control:no-cache, no-store
Pragma:no-cache
This comes at the price of more hits to your server. But I couldn’t figure out the IE issue.

Redis
If you use Redis (recommended), this is how you specify the connection string:

<add name="RedisConnectionString" connectionString="serverName:6379" />

The important part is that there is no http in the server name