Securing OpenClawd in the Cloud: A Defense-in-Depth Walkthrough
Personally, I fear the damage an unrestrained AI Agent can do too much to ever allow it to handle my personal credentials. Nonetheless, I stumbled across this great video from Kai Lentit (https://www.youtube.com/watch?v=40SnEd1RWUU) that gives an unironically good tutorial for a secure VPS deployment of Openclaw with some good comedy. I had been seeking an excuse to experiment with Terraform and Ansible for awhile, an endeavor I have not had the opportunity to perform at work, and thus I decided to generally replicate the process described in this video using TerraForm and Ansible instead of manually configuring it.
By “secure” I mean, of course, as secure as these agents can be. One still needs to exercise immense caution when using it and, ideally, creates emails, accounts, etc., solely for use with Openclaw and which don’t have access to sensitive information.
The Threat Model
Before diving into controls, it helps to name what we’re defending against:
- Credential theft — bot tokens, API keys, and secrets leaking or being exfiltrated.
- Unauthorized access — someone gaining a shell on the VM.
- Brute-force attacks — automated SSH login attempts (they start within minutes of a public IP going live).
- Lateral movement — an attacker who compromises the application escalating to root or pivoting to other services.
- Unpatched vulnerabilities — known CVEs sitting unpatched because no one remembered to run
apt upgrade.
Every choice below maps back to one or more of these threats.
Layer 1: Network Perimeter — NSG + WireGuard
The design
The Azure Network Security Group (NSG) only allows two inbound ports:
| Port | Protocol | Purpose |
|---|---|---|
| 22/tcp | SSH | Bootstrap only — removed after first Ansible run |
| 51820/udp | WireGuard | VPN tunnel |
Both rules scope the source address to a configurable CIDR (allowed_ssh_cidr), so even during initial setup, only your IP can reach the VM.
security_rule {
name = "SSH-Bootstrap"
priority = 1001
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "22"
source_address_prefix = var.allowed_ssh_cidr
destination_address_prefix = "*"
}
After Ansible finishes, the SSH-Bootstrap rule is deleted from the Terraform config and terraform apply closes port 22 for good. From that point on, the only way to reach the machine is through the WireGuard tunnel.
Why it matters
A publicly reachable SSH port is the single most scanned service on the internet. By removing it entirely post-bootstrap and funneling all management traffic through WireGuard, we eliminate the most common entry vector. The VPN uses modern cryptography (Noise protocol framework, Curve25519, ChaCha20-Poly1305) and has a tiny attack surface — unauthenticated packets are silently dropped, so port scanners don’t even get a response.
Layer 2: Host Firewall — UFW Default Deny
The NSG is a cloud-level control. Defense in depth means we don’t trust a single layer, so the VM itself runs UFW with a default-deny incoming policy:
- name: Set UFW default deny incoming
community.general.ufw:
direction: incoming
default: deny
Only three ports are punched through:
- 22/tcp — temporarily, for bootstrap (mirrors the NSG rule).
- 2222/tcp from 10.100.0.0/24 only — SSH, but only from the WireGuard subnet.
- 51820/udp — WireGuard itself.
Even if someone misconfigures the NSG or Azure adds a surprise rule, UFW independently blocks everything else. Two firewalls, two layers, two chances to catch a mistake.
IPv6 is disabled entirely
- name: Disable IPv6 in sysctl
ansible.posix.sysctl:
name: "{{ item }}"
value: "1"
loop:
- net.ipv6.conf.all.disable_ipv6
- net.ipv6.conf.default.disable_ipv6
- net.ipv6.conf.lo.disable_ipv6
IPv6 is a common blind spot. Many firewall rules only cover IPv4, leaving IPv6 wide open. Since OpenClaw doesn’t need IPv6, disabling it at the kernel level removes the entire class of risk.
Layer 3: SSH Hardening
SSH is the admin’s front door, so it gets its own hardening pass:
Port 2222
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers claw
MaxAuthTries 3
LoginGraceTime 30
X11Forwarding no
Breaking this down:
- Non-standard port (2222) — Not security by itself, but it filters out the vast majority of automated scanners that only hit port 22. Combined with the VPN requirement, it’s another obstacle in the chain.
- Key-only authentication — Passwords can be guessed. SSH keys can’t be brute-forced in any practical sense.
disable_password_authentication = trueis also set at the Terraform level on the VM resource itself, enforcing this from the moment the machine boots. - No root login — Even if an attacker steals the
clawuser’s private key, they land in a non-root shell. Privilege escalation requiressudo, which is logged. - AllowUsers claw — Only one user can SSH in. The
azureuserbootstrap account is locked out of SSH entirely after hardening. - MaxAuthTries 3 + LoginGraceTime 30 — Three wrong attempts and the connection drops. 30 seconds to authenticate or get disconnected. This limits brute-force velocity even before fail2ban kicks in.
Two separate SSH key pairs
The project uses two different keys for two different trust levels:
id_rsa— the bootstrap key. Used by Terraform and Ansible during initial provisioning. After the WireGuard tunnel is up and port 22 is closed, this key becomes useless.claw_ed25519— the operational key. Used for day-to-day access via the VPN tunnel. Ed25519 is a modern, fast, and secure algorithm with smaller key sizes than RSA.
This separation means compromising the bootstrap key after deployment gives an attacker nothing — there’s no port to use it on.
Layer 4: fail2ban — Brute-Force Protection
Even behind a VPN, defense in depth means assuming the VPN could be bypassed:
[sshd]
enabled = true
port = 2222
maxretry = 3
bantime = 3600
findtime = 600
Three failed login attempts within 10 minutes triggers a 1-hour IP ban. This runs on the host, independent of the NSG and UFW. If an attacker somehow reaches port 2222, they get three guesses — then they’re firewalled at the OS level.
Layer 5: Least-Privilege Application Isolation
The OpenClaw bot doesn’t run as root. It doesn’t even run as a generic user with broad permissions. The systemd unit enforces strict sandboxing:
[Service]
User=claw
Group=claw
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ProtectHome=read-only
LimitNOFILE=4096
What each directive does:
| Directive | Effect |
|---|---|
User=claw | Process runs as an unprivileged user |
NoNewPrivileges=true | The process and its children can never gain new privileges (blocks setuid, capability escalation) |
PrivateTmp=true | The service gets its own /tmp, invisible to other processes |
ProtectSystem=full | /usr, /boot, and /etc are mounted read-only for this service |
ProtectHome=read-only | Home directories are visible but not writable |
LimitNOFILE=4096 | Caps the number of open file descriptors, limiting resource exhaustion attacks |
If the Node.js process is compromised — say, through a dependency supply-chain attack — the attacker lands in a sandbox where they can’t modify system files, can’t escalate privileges, and can’t touch other users’ data. This is the difference between “the bot got popped” and “the server got popped.”
Layer 6: Secrets Management
Application credentials live in /etc/openclaw.env:
- name: Create OpenClaw env file
ansible.builtin.copy:
dest: /etc/openclaw.env
owner: root
group: claw
mode: "0640"
force: false
Key details:
- Owned by root, group-readable by claw — The
clawuser (and therefore the bot process) can read the file, but can’t modify it. An attacker who compromises the bot can read the current secrets but can’t plant a backdoor in the env file. force: false— Ansible won’t overwrite existing credentials on re-runs. This prevents accidental secret wipes during configuration updates.EnvironmentFile=in systemd — Secrets are loaded directly into the process environment by systemd, not read from disk at runtime by the application. This avoids the common mistake of passing secrets as command-line arguments (which are visible in/proc).
Layer 7: Automatic Security Updates
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::Automatic-Reboot "false";
Security patches are applied automatically, daily, without human intervention. Only security-channel updates are pulled — no surprise major version bumps. Automatic reboots are disabled to avoid unexpected downtime; kernel updates that require a reboot will wait for a maintenance window.
This is a pragmatic trade-off: the vast majority of Ubuntu CVEs are fixed by package updates that don’t require a reboot; the small remainder can be handled during scheduled maintenance.
Layer 8: Backups
- name: Schedule daily backup of /opt/openclaw
ansible.builtin.cron:
name: "openclaw daily backup"
minute: "0"
hour: "3"
job: "rsync -a /opt/openclaw /backups/openclaw"
user: claw
Backups aren’t traditionally a “security” control, but they’re essential for recovery from ransomware, accidental deletion, or a compromised deployment. The backup directory has restricted permissions (0750), and the backup runs as the claw user — not root — limiting what a compromised backup job could access.
Layer 9: Infrastructure as Code — The Meta-Security Layer
Everything described above is codified in Terraform and Ansible. This is itself a security property:
- Auditability — Every security control is version-controlled. You can diff changes, review PRs, and trace when a firewall rule was added or removed.
- Reproducibility — A compromised server can be torn down and rebuilt from scratch in minutes.
terraform destroy && terraform apply, run the playbook, restore from backup. No hand-configured snowflake servers with mystery configs. - Drift detection —
terraform planshows if someone manually changed the NSG rules outside of code. Ansible’s declarative tasks ensure the OS config matches the playbook on every run.
The Full Picture
When you zoom out, the security architecture forms concentric rings:
Internet
└─ Azure NSG (port whitelist + IP restriction)
└─ WireGuard VPN (encrypted tunnel, crypto-authenticated)
└─ UFW host firewall (default deny, subnet-scoped SSH)
└─ SSH hardening (key-only, non-root, allowlist)
└─ fail2ban (brute-force rate limiting)
└─ systemd sandboxing (least privilege)
└─ File permissions (secrets isolation)
└─ OpenClaw application
No single layer is “the” security. Each one assumes the layer above it could fail. That’s defense in depth — and it’s what turns a “running a bot on a VM” project into something you can actually trust in production.
The full source is available on GitHub. This project is a rough implementation, and may contain vulnerabilities I didn’t account for. If you spot something that could be improved, open an issue. You can find my project here
Leave a Reply