Securing OpenClawd in the Cloud: A Defense-in-Depth Walkthrough

Personally, I fear the damage an unrestrained AI Agent can do too much to ever allow it to handle my personal credentials. Nonetheless, I stumbled across this great video from Kai Lentit (https://www.youtube.com/watch?v=40SnEd1RWUU) that gives an unironically good tutorial for a secure VPS deployment of Openclaw with some good comedy. I had been seeking an excuse to experiment with Terraform and Ansible for awhile, an endeavor I have not had the opportunity to perform at work, and thus I decided to generally replicate the process described in this video using TerraForm and Ansible instead of manually configuring it.

By “secure” I mean, of course, as secure as these agents can be. One still needs to exercise immense caution when using it and, ideally, creates emails, accounts, etc., solely for use with Openclaw and which don’t have access to sensitive information.

The Threat Model

Before diving into controls, it helps to name what we’re defending against:

Credential theft — bot tokens, API keys, and secrets leaking or being exfiltrated.
Unauthorized access — someone gaining a shell on the VM.
Brute-force attacks — automated SSH login attempts (they start within minutes of a public IP going live).
Lateral movement — an attacker who compromises the application escalating to root or pivoting to other services.
Unpatched vulnerabilities — known CVEs sitting unpatched because no one remembered to run apt upgrade.

Every choice below maps back to one or more of these threats.

Layer 1: Network Perimeter — NSG + WireGuard

The design

The Azure Network Security Group (NSG) only allows two inbound ports:

Port	Protocol	Purpose
22/tcp	SSH	Bootstrap only — removed after first Ansible run
51820/udp	WireGuard	VPN tunnel

Both rules scope the source address to a configurable CIDR (allowed_ssh_cidr), so even during initial setup, only your IP can reach the VM.

security_rule {
  name                       = "SSH-Bootstrap"
  priority                   = 1001
  direction                  = "Inbound"
  access                     = "Allow"
  protocol                   = "Tcp"
  source_port_range          = "*"
  destination_port_range     = "22"
  source_address_prefix      = var.allowed_ssh_cidr
  destination_address_prefix = "*"
}

After Ansible finishes, the SSH-Bootstrap rule is deleted from the Terraform config and terraform apply closes port 22 for good. From that point on, the only way to reach the machine is through the WireGuard tunnel.

Why it matters

A publicly reachable SSH port is the single most scanned service on the internet. By removing it entirely post-bootstrap and funneling all management traffic through WireGuard, we eliminate the most common entry vector. The VPN uses modern cryptography (Noise protocol framework, Curve25519, ChaCha20-Poly1305) and has a tiny attack surface — unauthenticated packets are silently dropped, so port scanners don’t even get a response.

Layer 2: Host Firewall — UFW Default Deny

The NSG is a cloud-level control. Defense in depth means we don’t trust a single layer, so the VM itself runs UFW with a default-deny incoming policy:

- name: Set UFW default deny incoming
  community.general.ufw:
    direction: incoming
    default: deny

Only three ports are punched through:

22/tcp — temporarily, for bootstrap (mirrors the NSG rule).
2222/tcp from 10.100.0.0/24 only — SSH, but only from the WireGuard subnet.
51820/udp — WireGuard itself.

Even if someone misconfigures the NSG or Azure adds a surprise rule, UFW independently blocks everything else. Two firewalls, two layers, two chances to catch a mistake.

IPv6 is disabled entirely

- name: Disable IPv6 in sysctl
  ansible.posix.sysctl:
    name: "{{ item }}"
    value: "1"
  loop:
    - net.ipv6.conf.all.disable_ipv6
    - net.ipv6.conf.default.disable_ipv6
    - net.ipv6.conf.lo.disable_ipv6

IPv6 is a common blind spot. Many firewall rules only cover IPv4, leaving IPv6 wide open. Since OpenClaw doesn’t need IPv6, disabling it at the kernel level removes the entire class of risk.

Layer 3: SSH Hardening

SSH is the admin’s front door, so it gets its own hardening pass:

Port 2222
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers claw
MaxAuthTries 3
LoginGraceTime 30
X11Forwarding no

Breaking this down:

Non-standard port (2222) — Not security by itself, but it filters out the vast majority of automated scanners that only hit port 22. Combined with the VPN requirement, it’s another obstacle in the chain.
Key-only authentication — Passwords can be guessed. SSH keys can’t be brute-forced in any practical sense. disable_password_authentication = true is also set at the Terraform level on the VM resource itself, enforcing this from the moment the machine boots.
No root login — Even if an attacker steals the claw user’s private key, they land in a non-root shell. Privilege escalation requires sudo, which is logged.
AllowUsers claw — Only one user can SSH in. The azureuser bootstrap account is locked out of SSH entirely after hardening.
MaxAuthTries 3 + LoginGraceTime 30 — Three wrong attempts and the connection drops. 30 seconds to authenticate or get disconnected. This limits brute-force velocity even before fail2ban kicks in.

Two separate SSH key pairs

The project uses two different keys for two different trust levels:

id_rsa — the bootstrap key. Used by Terraform and Ansible during initial provisioning. After the WireGuard tunnel is up and port 22 is closed, this key becomes useless.
claw_ed25519 — the operational key. Used for day-to-day access via the VPN tunnel. Ed25519 is a modern, fast, and secure algorithm with smaller key sizes than RSA.

This separation means compromising the bootstrap key after deployment gives an attacker nothing — there’s no port to use it on.

Layer 4: fail2ban — Brute-Force Protection

Even behind a VPN, defense in depth means assuming the VPN could be bypassed:

[sshd]
enabled = true
port = 2222
maxretry = 3
bantime = 3600
findtime = 600

Three failed login attempts within 10 minutes triggers a 1-hour IP ban. This runs on the host, independent of the NSG and UFW. If an attacker somehow reaches port 2222, they get three guesses — then they’re firewalled at the OS level.

Layer 5: Least-Privilege Application Isolation

The OpenClaw bot doesn’t run as root. It doesn’t even run as a generic user with broad permissions. The systemd unit enforces strict sandboxing:

[Service]
User=claw
Group=claw
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ProtectHome=read-only
LimitNOFILE=4096

What each directive does:

Directive	Effect
`User=claw`	Process runs as an unprivileged user
`NoNewPrivileges=true`	The process and its children can never gain new privileges (blocks `setuid`, capability escalation)
`PrivateTmp=true`	The service gets its own `/tmp`, invisible to other processes
`ProtectSystem=full`	`/usr`, `/boot`, and `/etc` are mounted read-only for this service
`ProtectHome=read-only`	Home directories are visible but not writable
`LimitNOFILE=4096`	Caps the number of open file descriptors, limiting resource exhaustion attacks

If the Node.js process is compromised — say, through a dependency supply-chain attack — the attacker lands in a sandbox where they can’t modify system files, can’t escalate privileges, and can’t touch other users’ data. This is the difference between “the bot got popped” and “the server got popped.”

Layer 6: Secrets Management

Application credentials live in /etc/openclaw.env:

- name: Create OpenClaw env file
  ansible.builtin.copy:
    dest: /etc/openclaw.env
    owner: root
    group: claw
    mode: "0640"
    force: false

Key details:

Owned by root, group-readable by claw — The claw user (and therefore the bot process) can read the file, but can’t modify it. An attacker who compromises the bot can read the current secrets but can’t plant a backdoor in the env file.
force: false — Ansible won’t overwrite existing credentials on re-runs. This prevents accidental secret wipes during configuration updates.
EnvironmentFile= in systemd — Secrets are loaded directly into the process environment by systemd, not read from disk at runtime by the application. This avoids the common mistake of passing secrets as command-line arguments (which are visible in /proc).

Layer 7: Automatic Security Updates

Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::Automatic-Reboot "false";

Security patches are applied automatically, daily, without human intervention. Only security-channel updates are pulled — no surprise major version bumps. Automatic reboots are disabled to avoid unexpected downtime; kernel updates that require a reboot will wait for a maintenance window.

This is a pragmatic trade-off: the vast majority of Ubuntu CVEs are fixed by package updates that don’t require a reboot; the small remainder can be handled during scheduled maintenance.

Layer 8: Backups

- name: Schedule daily backup of /opt/openclaw
  ansible.builtin.cron:
    name: "openclaw daily backup"
    minute: "0"
    hour: "3"
    job: "rsync -a /opt/openclaw /backups/openclaw"
    user: claw

Backups aren’t traditionally a “security” control, but they’re essential for recovery from ransomware, accidental deletion, or a compromised deployment. The backup directory has restricted permissions (0750), and the backup runs as the claw user — not root — limiting what a compromised backup job could access.

Layer 9: Infrastructure as Code — The Meta-Security Layer

Everything described above is codified in Terraform and Ansible. This is itself a security property:

Auditability — Every security control is version-controlled. You can diff changes, review PRs, and trace when a firewall rule was added or removed.
Reproducibility — A compromised server can be torn down and rebuilt from scratch in minutes. terraform destroy && terraform apply, run the playbook, restore from backup. No hand-configured snowflake servers with mystery configs.
Drift detection — terraform plan shows if someone manually changed the NSG rules outside of code. Ansible’s declarative tasks ensure the OS config matches the playbook on every run.

The Full Picture

When you zoom out, the security architecture forms concentric rings:

Internet
  └─ Azure NSG (port whitelist + IP restriction)
       └─ WireGuard VPN (encrypted tunnel, crypto-authenticated)
            └─ UFW host firewall (default deny, subnet-scoped SSH)
                 └─ SSH hardening (key-only, non-root, allowlist)
                      └─ fail2ban (brute-force rate limiting)
                           └─ systemd sandboxing (least privilege)
                                └─ File permissions (secrets isolation)
                                     └─ OpenClaw application

No single layer is “the” security. Each one assumes the layer above it could fail. That’s defense in depth — and it’s what turns a “running a bot on a VM” project into something you can actually trust in production.

The full source is available on GitHub. This project is a rough implementation, and may contain vulnerabilities I didn’t account for. If you spot something that could be improved, open an issue. You can find my project here

Deploying OpenClawd to a secure cloud-based VPS with TerraForm/Ansible

Securing OpenClawd in the Cloud: A Defense-in-Depth Walkthrough

The Threat Model

Layer 1: Network Perimeter — NSG + WireGuard

The design

Why it matters

Layer 2: Host Firewall — UFW Default Deny

IPv6 is disabled entirely

Layer 3: SSH Hardening

Two separate SSH key pairs

Layer 4: fail2ban — Brute-Force Protection

Layer 5: Least-Privilege Application Isolation

Layer 6: Secrets Management

Layer 7: Automatic Security Updates

Layer 8: Backups

Layer 9: Infrastructure as Code — The Meta-Security Layer

The Full Picture

Comments

Leave a Reply Cancel reply

More posts

CPTS Journey Series – Password Attacks

CPTS Journey Series – Shells & Payloads Skill Assessment

CPTS Journey Series – Footprinting

CPTS Journey Series – Info Gathering-Web Skills Assessment