Enhance the NETWORK-BOOT-LXC.md documentation with detailed steps for testing network boot functionality, including prerequisites, expected outcomes, and quick testing methods. Introduce a new section on monitoring network boot status on the LXC, outlining commands to check DHCP leases, dnsmasq status, and registered devices. Update the initramfs scripts to support a rescue mode for devices stuck in network-only boot, allowing users to change boot order settings. Include a new rescue script for eMMC management in the build process.
150 lines
7.9 KiB
Markdown
150 lines
7.9 KiB
Markdown
# Network boot on the provisioning LXC (eth1 = LAN, eth0 = WAN)
|
||
|
||
The provisioning LXC can provide **network boot** (PXE-style) and **internet access** to devices connected on **eth1**, while **eth0** is used as WAN for the LXC itself.
|
||
|
||
## Roles
|
||
|
||
| Interface | Role | Typical config |
|
||
|-----------|------|-----------------|
|
||
| **eth0** | WAN | DHCP or static; default route; internet for the LXC |
|
||
| **eth1** | LAN (provisioning) | Static e.g. `10.20.50.1/24`; DHCP server + TFTP server; NAT so clients get internet via eth0 |
|
||
|
||
Devices plugged into the same network as **eth1** (e.g. reTerminals with network boot enabled) will:
|
||
|
||
1. Get an IP via **DHCP** (from the LXC on eth1).
|
||
2. Get **TFTP** boot files (Raspberry Pi firmware: `start4.elf`, `fixup4.dat`, kernel, etc.) for network boot.
|
||
3. Have **internet** via NAT through the LXC (eth0).
|
||
|
||
## What you need on the LXC
|
||
|
||
1. **DHCP server** on eth1 only (e.g. **dnsmasq**), handing out addresses in e.g. `10.20.50.100`–`10.20.50.200` and advertising the TFTP server (next-server = LXC’s eth1 IP).
|
||
2. **TFTP server** (dnsmasq can provide this) with **TFTP root** containing Raspberry Pi 4 / CM4 boot files.
|
||
3. **IP forwarding** and **NAT** (nftables or iptables) so traffic from `10.20.50.0/24` is masqueraded out **eth0**.
|
||
|
||
## One-time setup (inside the LXC)
|
||
|
||
From your machine, run the setup script **on the LXC** (replace with your LXC IP if different):
|
||
|
||
```bash
|
||
# From the repo (script runs inside the LXC)
|
||
./emmc-provisioning/scripts/setup-network-boot-on-lxc.sh root@10.130.60.141
|
||
```
|
||
|
||
Or SSH into the LXC and run the script there:
|
||
|
||
```bash
|
||
ssh root@10.130.60.141
|
||
# Copy or rsync the emmc-provisioning tree into the container, then:
|
||
bash /path/to/setup-network-boot-on-lxc.sh
|
||
```
|
||
|
||
The script will:
|
||
|
||
- Install **dnsmasq** (DHCP + TFTP).
|
||
- Configure dnsmasq to listen only on **eth1**, with a DHCP range and TFTP root.
|
||
- Create `/srv/tftpboot` and **fetch Raspberry Pi 4 boot files from GitHub** (raspberrypi/firmware, `boot/` folder) if not already present.
|
||
- Enable **IPv4 forwarding** and **NAT** (nftables) so clients on eth1 use eth0 for internet.
|
||
- Enable and start the **dnsmasq** service.
|
||
|
||
## Proxmox: adding eth1 to the LXC
|
||
|
||
If you create the container by hand or want a second interface:
|
||
|
||
1. On the **Proxmox host**, add a second network device to the container, e.g.:
|
||
```bash
|
||
pct set <CTID> --net1 name=eth1,bridge=vmbr1,ip=10.20.50.1/24
|
||
```
|
||
Use the bridge that corresponds to the physical LAN where reTerminals are connected (e.g. `vmbr1` or a dedicated provisioning bridge).
|
||
|
||
2. Inside the LXC, ensure **eth1** has a static address (e.g. in `/etc/network/interfaces`):
|
||
```
|
||
auto eth1
|
||
iface eth1 inet static
|
||
address 10.20.50.1/24
|
||
```
|
||
|
||
Your current LXC already has eth0 (10.130.60.141) and eth1 (10.20.50.1); the setup script only adds DHCP, TFTP, and NAT.
|
||
|
||
## After setup: reTerminal network boot
|
||
|
||
1. Set the reTerminal **boot order** to try network first (e.g. `BOOT_ORDER=0x21`; see cloud-init/first-boot).
|
||
2. Connect the reTerminal to the **same network as the LXC’s eth1** (e.g. 10.20.50.0/24).
|
||
3. Power on; it will get an IP via DHCP and load boot files via TFTP from the LXC.
|
||
4. For **provisioning** (Backup/Deploy), the netboot environment must run **network-client/provisioning-client.sh** with `PROVISIONING_SERVER=http://10.20.50.1:5000` so it talks to the dashboard on the LXC.
|
||
|
||
## TFTP boot files (Raspberry Pi 4 / CM4)
|
||
|
||
The setup script **automatically downloads** the official Raspberry Pi firmware `boot/` folder from GitHub (https://github.com/raspberrypi/firmware) into `/srv/tftpboot` when `start4cd.elf` is missing. No manual copy is needed.
|
||
|
||
To refresh or populate TFTP without re-running the full setup:
|
||
|
||
```bash
|
||
./emmc-provisioning/scripts/populate-tftpboot-from-git.sh root@<LXC-IP>
|
||
```
|
||
|
||
(Remove `/srv/tftpboot/start4cd.elf` on the LXC first if you want a full re-fetch.)
|
||
|
||
The TFTP root contains e.g. `start4cd.elf`, `fixup4cd.dat`, `config.txt`, `cmdline.txt`, `kernel8.img`, and other boot files. For a custom kernel or initramfs (e.g. for provisioning), add or replace files in `/srv/tftpboot` and adjust `config.txt` / `cmdline.txt` as needed.
|
||
|
||
## DHCP leases
|
||
|
||
On the LXC, dnsmasq stores DHCP leases in **`/var/lib/misc/dnsmasq.leases`** (Debian/Ubuntu default). To see which devices got an IP on the provisioning LAN:
|
||
|
||
```bash
|
||
# On the LXC (or via SSH)
|
||
cat /var/lib/misc/dnsmasq.leases
|
||
```
|
||
|
||
Each line is: *expiry_epoch MAC IP hostname client_id*. Example: `1734567890 aa:bb:cc:dd:ee:ff 10.20.50.101 reterminal 01:aa:bb:cc:dd:ee:ff`
|
||
|
||
---
|
||
|
||
## Testing network boot
|
||
|
||
1. **Prerequisites**
|
||
- reTerminal has **BOOT_ORDER=0x21** (network first). Check on the device:
|
||
`ssh pi@<device-ip> 'bash -s' < emmc-provisioning/scripts/check-network-boot-priority.sh`
|
||
- LXC network-boot options are **enabled**: on the LXC run
|
||
`/opt/cm4-provisioning/toggle-network-boot-dhcp.sh status` → should print `enabled`. If not: `toggle-network-boot-dhcp.sh enable`
|
||
- reTerminal is on the **same LAN as the LXC’s eth1** (e.g. 10.20.50.0/24), Ethernet connected.
|
||
|
||
2. **Power cycle the reTerminal** (or reboot if it’s already running). It will request DHCP, get options 66/67 (TFTP server + boot file), then TFTP boot files from the LXC.
|
||
|
||
3. **What “working” looks like**
|
||
- **On the LXC**: a new lease appears in `/var/lib/misc/dnsmasq.leases` (device MAC + IP in 10.20.50.x).
|
||
- If the netboot environment runs **provisioning-client.sh** and registers with the dashboard: the device appears under **“Device detected (Network)”** on the dashboard (`http://<LXC-IP>:5000`), and you can choose Backup/Deploy.
|
||
- If you only use “plain” Pi netboot (no custom initramfs/provisioning client): you just see the DHCP lease and the device loading files via TFTP; it may boot to a minimal kernel/initramfs or NFS root depending on your TFTP config.
|
||
|
||
4. **Quick test without a reTerminal**
|
||
- From a Linux host on the same VLAN as eth1, run:
|
||
`sudo dhclient -v eth0` (or your interface) and check that you get an IP in 10.20.50.x and, if netboot is enabled, that the DHCP reply includes option 66 (next-server) and 67 (boot file).
|
||
- Or on the LXC run `tcpdump -i eth1 -n port 67 or port 68` and power on the reTerminal: you should see DHCP (Discover/Offer/Request/Ack) and then TFTP traffic.
|
||
|
||
---
|
||
|
||
## Monitoring on the LXC
|
||
|
||
| What to check | How |
|
||
|--------------|-----|
|
||
| **Network boot enabled?** | ` /opt/cm4-provisioning/toggle-network-boot-dhcp.sh status` → `enabled` or `disabled` |
|
||
| **DHCP leases** | `cat /var/lib/misc/dnsmasq.leases` — lists MAC, IP, hostname for devices that got an IP from dnsmasq on eth1 |
|
||
| **dnsmasq (DHCP/TFTP) running** | `systemctl status dnsmasq` or `service dnsmasq status` |
|
||
| **TFTP root present** | `ls -la /srv/tftpboot/` — should contain e.g. `start4cd.elf`, `fixup4cd.dat`, `config.txt`, `kernel8.img` |
|
||
| **Live DHCP/TFTP traffic** | `tcpdump -i eth1 -n port 67 or port 68 or port 69` (67/68 = DHCP, 69 = TFTP). Run while powering on a device. |
|
||
| **Dashboard – network devices** | Open `http://<LXC-IP>:5000`; under “Device detected (Network)” you see devices that have called `POST /api/register-device` (only if your netboot environment runs the provisioning client). |
|
||
| **Registered devices (raw)** | `cat /var/lib/cm4-provisioning/network_devices.json` (if the dashboard uses default path) — list of MAC, IP, action. |
|
||
|
||
Optional: enable dnsmasq query logging to see every DHCP request. Add to a config in `/etc/dnsmasq.d/` (e.g. `log-queries.conf`): `log-queries` and `log-facility=/var/log/dnsmasq.log`, then create the log file and `systemctl reload dnsmasq`. Check your distro’s dnsmasq doc for log location.
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
| Component | Where | Purpose |
|
||
|-------------|--------|--------|
|
||
| eth0 | LXC | WAN; LXC’s internet |
|
||
| eth1 | LXC | LAN; 10.20.50.1/24; DHCP + TFTP |
|
||
| dnsmasq | LXC | DHCP (on eth1) + TFTP |
|
||
| TFTP root | LXC | e.g. `/srv/tftpboot` with RPi boot files |
|
||
| NAT | LXC | 10.20.50.0/24 → eth0 so LAN has internet |
|