Files
reterminal-dm4/emmc-provisioning/docs/NETWORK-BOOT-DEPLOYMENT-FLOW.md
nearxos 7e1bf8a4c2 Add DHCP network boot management to API and UI
Enhance the dashboard API with new endpoints for managing DHCP network boot options, allowing devices to enable or disable network boot via POST requests. Update the device action handling to include a 'reboot' action, specifically for network devices. Modify the home.html template to display the current state of network boot and provide a button for disabling it. Update provisioning scripts to disable network boot after deployment or backup completion, ensuring devices boot from eMMC on the next startup. Improve user feedback and error handling throughout the changes.
2026-02-20 17:05:38 +02:00

100 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# How network boot deployment works
This describes the full flow from power-on to eMMC deploy/backup when using **network boot** with the provisioning LXC.
---
## Overview
1. **reTerminal** is set to try **network boot first** (EEPROM `BOOT_ORDER=0x21`).
2. It is connected to the **same LAN as the LXCs eth1** (e.g. 10.20.50.0/24).
3. On power-on it gets an IP via **DHCP** and loads **boot files via TFTP** from the LXC.
4. The **netboot environment** (kernel + rootfs) runs **provisioning-client.sh**, which registers with the **dashboard** and polls for an action.
5. In the **dashboard** you see the device under “Device detected (Network)” and choose **Deploy** or **Backup**.
6. The device performs the action (download image → write eMMC, or read eMMC → upload), then you can reboot to run from eMMC.
---
## Step-by-step
### 1. LXC (provisioning server)
- **eth0** = WAN (e.g. 10.130.60.141), internet for the LXC.
- **eth1** = LAN (e.g. 10.20.50.1/24):
- **dnsmasq**: DHCP on eth1 (e.g. 10.20.50.100200) and **TFTP** with next-server = 10.20.50.1, boot file = `start4cd.elf`.
- **TFTP root** `/srv/tftpboot`: Raspberry Pi 4/CM4 boot files (from GitHub: start4cd.elf, fixup4cd.dat, kernel8.img, etc.).
- **NAT**: traffic from 10.20.50.0/24 is masqueraded out eth0 so netbooted devices have internet if needed.
The **dashboard** (Flask) runs in the LXC and is reachable at e.g. `http://10.20.50.1:5000` from the LAN. The **golden image** for Deploy lives at `/var/lib/cm4-provisioning/golden.img` (same LXC or bind-mounted from host).
### 2. reTerminal (device)
- **EEPROM**: `BOOT_ORDER=0x21` (network first, then SD/eMMC). Can be set by cloud-init first-boot on an already-flashed device.
- **Network**: Ethernet connected to the same segment as the LXCs **eth1** (e.g. same switch/VLAN as 10.20.50.0/24).
- On **power-on**:
1. Pi 4/CM4 firmware does **DHCP** on the wired interface.
2. DHCP reply gives: IP (e.g. 10.20.50.100), **next-server (TFTP)** = 10.20.50.1, **boot filename** = start4cd.elf.
3. Device **TFTP**s boot files from the LXC (start4cd.elf, fixup4cd.dat, kernel, DTB, etc.).
4. It boots the **kernel** (and optionally an initramfs or NFS root). That environment must have **network**, **curl**, and **provisioning-client.sh**.
### 3. Netboot root / environment
The **TFTP**-loaded kernel (and optional initramfs/NFS root) must end up in an environment where:
- The device has an IP on the same LAN as the LXC (already from DHCP).
- **provisioning-client.sh** is present and run (e.g. from init, a login script, or a systemd service).
- **PROVISIONING_SERVER** is set to the dashboard URL on the LXCs LAN IP, e.g.
`PROVISIONING_SERVER=http://10.20.50.1:5000`
So the “netboot environment” is either:
- A **custom initramfs** (recommended): build with **network-boot-initramfs/build.sh**, copy **initrd.img** to the TFTP root, and add `initramfs initrd.img followkernel` to **config.txt**. The initramfs brings up the network and runs the provisioning client. See **network-boot-initramfs/README.md**.
- A **minimal rootfs** (e.g. NFS) that runs the client script at boot, or
- Any other setup that gets the client running with network and the right `PROVISIONING_SERVER`.
### 4. Provisioning client (on the device)
- **provisioning-client.sh**:
1. **Registers**: `POST /api/register-device` with MAC and IP.
2. **Polls**: `GET /api/device-action-poll?mac=...` every few seconds.
3. When the dashboard returns **action = deploy** (with **url**):
downloads the image from **url** and runs `dd of=/dev/mmcblk0`.
4. When the dashboard returns **action = backup** (with **upload_url**):
runs `dd if=/dev/mmcblk0` and POSTs the stream to **upload_url**.
5. Then exits (and you can reboot to eMMC after deploy).
### 5. Dashboard (your actions)
- You open the dashboard at `http://10.20.50.1:5000` (or the LXCs WAN IP if youre not on the provisioning LAN).
- Under **“Device detected (Network)”** you see the device (identified by MAC).
- You click **Deploy**, **Backup**, or **Disable network boot**.
- **Deploy** / **Backup**: the dashboard sets the action and URL; the client runs dd + curl, then calls **/api/action-done**, which **disables DHCP network-boot options** on the LXC so the device will boot from eMMC on the next reboot. No need to unplug ethernet.
- **Disable network boot**: turns off DHCP options 66/67 (next-server, boot file) on the LXC. The DHCP server keeps running; devices just stop receiving netboot and will boot from local storage (eMMC) next time. Use this when you don't want to deploy or backup; the netbooted device can then reboot and boot from eMMC.
---
## Data flow summary
| Stage | Where | What happens |
|-------------|--------------|--------------|
| Boot | reTerminal | DHCP (get IP + next-server + boot file), then TFTP (load start4cd.elf, kernel, etc.). |
| Boot | reTerminal | Kernel (and netboot root) start; run **provisioning-client.sh** with `PROVISIONING_SERVER=http://10.20.50.1:5000`. |
| Register | Device → LXC | POST /api/register-device (MAC, IP). |
| Poll | Device → LXC | GET /api/device-action-poll?mac=... every 5 s. |
| Your choice | You → LXC | In dashboard: click Deploy or Backup for that device. |
| Deploy | LXC → device | Client GETs image URL, streams to `dd of=/dev/mmcblk0`. |
| Backup | Device → LXC | Client `dd if=/dev/mmcblk0` and POSTs to upload_url. |
| After | Device → LXC | Client calls **POST /api/action-done**; server disables DHCP netboot options. |
| After | reTerminal | Reboot; device boots from eMMC (no netboot advertised). |
---
## What you need in place
- **LXC**: eth1 = 10.20.50.1/24, dnsmasq (DHCP + TFTP on eth1; netboot options 66/67 in a separate snippet so they can be toggled), `/srv/tftpboot` with RPi 4 boot files, NAT for 10.20.50.0/24 via eth0. Toggle script **/opt/cm4-provisioning/toggle-network-boot-dhcp.sh** (enable/disable/status). Dashboard running, `golden.img` present for Deploy.
See **NETWORK-BOOT-LXC.md** and **setup-network-boot-on-lxc.sh**.
- **reTerminal**: EEPROM boot order = network first; Ethernet on 10.20.50.0/24; netboot environment that runs **provisioning-client.sh** with `PROVISIONING_SERVER=http://10.20.50.1:5000`.
- **Netboot root**: Must provide network, curl, and the client script (NFS, initramfs, or custom root).
The **TFTP** setup only gets the Pi to boot a kernel (and optional root). The **provisioning** (Deploy/Backup) is done by that kernels environment running the **network-client** against the dashboard on the LXC.