# Network boot troubleshooting: no DHCP/TFTP during boot, only after OS is up If you run **tcpdump** during power-on but see **no DHCP/TFTP traffic during boot**, and only see traffic **after** the device has booted to the OS, the reTerminal is almost certainly **not on the same L2 segment as the LXC's eth1**. ## What’s going on - The Pi’s **bootloader** (EEPROM) sends DHCP Discover on the Ethernet port when it tries network boot. - That request only reaches interfaces on the **same VLAN / same bridge** (same cable/switch segment). - dnsmasq in the LXC listens only on **eth1** (provisioning LAN). - If the reTerminal is plugged into the **main office LAN** (or the same segment as the LXC’s **eth0**), the netboot DHCP **never reaches eth1** — so you see no DHCP/TFTP on eth1 during boot. - After the OS boots, it uses the same Ethernet port and gets an IP from the main LAN; you then see traffic (e.g. on eth0 or from the device’s new IP). That’s why you only see traffic “after the device boots to OS”. ## What to do ### 1. Confirm which interface sees the boot-time DHCP On the LXC, run tcpdump on **both** interfaces in two terminals (or run one in background): ```bash # Terminal 1: provisioning LAN (where netboot should happen) tcpdump -i eth1 -n -e port 67 or port 68 or port 69 # Terminal 2: WAN / main LAN tcpdump -i eth0 -n -e port 67 or port 68 or port 69 ``` Then **power off** the reTerminal and **power it on**. Watch where DHCP (and TFTP) appear: - If you see DHCP **only on eth0** during boot → the reTerminal is on the same segment as **eth0**, not eth1. So netboot is not using your LXC’s dnsmasq; the device may get an IP from another DHCP server and fall back to eMMC boot. - If you see DHCP **on eth1** during boot → the reTerminal is on the provisioning segment; you should then see TFTP (port 69) as well. ### 2. Fix: put the reTerminal on the same segment as eth1 - The reTerminal’s Ethernet cable must be connected to the **provisioning** segment: the same VLAN or bridge as the LXC’s **eth1** (e.g. 10.20.50.0/24). - On Proxmox, eth1 is often on a **dedicated bridge** (e.g. `vmbr1`). The reTerminal must be plugged into a switch port that belongs to that same bridge/VLAN. - If you have one physical switch: either put the LXC’s eth1 and the reTerminal in the same VLAN, or use a dedicated “provisioning” port group / switch. ### 3. Sanity check: same port as reTerminal - Plug a **laptop** (or another device) into the **same port** (or same VLAN) as the reTerminal. - Run: `sudo dhclient -v ` (or let it get DHCP automatically). - If you get an IP in **10.20.50.x** → that segment is your provisioning LAN (eth1); the reTerminal should netboot from there. - If you get a different range (e.g. 192.168.x.x) → that segment is **not** the provisioning LAN; move the reTerminal’s cable or VLAN to the segment where 10.20.50.x is served. ## Summary table | Symptom | Likely cause | Action | |--------|---------------|--------| | No DHCP/TFTP on eth1 during boot; traffic only after OS | reTerminal on different segment than eth1 | Plug reTerminal into same VLAN/bridge as LXC eth1 (provisioning LAN) | | DHCP on eth0 during boot, none on eth1 | reTerminal on same segment as eth0 | Move reTerminal to provisioning segment (same as eth1) | | No DHCP on any interface during boot | Cable unplugged, BOOT_ORDER not 0x21, or device not attempting netboot | Check cable, confirm BOOT_ORDER=0x21, power cycle with cable in before power | --- ## I only see DHCP Request/Reply, and the client already has 10.20.50.x If your tcpdump on **eth1** shows something like: ```text 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 88:a2:9e:xx:xx:xx 10.20.50.1.67 > 10.20.50.147.68: BOOTP/DHCP, Reply ``` that is **not** the bootloader — it is the **OS** DHCP client (renewal or re-request). The client already has **10.20.50.147**, so this happens **after** the device has booted to the OS. - **Bootloader** (network boot): sends **DHCP Discover** (client 0.0.0.0, no IP yet), then you see **Offer**, **Request**, **Ack**, then **TFTP (port 69)** for start4cd.elf, kernel, etc. - **OS**: sends **DHCP Request** (renew/rebind, often already with an IP or requesting a known one), then **Reply** — no Discover, no TFTP. So the device **is** on the right segment (eth1, 10.20.50.x). The problem is that you are not seeing the **bootloader’s** DHCP/TFTP during the first seconds after power-on. **What to do:** 1. **Start tcpdump before power-on** Run `tcpdump -i eth1 -n -e port 67 or port 68 or port 69` on the LXC, **then** power off the reTerminal, wait a few seconds, and power it on. Capture from the first second. Look for: - **Discover** (client 0.0.0.0 → broadcast) at the very start → that’s the bootloader. - **TFTP (port 69)** right after DHCP Ack → bootloader loading files. 2. If you **never** see Discover or TFTP, only Request/Reply after the OS is up, then the bootloader is either not attempting network boot or is giving up (e.g. link not ready, timeout) and booting from eMMC. Try a full power-off (mains or PSU), wait 10 s, then power on with tcpdump already running. 3. Confirm **BOOT_ORDER=0x21** on the device (network first) and that Ethernet is connected before power-on. --- ## reTerminal DM: serial console vs USB boot (rpiboot) **The serial console is not on the same USB as rpiboot.** | Port / interface | Purpose | |------------------|--------| | **USB Type-C** (next to boot-mode switch) | Power, and **rpiboot** when eMMC is disabled (USB device mode). No serial console here. | | **40-pin GPIO header** (UART) | **Serial console.** Use a USB‑to‑serial adapter; connect its **RX** to **GPIO 14 (Pin 8)**, **GND** to **GPIO 15 (Pin 10)** or any GND. | **Baud rate:** - **Bootloader (BOOT_UART=1):** use **115200** 8N1. This is the Pi EEPROM/bootloader debug output (network boot attempts, DHCP, TFTP, errors). - **OS serial login:** some Seeed docs use **9600** for getty; many Pi images use **115200**. If you only care about bootloader messages, use **115200**. So: use the **same USB‑C cable** only for power and rpiboot. For serial console, use a **USB‑to‑serial adapter** on the **GPIO header** at **115200** to see bootloader output. --- ## Serial shows "Boot mode: SD (01)" and no network attempt If the bootloader serial output shows something like: ```text Boot mode: SD (01) order 2 ``` and you **never** see a line about network (e.g. "Trying DHCP", "TFTP", or "Boot mode: NET (02)"), then the bootloader is **not** attempting network boot for this boot. It goes straight to SD/eMMC (01). That matches “no DHCP during boot, only after OS”. **Possible causes:** 1. **BOOT_ORDER not applied or not read** From the running OS, confirm: `sudo vcgencmd bootloader_config` and check that `BOOT_ORDER=0x21` (and optionally `NET_BOOT_MAX_RETRIES`, `DHCP_TIMEOUT`, `TFTP_IP`). If you see different or missing values, the EEPROM config in use at boot may be different (e.g. old EEPROM, or update not applied on cold boot). 2. **Network tried but failed before any DHCP** The bootloader may try network, fail very early (e.g. no link, or timeout before sending DHCP), then fall back to SD without printing a “Trying network” line. Slower link-up (switch, cable) can cause this. Increasing `DHCP_TIMEOUT` and `NET_BOOT_MAX_RETRIES` (and setting `TFTP_IP`) gives the best chance. 3. **CM4 / carrier quirk** On some CM4 carriers the bootloader may skip or shorten the network attempt. Serial is the only way to see what it actually does; if you never see any network-related line, treat it as “network not attempted” for that boot. **What to try:** - Re-apply EEPROM config with network first and timeouts (as in NETWORK-BOOT-TROUBLESHOOTING), then **full power cycle** (unplug power 10+ s, then power on) with serial connected. Watch from the first character for any “NET”, “DHCP”, “TFTP” or “order” line. - For a one-off test you can set `BOOT_ORDER=0x2` (network only). If network fails, the device won’t boot (no fallback to SD). Use only to confirm whether the bootloader tries network and what it prints; then set back to `0x21`. If the full serial log never shows "NET", "DHCP", or "TFTP" and goes straight to "Boot mode: SD (01) order 2", trying `BOOT_ORDER=0x2` (network only) once will force a network attempt and should produce DHCP/TFTP messages on serial. --- ## Boot stops after start4.elf ("PCI0 reset" then nothing) ### What’s actually going on The **EEPROM bootloader** only does TFTP for config.txt, start4.elf, and fixup4.dat. It then **starts the GPU firmware (start4.elf)** and **stops the network**. The **kernel and initrd are loaded by the GPU firmware**, not by the EEPROM: after “Starting start4.elf”, the GPU is supposed to bring the network back up and TFTP kernel8.img, cmdline.txt, and initrd.img. If you never see TFTP for kernel8.img/initrd.img and the log stops at “PCI0 reset”, the GPU stage is not doing that. Common causes: 1. **Config not seen by the GPU** — The config the EEPROM fetched (e.g. from `0d1ddbda/config.txt`) must contain `kernel=kernel8.img` and `initramfs initrd.img followkernel`. If that file was a symlink or truncated, the GPU may not see those lines. Use a **real copy** of the full config in the serial dir (see ensure script below). 2. **No visibility into the GPU** — The EEPROM logs stop at “PCI0 reset”; the next step is inside the GPU firmware. To see GPU messages (e.g. network bring-up, TFTP, or errors), add **`uart_2ndstage=1`** to config.txt so the GPU logs to the UART. Then power-cycle and watch for lines like `MESS:... genet: LINK STATUS` or TFTP activity. 3. **Firmware/board quirk** — On some boards or firmware versions the GPU netboot path can fail silently. Ensuring the latest Pi 4/CM4 boot files in the TFTP root and trying **start4cd.elf** + **fixup4cd.dat** (or leaving defaults) is worth a try. If the serial log shows **TFTP** for config.txt, start4.elf, fixup4.dat, then **"Starting start4.elf"**, **"Stopping network"**, **"PCI0 reset"**, and **no** TFTP requests for **kernel8.img** or **initrd.img**, use the checks below. **Fix on the LXC:** ensure `/srv/tftpboot/config.txt` contains (and that `0d1ddbda/config.txt` is a real copy with the same content): ```ini enable_uart=1 kernel=kernel8.img initramfs initrd.img followkernel uart_2ndstage=1 ``` `enable_uart=1` is required for the kernel serial console when netbooting (otherwise the firmware can set 8250.nr_uarts=0). `uart_2ndstage=1` makes the GPU firmware log to the UART so you see **MESS:** lines after "PCI0 reset" (e.g. network bring-up, TFTP, or errors). You can run: ```bash # On the LXC (or from your machine) ssh root@ 'bash -s' < emmc-provisioning/scripts/ensure-tftpboot-config-kernel-initrd.sh ``` Also ensure the TFTP root has **kernel8.img** and **initrd.img** (and the serial subdir has symlinks or copies). Then power-cycle the device; you should see TFTP_GET for kernel8.img and initrd.img, then the kernel and initramfs (e.g. rescue shell or provisioning client) run. **If it still stops after “PCI0 reset”:** - Add **`uart_2ndstage=1`** to the TFTP config.txt (root and serial copy). Re-run the ensure script so the serial dir gets the updated config, then power-cycle. Watch the serial log for **MESS:** lines from the GPU (e.g. `genet: LINK STATUS`, TFTP, or errors). That shows whether the GPU is bringing the network up and trying to load the kernel. - On the LXC, confirm the config the device gets has the right size and content: `ssh root@ 'wc -c /srv/tftpboot/0d1ddbda/config.txt && grep -E "kernel|initramfs|uart_2ndstage" /srv/tftpboot/0d1ddbda/config.txt'` --- ## Kernel loads but serial stops at "Baud rate change done" (no rescue shell) If you see the GPU load kernel8.img and initrd.img, then **"Baud rate change done..."** and nothing else (no rescue shell, no kernel messages), the kernel is likely hanging very early because of a **missing or invalid Device Tree**. The GPU log may show **`dterror: Failed to load Device Tree file '?'`**. The GPU loads files from the **serial-prefix** dir (e.g. `0d1ddbda/`). If the **.dtb** files (e.g. `bcm2711-rpi-cm4.dtb`, `bcm2711-rpi-cm4-io.dtb`) are only in the TFTP root and not in that dir, the firmware can fail to load the right DTB and the kernel gets no valid device tree. **Fix:** Ensure the TFTP root has the Pi 4/CM4 DTB files (from the [Raspberry Pi firmware](https://github.com/raspberrypi/firmware) `boot/` folder) and that each **serial-prefix** dir has symlinks to them. Re-run the ensure script (it now links `*.dtb` into each serial dir): ```bash ssh root@ 'bash -s' < emmc-provisioning/scripts/ensure-tftpboot-config-kernel-initrd.sh ``` If the TFTP root has no `*.dtb` files, populate it from the Pi firmware (e.g. run `populate-tftpboot-from-git.sh` or copy `bcm2711-rpi-cm4.dtb`, `bcm2711-rpi-cm4-io.dtb`, and other `bcm2711*.dtb` from the firmware repo into `/srv/tftpboot`), then run the ensure script again and power-cycle the device. **Serial stops at "Baud rate change done" (no kernel/initramfs output):** On Pi 4/CM4 netboot, the firmware can force **8250.nr_uarts=0**, which disables the kernel serial driver so you get no console after the GPU handoff ([raspberrypi/firmware#1575](https://github.com/raspberrypi/firmware/issues/1575)). The workaround is **`enable_uart=1`** in config.txt (within the first 4KB). The ensure script adds it; re-run the script so the root and serial-prefix configs have it, then power-cycle. Keep serial at **115200** baud. --- ## TFTP "file .../SERIAL/start4.elf not found" — serial-number prefix The Pi bootloader may request files under a path named after the board serial number (e.g. `0d1ddbda/start4.elf`). If the TFTP root has no such subdirectory, those requests fail and the bootloader falls back to the root (e.g. `start4.elf`). To avoid "not found" for the first requests, on the LXC create the serial directory and symlink the boot files: ```bash # On the LXC (replace 0d1ddbda with your Pi's serial from vcgencmd or serial output) mkdir -p /srv/tftpboot/0d1ddbda cd /srv/tftpboot/0d1ddbda for f in start4.elf start4cd.elf start.elf fixup4.dat fixup4cd.dat config.txt cmdline.txt kernel8.img initrd.img; do [ -f ../$f ] && ln -sf ../$f $f done ``` After that, the bootloader’s first TFTP requests succeed. The device already had this directory created for serial `0d1ddbda`. --- ## Stuck in network-only boot (BOOT_ORDER=0x2): get back to Raspbian and change boot order If you set **BOOT_ORDER=0x2** (network only) for testing, the device will never try eMMC. To get back to Raspbian and set **BOOT_ORDER=0x1** or **0x21**, use **rescue mode**: the network boot chain loads the provisioning initramfs; with a special kernel cmdline it drops to a shell so you can mount eMMC and run **rpi-eeprom-config** from the eMMC install. ### Prerequisites - **Initramfs with rescue support** — Build the initramfs (it includes `/rescue-eeprom.sh`) and copy it to the LXC TFTP root and into the serial dir: ```bash cd emmc-provisioning/network-boot-initramfs && ./build.sh scp initrd.img root@:/srv/tftpboot/ ssh root@ 'cp /srv/tftpboot/initrd.img /srv/tftpboot/0d1ddbda/ 2>/dev/null || true' ``` - **TFTP config** — Ensure `/srv/tftpboot/config.txt` (and thus `0d1ddbda/config.txt` if it’s a symlink) has `kernel=kernel8.img` and `initramfs initrd.img followkernel` so the full kernel+initrd chain runs. ### Steps 1. **On the LXC**, enable rescue for this device by serving a cmdline that includes **provisioning_rescue=1**. The Pi loads `0d1ddbda/cmdline.txt`; replace that with a **real file** (not a symlink) so this device gets the rescue cmdline: ```bash # On the LXC (replace 0d1ddbda with your Pi serial if different) CD="/srv/tftpboot/0d1ddbda" rm -f "$CD/cmdline.txt" # Same as root cmdline plus rescue flag (one line, space-separated) cat /srv/tftpboot/cmdline.txt | tr '\n' ' ' > "$CD/cmdline.txt" echo -n ' provisioning_rescue=1' >> "$CD/cmdline.txt" echo >> "$CD/cmdline.txt" ``` 2. **Power on the reTerminal** (or reboot). It will network boot, load kernel + initramfs, and **rescue mode** will start a shell (serial or console). You should see: `=== RESCUE MODE (provisioning_rescue=1) ===` 3. **In the rescue shell**, run the helper to mount eMMC and run the EEPROM config from the eMMC install: ```bash /rescue-eeprom.sh ``` In the editor that opens, set **BOOT_ORDER=0x1** (eMMC only) or **0x21** (network first, then eMMC). Save and exit the editor. 4. **Reboot** from the rescue shell: ```bash reboot ``` The bootloader will apply the EEPROM update and on the next boot use the new order (eMMC only with 0x1, or network then eMMC with 0x21). 5. **On the LXC**, restore normal cmdline for the device so the next network boot runs the provisioning client, not rescue: ```bash rm -f /srv/tftpboot/0d1ddbda/cmdline.txt ln -s ../cmdline.txt /srv/tftpboot/0d1ddbda/cmdline.txt ``` See also **NETWORK-BOOT-LXC.md** for setup and monitoring.