Introduce a new section in NETWORK-BOOT-TROUBLESHOOTING.md addressing issues where boot stops after start4.elf, detailing necessary config.txt settings for kernel and initramfs. Include instructions for ensuring the presence of required files in the TFTP root and a command to verify configurations on the LXC. Update initrd.img to reflect changes in the network boot process.
13 KiB
Network boot troubleshooting: no DHCP/TFTP during boot, only after OS is up
If you run tcpdump during power-on but see no DHCP/TFTP traffic during boot, and only see traffic after the device has booted to the OS, the reTerminal is almost certainly not on the same L2 segment as the LXC's eth1.
What’s going on
- The Pi’s bootloader (EEPROM) sends DHCP Discover on the Ethernet port when it tries network boot.
- That request only reaches interfaces on the same VLAN / same bridge (same cable/switch segment).
- dnsmasq in the LXC listens only on eth1 (provisioning LAN).
- If the reTerminal is plugged into the main office LAN (or the same segment as the LXC’s eth0), the netboot DHCP never reaches eth1 — so you see no DHCP/TFTP on eth1 during boot.
- After the OS boots, it uses the same Ethernet port and gets an IP from the main LAN; you then see traffic (e.g. on eth0 or from the device’s new IP). That’s why you only see traffic “after the device boots to OS”.
What to do
1. Confirm which interface sees the boot-time DHCP
On the LXC, run tcpdump on both interfaces in two terminals (or run one in background):
# Terminal 1: provisioning LAN (where netboot should happen)
tcpdump -i eth1 -n -e port 67 or port 68 or port 69
# Terminal 2: WAN / main LAN
tcpdump -i eth0 -n -e port 67 or port 68 or port 69
Then power off the reTerminal and power it on. Watch where DHCP (and TFTP) appear:
- If you see DHCP only on eth0 during boot → the reTerminal is on the same segment as eth0, not eth1. So netboot is not using your LXC’s dnsmasq; the device may get an IP from another DHCP server and fall back to eMMC boot.
- If you see DHCP on eth1 during boot → the reTerminal is on the provisioning segment; you should then see TFTP (port 69) as well.
2. Fix: put the reTerminal on the same segment as eth1
- The reTerminal’s Ethernet cable must be connected to the provisioning segment: the same VLAN or bridge as the LXC’s eth1 (e.g. 10.20.50.0/24).
- On Proxmox, eth1 is often on a dedicated bridge (e.g.
vmbr1). The reTerminal must be plugged into a switch port that belongs to that same bridge/VLAN. - If you have one physical switch: either put the LXC’s eth1 and the reTerminal in the same VLAN, or use a dedicated “provisioning” port group / switch.
3. Sanity check: same port as reTerminal
- Plug a laptop (or another device) into the same port (or same VLAN) as the reTerminal.
- Run:
sudo dhclient -v <interface>(or let it get DHCP automatically). - If you get an IP in 10.20.50.x → that segment is your provisioning LAN (eth1); the reTerminal should netboot from there.
- If you get a different range (e.g. 192.168.x.x) → that segment is not the provisioning LAN; move the reTerminal’s cable or VLAN to the segment where 10.20.50.x is served.
Summary table
| Symptom | Likely cause | Action |
|---|---|---|
| No DHCP/TFTP on eth1 during boot; traffic only after OS | reTerminal on different segment than eth1 | Plug reTerminal into same VLAN/bridge as LXC eth1 (provisioning LAN) |
| DHCP on eth0 during boot, none on eth1 | reTerminal on same segment as eth0 | Move reTerminal to provisioning segment (same as eth1) |
| No DHCP on any interface during boot | Cable unplugged, BOOT_ORDER not 0x21, or device not attempting netboot | Check cable, confirm BOOT_ORDER=0x21, power cycle with cable in before power |
I only see DHCP Request/Reply, and the client already has 10.20.50.x
If your tcpdump on eth1 shows something like:
0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 88:a2:9e:xx:xx:xx
10.20.50.1.67 > 10.20.50.147.68: BOOTP/DHCP, Reply
that is not the bootloader — it is the OS DHCP client (renewal or re-request). The client already has 10.20.50.147, so this happens after the device has booted to the OS.
- Bootloader (network boot): sends DHCP Discover (client 0.0.0.0, no IP yet), then you see Offer, Request, Ack, then TFTP (port 69) for start4cd.elf, kernel, etc.
- OS: sends DHCP Request (renew/rebind, often already with an IP or requesting a known one), then Reply — no Discover, no TFTP.
So the device is on the right segment (eth1, 10.20.50.x). The problem is that you are not seeing the bootloader’s DHCP/TFTP during the first seconds after power-on.
What to do:
- Start tcpdump before power-on
Runtcpdump -i eth1 -n -e port 67 or port 68 or port 69on the LXC, then power off the reTerminal, wait a few seconds, and power it on. Capture from the first second. Look for:- Discover (client 0.0.0.0 → broadcast) at the very start → that’s the bootloader.
- TFTP (port 69) right after DHCP Ack → bootloader loading files.
- If you never see Discover or TFTP, only Request/Reply after the OS is up, then the bootloader is either not attempting network boot or is giving up (e.g. link not ready, timeout) and booting from eMMC. Try a full power-off (mains or PSU), wait 10 s, then power on with tcpdump already running.
- Confirm BOOT_ORDER=0x21 on the device (network first) and that Ethernet is connected before power-on.
reTerminal DM: serial console vs USB boot (rpiboot)
The serial console is not on the same USB as rpiboot.
| Port / interface | Purpose |
|---|---|
| USB Type-C (next to boot-mode switch) | Power, and rpiboot when eMMC is disabled (USB device mode). No serial console here. |
| 40-pin GPIO header (UART) | Serial console. Use a USB‑to‑serial adapter; connect its RX to GPIO 14 (Pin 8), GND to GPIO 15 (Pin 10) or any GND. |
Baud rate:
- Bootloader (BOOT_UART=1): use 115200 8N1. This is the Pi EEPROM/bootloader debug output (network boot attempts, DHCP, TFTP, errors).
- OS serial login: some Seeed docs use 9600 for getty; many Pi images use 115200. If you only care about bootloader messages, use 115200.
So: use the same USB‑C cable only for power and rpiboot. For serial console, use a USB‑to‑serial adapter on the GPIO header at 115200 to see bootloader output.
Serial shows "Boot mode: SD (01)" and no network attempt
If the bootloader serial output shows something like:
Boot mode: SD (01) order 2
and you never see a line about network (e.g. "Trying DHCP", "TFTP", or "Boot mode: NET (02)"), then the bootloader is not attempting network boot for this boot. It goes straight to SD/eMMC (01). That matches “no DHCP during boot, only after OS”.
Possible causes:
-
BOOT_ORDER not applied or not read
From the running OS, confirm:
sudo vcgencmd bootloader_config
and check thatBOOT_ORDER=0x21(and optionallyNET_BOOT_MAX_RETRIES,DHCP_TIMEOUT,TFTP_IP). If you see different or missing values, the EEPROM config in use at boot may be different (e.g. old EEPROM, or update not applied on cold boot). -
Network tried but failed before any DHCP
The bootloader may try network, fail very early (e.g. no link, or timeout before sending DHCP), then fall back to SD without printing a “Trying network” line. Slower link-up (switch, cable) can cause this. IncreasingDHCP_TIMEOUTandNET_BOOT_MAX_RETRIES(and settingTFTP_IP) gives the best chance. -
CM4 / carrier quirk
On some CM4 carriers the bootloader may skip or shorten the network attempt. Serial is the only way to see what it actually does; if you never see any network-related line, treat it as “network not attempted” for that boot.
What to try:
- Re-apply EEPROM config with network first and timeouts (as in NETWORK-BOOT-TROUBLESHOOTING), then full power cycle (unplug power 10+ s, then power on) with serial connected. Watch from the first character for any “NET”, “DHCP”, “TFTP” or “order” line.
- For a one-off test you can set
BOOT_ORDER=0x2(network only). If network fails, the device won’t boot (no fallback to SD). Use only to confirm whether the bootloader tries network and what it prints; then set back to0x21. If the full serial log never shows "NET", "DHCP", or "TFTP" and goes straight to "Boot mode: SD (01) order 2", tryingBOOT_ORDER=0x2(network only) once will force a network attempt and should produce DHCP/TFTP messages on serial.
Boot stops after start4.elf ("PCI0 reset" then nothing)
If the serial log shows TFTP for config.txt, start4.elf, fixup4.dat, then "Starting start4.elf", "Stopping network", "PCI0 reset", and no TFTP requests for kernel8.img or initrd.img, the bootloader is not loading the kernel. That usually means config.txt in the TFTP root does not have the kernel and initramfs lines.
Fix on the LXC: ensure /srv/tftpboot/config.txt contains (and that 0d1ddbda/config.txt is a symlink to it or has the same content):
kernel=kernel8.img
initramfs initrd.img followkernel
You can run:
# On the LXC (or from your machine)
ssh root@<LXC-IP> 'bash -s' < emmc-provisioning/scripts/ensure-tftpboot-config-kernel-initrd.sh
Also ensure the TFTP root has kernel8.img and initrd.img (and the serial subdir has symlinks or copies). Then power-cycle the device; you should see TFTP_GET for kernel8.img and initrd.img, then the kernel and initramfs (e.g. rescue shell or provisioning client) run.
TFTP "file .../SERIAL/start4.elf not found" — serial-number prefix
The Pi bootloader may request files under a path named after the board serial number (e.g. 0d1ddbda/start4.elf). If the TFTP root has no such subdirectory, those requests fail and the bootloader falls back to the root (e.g. start4.elf). To avoid "not found" for the first requests, on the LXC create the serial directory and symlink the boot files:
# On the LXC (replace 0d1ddbda with your Pi's serial from vcgencmd or serial output)
mkdir -p /srv/tftpboot/0d1ddbda
cd /srv/tftpboot/0d1ddbda
for f in start4.elf start4cd.elf start.elf fixup4.dat fixup4cd.dat config.txt cmdline.txt kernel8.img initrd.img; do
[ -f ../$f ] && ln -sf ../$f $f
done
After that, the bootloader’s first TFTP requests succeed. The device already had this directory created for serial 0d1ddbda.
Stuck in network-only boot (BOOT_ORDER=0x2): get back to Raspbian and change boot order
If you set BOOT_ORDER=0x2 (network only) for testing, the device will never try eMMC. To get back to Raspbian and set BOOT_ORDER=0x1 or 0x21, use rescue mode: the network boot chain loads the provisioning initramfs; with a special kernel cmdline it drops to a shell so you can mount eMMC and run rpi-eeprom-config from the eMMC install.
Prerequisites
- Initramfs with rescue support — Build the initramfs (it includes
/rescue-eeprom.sh) and copy it to the LXC TFTP root and into the serial dir:cd emmc-provisioning/network-boot-initramfs && ./build.sh scp initrd.img root@<LXC-IP>:/srv/tftpboot/ ssh root@<LXC-IP> 'cp /srv/tftpboot/initrd.img /srv/tftpboot/0d1ddbda/ 2>/dev/null || true' - TFTP config — Ensure
/srv/tftpboot/config.txt(and thus0d1ddbda/config.txtif it’s a symlink) haskernel=kernel8.imgandinitramfs initrd.img followkernelso the full kernel+initrd chain runs.
Steps
-
On the LXC, enable rescue for this device by serving a cmdline that includes provisioning_rescue=1. The Pi loads
0d1ddbda/cmdline.txt; replace that with a real file (not a symlink) so this device gets the rescue cmdline:# On the LXC (replace 0d1ddbda with your Pi serial if different) CD="/srv/tftpboot/0d1ddbda" rm -f "$CD/cmdline.txt" # Same as root cmdline plus rescue flag (one line, space-separated) cat /srv/tftpboot/cmdline.txt | tr '\n' ' ' > "$CD/cmdline.txt" echo -n ' provisioning_rescue=1' >> "$CD/cmdline.txt" echo >> "$CD/cmdline.txt" -
Power on the reTerminal (or reboot). It will network boot, load kernel + initramfs, and rescue mode will start a shell (serial or console). You should see:
=== RESCUE MODE (provisioning_rescue=1) === -
In the rescue shell, run the helper to mount eMMC and run the EEPROM config from the eMMC install:
/rescue-eeprom.shIn the editor that opens, set BOOT_ORDER=0x1 (eMMC only) or 0x21 (network first, then eMMC). Save and exit the editor.
-
Reboot from the rescue shell:
rebootThe bootloader will apply the EEPROM update and on the next boot use the new order (eMMC only with 0x1, or network then eMMC with 0x21).
-
On the LXC, restore normal cmdline for the device so the next network boot runs the provisioning client, not rescue:
rm -f /srv/tftpboot/0d1ddbda/cmdline.txt ln -s ../cmdline.txt /srv/tftpboot/0d1ddbda/cmdline.txt
See also NETWORK-BOOT-LXC.md for setup and monitoring.