Boot process (suggestion)

FIXME

Errors during flashing or faulty kernel or root filesystem (RFS) images usually result in an unusable (“bricked”) system. To avoid this, two versions of the kernel and RFS are stored in the flash memory. One kernel and one RFS belong together and are referred to as “bank”. The two banks are identified as A and B.

From which bank should be booted is determined during the boot process. Usually from the so-called “current bank”. From there the kernel is copied to RAM, the kernel arguments for the RFS are determined (root=…) and the kernel is started.

During a firmware update always the bank which is currently not running is flashed. This avoids the problem that would arise without the mechanism of the two banks that the flash is overwritten, which is currently mounted as the file system by the kernel.

After flashing the current bank will be changed and rebooted.

If the new firmware proves faulty, is will automatically be switched back to the old bank and this one booted. This ensures that a faulty update does not make the system unusable.

  1. U-Boot initialises the hardware
  2. U-Boot executes the macro in $bootcmd, usually run boot_flash by default
  3. First the variable $failCount is checked. If the value of this variable is 0, the current bank is booted, as contained in the variable $curBank (A or B)
  4. If the value of $failCount is greater than 1, the value is decremented by one and the current bank is also booted
  5. If the variable $failCount contains the vaulue 1, it is set to 0 and $curBank is set to the other bank
  1. Once the application is booted, it must decide whether everything is OK e.g., whether all processes are running, whether there are data inconsistencies, etc.
  2. If everything is OK, the U-Boot environment variable $failCount is set to 0 using the call boot-control mark-active-bank-good
  3. If an error occurs, the application can immediately force to switch to the other, hopefully still working bank with boot-control set-other-bank-active
  4. If the application does not even get so far because, for example the kernel already crashed, the “count down” mechanism as described above gets active and then finally switches to the other bank
  1. The bank on which the new images will be installed is determined by boot-control other-bank $(boot-control get-running-bank)
  2. install-img <kernel-img> kernel $bank and install-img <rootfs-img> rootfs $bank install the images
  3. Now the active bank is switched: boot-control mark-active-bank $bank

The above described is currently implemented for NOR flash-based systems. The following changes and enhancements are necessary at the current TQMa28 system for the mechanisms to work with eMMC flash:

  • The partitioning of eMMC has to be changed
  • The Linux U-Boot environment tools (fs_printenv, fw_setenv) have to be adapted
  • The install mechanism ofthe rootfs has to be changed from dd if=image of=/dev/mtdx to mkfs.ext3 …; mount …; tar -C /mnt/ -xf …
  • In general it is not possible to cache the rootfs archive on the device. This means that the validity of the archive file is only certain after the rootfs has been installed. FIXME consequence???
  • Last modified: 2022/08/04 15:02