Flash write bug fix

Looks like I finally have a fix for the flash write issue, thanks to some help from Espressif. Embarrassingly it turns out to be my fault, but I’m not too proud to admit that. So what was the problem? Since SDK v1.5.1 people have been reporting intermittent failed OTA updates and I’ve been able to reproduce it easily myself. This only seemed to occur when using wifi for long sustained writes.

So what was happening? Occasionally the network receive would get a much larger packet than normal. In my latest testing they are usually around 1436 bytes, but occasionally one would arrive that was 5744 (this is the only large value I have seen myself, but presumably other values could occur). The flash is erased in 4k sectors and the rBoot code erases them as required, checking if the current data block will fit in the current sector and if not erasing the next one too. Where I went wrong was to assume that a receive would always be less than the 4k sector size, having never seen one before SDK v1.5.1 that was anywhere near that large. When one of these very large packets arrives it could span 3 sectors. The second sector would get erased correctly but the third sector would not. The flash write command would not return an error when it tried to write to the non-erased sector, so no fault was noticed at this time. Then on the next write that third sector, now being the first sector for that next write, gets erased and the new data written part way through it (where it should be).

Moral of the story. Don’t make assumptions and don’t ignore the edges cases – I’m usually pretty good at this second point, but occasionally I seem to need reminding. In this case I had thought about it and added code that detected a chunk over 4k that would need more erasing. As I didn’t think this could ever happen I merely put a comment in the if statement to say what would need to be done in that scenario, if I’d thrown an error at that point instead this problem would have been easily diagnosed. However, I did also assume that the flash write would fail if it tried to write to flash that was not erased, so I expected to see an error of some kind.

Why did it suddenly happen at v1.5.1? I don’t know that but presumably Espressif made some change in the SDK that makes this more likely to occur. While playing with some code for Pete Scargill I did manage to reproduce the problem with v1.4.0 so it wasn’t impossible for it to happen there, but I never had any reports of it there previously. I also found that the timers in Pete’s code made it more likely to happen in my testing, so I suspect the extra processing these caused was impacting the performance of the network stack and causing more packets to be bunched together and delivered as larger chunks to the application. Further testing showed RF interference could also cause the same result.

The fix is available in the rBoot GitHub repo, and has also been updated in Sming.

Flash write bug in Espressif SDK v1.5.1+

Users of rBoot have reported OTA problems when using Espressif SDK v1.5.1 which I have been able to reproduce on this version and v1.5.2. The problem appears to be in the sdk flash write functions. While most writes work fine some will occasional fails to fully write (leaving small areas of blank data (0xff)). To make matters worse no error is reported from the write function and attempts to read the flash back for verification immediately afterwards return the correct data (presumably they are serviced by the rom cache). This is very similar to the problem seen when trying to perform an OTA with insufficient power. I once had a similar report from a user using just a 100mA power supply and the problem was fixed by using a proper power supply. This makes me wonder if the new SDK causes the device to draw more power, although supplying extra power does not fix the problem in this case.

I have created a test case that doesn’t use rBoot but simply downloads a file and writes it to flash (which is of course very similar to what rBoot does) to demonstrate the problem to Espressif. I have yet to hear back from them. If you encounter this problem please add your support to my bug report. I have no reason to believe this bug will be limited to rBoot users and I assume SDK bootloader users and anyone else writing to the flash from an application (possibly only in long sustained writes) will also have this problem.

This is a good opportunity to encourage rBoot users to enable the irom checksum option, to help detect badly flashed roms at boot time. See the readme for more information.

New rBoot version allows temporary boot

I’ve just pushed an update to rBoot that allows 2-way communication between rBoot and the running user application. This is something I had though about previously, and I mentioned it in a previous post, but nobody had actually asked for it until a couple of weeks ago. The main use of this is to allow the application to request rBoot to perform a temporary boot to a different rom (i.e. not the one identified in the config, which would normally be booted). This helps to make updating safer, because you can perform an OTA and only temporarily switch to the new rom, until you are happy to update the config and make this the standard. rBoot is already safer for updating than the SDK bootloader, if you enable to irom checksum option, but this new functionality also guards against valid but buggy roms that simply don’t work properly once booted.

How you decide that your rom is good enough to switch to booting it by default is up to you of course. Perhaps if the rom is able to boot, connect to wifi and stays up for 5 minutes, that would be deemed sufficient. Another option would be to have the user manually initiate the change of default rom once they are happy with the way it runs.

You can also get information about the boot from your application such as the boot mode (standard, temporary or GPIO), and the currently running rom. Previously the running rom could be determined by reading the config, but that would not work for a temporary boot.

This new functionality makes use of the ESP8266’s RTC data area and to use it uncomment the #define BOOT_RTC_ENABLED line in rboot.h. See the documentation in the GitHub rBoot repository and the updated sample project for example use.

One other change made at the same time is that GPIO-rom selection is now an optional feature and not compiled in by default. Please enable the appropriate #define in rboot.h if you wish to use this feature.

rBoot now in Sming

rBoot has now been integrated into Sming. This includes rBoot itself (allowing the bootloader to be built alongside a user app) and a new Sming specific rBoot OTA class. The sample app from my GitHub repo has been improved and included under the name Basic_rBoot, it demonstrates OTA updates, big flash support and multiple spiffs images.

The old sample app will be removed from my repo shortly. If you want to make use of rBoot in Sming it’s now easier than ever – just use release v1.3.0 onwards, or clone the Sming master branch, and take a look at the sample project.

rBoot now supports Sming for ESP8266

Although it’s always been possible to use Sming compiled apps with rBoot it wasn’t easy. I’ve shared Makefiles and talked a few people through it previously on the esp8266.com forum, but now there is a new sample project on GitHub to help everyone do it.

The sample demonstrates:

  • Compiling a basic app (similar to the rBoot sample for the regular sdk).
  • Big flash support, allowing up to 4 roms each up to 1mb in size on an ESP12.
  • Over-the-air (OTA) updates.
  • Spiffs support, with a different filesystem per app rom.

Spiffs support depends on a patch to Sming, for which there is a pull request pending to have it included properly. Probably the most common way I envisage this being used is a pair of app roms (to allow for easy OTA updates ) with a separate spiffs file system each, but rBoot is flexible enough to let you lay out your flash however you want to.

Suggested layout for 4mb flash:

0x000000 rboot
0x001000 rboot config
0x002000 rom0
0x100000 spiffs0
0x1fc000 (4 unused sectors*)
0x200000 (2 unused sectors†)
0x202000 rom1
0x300000 spiffs1
0x3fc000 sdk config (last 4 sectors)

* The small unused section at the top of the second mb means the same size spiffs can be used for spiffs0 and spiffs1. The top of the fourth mb (where spiffs1 sits) is reserved for the sdk to store config.
† The small unused section at the start of the third mb mirrors the space used by rBoot at the start of the first mb. This means only one rom needs to be produced, that can be used in either slot, because it will be of the same size and have the same linker rom address.

Memory map limitation affecting rBoot

I’ve become aware of a serious limitation that testing should should have found, if I’d done a bit more of it! The ESP8266 only memory maps the the first 8 Mbit of the SPI flash. rBoot doesn’t use memory mapped flash, it uses SPI reads, so rBoot itself is fine with any size of flash. The problem comes if you try to put an irom section beyond 8 MBit (i.e. from address 0x40300000) – code there won’t be accessible at run time!

I’m quite disappointed about that. It’s something that rBoot can’t overcome, it’s a limitation of the ESP8266 design. You can still use the extra space on larger chips for something, like logging data, or for storing a filesystem for your resources. You can also still use rBoot OTA to flash these resources by dedicating a non-bootable rom slot to them. Your code would need to access them with SPI reads rather than through the memory mapping though.

Update: this problem has been largely mitigated, see here.

rBoot tutorial for ESP8266 – OTA updates

Ok, hope you’re still with me after my previous massive post. Now I’m going to show you how to perform an over-the-air (OTA) update with rBoot. I’ve covered all the background already, so this should be pretty straight forward as long as you have a simple two rom rBoot setup running.

Now add rboot.h, rboot-ota.h and rboot-ota.c to your project and call the rboot_ota_start function. How you invoke the OTA update code is up to you, the sample project on GitHub (now updated) has a simple command line interface over the UART allowing the user to enter the command “ota”.

rboot_ota_start takes an rboot_ota struct with the options for the update.

typedef struct {
	uint8 ip[4];
	uint16 port;
	uint8 *request;
	uint8 rom_slot;
	ota_callback callback;
} rboot_ota;
  • ip is the ip address of the web server to download the new rom from.
  • port is the web server port (usually 80).
  • request is a complete http request which will be sent to the web server, this may seem like a slightly odd way to do it, but it gives you full control over what is sent and it’s the same way the SDK OTA update works.
  • rom_slot is the number of the rom slot on the flash to update, starting at zero. In our two rom example that will be either 0 or 1 (and the opposite to the one we are currently running).
  • callback is a user function that will be called when the update is completed (either success or failure), it is passed a pointer to the rboot_ota structure and a bool to indicate success or failure. This is where you will then switch to the new rom (using rboot_set_current_rom) and restart the device.

Example

static void ICACHE_FLASH_ATTR OtaUpdate_CallBack(void *arg, bool result) {

	char msg[40];
	rboot_ota *upServer = (rboot_ota*)arg;

	if(result == true) {
		// success, reboot
		os_sprintf(msg, "Firmware updated, rebooting to rom %d...\r\n", upServer->rom_slot);
		uart0_send(msg);
		rboot_set_current_rom(upServer->rom_slot);
		system_restart();
	} else {
		// fail, cleanup
		uart0_send("Firmware update failed!\r\n");
		os_free(upServer->request);
		os_free(upServer);
	}
}

static const uint8 ota_ip[] = {192,168,7,5};
#define HTTP_HEADER "Connection: keep-alive\r\nCache-Control: no-cache\r\nUser-Agent: rBoot-Sample/1.0\r\nAccept: */*\r\n\r\n"

static void ICACHE_FLASH_ATTR OtaUpdate() {

	uint8 slot;
	rboot_ota *ota;

	// create the update structure
	ota = (rboot_ota*)os_zalloc(sizeof(rboot_ota));
	os_memcpy(ota->ip, ota_ip, 4);
	ota->port = 80;
	ota->callback = (ota_callback)OtaUpdate_CallBack;
	ota->request = (uint8 *)os_zalloc(512);

	// select rom slot to flash
	slot = rboot_get_current_rom();
	if (slot == 0) slot = 1; else slot = 0;
	ota->rom_slot = slot;

	// actual http request
	os_sprintf((char*)ota->request,
		"GET /%s HTTP/1.1\r\nHost: "IPSTR"\r\n" HTTP_HEADER,
		(slot == 0 ? "rom0.bin" : "rom1.bin"),
		IP2STR(ota->ip));

	// start the upgrade process
	if (rboot_ota_start(ota)) {
		uart0_send("Updating...\r\n");
	} else {
		uart0_send("Updating failed!\r\n\r\n");
		os_free(ota->request);
		os_free(ota);
	}

}

It’s really that simple, that’s all you need to add to your application to be able to perform OTA updates. You might want to put in a version check, so it only updates if there is a new version, but that’s up to you.

Web Server

All that remains is to drop rom0.bin and rom1.bin in the root of your web server. Obviously you can change where it looks for the files with a small tweak to the code above.

rBoot – A new boot loader for ESP8266

As promised here is my new boot loader for the ESP8266 – rBoot.

Advantages over SDK supplied bootloader:

  • Open source (written in C) – this is the big one.
  • Supports any number of roms.
  • Roms can be different sizes.
  • Rom slots can be used for resource storage as well as bootable apps (and benefit from the OTA update system).
  • Can use the full size of the SPI flash (see below).
  • Rom slots can be altered after deployment (with care!).
  • Earlier rom validation (less prone to errors).
  • Can try multiple backup roms (without needing to reboot).
  • Rom selection by GPIO (e.g. hold down a button when powering on to start a recovery rom).
  • Wastes no stack space  (SDK boot loader uses 144 bytes).
  • Documented config structure (easy to configure from user code).

Disadvantages over SDK supplied bootloader:

  • Not compatible with sdk libupgrade (but equivalent source included, based on open source copy shipped with earlier SDKs, so you can easily update your existing OTA app use this new code).
  • Requires you to think slightly more about your linker scripts, rather than just using the pair supplied with the SDK (but it’s not really that difficult – if you’re programming in C it’ll be well within your capabilities).

Problems common to both:

  • You still need to relink user code against multiple different linker scripts depending where you intend to place it on the flash, because the memory mapped position of the .irom0.text section needs to be known at link time. This also prevents you moving roms around at will once they have been compiled.
  • Only 8MBit of flash can be memory mapped at a time (the SDK bootloader allows at most the first 2 x 8Mbit chunks to be used for roms, rBoot doesn’t have this limit, on a 32MBit flash you can have 4 x 8MBit roms), see memory mapping imitation for more details.

Source code

I’ve decided to start putting my source code on GitHub, it’ll be easier to maintain keep my blog tidier.

https://github.com/raburton