Hello everyone and welcome to our Logic Technology Roundtable in cooperation with Tuxera on Techniques to Safeguard critical data for IoT.
My name is Gevorg Melikdjanjan. At Logic Technology I am responsible for Tuxera’s flash file system and flash management solutions and I will be today’s moderator.
Our guest for the Roundtable is Thom Denholm – Technical Product Manager at Tuxera.
In today’s growing number of autonomous systems, preserving decision-quality data is becoming all the more imperative.
Through different use-cases, environmental circumstances or simply when devices become older, problems may come to the surface that were not visible before. This is not acceptable for devices containing sensitive data.
Therefore, through the course of this roundtable we hope to learn more about the techniques on how to protect critical systems from data-corruption in embedded devices where power loss may occur.
“So let’s start our conversation Thom – what do you think is the most important technique”?
Thanks Gevorg.
I’ve been working with file systems specifically since 2000, and DOS system calls before that.
In my experience, the number one cause of data loss is overwriting that data.
“Seems simple, but I’m sure there is more to it”
True, but hear me out. In order to modify a file, you have to overwrite some existing data. If the power is lost, you have no idea what state that block is in. Did all of your new data get to the media? Some of it, or none of it? So that file has to be considered corrupted, or at least require examination by the person or program that created it – and examination of potentially corrupted data isn’t in the normal job description.
“So what if you don’t modify files?”
Not modifying files, or changing the way you use the file system, is an option for that user data – but no guarantee overall. For the use case of appending to a file, there is still a modification happening – to the file metadata. The same problem occurs – if power is interrupted while updating that block, what next? Is it done, partial, or undone? Now we have to rely on the file system to decide what is a good file and what isn’t – again, that usually isn’t in the design. If the file is in a sub-folder, another metadata update is there for the higher level folder, etc.
These are single-block updates, so these can be guaranteed with what is known as atomic operations at the media level. Either the entire block is written or none of it – a much more predictable situation.
“So an atomic block update and no overwrite solves the problem?”
Yes, in a way. Changing the way you work and choosing the right media can help. Did you also provide the application, and is it written the way you expect? There is another way, however – transaction points.
The simple way to describe a transaction point is a database. When you perform a series of operations on a database that are designed to go together, they either all happen or none of them do. That’s an atomic operation, just like we were talking about earlier.
A file system can perform a series of operations, then issue a transaction point. Once that transaction point is successful, all of the operations happen. If power is lost, none of the operations have occurred. To do this, the file system cannot overwrite the existing data, the “known good” data.
Speaking of power loss scenarios: Perhaps an interesting question for the listeners: what kind of measures have been taken by your development team in order to protect the data stored in your applications in the case of a sudden power-loss? You could be using a non-volatile memory or a battery back-up system or maybe some other solution.
Please type in your comments in the chat box. We will address them at the end of the talk.
“So Thom, to come back to your point about the transaction point settings in a file system, this is regarding the design of the Reliance family of file systems, right?”
Yes. The Reliance family uses a master block and two meta roots to track the state of which system state is the working state and which is the known good state. These few sectors take up much less space than a linux file system journal, and yet provide much more. Here are three common operations, we will collect them into one atomic operation by using a transaction point when complete. Once that is done, the erased file is now listed in a free memory pool, the metadata and another block have moved to a new location in this newly modified file, and this new file has been created at the end. Notice the file system didn’t overwrite the original block locations – it cannot on NAND flash anyhow. If the power was lost or the system crashed before the transaction point was complete, the system would start up to the original situation. No chkdsk or journal scan and rebuild is necessary.
“Alright, thank you. I’d like to come back to the list of techniques. What is copy-on-write?”
To talk about that, I need to go into detail about NAND flash media, and that could take the rest of our time. Briefly, though
((next slide))Visualizing NAND Blocks
This is an abstract view of NAND media, with two important objects – the write page and the erase block.
(click) The minimum size you can erase is an entire erase block. I won’t go into the circuit-level details, but this is a slow process
(click) The minimum size you can write is a full write page.
This write page can’t be overwritten, or appended to. Any update means replacing it with a new one, then marking the original obsolete. This is known as “copy on write”.
So a file system which is also copy-on-write makes a good match with NAND flash based media.
“This is handled by the firmware”
That’s right, the firmware for managed flash memory like eMMC, UFS, SD and SSD. For raw NAND flash, we have software which does this – FlashFX Tera.
“And It takes care of everything?”
Not quite. Wear leveling, bad block management, and customization of things like block size and error handling. The file system can still cause atrocious write amplification by writing small byte counts to large write pages.
One of the most important considerations of flash media is that it doesn’t last forever. Unlike other media, it has a very predictable lifetime. Correctible bit errors grow and the media uses replacement blocks until, after roughly 10,000 cycles, the bit errors become to frequent to be corrected.
Lifetime comes back to how frequently you write, and how many of those writes are complete vs partial. Put another way, the goal is Low write amplification for the highest lifetime. Tuxera file systems are designed for minimal write amplification, and we really do utilize a flash friendly operation at all levels.
“Do you have an interesting example of a use-case where you have extended the lifetime of a device?
We do, example of power-meter use-case. This customer was doing lietime testing on their existing Linux ext.4 file system which gave them 18 years of lifetime. Their requirement was 20 years.
So they applied Tuxera’s Reliance Nitro file system for the same tests, and out of the bos it have them 22 years lifetime.
Consequently, Tuxera’s support team thought they could do better, so they made some adaptations to the file system and eventually ended up with more than 30 years of lifetime.
Remember that their requirement was 20 years and with Nitro they ended up having more than 30 years of simulated lifetime.
Thanks for that example, so let’s go back to the wear-related failures. I think this is where you left-off.
That is a problem with wear-related failures, yes, but it also comes back to complete system integration.
The folks at Tesla designed a solution that they thought would last 10 years, using a standard SOC that met their needs. The recent report demonstrates that they didn’t have a complete picture of how frequently the OS and applications would write log data. Those small packets were flushed immediately, compounding the problem by creating large write amplification.
They tried some short-term solutions, generating less log data and reducing the need to flush, but the damage was already done. The flash media on a Model S with 70% of usable lifetime used is heavily worn, and the 3 years originally predicted for that part aren’t going to last 8 years, no matter what the software does.
Later designs incorporated a larger part (with effectively more lifetime in “terabytes written”) and a better profile for data logging operations – or in other words, the complete solution we are talking about here.
“What are some of the other integration points”
Testing is a major factor, and a design which takes those tests into account. That’s the process direction of Automotive SPICE, which we use for our Reliance Edge Assure product. This provides complete traceability from the design to the code to the test suite, with every line tested.
Working closely with the board or SOC manufacturer is another important point. Logic provides boards from XXXXXXXXXX, YYYYYYYYY, and helps make sure that the software and hardware is integrated.
Validation of the design can be provided by LDRA.
“The last item is security”
We’ve been talking about safeguarding critical data, and really all data is critical data to someone. Secure data, though, is an entirely different matter. Now we get into legal requirements and jurisdictions, and that data has to be managed properly.
At the simple level that could mean encryption. Really encryption is not the complete solution. No lasting protection for it through obscurity.
“So what level of encryption does the Reliance FS family provide?”
Currently we have an encryption plug-in. and at Linux we work with encryption on two different layers: DMcrypt(). You can use this to encrypt the entire block device. We provide a chance to do that with Linux raw flash.
Linux file systems don’t have an option for DMcrypt() because they don’t provide a block device.
At the file system layer, Android folks at Google have been focusing on fscrypt(). Which means you can encrypt separate folders, separate files within the FS. In fact, you can have different encryptions for different subfolders. We programmed that into the Reliance Nitro, although they have updated it a couple of times with different kernel releases, so that’s something we have to keep up with. Ultimately we are one of 5 different file systems on Linux that support fscrypt() today.
“Is that encryption only at directory level?”
Directories are sub-folders, and my understanding is that you encrypt every file within the folder.
But you could tree it up. So you could have an encrypted folder, unencrypted folder, or a different encrypted folder which has a different encryption key with it.
“So is this only available for Linux OS?”
That’s correct, currently there are Linux libraries, the fscrypt() library and again, we have adapted the file system to make a call to the encryption library to do that work for us.
That same call is available on non-Linux ports. Reliance Nitro and Reliance Edge for that matter.
There are hardware design that have encryption at a hardware level. So there’s an actual hardware chip that manages encryption. If you had a driver that worked with that hardware layer, we could in a services project incorporate that driver into the FS. So the FS which already has the hooks to call out, could then call out to the hardware chip on your design.
No general solution yet because there is no general interface yet for those drivers.
“So speaking of ports, what if you implement the FS into a Linux OS and after a while you switch to a different operating system but still want to use the same Reliance FS, how does that work?”
At a simple level, the core of that FS we write is portable across all of the different OS ports that we have. For instance, Wind River VxWorks and Linux use the same core code. They actually use the same on media format as well. So if a design was originally done on Linux, but then they decide to switch to VxWorks, or vice-versa, they could change the implementation at the driver level, at their OS level and they wouldn’t need to change what’s on the stored media.
The other factor there is for instance if you have a hypervisor solution or a share storage solution, where you were writing to a given block of data but from two different applications or designs, one could be Linux one could be VxWorks or GHS Integrity, they could all share the same common format. Before the only option to do that was to use a standard format like say: exFAT. The problem is that exFAT does not provide the data integrity that the Reliance family of file systems provide.
“So with a hypervisor, can you still use the FFX Tera in order to communicate with the flash?”
You could.
I think what you would end up with in that design, would be one of the hypervisor applications being entirely devoted to the flash. Flash media really requires only one application to control it. Because you’ll end up writing and then you’ll pull it and wait for something to finish.
So if you had two different applications writing they might do different things and they start pulling and one would catch the finish of the other one. You just can’t do that. So really, the control at the FFX Tera level would probably be in one thread, and then that would provide a common block interface that then anything could use at that point. It’s just a block device like a hard drive or an SSD.
“Okay thank you Thom, I think we left off at the part of security where the data has to be securely removed”
So we talked about security at the encryption of the files and protection of the files, but we are also focused on security in removing that secure data. When you need to remove it, it needs to go now. You can’t remove it, queue that up and maybe a day later it gets removed or even an hour later, because the design could be in somebody else’s hands at that point.
I’m speaking on this topic in the Embedded World conference in a few hours. If you don’t have a conference pass, but this topic is relevant to you, please reach out to our team here at Logic. We can provide you with the whitepaper I wrote on this topic, and I would be glad to meet with your design team to go over the points in my presentation.
Thank you very much Thom for the interesting presentation. I hope it has provided some useful insights to the people attending this talk today.
So now let’s check if we have some questions in the chat box. Earlier in the talk I asked if you could share your experiences on measures that you have in place in case of a power-loss. You may also unmute your mic and simply comment or ask a question, if you want. I see that there are no further questions.
I want to thank everyone for their attendance. Thank you Thom Denholm for joining us.
If you haven’t signed up for one of our other sessions, feel free to do so and if you happen to have any questions after this session or want to receive more information, you can contact us.
Once again thank you for your time and for being with us and hope to speak to you soon!