3.4 Storage

All embedded systems require at least one form of persistent storage to start even the earliest stages of the boot process. Most systems, including embedded Linux systems, continue to use this same initial storage device for the rest of their operation, either to execute code or to access data. In comparison to traditional embedded software, however, Linux's use imposes greater requirements on the embedded system's storage hardware, both in terms of size and organization.

The size requirements were discussed in Chapter 1, and an overview of the typical storage device configurations was provided in Chapter 2. We will discuss the actual organization further in Chapter 7 and Chapter 8. For the moment, let us take a look at the persistent storage devices supported by Linux. In particular, we'll discuss the level of support provided for these devices and their typical use with Linux.

3.4.1 Memory Technology Devices

In Linux terminology, memory technology devices (MTDs) include all memory devices, such as conventional ROM, RAM, flash, and M-Systems' DiskOnChip (DOC). As explained by Michael Barr in Programming Embedded Systems in C and C++ (O'Reilly), such devices have their own capabilities, particularities, and limitations. Hence, to program and use an MTD device in their systems, embedded system developers traditionally use tools and methods specific to that type of device.

To avoid, as much as possible, having different tools for different technologies and to provide common capabilities among the various technologies, the Linux kernel includes the MTD subsystem. This provides a unified and uniform layer that enables a seamless combination of low-level MTD chip drivers with higher-level interfaces called user modules, as seen in Figure 3-1. These "user modules" should not be confused with kernel modules or any sort of user-land software abstraction. The term "MTD user module" refers to software modules within the kernel that enable access to the low-level MTD chip drivers by providing recognizable interfaces and abstractions to the higher levels of the kernel or, in some cases, to user space.

Figure 3-1. The MTD subsystem

MTD chip drivers register with the MTD subsystem by providing a set of predefined callbacks and properties in the mtd_info structure to the add_mtd_device( ) function. The callbacks an MTD driver has to provide are called by the MTD subsystem to carry out operations such as erase, read, write, and sync. The following is a list of MTD chip drivers already available:

DiskOnChip

These are the drivers for M-Systems' DOC technology. Currently, Linux supports the DOC 1000, DOC 2000, and DOC Millennium.

Common Flash Interface (CFI)

CFI is a specification developed by Intel, AMD, and other flash manufacturers. All CFI-compliant flash components have their configuration and parameters stored directly on the chip. Hence, the software interfaces for their detection, configuration, and use are standardized. The kernel includes code to detect and support CFI chips.

As the CFI specification allows for different commands to be made available by different chips, the kernel also includes support for two types of command sets implemented by two different chip families, Intel/Sharp and AMD/Fujitsu.

JEDEC

The JEDEC Solid State Technology Association (http://www.jedec.org/) is a standardization body. Among its standards are a set of standards for flash chips. It is also responsible for handing out identification numbers for such devices. Although the JEDEC flash standard is rendered obsolete by CFI, some chips still feature JEDEC compliance. The MTD subsystem supports the probing and configuration of such devices.

Non-DOC NAND flash

The most popular form of packaging for NAND flash is M-Systems' DOC devices. There are, however, other types of NAND flash chips on the market. The MTD subsystem supports a number of such devices using a separate driver from the DOC drivers. For a complete list of the devices supported by this driver, look in the include/linux/mtd/nand-ids.h file in the kernel sources.

Old non-CFI flash

Some flash chips are not CFI compliant, and some aren't even JEDEC compliant. The MTD subsystem therefore provides drivers that manipulate such devices according to their manufacturers' specifications. The devices supported in this fashion are non-CFI AMD-compatible flash chips, pre-CFI Sharp chips, and non-CFI JEDEC devices. Keep in mind, however, that these drivers are not updated as frequently as the drivers for more commonly used devices such DOC or CFI memory devices.

RAM, ROM, and absent chips

The MTD subsystem provides drivers to access conventional RAM and ROM chips, mapped in a system's physical address space, as MTD devices. Since some of these chips may be connected to the system using a socket or some similar connector that lets you remove the chip, the MTD subsystem also provides a driver that can be used to preserve the registration order of the MTD device nodes in case one of the devices is removed and is therefore absent from the system.

Uncached RAM

If there is any system RAM that your CPU cannot cache, you can use this memory as an MTD device during normal system operation. Of course, the information stored on such a medium will be lost when the system's power is turned off.

Virtual devices for testing and evaluation

When adding or testing MTD support for your board's devices, you may sometimes want to test the operation of the user modules independently from the chip drivers. To this end, the MTD subsystem contains two MTD drivers that emulate real MTD hardware: a driver that emulates an MTD device using memory from the system's virtual address space, and another that emulates an MTD device using a normal block device.

Since there is no universally agreed upon physical address location for MTD devices, the MTD subsystem requires customized mapping drivers^[14] to be able to see and manage the MTD devices present in a given system. As some systems and development boards have known MTD device configurations, the kernel contains a number of specific mapping drivers for a variety of such systems. It also contains a generic driver for accessing CFI flash chips on systems that have no specific mapping driver. If there are no appropriate mapping drivers for your system's memory devices, you may need to create a new one using existing ones as examples. The existing mapping drivers are found in the drivers/mtd/maps/ directory of the kernel sources.

^[14] A mapping driver is a special kind of MTD driver whose main task is to provide MTD chip drivers with the physical location of the MTD devices in the system and a set of functions for accessing these physical devices.

As with other kernel device drivers, an MTD chip driver can manage many instances of the same device. If you have two identical AMD CFI-compliant flash chips in your system, for instance, they might be managed as separate MTD devices by a single instance of the CFI driver, depending on their setup.^[15] To further facilitate customization of the storage space available in MTD devices, the MTD subsystem also allows for memory devices to be divided into multiple partitions. Much like hard disk partitions, each MTD partition is then accessible as a separate MTD device and can store data in formats entirely different from those of other partitions on the same device. In practice, as we saw in Chapter 2, memory devices are often divided in many partitions, each serving a specific purpose.

^[15] Identical chips placed on system buses are often arranged to appear as a single large chip.

Once the MTD chip drivers are properly configured for a system's memory devices, the storage space available on each MTD device can be managed by an MTD user module. The user module enforces a storage format on the MTD devices it manages, and it provides, as I said above, interfaces and abstractions recognized by higher-level kernel components. It is important to note that MTD user modules are not fully interoperable with all MTD drivers. In fact, certain MTD user modules may not be usable with certain MTD drivers because of technical or even legal limitations. At the time of this writing, for example, development is still under way to enable the JFFS2 user module to be used with NAND flash devices. Until recently, it was impossible to use the JFFS2 user module with any form of NAND flash, including DOC devices, because JFFS2 did not deal with NAND flash chip particularities. Work is underway to fix the situation, however, and JFFS2 may actually be usable with NAND devices by the time you read this. The following list describes the existing MTD user modules and their characteristics:

JFFS2

JFFS2 is a successor and a complete rewrite by Red Hat of the JFFS discussed below. As its name implies, the Journalling Flash File System Version 2 (JFFS2) implements a journalling filesystem on the MTD device it manages. In contrast with other memory device storage schemes, it does not attempt to provide a translation layer that enables the use of a traditional filesystem with the device. Instead, it implements a log-structured filesystem directly on the MTD device. The filesystem structure itself is recreated in RAM at mount time by JFFS2 through a scan of the MTD device's log content.

In addition to its log-structured filesystem, JFFS2 implements wear leveling and data compression on the MTD device it manages, while providing power down reliability.

Power down reliability is crucial to embedded systems, because they may lose power at any time. The system must then gracefully restart and be capable of restoring a filesystem's content without requiring outside intervention. If your Linux, or even Windows, workstation has ever lost power accidently, you probably had to wait for the system to check the filesystems' integrity upon rebooting and may have even been prompted to perform some checks manually. Usually, this is a situation that is not acceptable for an embedded system. JFFS2 avoids these problems; it can gracefully recuperate regardless of power failures. Note, however, that it does not guarantee rollback of interrupted filesystem operations. If an application had called write( ) to overwrite old data with new data, for example, it is possible that the old data may have been partially overwritten and that the new data was not completely committed. Both data sets are then lost. Your system should be built to check on startup for this type of failure.

Wear leveling is necessary, because flash devices have a limited number of erases per block, which is often 100,000 but may differ between manufacturers. Once this limit is reached, the block's correct operation is not guaranteed by the manufacturer. To avoid using some blocks more than others and thereby shortening the life of the device, JFFS2 implements an algorithm that ensures uniform usage of all the blocks on the flash device, hence leveling the wear of its blocks.

Because flash hardware is usually more expensive and slower than RAM hardware, it is desirable to compress the data stored on flash devices and then decompress it to RAM before using it. This is precisely what JFFS2 does. For this reason, eXecute In Place (XIP)^[16] is not possible with JFFS2.

^[16] XIP is the ability to execute code directly from ROM without copying it to RAM.

JFFS2 has been widely adopted as the filesystem of choice for MTD devices. The Familiar project, http://familiar.handhelds.org/, for instance, uses JFFS2 to manage the flash available in Compaq's iPAQ.

As I said earlier, though JFFS2 cannot currently be used with NAND devices, including DOC devices, this is under construction and may be available by the time your read this. Meanwhile, JFFS2 can be used with other types of MTD devices and is even sometimes used with CompactFlash devices, which actually behave as IDE hard drives connected to the system's IDE interface.

NFTL

The NAND Flash Translation Layer (NFTL) implements a virtual block device on NAND flash chips. As seen in Figure 3-1, a disk-style filesystem, such as FAT or ext2, must then be used to store and retrieve data from an NFTL-managed MTD device.

It is important to note that M-Systems holds patents on the algorithms implemented by NFTL and, as such, permits the use of these algorithms only with licensed DOC devices. Though NFTL is itself reliable in case of power failure, you would need to use a journalling filesystem over NFTL to make your system's storage power-failure proof. An embedded system that crashes while running ext2 over NFTL, for example, would require a filesystem integrity check on startup, much like a normal Linux workstation.

JFFS

The Journalling Flash File System (JFFS) was originally developed by Axis Communications AB in Sweden and was aimed at embedded systems as a crash/power down-safe filesystem. Though JFFS has reportedly been used with NAND devices?a feature likely to be available in JFFS2 by the time you read this?it has largely been replaced by JFFS2.

FTL

The Flash Translation Layer implements a virtual block device on NOR flash chips. As with NFTL, a "real" filesystem must then be used to manage the data on the FTL-handled device.

FTL, too, is subject to patents. In the U.S., it may be used only on PCMCIA hardware. Instead of using FTL on NOR flash chips, you may want to go with JFFS2 directly, as it is not hampered by any patents and is a better fit for the task.

Char device

This user module enables character device-like access to MTD devices. Using it, each MTD device can be directly manipulated as a character device, in the Unix sense. It is mostly useful for the initial setup of an MTD device. As we'll see in Chapter 7, there is a specific way in which reading and writing to this char device must be done for the data involved to be valid. Before writing to the char device, for example, it usually must be erased first.

Caching block device

This user module provides a block device interface to MTD devices. The usual workstation and server filesystems can then be used on these devices. Although this is of little use for production embedded systems, which require features such as those provided by JFFS2, this module is mainly useful for writing data to flash partitions without having to explicitly erase the content of the partition beforehand. It may also be used for setting up systems whose filesystems will be mounted read-only in the field.

This module is called the "caching" block device user module, because it works by caching blocks in RAM, modifying them as requested, erasing the proper MTD device block, and then rewriting the modified block. There is, of course, no power failure reliability to be found here.

Read-only block device

The read-only block device user module provides the exact same capabilities as the caching block device user module, except that no RAM caching is implemented. All filesystem content is therefore read-only.

As you can see, the MTD subsystem is quite rich and elaborate. Even though its use is complicated by the rules that govern the proper matching of MTD user modules with MTD chip drivers, it is fairly flexible and is effective in providing a uniform and unified access to memory devices. The Memory Technology Device Subsystem project web site is found at http://www.linux-mtd.infradead.org/ and contains documentation regarding the programming API for implementing MTD user modules and MTD chip drivers. It also contains information regarding the MTD mailing list and a fairly elaborate MTD-JFFS-HOWTO by Vipin Malik.

In Chapter 7, we will continue our discussion of the MTD subsystem and will detail the setup and configuration instructions for using MTD devices in your embedded system.

3.4.2 ATA-ATAPI (IDE)

The AT Attachment (ATA)^[17] was developed in 1986 by three companies: Imprimis, Western Digital, and Compaq. It was initially used only by Compaq but eventually became quite popular when Conner Peripherals began providing its IDE drives through retail stores. By 1994, ATA was an ANSI standard. Different versions of the standard have since been developed allowing faster transfer rates and enhanced capabilities. Along the way, the ATA Packet Interface (ATAPI) was developed by CD-ROM manufacturers with the help of Western Digital and Oak Technology. ATAPI allows for CD-ROM and tape devices to be accessible through the ATA interface using SCSI-like command packets. Today ATA and ATAPI are developed and maintained by ANSI, NCITS, and T13.

^[17] Although it is often referred to as "IDE," which stands for Integrated Drive Electronics, "ATA" is the real name of this interface.

Although only a fraction of traditional embedded systems ever need a permanent storage media providing as much storage space as an IDE hard disk can, many embedded systems use a very popular ATA-compliant flash device, CompactFlash. Contrary to the flash devices discussed in Section 3.4.1, the CompactFlash's storage space can be accessed only using the ATA interface. Hence, from the software's perspective, and indeed from the hardware's perspective, it is indistinguishable from a small-sized IDE drive. Note that CompactFlash cards can also be accessed through CompactFlash-to-PCMCIA adapters. We will discuss the use of CompactFlash devices with Linux further in Chapter 7. Meanwhile, keep in mind that not all CompactFlash devices have the proper characteristics for use in embedded systems. In particular, some CompactFlash devices do not tolerate power failure, and may be permanently damaged following such a failure.

In embedded Linux systems, IDE and most other types of disks are usually set up as in a workstation or server. Typically, the disk holds the OS bootloader, the root filesystem, and possibly a swap partition. In contrast to most workstations and servers, however, not all embedded system monitors and bootloaders are ATA-capable. In fact, as we'll see in Chapter 9, most bootloaders are not ATA/IDE-capable. If you want to use an IDE disk in your system and an ATA-capable monitor or bootloader is not present in your system's flash, you need to have the kernel present in flash or in ROM with the boot monitor so that it may be accessible at system startup. You then have to configure your boot monitor to use this kernel on startup in order to have access to the IDE disk. In this case, you can still configure your root filesystem and swap partition to be on the IDE disk.

Linux's support for the ATA interface is quite extensive and mature. The ATA subsystem, located in the drivers/ide directory of the kernel sources, includes support, and sometimes bug fixes, for many chipsets. This support spans many architectures. In addition, the kernel supports PCMCIA IDE devices and provides a SCSI-emulation driver for use with ATAPI devices. The latter can be used in conjunction with a SCSI driver to control an ATAPI device for which there is still no existing ATAPI native driver. Though it is no longer necessary since the 2.5 kernel development series, this functionality was mostly useful to users with workstations equipped with CD-RW drives, since the tools available to operate these devices in Linux used to require that the underlying hardware be SCSI.

Given the importance of ATA/IDE support, most modifications and updates posted to the kernel mailing list are directly integrated into the kernel. This contrasts with other subsystems where maintainers provide a separate up-to-date version through the subsystem's project web site, while the kernel contains a stable version that is updated every so often when the maintainers send a patch or, more commonly, a set of patches to Linus. There are, however, ATA/IDE-related tools, primarily hdparm and fdisk, maintained outside the kernel, mainly because they are user tools and are not required for the kernel's normal operation. hdparm gets and sets IDE hard disk parameters using the ioctl( ) commands supported by ATA/IDE drivers in the kernel. fdisk is used to view and modify disk partitions. If you have ever installed Linux on a workstation, you are probably already familiar with fdisk. Note that this utility is not limited to IDE hard disks and can be used with SCSI disks, too.

The main starting point for information on Linux's ATA/IDE capabilities is the Linux ATA Development Project web site located at http://www.linux-ide.org/. In addition to providing access to the ATA-related user tools, it provides links to many resources relevant to ATA. Also of importance is the ide.txt file located in the Documentation directory of the kernel sources, which contains information on the kernel's support for IDE devices and how to configure the kernel to properly access such devices.

Several non-Linux-specific ATA/IDE resources are available both online and in print. PC Hardware in a Nutshell by Robert Bruce Thompson and Barbara Fritchman Thompson (O'Reilly) contains a full chapter on IDE and SCSI hard disk interfaces, including a comparison of these interfaces. Although the discussion centers on high-level issues, it is a good introduction to the world of ATA/IDE and may be helpful in choosing a hard disk interface. For a more in-depth discussion, you may want to have a look at the Enhanced IDE FAQ, available from http://www.faqs.org/, which contains tips and tricks resulting from the cumulative knowledge available on the comp.sys.ibm.pc.hardware.storage newsgroup. Finally, if you really want to know all the ins and outs of the ATA interface, purchase the relevant standards documents from ANSI. Before you do so, however, be sure to read the relevant portions of the kernel's sources, as they too often contain hard-to-find information.

3.4.3 SCSI

As described in the Section 3.2.8 subsection of Section 3.2 , the use of SCSI storage devices in embedded Linux systems is limited. When used, these devices are set up and configured in much the same way they would be used in a server. You may therefore follow the instructions and recommendations provided in any appropriate system administration book or online manual. The documentation and resources mentioned in the earlier Section 3.2.8 are, of course, still recommended. As an introduction to SCSI storage devices, PC Hardware in a Nutshell (O'Reilly) contains a brief discussion of SCSI storage devices and a comparison with ATA/IDE.