Let's take a look at the kernel's operation once it's installed on your target and ready to run. Because the algorithms and underlying source code is the same for embedded and regular systems, the kernel will behave almost exactly the same as it would on a workstation or a server. For this reason, the other books and online material on the subject, such as Linux Device Drivers and Understanding the Linux Kernel from O'Reilly, are much more appropriate for finding in-depth explanations of the kernel. There are, nevertheless, aspects particular to embedded Linux systems or that warrant particular emphasis.
The Linux kernel is a very stable and mature piece of software. This, however, does not mean that it or the hardware it relies on never fail. Linux Device Drivers covers issues such as oops messages and system hangs. In addition to keeping these issues in mind during your design, you should think about the most common form of kernel failure: kernel panic.
When a fatal error occurs and is caught by the kernel, it will stop all processing and emit a kernel panic message. There are many reasons a kernel panic can occur. One of the most frequent is when you forget to specify to the kernel the location of its root filesystem. In that case, the kernel will boot normally and will panic upon trying to mount its root filesystem.
The only means of recovery in case of a kernel panic is a complete system reboot. For this reason, the kernel accepts a boot parameter that indicates the number of seconds it should wait after a kernel panic to reboot. If you would like the kernel to reboot one second after a kernel panic, for instance, you would pass the following sequence as part of the kernel's boot parameters: panic=1.
Depending on your setup, however, a simple reboot may not be sufficient. In the case of our control module, for instance, a simple reboot may even be dangerous, since the chemical or mechanical process being controlled may get out of hand. For this reason, we need to change the kernel's panic function to notify a human operator who could then use emergency manual procedures to control the process. Of course, the actual panic behavior of your system depends on the type of application your system is used for.
The code for the kernel's panic function, panic(), is in the kernel/panic.c file in the kernel's sources. The first observation to be made is that the panic function's default output goes to the console.[6] Since your system may not even have a terminal, you may want to modify this function according to your particular hardware. An alternative to the terminal, for example, would be to write the actual error string in a special section of flash memory that is specifically set aside for this purpose. At the next reboot, you would be able to retrieve the text information from that flash section and attempt to solve the problem.
[6] The console is the main terminal to which all system messages are sent.
Whether you are interested in the actual text message or not, you can register your own panic function with the kernel. This function will be called by the kernel's panic function in the event of a kernel panic and can be used to carry out such things as signaling an emergency.
The list that holds the functions called by the kernel's own panic function is panic_notifier_list. The notifier_chain_register function is used to add an item to this list. Conversely, notifier_chain_unregister is used to remove an item from this list.
The location of your own panic function has little importance, but the registration of this function must be done during system initialization. In our case, we add a mypanic.c file in the kernel/ directory of the kernel sources and modify that directory's Makefile accordingly. Here is the mypanic.c for our control module:
#include <linux/kernel.h> #include <linux/init.h> #include <linux/notifier.h> static int my_panic_event(struct notifier_block *, unsigned long, void *); static struct notifier_block my_panic_block = { notifier_call: my_panic_event, next: NULL, priority: INT_MAX }; int _ _init register_my_panic(void) { printk("Registering buzzer notifier \n"); notifier_chain_register(&panic_notifier_list, &my_panic_block); return 0; } void ring_big_buzzer(void) { ... } static int my_panic_event(struct notifier_block *this, unsigned long event, void *ptr) { ring_big_buzzer( ); return NOTIFY_DONE; } module_init(register_my_panic);
The module_init(register_my_panic); statement ensures that the register_my_panic function is called during the kernel's initialization without requiring any modification of the kernel's startup functions. The registration function adds my_panic_block to the list of other blocks in the panic notifier list. The notifier_block structure has three fields. The first field is the function to be called, the second is a pointer to the next notifier block, and the third is the priority of this block. In our case, we want to have the highest possible priority. Hence the use of INT_MAX.
In case of kernel panic, my_panic_event is called as part of the kernel's notification of all panic functions. In turn, this function calls on ring_big_buzzer, which contains code to start a loud alarm to attract the human operator's attention to the imminent problem.