Monday, July 18, 2011

Dealing with Kernel Failure

The below mentioned text is from a Book : Building Embedded Linux System by Karim Yaghmour.I have copied this for self-reference.

The Linux kernel is a very stable and mature piece of software. This, however, does not mean that it or the hardware it relies on never fail. Linux Device Drivers covers issues such as oops messages and system hangs. In addition to keeping these issues in mind during your design, you should think about the most common form of kernel failure known as kernel panic.When a fatal error occurs and is caught by the kernel, it will stop all processing and emit a kernel panic message.

There are many reasons a kernel panic can occur. One of the most frequent is when you forget to specify to the kernel the location of its root file system. In that case, the kernel will boot normally and will panic upon trying to mount its root file system.

The only means of recovery in case of a kernel panic is a complete system reboot.For this reason,the kernel accepts a boot parameter that indicates the number of seconds it should wait after a kernel panic to reboot. If you would like the kernel to reboot one second after a kernel panic, for instance, you would pass the following sequence as part of the kernel's boot parameters: panic=1.

The code for the kernel's panic function, panic(), is in the kernel/panic.c file in the kernel's sources.The first observation to be made is that the panic function's default output goes to the console.However, If your system do not even have a terminal, you may want to modify this function according to your particular hardware. An alternative to the terminal, for example, would be to write the actual error string in a special section of flash memory that is specifically set aside for this purpose. At the next reboot, you would be able to retrieve the text information from that flash section and attempt to solve the problem.



No comments:

Post a Comment