Formatted HTML doc to follow...
Kernel Panics and Oopses A kernel oops is not the same thing as a panic. When the kernel panics, the kernel cannot continue running and the system must be restarted. When an oops occurs the kernel may be able to continue operating. In some cases the oops may cause a panic if something vital was affected. Oopses in device drivers don't normally cause panics--however, they may leave the system in a semi-useable state. Oopses are caused by the kernel dereferencing an invalid pointer. In a user-space program this would normally cause a segmentation fault, also known as a segfault. A user-space program cannot recover from a segfault. When this occurs in the kernel, however, it is called an oops and doesn't necessarily leave the kernel unuseable. An oops can be caused by both hardware problems and kernel programming errors. A frequently asked question is why a Linux system does not save a crash dump when the kernel panics. There are several extensions which enable crash dumps on Linux boxes. Red Hat's NetDump facility allows dumps to be made over the network to a specified dump server. Red Hat has more recently introduced a diskdump facility for certain disk subsystem hardware. The main reason why Linux does not save crash dumps by default is due to the nature of the x86 hardware architecture. When the kernel panics a dump must be written without kernel support. With an OS running on dedicated hardware, such as Solaris on SPARC, this is not difficult to achieve. A SPARC system will save the contents of the system memory and then write it to disk upon subsequent bootup. The PC BIOS does not have a means to save the state of memory when the system is rebooted, thereby preventing a reliable means of saving a crash dump. Unable to handle kernel NULL pointer dereference at virtual address 00000014 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[] EFLAGS: 00210213 eax: 00000000 ebx: c6155c6c ecx: 00000038 edx: 00000000 esi: c672f000 edi: c672f07c ebp: 00000004 esp: c6155b0c ds: 0018 es: 0018 ss: 0018 Process tar (pid: 2293, stackpage=c6155000) Stack: c672f000 c672f07c 00000000 00000038 00000060 00000000 c6d7d2a0 c6c79018 00000001 c6155c6c 00000000 c6d7d2a0 c017eb4f c6155c6c 00000000 00000098 c017fc44 c672f000 00000084 00001020 00001000 c7129028 00000038 00000069 Call Trace: [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] Code: 8b 40 14 ff d0 89 c2 8b 06 83 c4 10 01 c2 89 16 8b 83 8c 01 The oops displays the type of error that occurred, in this case "Unable to handle kernel NULL pointer dereference". The Oops number is important, since only the first oops can be relied upon when multiple oopses occur. The EIP shows the code segment and instruction address that were being executed. Also printed are the contents of the CPU's registers and a stack backtrace. The call trace is the list of functions the process was in when the oops occurred. The numerical data here is nearly useless for debugging purposes because it is unique to the kernel it was running on. The only way to decipher the addresses is through the Symbol Map, typically /boot/System.map. This maps actual function names to their numeric addresses. In kernels prior to 2.6, the ksymoops utility was used to rewrite the numeric addresses into human-useable function names. As of 2.6, ksymoops is no longer used (see Documentation/oops-tracing.txt in a 2.6 kernel tree). The klogd daemon which is responsible for passing kernel messages onto syslogd will perform the same lookups from the System.map that ksymoops used to do. So the oops that gets sent to /var/log/messages will be the one used for debugging.