Tuesday, February 18, 2014

Linux OS : What happens inside a system call? OR How Linux handles a system call?

In a simple words : A system call is an interface between kernel space and user space. As more often there is a need for user-space applications to interact with the kernel resources thus a system call is used. Most system calls use copy_to_user/copy_from_user kernel functions in the kernel space to do the job.

It is not possible for user-space applications to execute kernel code directly. They cannot simply make a function call to a method existing in kernel-space because the kernel exists in a protected memory space. If applications could directly read and write to the kernel’s address space, system security and stability would be nonexistent.

system calls have a defined behavior. For example, the system call getpid() is defined to return an integer that is the current process’s PID.

The implementation of this syscall in the kernel is simple:

SYSCALL_DEFINE0(getpid)
{
return task_tgid_vnr(current); // returns current->tgid
}

Note that the definition says nothing of the implementation. The kernel must provide the intended behavior of the system call but is free to do so with whatever implementation it wants as long as the result is correct.Of course, this system call is as simple as they come, and there are not too many other ways to implement it. SYSCALL_DEFINE0 is simply a macro that defines a system call with no parameters (hence the 0).

The expanded code looks like this:

asmlinkage long sys_getpid(void)


First, note the asmlinkage modifier on the function definition. This is a directive to tell the compiler to look only on the stack for this function’s arguments. This is a required modifier for all system calls.

Second, the function returns a long. For compatibility between 32- and 64-bit systems, system calls defined to return an int in user-space return a long in the kernel.

Third, note that the getpid() system call is defined as sys_getpid() in the kernel. This is the naming convention taken with all system calls in Linux: System call bar() is implemented in the kernel as function sys_bar().

Switch to Kernel Mode from User-space
user-space applications must somehow signal to the kernel that they want to execute a system call and have the system switch to kernel mode, where the system call can be executed in kernel-space by the kernel on behalf of the application. The mechanism to signal the kernel is a software interrupt: Incur an exception, and the system will switch to kernel mode and execute the exception handler. The exception handler, in this case, is actually the system call handler. The defined software interrupt onx86 is interrupt number 128, which is incurred via the int $0x80 instruction. It triggers a switch to kernel mode and the execution of exception vector 128, which is the system call handler. The system call handler is the aptly named function system_call(). It is architecture-dependent; on x86-64 it is implemented in assembly in entry_64.S.6 Recently, x86 processors added a feature known as sysenter. This feature provides a faster, more specialized way of trapping into a kernel to execute a system call than using the int interrupt instruction. Support for this feature was quickly added to the kernel. Regardless of how the system call handler is invoked, however, the important notion is that somehow user-space causes an exception or trap to enter the kernel.

What are System Call Numbers?In Linux, each system call is assigned a syscall number. This is a unique number that is used to reference a specific system call. When a user-space process executes a system call, the syscall number identifies which syscall was executed; the process does not refer to the syscall by name. System call number are fed to the kernel using CPU registers(before causing the TRAP).


Parameter Passing
Parameters are passed to the kernel like sys call numbers again using the registers. The return value from the user-space is also via a register.

System call context
When a program executes a system call or triggers an exception, it enters kernel-space.At this point, the kernel is said to be “executing on behalf of the process" and is actually in process context. Note that the kernel is in process context when a system call is executed however there is signal or SWI to change the processor mode.

Reference:-
Linux Kernel Development by Robert Love

No comments:

Post a Comment