Thursday, February 20, 2014

Linux Kernel : Bottom Halves

The job of bottom halves is to perform any interrupt-related work not performed by the interrupt handler. In an ideal world, this is nearly all the work because you want the interrupt handler to perform as little work (and in turn be as fast) as possible. By offloading as much work as possible to the bottom half, the interrupt handler can return control of the system to whatever it interrupted as quickly as possible.Nonetheless, the interrupt handler must perform some of the work.For example, the interrupt handler almost assuredly needs to acknowledge to the hardware the receipt of the interrupt.It may need to copy data to or from the hardware.

SOFTIRQS

Softirqs are a set of statically defined bottomhalves that can run simultaneously on any processor;Even two of the same type can run concurrently. Softirqs cannot be defined dynamically also needs special locking as they can run simultaneously(the same Softirq) on different processors.Softirqs are useful when performance is critical, such as with networking.The number of registered softirqs is statically determined at compile time and cannot be changed dynamically.

The kernel enforces a limit of 32 registered softirqs;In the current kernel, however, only nine exist.A softirq never preempts another softirq. The only event that can preempt a softirq is an interrupt.

Softirqs are most often raised from within interrupt handlers.The interrupt handler performs the basic hardware-related work,raises the softirq,and then exits.

When processing interrupt,the kernel invokes:-
do_softirq();

The softirq then runs and picks up where the interrupt handler left off. Hence the softirq runs in interrupt context.

TASKLETS

Tasklets are flexible, dynamically created bottom halves built on top of softirqs.Two different tasklets can run concurrently on different processors,
but two of the same type of tasklet cannot run simultaneously.

Tasklets are represented by two softirqs: 
HI_SOFTIRQ and TASKLET_SOFTIRQ.The only difference in these types is that the HI_SOFTIRQ-based tasklets run prior to the TASKLET_SOFTIRQ based tasklets.

As with softirqs, tasklets cannot sleep.This means you cannot use semaphores or other blocking functions in a tasklet.Tasklets also run with all interrupts enabled, so you must take precautions (for example, disable interrupts and obtain a lock) if your tasklet shares data with an interrupt handler. Unlike softirqs, however,two of the same tasklets never run concurrently—although two different tasklets can run at the same time on two different processors. If your tasklet shares data with another tasklet or softirq, you need to use proper locking.

After a tasklet is scheduled, it runs once at some time in the near future. If the same tasklet is scheduled again, before it has had a chance to run, it still runs only once. If it is already running, for example on another processor, the tasklet is rescheduled and runs again.As an optimization, a tasklet always runs on the processor that scheduled it—making better use of the processor’s cache.

ksoftirqd for both tasklet and softirq
Softirq (and thus tasklet) processing is aided by a set of per-processor kernel threads.These kernel threads help in the processing of softirqs when the system is overwhelmed with softirqs.Because tasklets are implemented using softirqs, the following discussion applies equally to softirqs and tasklets.

Problem Statement : 
On return from interrupt, the kernel merely looks at all pending softirqs and executes them as normal. If any softirqs reactivate themselves, however, they will not run until the next time the kernel handles pending softirqs.This is most likely not until the next interrupt occurs, which can equate to a lengthy amount of time before any new (or reactivated) softirqs are executed.

In designing softirqs, the kernel developers realized that some sort of compromise was needed.The solution ultimately implemented in the kernel is to not immediately process reactivated softirqs. Instead, if the number of softirqs grows excessive, the kernel wakes up a family of kernel threads to handle the load.The kernel threads run with the lowest possible priority (nice value of 19), which ensures they do not run in lieu of anything important.There is one thread per processor.The threads are each named ksoftirqd/n where n is the processor number. 

On a two-processor system, you would have ksoftirqd/0 and ksoftirqd/1. Having a thread on each processor ensures an idle processor, if available, can always service softirqs. After the threads are initialized, they run a tight loop similar to this:

for (;;) {
if (!softirq_pending(cpu))
schedule();
set_current_state(TASK_RUNNING);
while (softirq_pending(cpu)) {
do_softirq();
if (need_resched())
schedule();
}
set_current_state(TASK_INTERRUPTIBLE);
}

If any softirqs are pending (as reported by softirq_pending()),ksoftirqd calls do_softirq() to handle them



Locking Between Hard IRQ and Softirqs/Tasklets :

If a hardware irq handler shares data with a softirq, you have two concerns. Firstly, the softirq processing can be interrupted by a hardware interrupt, and secondly, the critical region could be entered by a hardware interrupt on another CPU. This is where spin_lock_irq() is used. It is defined to disable interrupts on that cpu, then grab the lock. spin_unlock_irq() does the reverse.
The irq handler does not to use spin_lock_irq(), because the softirq cannot run while the irq handler is running: it can use spin_lock(), which is slightly faster. The only exception would be if a different hardware irq handler uses the same lock: spin_lock_irq() will stop that from interrupting us.

This works perfectly for UP as well: the spin lock vanishes, and this macro simply becomes local_irq_disable() (include/asm/smp.h), which protects you from the softirq/tasklet/BH being run.

spin_lock_irqsave() (include/linux/spinlock.h) is a variant which saves whether interrupts were on or off in a flags word, which is passed to spin_unlock_irqrestore()

This means that the same code can be used inside an hard irq handler (where interrupts are already off) and in softirqs (where the irq disabling is required).
Note that softirqs (and hence tasklets and timers) are run on return from hardware interrupts, so spin_lock_irq() also stops these. In that sense, spin_lock_irqsave() is the most general and powerful locking function.


WORKQUEUES
Work queues are a different form of deferring work from what we have looked at so far.Work queues defer work into a kernel thread—this bottom half always runs in process context.Thus, code deferred to a work queue has all the usual benefits of process context. Most important, work queues are schedulable and can therefore sleep.

The prototype for the work queue handler is  : 

void work_handler(void *data);

A worker thread executes this function, and thus, the function runs in process context. By default, interrupts are enabled and no locks are held. If needed, the function can sleep. Note that, despite running in process context, the work handlers cannot access user-space memory because there is no associated user-space memory map for kernel threads.

Locking between work queues or other parts of the kernel is handled just as with any other process context code.This makes writing work handlers much easier.Work queues involve the highest overhead because they involve kernel threads and, therefore, context switching.This is not to say that they are inefficient, but in light of thousands of interrupts hitting per second(as the networking subsystem might experience), other methods make more sense. For most situations, however, work queues are sufficient.

How workqueues are scheduled?
1.There is a global pool of threads attached to each CPU in the system.When a work item is enqueued, it will be passed to one of the global threads at the right time (as deemed by the workqueue code).One interesting implication of this is that tasks submitted to the same workqueue on the same CPU may now execute concurrently.

2.When the first workqueue task is submitted, a thread running under the workqueue scheduler class is created to execute it. As long as that task continues to run, other tasks will wait. But as soon as the running task 
blocks on some resource, the scheduler will notify the workqueue code
and another thread will be created  to run the next task.The workqueue manager will create as many threads as needed (up to a limit) to keep 
the CPU busy, but it tries to only have one task actually running at any 
given time.

3.The handling of CPU hotplugging is interesting. If a CPU is being taken offline,the system needs to move all work off that CPU as quickly as possible.To that end,the workqueue manager responds to a hot-unplug notification by creating a special "trustee" manager on a CPU which is sticking around.That trustee takes over responsibility for the workqueue running on the doomed CPU, executing tasks until they are all gone and the workqueue can be shut down.Meanwhile, the CPU can go offline without waiting for the workqueue to drain.

THREADED IRQS

Reference :- 
https://lwn.net/Articles/355700/
Linux Kernel Development by Robert Love.

No comments:

Post a Comment