Tuesday, April 12, 2016

The Classic Lost Wake up in Linux Kernel

The lost wake-up problem arises out of a race condition that occurs while a thread/process goes to conditional sleep. It is a classic problem in operating systems.

Consider two threads/processes, A and B. 
  • Thread/Process A is processing from a list, consumer.
  • The thread/process B is adding to this list, producer. 
  • When the list is empty, thread/process A sleeps.
  • Thread/Process B wakes A up when it appends anything to the list. 
And the code looks like as below :

 Process/Thread A (Processing the List):

1  spin_lock(&list_lock);
2  if(list_empty(&list_head)) {
3      spin_unlock(&list_lock);  
4      set_current_state(TASK_INTERRUPTIBLE);  
5      schedule();
6      spin_lock(&list_lock);
7  }
8
9  /* Rest of the code ... */
10 spin_unlock(&list_lock);


Process/Thread B( Adding to the List ):

100  spin_lock(&list_lock);
101  list_add_tail(&list_head, new_node);
102  spin_unlock(&list_lock);
103  wake_up_process(processa_task);

There is one problem with this situation as described below. 

STEP1 :  It may happen that after process A executes line 3 but before it executes line 4,thread/process B is scheduled on another processor. In other words. the Process/Thread A is not blocked however thread/process B is scheduled before the Line 4 Marked in RED is executed.

STEP2  : In this timeslice, thread/process B executes all its instructions, 100 through 103. Thus, it performs a wake-up on thread/process A, which has not yet gone to sleep. 

STEP3 :  Now, thread/process A sets the state to TASK_INTERRUPTIBLE and goes to sleep.

Thus, a wake up from thread/process B is lost or it was not processed at all. 
This is known as the lost wake-up problem. The thread/process A sleeps, even though there are nodes available on the list.

SOLUTION to the problem :- We need to re-write the thread or Process A  so that it doesn't misses any wake up.

Process A:

1  set_current_state(TASK_INTERRUPTIBLE);
2  spin_lock(&list_lock);
3  if(list_empty(&list_head)) {
4         spin_unlock(&list_lock);
5         schedule();
6         spin_lock(&list_lock);
7  }
8  set_current_state(TASK_RUNNING);
9
10 /* Rest of the code ... */
11 spin_unlock(&list_lock);

How did this solve problem?

1. The default state of Thread/Process A is Interruptible and after executing step 4 [ Marked RED] before the schedule ,the wake up is called by thread/process B hence the thread/process A is put to TASK_RUNNING state. Hence the wake up call of thread/process A for thread/process B is not missed.


2. If a wake-up is delivered by thread/process B at any point after the check for list_empty is made, the state of thread/process A automatically is changed to TASK_RUNNING. Hence, the call to schedule() does not put thread/process A to sleep; it merely schedules it out for a while.


No comments:

Post a Comment