Tuesday, September 23, 2014

C Programming : Is a number power of '2'?

#include


int main()
{
        int num = 0;int k = 0;
     
        printf("enter a number");
        scanf("%d",&num);
        printf("\nIsPoweroftwo = %d\n",isPowerOfTwo(num));

        while(1) {
                printf("\nDo you want to continue? 1 or 0\n");
                scanf("%d",&k);

                  if(k){
                        printf("enter a number");
                        scanf("%d",&num);
                        printf("\nIsPoweroftwo = %d\n",isPowerOfTwo(num));
                   } else {
                      break;
                   }
       }    
 }

int isPowerOfTwo (unsigned int x)
{
       return ((x > 0) && ((x & (~x + 1)) == x));
}

Saturday, September 20, 2014

C Programming : Run Length Encoding for a given string


int main()
{
int cnt = 0;
int j = 0,i = 0,k=0;
char str[1000] = "llllggg$$$$$$(((((((@@@@@@@aaiiiiiahaag;;;;;a,,,,,////)))gsssdsssggggggggkkkkkkjjj";

char str_enc[50];
char str_n[10] = {0};
char val = '\0';

val = str[0];
for(i = 0 ;str[i]!='\0'; i++) {
if(val == str[i]) {
cnt++;
} else {
val = str[i];
snprintf(str_n,8,"%d",cnt);
k = 0;
while(str_n[k] != '\0')
str_enc[j++] = str_n[k++];
str_enc[j++] = str[i-1];
memset(str_n,0,strlen(str_n));
cnt = 1;
}
}

snprintf(str_n,8,"%d",cnt);

k = 0;
while(str_n[k] != '\0')
str_enc[j++] = str_n[k++];
str_enc[j++] = str[i-1];
str_enc[j] = '\0';

memset(str_n,0,strlen(str_n));
printf("%s\n",str_enc);

}

OUTPUT
4l3g6$7(7@2a5i1a1h2a1g5;1a5,4/3)1g3s1d3s8g6k3j

Interview Questions : ARM and General Embedded System


General Embedded System
1. State the differences between a FIQ and IRQ?
2. Describe steps for a wake-up of a new IP component? How to write a device driver?
3. What things you do not do in ISR?

ARM
1.  Explain different ARM registers especially SPSR and CPSR.
2.  Explain various processor modes?
3. What is use of Thumb Instruction?
4. Explain how interrupt controller works in ARM.


Interview Questions : C Programming and Data Structures

1. State the use of 'Volatile'. What happens when cache is disabled in a system?
2. Program for reversal of linklist.
3. Difference between declaration and definition.
4. Explain pointer to const and constant pointer.
5. Tell me declaration of a function pointer for a function which takes no argument but returns a pointer to char.
6. Dynamically allocate double dim array.
7. Difference between k++; and ++k; in terms of efficiency.
8. How to find sizeof of an unknown structure without using sizeof operator. Assume that you only know the name of the struct not its components.
9. How to find or update 'n' th byte of any integer.
10.How to set 'n' bits of a number from 'p'th position.
11.How to find a 'n'th node from last node of a single linklist while traversing from root.
12.How to find whether two linklist merges or not, and also where they merge.
13.How to find whether there is a cycle in a linklist.
14.How to determine whether a system is Little Endian or Big Endian?
15.How to change Endianness of a number?
16.How to rotate 'n' bits of a number?
17.How to find whether a number is power of 2 or not?
18.Finding number of set bits in a number without a loop?
19.How to toggle 'n' bits of a number from 'p'th position.
20. Declare a function pointer which takes a pointer to an array and also returns a pointer to an array.
21. What are 'Variadic' functions?
22. How would you find pattern in a string.
23. Explain how compiler manages an Inline function? What is the difference between Inline and Macros?

Thursday, August 7, 2014

C Programming : Number of ones in a binary for an integer number without a loop


  #include"stdio.h"
  //#define DEBUG

/* Note :-
 * char array notation :- ((unsigned char *)&number)[p]
 * Byte - 8 bit.
 * e58e9bc0  - hex, - 4byte
 * 32bit - 8 hex numbers
 * each hex number is 32/8 = 4 bits
 * "f" in hex is 1111
 */

  int number_of_ones(int j);

  #define byte_extract(number,p)   ((unsigned char *)&number)[p]  
 
  int array[16] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
 
 int main()
 {
     int j = 0;

      printf("/******** THIS PROGRAM GIVES NUMBER OF ONES IN A NUMBER,");
      printf("LOOPS ARE NOT USED FOR CODE EFFICIENCY ****************/\n");

      printf("Enter a number(Decimal Format) : ");

      scanf("%d",&j);

#ifdef DEBUG
      printf("%d,%d,%d,%d,%d,%d,%d,%d\n",array[(byte_extract(j,0) & 0xf)],
      array[((byte_extract(j,0) & 0xf0) >> 4)] ,
      array[(byte_extract(j,1) & 0xf)] , array[((byte_extract(j,1) & 0xf0) >> 4)] ,
      array[(byte_extract(j,2) & 0xf)]  , array[((byte_extract(j,2) & 0xf0) >> 4)] ,
      array[(byte_extract(j,3) & 0xf)]  , array[((byte_extract(j,3) & 0xf0) >> 4)] );
#endif

     printf("Number of ones in the number 0x%x(Hex) are %d\n",j ,number_of_ones(j));
}


int number_of_ones(int j)
{
     return  (array[(byte_extract(j,0) & 0xf)] + array[((byte_extract(j,0) & 0xf0) >> 4)] +
              array[(byte_extract(j,1) & 0xf)] + array[((byte_extract(j,1) & 0xf0) >> 4)] +
              array[(byte_extract(j,2) & 0xf)] + array[((byte_extract(j,2) & 0xf0) >> 4)] +
              array[(byte_extract(j,3) & 0xf)] + array[((byte_extract(j,3) & 0xf0) >> 4)] );
}


Sunday, February 23, 2014

C Programming : Reverse bits in a Number

/* Function to reverse bits of num */
unsigned int reverseBits(unsigned int num)
{
    unsigned int  NO_OF_BITS = sizeof(num) * 8;
    unsigned int reverse_num = 0, i, temp;

    for (i = 0; i < NO_OF_BITS; i++)
    {
        temp = (num & (1 << i));
        if(temp)
            reverse_num |= (1 << ((NO_OF_BITS - 1) - i));
    }

    return reverse_num;
}

OR 

int reverseBits(unsigned int num)
{
    unsigned int temp = num;
// Reverse the bits in this number.
// temp will have the reversed bits of num.
    int i = 0;
    for (i = (sizeof(num)*8-1); i > 0; i--)
    {
        temp = temp | (num & 1);
        temp <<= 1;
        num >>= 1;
    }
    temp = temp | (num & 1);
    return(temp);
}

Friday, February 21, 2014

ARM : How to minimize Interrupt Latency?


Interrupt-driven embedded systems have to fight a battle with interrupt latency—the interval of time from an external interrupt request signal being raised to the first fetch of an instruction of a specific interrupt service routine (ISR). Interrupt latency depends on a combination of hardware and software. System architects must balance the system design to handle multiple simultaneous interrupt sources and minimize interrupt latency. If the interrupts are not handled in a timely manner, then the system will exhibit slow response times. Software handlers have two main methods to minimize interrupt latency.

The first method is to use a nested interrupt handler, which allows further interrupts to occur even when currently servicing an existing interrupt (see Figure Below). This is achieved by re-enabling the interrupts as soon as the interrupt source has been serviced (so it won’t generate more interrupts) but before the interrupt handling is complete. Once a nested interrupt has been serviced, then control is relinquished to the original interrupt service routine.

However I would like to quote Jack Ganssle here :-
"No matter how carefully you build the application, you'll be turning interrupts off frequently. Even code that never issues a "disable interrupt" instruction does, indeed, disable them often. For, every time a hardware event issues an interrupt request, the processor itself does an automatic disable, one that stays in effect till you explicitly re-enable them inside of the ISR. Count on skyrocketing latency as a result.

Of course, on many processors we don't so much as turn interrupts off as change priority levels. A 68K receiving an interrupt on level 5 will prohibit all interrupts at this and lower levels till our code explicitly re-enables them in the ISR. Higher priority devices will still function, but latency for all level 1 to 5 devices is infinity till the code does its thing.

So, in an ISR re-enable interrupts as soon as possible. When reading code one of my "rules of thumb" is that code which does the enable just before the return is probably flawed. Most of us were taught to defer the interrupt enable till the end of the ISR. But that prolongs latency unacceptably. Every other interrupt (at least at or below that priority level) will be shut down till the ISR completes. Better: enter the routine, do all of the non-reentrant things (like handling hardware), and then enable interrupts. Run the rest of the ISR, which manages reentrant variables and the like with interrupts on. You'll reduce latency and increase system performance.

The downside might be a need for more stack space if that same interrupt can re-invoke itself. There's nothing wrong with this in a properly designed and reentrant ISR, but the stack will grow till all pending interrupts get serviced"



The second method involves prioritization. You program the interrupt controller to ignore interrupts of the same or lower priority than the interrupt you are handling, so only a higher-priority task can interrupt your handler. You then re-enable interrupts.The processor spends time in the lower-priority interrupts until a higher-priority interrupt occurs. Therefore higher-priority interrupts have a lower average interrupt latency than the lower-priority interrupts, which reduces latency by speeding up the completion time on the critical time-sensitive interrupts.

Reference:-
http://www.ganssle.com/articles/interruptlatency.htm
ARM system developer's guide.

ARM Architecture Basics

The ARM core uses a RISC architecture. RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC philosophy concentrates on reducing the complexity of instructions performed by the hardware because it is easier to provide greater flexibility and intelligence in software rather than hardware.As a result, a RISC design places greater demands on the compiler. In contrast, the traditional complex instruction set computer (CISC) relies more on the hardware for instruction functionality, and consequently the CISC instructions are more complicated.

RISC philosophy
1.Instructions —RISC processors have a reduced number of instruction classes. These classes provide simple operations that can each execute in a single cycle. The compiler or programmer synthesizes complicated operations (for example, a divide operation) by combining several simple instructions.

Each instruction is a fixed length to allow the pipeline to fetch future instructions before decoding the current instruction. In contrast, in CISC processors the instructions are often of variable size and take many cycles to execute.

2.Pipelines —The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines. Ideally the pipeline advances by one step on each cycle for maximum throughput. Instructions can be decoded in one pipeline stage.There is no need for an instruction to be executed by a mini-program called microcode as on CISC processors.

3.Registers —RISC machines have a large general-purpose register set. Any register can contain either data or an address. Registers act as the fast local memory store for all data processing operations. In contrast, CISC processors have dedicated registers for specific purposes.

4.Load-store architecture —The processor operates on data held in registers. Separate load and store instructions transfer data between the register bank and external memory.Memory accesses are costly, so separating memory accesses from data processing provides an advantage because you can use data items held in the register bank multiple times without needing multiple memory accesses. In contrast, with a CISC design the data processing operations can act on memory directly.

ARM Register Set
The processor can operate in seven different modes, which we will introduce shortly. All the registers are 32 bits in size. There are up to 18 active registers: 16 data registers and 2 processor status registers. The data registers are visible to the programmer as r0 to r15. The ARM processor has three registers assigned to a particular task or special function: r13, r14, and r15. They are frequently given different labels to differentiate them from the other registers.

■ Register r13 is traditionally used as the stack pointer (sp) and stores the head of the stack in the current processor mode.
■ Register r14 is called the link register (lr) and is where the core puts the return address  whenever it calls a subroutine.
■ Register r15 is the program counter (pc) and contains the address of the next instruction to be fetched by the processor.

Depending upon the context, registers r13 and r14 can also be used as general-purpose registers, which can be particularly useful since these registers are banked during a processor mode change. However, it is dangerous to use r13 as a general register when the processor is running any form of operating system because operating systems often assume that r13 always points to a valid stack frame. 

In ARM state the registers r0 to r13 are orthogonal—any instruction that you can apply to r0 you can equally well apply to any of the other registers. However, there are instructions that treat r14 and r15 in a special way. In addition to the 16 data registers, there are two program status registers: cpsr and spsr(the current and saved program status registers, respectively). The register file contains all the registers available to a programmer. Which registers are visible to the programmer depend upon the current mode of the processor.



Figure above shows all 37 registers in the register file. Of those, 20 registers are hidden from a program at different times. These registers are called banked registers and are identified by the shading in the diagram. They are available only when the processor is in a particular mode;

for example, abort mode has banked registers r13_abt, r14_abt and spsr_abt. Banked registers of a particular mode are denoted by an underline character post-fixed to the mode mnemonic or _mode.

Every processor mode except user mode can change mode by writing directly to the mode bits of the cpsr. All processor modes except system mode have a set of associated banked registers that are a subset of the main 16 registers. A banked register maps one-to-one onto a user mode register. If you change processor mode, a banked register from the new mode will replace an existing register.

For example, when the processor is in the interrupt request mode, the instructions you execute still access registers named r13 and r14. However, these registers are the banked registers r13_irq and r14_irq. The user mode registers r13_usr and r14_usr are not affected by the instruction referencing these registers. A program still has normal access to the other registers r0 to r12. The processor mode can be changed by a program that writes directly to the cpsr (the processor core has to be in privileged mode) or by hardware when the core responds to an exception or interrupt. The following exceptions and interrupts cause a mode change: reset, interrupt request, fast interrupt request, software interrupt, data abort, prefetch abort, and undefined instruction. Exceptions and interrupts suspend the normal execution of sequential instructions and jump to a specific location.

Another important feature to note is that the cpsr is not copied into the spsr when a mode change is forced due to a program writing directly to the cpsr. The saving of the cpsr only occurs when an exception or interrupt is raised.

Current Program Status Register
The ARM core uses the cpsr to monitor and control internal operations. The cpsr is a dedicated 32-bit register and resides in the register file. Figure below shows the basic layout of a generic program status register. Note that the shaded parts are reserved for future expansion. The cpsr is divided into four fields, each 8 bits wide: flags, status, extension, and control. In current designs the extension and status fields are reserved for future use. The control field contains the processor mode, state, and interrupt mask bits. The flags field contains the condition flags. Some ARM processor cores have extra bits allocated. For example, the J bit, which can be found in the flags field, is only available on Jazelle-enabled processors, which execute 8-bit instructions. It is highly probable that future designs will assign extra bits for the monitoring and control of new features.

Condition flags
Q - Saturation the result causes an overflow and/or saturation
V - oVerflow the result causes a signed overflow
C - Carry the result causes an unsigned carry
Z - Zero the result is zero, frequently used to indicate equality
N - Negative bit 31 of the result is a binary 1

Processor Modes 
The processor mode determines which registers are active and the access rights to the cpsr register itself. Each processor mode is either privileged or non-privileged: A privileged mode allows full read-write access to the cpsr. Conversely, a non-privileged mode only allows read access to the control field in the cpsr but still allows read-write access to the condition flags. There are seven processor modes in total: six privileged modes (abort, fast interrupt request, interrupt request, supervisor, system, and undefined) and one non-privileged mode (user). The processor enters abort mode when there is a failed attempt to access memory. Fast interrupt request and interrupt request modes correspond to the two interrupt levels available on the ARM processor. Supervisor mode is the mode that the processor is in after reset and is generally the mode that an operating system kernel operates in. System mode is a special version of user mode that allows full read-write access to the cpsr. Undefined mode is used when the processor encounters an instruction that is undefined or not supported by the implementation. User mode is used for programs and applications.


Use of System mode for exception handling

Corruption of the link register can be a problem when handling multiple exceptions of the same type.ARMv4 and later architectures include a privileged mode called System mode, to overcome this problem. System mode shares the same registers as User mode, it can run tasks that require privileged access, and exceptions no longer overwrite the link register.Note that the System mode cannot be entered by an exception. The exception handlers modify the CPSR to enter System mode.

Embedded Hardware : I2C

Features of the I2C-bus(From the Specification)
1.Only two bus lines are required; a serial data line (SDA) and a serial clock line (SCL).

2.Each device connected to the bus is software addressable by a unique address and simple master/slave relationships exist at all times; masters can operate as master-transmitters or as master-receivers.


3.It is a true multi-master bus including collision detection and arbitration to prevent data corruption if two or more masters simultaneously initiate data transfer.


4.Serial, 8-bit oriented, bidirectional data transfers can be made at up to 100 kbit/s in the Standard-mode, up to 400 kbit/s in the Fast-mode, up to 1Mbit/s in Fast-mode Plus, or up to 3.4 Mbit/s in the High-speed mode.


5.On-chip filtering rejects spikes on the bus data line to preserve data integrity.


6.The number of ICs that can be connected to the same bus is limited only by a maximum bus capacitance. More capacitance may be allowed under some conditions.


"I2C communication uses a 7bit address space with 16 reserved addresses, so a theoretical maximum of 112 nodes can communicate on the same bus.In practice, however, the number of nodes is limited by the specified total bus capacitance of 400pF, which restricts communication distances to a few meters"



The bus has two roles for nodes,master and slave:-
A master issues the clock, and the slave addresses and also initiates and ends data transactions.A slave receives the clock,and addresses and responds to requests from the master.

The master initiates a transaction by creating a start condition,followed by the 7bit address of the slave with which it wishes to communicate.This is followed by a single read/write bit, representing whether the master wishes to write to (0), or to read from (1) the slave.The master then releases the SDA line to allow the slave to acknowledge the receipt of data.

The slave responds with an acknowledge bit (ACK) by pulling SDA low during the entire high time of the ninth clock pulse on SCL, after which the master continues in either transmit or receive mode (according to the R/W bit sent), while the slave continues in the complementary mode (receive or transmit, respectively).The address and the 8bit data bytes are sent most significant bit first.The start bit is indicated by a high-to-low transition of SDA while SCL is high.The stop condition is created by a low-to-high transition of SDA while SCL
is high.

If the master writes to a slave,it repeatedly sends a byte with the slave sending an ACK bit. In this case,the master is in master-transmit mode and slave is in slave-receive mode.

If the master reads from a slave, it repeatedly receives a byte from the slave,while acknowledging (ACK) the receipt of every byte but the last one.In this situation,the master is in master-receive mode and slave is in slave-transmit mode.

The master ends the transmission with a stop bit or may send another start bit to maintain bus control for further transfers.When writing to a slave, a master mainly operates in transmit-mode and only changes to receive-mode when receiving acknowledgment from the slave.When reading from a slave,the master starts in transmit-mode and then changes to receive-mode after sending a read request (R/W bit = 1) to the slave.The slave continues in the complementary mode until the end of a transaction.Note that the master ends a reading sequence by not acknowledging (NACK) the last byte received.This procedure resets the slave state machine and allows the master to send the stop command.

Clock Synchronization

Under normal conditions, only one master device generates the clock signal, SCL. During the arbitration procedure,however,there are two or more masters and the clock must be synchronized so that the data output can be compared. The wired-AND property of SCL means that a device that first generates a low period on SCL (device #1) overrules the other devices.At this high-to-low transition, the clock generators of the other devices are forced to start their own low period. The SCL is held low by the device with the longest low period.The other devices that finish their low periods must wait for SCL to be released,before starting their high periods.A synchronized signal on SCL is obtained, where the slowest device determines the length of the low period and the fastest device determines the length of the high period.If a device pulls down the clock line for a longer time, the result is that all clock generators must enter the wait state. In this way, a slave slows down a fast master and the slow device creates enough time to store a received data word or to prepare a data word to be transmitted.


Arbitration

Several I2C multi-masters can be connected to the same I2C bus and operate concurrently.By constantly monitoring SDA and SCL for start and stop conditions,they can determine whether the bus is currently idle or not. If the bus is busy, masters delay pending I2C transfers until a stop condition indicates that the bus is free again.
However, it may happen that two masters start a transfer at the same time. During the transfer, the masters constantly monitor SDA and SCL. If one of them detects that SDA is low when it should actually be high, it assumes that another master is active and immediately stops its transfer. This process is called arbitration.

What are the limitations of I2C interface?
  • Half-duplex communication, so data is transmitted only in one direction (because of the single data bus) at a time.
  • Since the bus is shared by many devices, debugging an I2C bus (detecting which device is misbehaving) for issues is pretty difficult.
  • The I2C bus is shared by multiple slave devices if anyone of these slaves misbehaves (pull either SCL or SDA low for an indefinite time) the bus will be stalled. No further communication will take place.
  • I2C uses resistive pull-up for its bus. Limiting the bus speed.
  • Bus speed is directly dependent on the bus capacitance, meaning longer I2C bus traces will limit the bus speed.




Booting ARM Linux - Russell King

Just a copy paste from a kernel doc....


                        Booting ARM Linux
                        =================

Author: Russell King
Date  : 18 May 2002

The following documentation is relevant to 2.4.18-rmk6 and beyond.

In order to boot ARM Linux, you require a boot loader, which is a small program that runs before the main kernel.  The boot loader is expected to initialise various devices, and eventually call the Linux kernel, passing information to the kernel.

Essentially, the boot loader should provide (as a minimum) the following:

1. Setup and initialize the RAM.
2. Initialize one serial port.
3. Detect the machine type.
4. Setup the kernel tagged list.
5. Load initramfs.
6. Call the kernel image.


1. Setup and initialize RAM
---------------------------
Existing boot loaders: MANDATORY
New boot loaders: MANDATORY

The boot loader is expected to find and initialize all RAM that the kernel will use for volatile data storage in the system.  It performs this in a machine dependent manner.(It may use internal algorithms to automatically locate and size all RAM, or it may use knowledge of the RAM in the machine, or any other method the boot loader designer sees fit.)

2. Initialize one serial port
-----------------------------
Existing boot loaders: OPTIONAL, RECOMMENDED
New boot loaders: OPTIONAL, RECOMMENDED

Bootloader should initialize and enable one serial port on the target.This allows the kernel serial driver to automatically detect which serialport it should use for the kernel console (generally used for debugging purposes, or communication with the target.)
As an alternative,the boot loader can pass the relevant 'console=' option to the kernel via the tagged lists specifying the port,and serial format options as described in Documentation/kernel-parameters.txt.
3. Detect the machine type
--------------------------
Existing boot loaders: OPTIONAL
New boot loaders: MANDATORY

Boot loader should detect the machine type its running on by some method.Whether this is a hard coded value or some algorithm that looks at the connected hardware is beyond the scope of this document.The boot loader must ultimately be able to provide a MACH_TYPE_xxx value to the kernel.(see linux/arch/arm/tools/mach-types).

4. Setup boot data
------------------
Existing boot loaders:OPTIONAL,HIGHLY RECOMMENDED
New boot loaders:MANDATORY

The boot loader must provide either a tagged list or a dtb image for passing configuration data to the kernel.The physical address of the boot data is passed to the kernel in register r2.

4a. Setup the kernel tagged list
--------------------------------
The boot loader must create and initialize the kernel tagged list.A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE.The ATAG_CORE tag may or may not be empty.An empty ATAG_CORE tag has the size field set to '2'(0x00000002).The ATAG_NONE must set the size field to zero.

Any number of tags can be placed in the list.It is undefined whether a repeated tag appends to the information carried by the previous tag, or whether it replaces the information in its entirety; some tags behave as the former,others the latter.

The boot loader must pass at a minimum the size and location of the system memory, and root filesystem location.Therefore, the minimum tagged list should look:

 +-----------+
 base -> | ATAG_CORE |  |
 +-----------+  |
 | ATAG_MEM  |  | increasing address
 +-----------+  |
 | ATAG_NONE |  |
 +-----------+  v

The tagged list should be stored in system RAM.

The tagged list must be placed in a region of memory where neither the kernel decompressor nor initrd 'bootp' program will overwrite it.The recommended placement is in the first 16KiB of RAM.


4b.Setup the device tree
-------------------------
The boot loader must load a device tree image (dtb) into system ram at a 64bit aligned address and initialize it with the boot data.The kernel will look for the dtb magic value of 0xd00dfeed at the dtb physical address to determine if a dtb has been passed instead of a tagged list.The dtb format is documented in Documentation/devicetree/booting-without-of.txt.
The boot loader must pass at a minimum the size and location of the system memory,and the root filesystem location.The dtb must be placed in a region of memory where the kernel decompressor will not overwrite it, whilst remaining within the region which will be covered by the kernel's low-memory mapping.

A safe location is just above the 128MiB boundary from start of RAM.

5. Load initramfs.
------------------
Existing boot loaders:OPTIONAL
New boot loaders:OPTIONAL

If an initramfs is in use then, as with the dtb, it must be placed in a region of memory where the kernel decompressor will not overwrite it while also with the region which will be covered by the kernel's low-memory mapping.

A safe location is just above the device tree blob which itself will be loaded just above the 128MiB boundary from the start of RAM as recommended above.

6. Calling the kernel image
---------------------------
Existing boot loaders: MANDATORY
New boot loaders: MANDATORY

There are two options for calling the kernel zImage.If the zImage is stored in flash, and is linked correctly to be run from flash,then it is legal for the boot loader to call the zImage in flash directly.

The zImage may also be placed in system RAM and called there.The kernel should be placed in the first 128MiB of RAM.It is recommended that it is loaded above 32MiB in order to avoid the need to relocate prior to decompression, which will make the boot process slightly faster.

When booting a raw (non-zImage) kernel the constraints are tighter.In this case the kernel must be loaded at an offset into system equal to TEXT_OFFSET - PAGE_OFFSET.

In any case, the following conditions must be met:

1.Quiesce all DMA capable devices so that memory does not get corrupted by bogus network packets or disk data.This will save you many hours of debug.
2.CPU register settings
  r0 = 0,
  r1 = machine type number discovered in (3) above.
  r2 = physical address of tagged list in system RAM, or
       physical address of device tree block (dtb) in RAM

3.CPU mode : All forms of interrupts must be disabled (IRQs and FIQs).

For CPUs which do not include the ARM virtualization extensions,the CPU must be in SVC mode.(A special exception exists for Angel),

CPUs which include support for the virtualization extensions can be entered in HYP mode in order to enable the kernel to make full use of these extensions.

This is the recommended boot method for such CPUs,unless the virtualisations are already in use by a pre-installed hypervisor.
If the kernel is not entered in HYP mode for any reason, it must be entered in SVC mode.

4.Caches,MMUs:The MMU must be off.Instruction cache may be on or off.Data cache must be off.


If the kernel is entered in HYP mode, the above requirements apply to the HYP mode configuration in addition to the ordinary PL1 (privileged kernel modes) configuration.


In addition,all traps into the hypervisor must be disabled,and PL1 access must be granted for all peripherals and CPU resources for which this is architecturally possible.Except for entering in HYP mode,the system configuration should be such that a kernel which doesn't include support for the virtualization extensions can boot correctly without extra help.

5.The boot loader is expected to call the kernel image by jumping directly to the first instruction of the kernel image.On CPUs supporting the ARM instruction set, the entry must be made in ARM state, even for a Thumb-2 kernel.On CPUs supporting only the Thumb instruction set such as Cortex-M class CPUs, the entry must be made in Thumb state.

Thursday, February 20, 2014

Linux Kernel : Bottom Halves

The job of bottom halves is to perform any interrupt-related work not performed by the interrupt handler. In an ideal world, this is nearly all the work because you want the interrupt handler to perform as little work (and in turn be as fast) as possible. By offloading as much work as possible to the bottom half, the interrupt handler can return control of the system to whatever it interrupted as quickly as possible.Nonetheless, the interrupt handler must perform some of the work.For example, the interrupt handler almost assuredly needs to acknowledge to the hardware the receipt of the interrupt.It may need to copy data to or from the hardware.

SOFTIRQS

Softirqs are a set of statically defined bottomhalves that can run simultaneously on any processor;Even two of the same type can run concurrently. Softirqs cannot be defined dynamically also needs special locking as they can run simultaneously(the same Softirq) on different processors.Softirqs are useful when performance is critical, such as with networking.The number of registered softirqs is statically determined at compile time and cannot be changed dynamically.

The kernel enforces a limit of 32 registered softirqs;In the current kernel, however, only nine exist.A softirq never preempts another softirq. The only event that can preempt a softirq is an interrupt.

Softirqs are most often raised from within interrupt handlers.The interrupt handler performs the basic hardware-related work,raises the softirq,and then exits.

When processing interrupt,the kernel invokes:-
do_softirq();

The softirq then runs and picks up where the interrupt handler left off. Hence the softirq runs in interrupt context.

TASKLETS

Tasklets are flexible, dynamically created bottom halves built on top of softirqs.Two different tasklets can run concurrently on different processors,
but two of the same type of tasklet cannot run simultaneously.

Tasklets are represented by two softirqs: 
HI_SOFTIRQ and TASKLET_SOFTIRQ.The only difference in these types is that the HI_SOFTIRQ-based tasklets run prior to the TASKLET_SOFTIRQ based tasklets.

As with softirqs, tasklets cannot sleep.This means you cannot use semaphores or other blocking functions in a tasklet.Tasklets also run with all interrupts enabled, so you must take precautions (for example, disable interrupts and obtain a lock) if your tasklet shares data with an interrupt handler. Unlike softirqs, however,two of the same tasklets never run concurrently—although two different tasklets can run at the same time on two different processors. If your tasklet shares data with another tasklet or softirq, you need to use proper locking.

After a tasklet is scheduled, it runs once at some time in the near future. If the same tasklet is scheduled again, before it has had a chance to run, it still runs only once. If it is already running, for example on another processor, the tasklet is rescheduled and runs again.As an optimization, a tasklet always runs on the processor that scheduled it—making better use of the processor’s cache.

ksoftirqd for both tasklet and softirq
Softirq (and thus tasklet) processing is aided by a set of per-processor kernel threads.These kernel threads help in the processing of softirqs when the system is overwhelmed with softirqs.Because tasklets are implemented using softirqs, the following discussion applies equally to softirqs and tasklets.

Problem Statement : 
On return from interrupt, the kernel merely looks at all pending softirqs and executes them as normal. If any softirqs reactivate themselves, however, they will not run until the next time the kernel handles pending softirqs.This is most likely not until the next interrupt occurs, which can equate to a lengthy amount of time before any new (or reactivated) softirqs are executed.

In designing softirqs, the kernel developers realized that some sort of compromise was needed.The solution ultimately implemented in the kernel is to not immediately process reactivated softirqs. Instead, if the number of softirqs grows excessive, the kernel wakes up a family of kernel threads to handle the load.The kernel threads run with the lowest possible priority (nice value of 19), which ensures they do not run in lieu of anything important.There is one thread per processor.The threads are each named ksoftirqd/n where n is the processor number. 

On a two-processor system, you would have ksoftirqd/0 and ksoftirqd/1. Having a thread on each processor ensures an idle processor, if available, can always service softirqs. After the threads are initialized, they run a tight loop similar to this:

for (;;) {
if (!softirq_pending(cpu))
schedule();
set_current_state(TASK_RUNNING);
while (softirq_pending(cpu)) {
do_softirq();
if (need_resched())
schedule();
}
set_current_state(TASK_INTERRUPTIBLE);
}

If any softirqs are pending (as reported by softirq_pending()),ksoftirqd calls do_softirq() to handle them



Locking Between Hard IRQ and Softirqs/Tasklets :

If a hardware irq handler shares data with a softirq, you have two concerns. Firstly, the softirq processing can be interrupted by a hardware interrupt, and secondly, the critical region could be entered by a hardware interrupt on another CPU. This is where spin_lock_irq() is used. It is defined to disable interrupts on that cpu, then grab the lock. spin_unlock_irq() does the reverse.
The irq handler does not to use spin_lock_irq(), because the softirq cannot run while the irq handler is running: it can use spin_lock(), which is slightly faster. The only exception would be if a different hardware irq handler uses the same lock: spin_lock_irq() will stop that from interrupting us.

This works perfectly for UP as well: the spin lock vanishes, and this macro simply becomes local_irq_disable() (include/asm/smp.h), which protects you from the softirq/tasklet/BH being run.

spin_lock_irqsave() (include/linux/spinlock.h) is a variant which saves whether interrupts were on or off in a flags word, which is passed to spin_unlock_irqrestore()

This means that the same code can be used inside an hard irq handler (where interrupts are already off) and in softirqs (where the irq disabling is required).
Note that softirqs (and hence tasklets and timers) are run on return from hardware interrupts, so spin_lock_irq() also stops these. In that sense, spin_lock_irqsave() is the most general and powerful locking function.


WORKQUEUES
Work queues are a different form of deferring work from what we have looked at so far.Work queues defer work into a kernel thread—this bottom half always runs in process context.Thus, code deferred to a work queue has all the usual benefits of process context. Most important, work queues are schedulable and can therefore sleep.

The prototype for the work queue handler is  : 

void work_handler(void *data);

A worker thread executes this function, and thus, the function runs in process context. By default, interrupts are enabled and no locks are held. If needed, the function can sleep. Note that, despite running in process context, the work handlers cannot access user-space memory because there is no associated user-space memory map for kernel threads.

Locking between work queues or other parts of the kernel is handled just as with any other process context code.This makes writing work handlers much easier.Work queues involve the highest overhead because they involve kernel threads and, therefore, context switching.This is not to say that they are inefficient, but in light of thousands of interrupts hitting per second(as the networking subsystem might experience), other methods make more sense. For most situations, however, work queues are sufficient.

How workqueues are scheduled?
1.There is a global pool of threads attached to each CPU in the system.When a work item is enqueued, it will be passed to one of the global threads at the right time (as deemed by the workqueue code).One interesting implication of this is that tasks submitted to the same workqueue on the same CPU may now execute concurrently.

2.When the first workqueue task is submitted, a thread running under the workqueue scheduler class is created to execute it. As long as that task continues to run, other tasks will wait. But as soon as the running task 
blocks on some resource, the scheduler will notify the workqueue code
and another thread will be created  to run the next task.The workqueue manager will create as many threads as needed (up to a limit) to keep 
the CPU busy, but it tries to only have one task actually running at any 
given time.

3.The handling of CPU hotplugging is interesting. If a CPU is being taken offline,the system needs to move all work off that CPU as quickly as possible.To that end,the workqueue manager responds to a hot-unplug notification by creating a special "trustee" manager on a CPU which is sticking around.That trustee takes over responsibility for the workqueue running on the doomed CPU, executing tasks until they are all gone and the workqueue can be shut down.Meanwhile, the CPU can go offline without waiting for the workqueue to drain.

THREADED IRQS

Reference :- 
https://lwn.net/Articles/355700/
Linux Kernel Development by Robert Love.

Linux Kernel Preemption

The Linux kernel, unlike most other Unix variants and many other operating systems, is a fully preemptive kernel. In non-preemptive kernels, kernel code runs until completion. That is, the scheduler cannot reschedule a task while it is in the kernel—kernel code is scheduled cooperatively, not preemptively. Kernel code runs until it finishes (returns to user-space) or explicitly blocks.

In the 2.6 kernel, however, the Linux kernel became preemptive: It is now possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule. So when is it safe to reschedule? The kernel can preempt a task running in the kernel so long as it does not hold a lock.That is, locks are used as markers of regions of nonpreemptibility. Because the kernel is SMP-safe, if a lock is not held, the current code is reentrant and capable of being preempted.

Changes in Linux kernel to support Preemption

The first change in supporting kernel preemption was the addition of a preemption counter, preempt_count, to each process’s thread_info.This counter begins at zero and increments once for each lock that is acquired and decrements once for each lock that is released.When the counter is zero, the kernel is preemptible. Upon return from interrupt, if returning to kernel-space, the kernel checks the values of need_resched and preempt_count. If need_resched is set and preempt_count is zero, then a more important task is runnable, and it is safe to preempt.Thus, the scheduler is invoked. If preempt_count is nonzero, a lock is held, and it is unsafe to reschedule. In that case, the interrupt returns as usual to the currently executing task.When all the locks that the current task is holding are released, preempt_count returns to zero.At that time, the unlock code checks whether need_resched is set. If so, the scheduler is invoked. Enabling and disabling kernel preemption is sometimes required in kernel code

Kernel preemption can occur...
1.When an interrupt handler exits, before returning to kernel-space
2.When kernel code becomes preemptible again
3. If a task in the kernel explicitly calls schedule()
4. If a task in the kernel blocks (which results in a call to schedule())

Reference:-
Linux Kernel Development by Robert Love

Operating System : Introduction to Reentrancy

Virtually every embedded system uses interrupts; many support multitasking or multithreaded operations. These sorts of applications can expect the program's control flow to change contexts at just about any time. When that interrupt comes, the current operation gets put on hold and another function or task starts running. What happens if functions and tasks share variables? Disaster surely looms if one routine corrupts another's data.
By carefully controlling how data is shared, we create reentrant functions, those that allow multiple concurrent invocations that do not interfere with each other. The word pure is sometimes used interchangeably with reentrant.
Like so many embedded concepts, reentrancy came from the mainframe era, in the days when memory was a valuable commodity. In those days compilers and other programs were often written to be reentrant, so a single copy of the tool lived in memory, yet was shared by perhaps a hundred users. Each person had his or her own data area, yet everyone running the compiler quite literally executed the identical code. As the operating system changed contexts from user to user it swapped data areas so one person's work didn't effect any other. Share the code, but not the data.
In the embedded world a routine must satisfy the following conditions to be reentrant:
FirstIt uses all shared variables in an atomic way, unless each is allocated to a specific instance of the function.Second,It does not call non-reentrant functions.Third,It does not use the hardware in a non-atomic way.
Atomic variablesBoth the first and last rules use the word atomic, which comes from the Greek word meaning indivisible. In the computer world, atomic means an operation that cannot be interrupted. Consider the assembly language instruction:
mov ax,bx
Since nothing short of a reset can stop or interrupt this instruction, it's atomic. It will start and complete without any interference from other tasks or interrupts
The first part of Rule 1 requires the atomic use of shared variables. Suppose two functions each share the global variable foobar. Function A contains:
temp = foobar;
temp += 1;
foobar = temp;
This code is not reentrant, because foobar is used non-atomically. That is, it takes three statements to change its value, not one. The foobar handling is not indivisible; an interrupt can come between these statements and switch context to the other function, which then may also try and change foobar. Clearly there's a conflict; foobar will wind up with an incorrect value, the autopilot will crash, and hundreds of screaming people will wonder "why didn't they teach those developers about reentrancy?"
Suppose, instead, Function A looks like:
foobar += 1;
Now the operation is atomic, right? An interrupt cannot suspend processing with foobar in a partially changed state, so the routine is reentrant.
Except... do you really know what your C compiler generates? On an x86 processor that statement might compile to:
mov ax,[foobar]
inc ax
mov [foobar],ax
which is clearly not atomic, and so not reentrant. The atomic version is:
inc [foobar]
The moral is to be wary of the compiler. Assume it generates atomic code and you may find "60 Minutes" knocking at your door.
The second part of the first reentrancy rule reads "...unless each is allocated to a specific instance of the function." This is an exception to the atomic rule that skirts the issue of shared variables.
An "instance" is a path through the code. There's no reason a single function can't be called from many other places. In a multitasking environment, it's quite possible that several copies of the function may indeed be executing concurrently. (Suppose the routine is a driver that retrieves data from a queue; many different parts of the code may want queued data more or less simultaneously). Each execution path is an "instance" of the code. Consider:
int foo;
void some_function(void) {
    foo++;
}
foo is a global variable whose scope exists outside that of the function. Even if no other routine uses foo, some_function can trash the variable if more than one instance of it runs at any time.
C and C++ can save us from this peril. Use automatic variables. That is, declare foo inside of the function. Then, each instance of the routine will use a new version of foo created from the stack, as follows:
void some_function(void) {
    int foo;
    foo++;
}
Another option is to dynamically assign memory (using malloc), again so each incarnation uses a unique data area. The fundamental reentrancy problem is thus avoided, as it's impossible for multiple instances to modify a common version of the variable.
Keeping code reentrantWhat are our best options for eliminating non-reentrant code? The first rule of thumb is to avoid shared variables. Globals are the source of endless debugging woes and failed code. Use automatic variables or dynamically allocated memory.
Yet globals are also the fastest way to pass data around. It's not always possible to entirely eliminate them from real time systems. So, when using a shared resource (variable or hardware) we must take a different sort of action.
The most common approach is to disable interrupts during non-reentrant code. With interrupts off, the system suddenly becomes a single-process environment. There will be no context switches. Disable interrupts, do the non-reentrant work, and then turn interrupts back on.
In Linux Kernel, We have Spinlocks whose variants either disable preemption and interrupts on local cpu before writing in global data structures. It also caters mutex and semaphores which can do the job, Read other articles in blog to find there usage in detail.
Shutting interrupts down does increase system latency, reducing its ability to respond to external events in a timely manner. A kinder, gentler approach is to use a mutex (also known as binary semaphore) to indicate when a resource is busy. Mutexes are simple on-off state indicators whose processing is inherently atomic. These are often used as "in-use" flags to have tasks idle when a shared resource is not available.
Nearly every commercial real-time operating system includes mutexes. If this is your way of achieving reentrant code, by all means use an RTOS.
RecursionNo discussion of reentrancy is complete without mentioning recursion, if only because there's so much confusion between the two.
A function is recursive if it calls itself.That's a classic way to remove iteration from many sorts of algorithms. Given enough stack space, this is a perfectly valid-though tough to debug-way to write code. Since a recursive function calls itself, clearly it must be reentrant to avoid trashing its variables. So all recursive functions must be reentrant, but not all reentrant functions are recursive.
Reference:-
http://www.embedded.com/electronics-blogs/beginner-s-corner/4023308/Introduction-to-Reentrancy
Jack G. Ganssle is a lecturer and consultant on embedded development issues, and a regular contributor to Embedded Systems Programming. Contact him at jack@ganssle.com.

Wednesday, February 19, 2014

What is the difference between interrupt/software interrupt and exception context?

What is the difference between interrupt/software interrupt and exception context?

Interrupt is an as asynchronous event that is typically generated by hardware(Ex, I/O) not in sync with processor instruction execution. While exceptions are synchronous events generated when processor detect any predefined condition while executing instruction.

Interrupt handler may be interrupted by another interrupt handler and so on. An interrupt handler may defer an exception handler, but an exception handler never defers an interrupt handler.The only exception possible in kernel mode is the page fault.

A software interrupt (OR Programmed Exceptions) occur at the request of the programmer. For example, A timer interrupt, system calls in Linux. Software interrupt is a considered to be an exception because they are synchronous.


Hardware Interrupt: This interrupt is caused by some external device such as request to start an I/O or occurrence of a hardware failure.
Software Interrupt: This interrupt can be invoked with the help of INT instruction. A programmer triggered this event that immediately stops execution of the program and passes execution over to the INT handler. The INT handler is usually a part of the operating system and determines the action to be taken e.g. output to the screen, execute file etc.
Thus a software interrupt as it’s name suggests is driven by a software instruction and a hardware interrupt is the result of external causes.
What Robert Love says about Exceptions...
In OS texts, exceptions are often discussed at the same time as interrupts. Unlike interrupts,exceptions occur synchronously with respect to the processor clock. Indeed, they are often called synchronous interrupts. Exceptions are produced by the processor while executing instructions either in response to a programming error (for example, divide by zero) or abnormal conditions that must be handled by the kernel (for example, a page fault). Because many processor architectures handle exceptions in a similar manner to interrupts, the linux kernel infrastructure for handling the two is similar.

Linux Kernel : Semaphores

Semaphores in Linux are sleeping locks.When a task attempts to acquire a semaphore that is unavailable, the semaphore places the task onto a wait queue and puts the task to sleep.The processor is then free to execute other code.When the semaphore becomes available, one of the tasks on the wait queue is awakened so that it can then acquire the semaphore.

Important Points
1.Semaphores are well suited to locks that are held for a long time

2.Because a thread of execution sleeps on lock contention, semaphores  must be obtained only in process context because interrupt context is not schedulable.

3.You can (although you might not want to) sleep while holding a semaphore because you will not deadlock when another process acquires the same semaphore. (It will just go to sleep and eventually let you continue.)

4.You cannot hold a spin lock while you acquire a semaphore, because you might have to sleep while waiting for the semaphore, and you cannot sleep while holding a spin lock.

5.Unlike spin locks, semaphores do not disable kernel preemption and, consequently, code holding a semaphore can be preempted.This means semaphores do not adversely affect scheduling latency.

Linux Kernel : Spinlocks

SPINLOCKS
A spinlock is a lock that can be held by at most one thread of execution. If a thread of execution attempts to acquire a spin lock while it is already held, which is called contended, the thread busy loops - spins waiting for the lock to become available. If the lock is not contended, the thread can immediately acquire the lock and continue.The spinning prevents more than one thread of execution from entering the critical region at any one time.The same lock can be used in multiple locations, so all access to a given data structure, for example, can be protected and synchronized.

The fact that a contended spinlock causes threads to spin (essentially wasting processor time) while waiting for the lock to become available is salient.This behavior is the point of the spin lock. It is not wise to hold a spin lock for a long time.This is the nature of the spinlock : a lightweight single-holder lock that should be held for short duration. An alternative behavior when the lock is contended is to put the current thread to sleep and wake it up when it becomes available.Then the processor can go off and execute other code.This incurs a bit of overhead - most notably the two context switches required to switch out of and back into the blocking thread, which is certainly a lot more code than the handful of lines used to implement a spinlock. Therefore, it is wise to hold spinlocks for less than the duration of two context switches. Because most of us have better things to do than measure context switches, just try to hold the lock for as little time as possible.

"Spinlocks waste processor time hence cannot be held for longer duration, whereas other locks that sleep and go for a context switch instead has extra overhead of two context switches" 

Here is below the usage of a normal spinlocks.

spin_lock(&mr_lock);
/* critical region ... */
spin_unlock(&mr_lock);

The lock can be held simultaneously by at most only one thread of execution. Consequently, only one thread is allowed in the critical region at a time.This provides the needed protection from concurrency on multiprocessing machines. 

On Uni-processor machines, the lock doesn't exist however they function to disable and enable kernel preemption. If the kernel is non-preemptive, the spinlocks are not required thus it is compiled away completely as if it doesn't exists.

Warning: Spin Locks Are Not Recursive!
Unlike spin lock implementations in other operating systems and threading libraries, the Linux kernel’s spin locks are not recursive. This means that if you attempt to acquire a lock you already hold, you will spin, waiting for yourself to release the lock. But because you are busy spinning, you will never release the lock and you will deadlock. Be careful!

A Variant of Spinlocks can be used in Interrupt handlers whereas semaphores/mutexes cannot be as they would sleep in the handler. The interrupt handler would spin before the lock is available. A sleep in the interrupt handler is not allowed. It is also important here to understand that this kind of spinlocks need to disable the interrupts for the local cpu before getting used in the interrupt handler. If any other interrupt occurs in a different cpu at that point of time and it spins on the same lock, it doesn't prevent the lock holder in the other cpu to release it.

spin_lock_irqsave(&mr_lock, flags);
/* critical region ... */
spin_unlock_irqrestore(&mr_lock, flags);


Another variant of spinlock which should be avoided: 
If you always know before the fact that interrupts are initially enabled, there is no need to restore their previous state.You can unconditionally enable them on unlock.

In those cases, spin_lock_irq() and spin_unlock_irq() are optimal:


DEFINE_SPINLOCK(mr_lock);

spin_lock_irq(&mr_lock);
/* critical section ... */
spin_unlock_irq(&mr_lock);

As the kernel grows in size and complexity, it is increasingly hard to ensure that interrupts are always enabled in any given code path in the kernel. Use of spin_lock_irq()therefore is not recommended. If you do use it, you had better be positive that interrupts were originally enabled.


Note : On Uni-processor machines, the lock doesn't exist for all variant of spinlocks and for a non-preemptive,uni-processor environment the whole spinlock functionality is compiled away as if it doesn't exists.

Debugging Spinlocks
The configure option CONFIG_DEBUG_SPINLOCK enables a handful of debugging checks in the spin lock code. For example, with this option the spin lock code checks for the use of uninitialized spin locks and unlocking a lock that is not yet locked. When testing your code,you should always run with spin lock debugging enabled. For additional debugging of lock life cycles, enable CONFIG_DEBUG_LOCK_ALLOC.

Whenever you write kernel code, you should ask yourself these questions:

1. Is the data global? Can a thread of execution other than the current one access it?


2. Is the data shared between process context and interrupt context? Is it shared between two different interrupt handlers?


3. If a process is preempted while accessing this data, can the newly scheduled process 
access the same data?

4. Can the current process sleep (block) on anything? If it does, in what state does that leave any shared data?

5. What prevents the data from being freed out from under me?

6. What happens if this function is called again on another processor?


7. Given the proceeding points, how am I going to ensure that my code is safe from 
concurrency?