42.4. Xtensa Relaxation

When an instruction operand is outside the range allowed for that particular instruction field, as can transform the code to use a functionally-equivalent instruction or sequence of instructions. This process is known as relaxation. This is typically done for branch instructions because the distance of the branch targets is not known until assembly-time. The Xtensa assembler offers branch relaxation and also extends this concept to function calls, MOVI instructions and other instructions with immediate fields.

42.4.1. Conditional Branch Relaxation

When the target of a branch is too far away from the branch itself, i.e., when the offset from the branch to the target is too large to fit in the immediate field of the branch instruction, it may be necessary to replace the branch with a branch around a jump. For example,

beqz    a2, L

may result in:

    bnez.n  a2, M
    j L
M:

(The BNEZ.N instruction would be used in this example only if the density option is available. Otherwise, BNEZ would be used.)

42.4.2. Function Call Relaxation

Function calls may require relaxation because the Xtensa immediate call instructions (CALL0, CALL4, CALL8 and CALL12) provide a PC-relative offset of only 512 Kbytes in either direction. For larger programs, it may be necessary to use indirect calls (CALLX0, CALLX4, CALLX8 and CALLX12) where the target address is specified in a register. The Xtensa assembler can automatically relax immediate call instructions into indirect call instructions. This relaxation is done by loading the address of the called function into the callee's return address register and then using a CALLX instruction. So, for example:

call8 func

might be relaxed to:

    .literal .L1, func
    l32r    a8, .L1
    callx8  a8

Because the addresses of targets of function calls are not generally known until link-time, the assembler must assume the worst and relax all the calls to functions in other source files, not just those that really will be out of range. The linker can recognize calls that were unnecessarily relaxed, but it can only partially remove the overhead introduced by the assembler.

Call relaxation has a negative effect on both code size and performance, so this relaxation is disabled by default. If a program is too large and some of the calls are out of range, function call relaxation can be enabled using the -longcalls command-line option or the longcalls directive (refer to Section 42.5.3 longcalls).

42.4.3. Other Immediate Field Relaxation

The MOVI machine instruction can only materialize values in the range from -2048 to 2047. Values outside this range are best materalized with L32R instructions. Thus:

movi a0, 100000

is assembled into the following machine code:

    .literal .L1, 100000
    l32r a0, .L1

The L8UI machine instruction can only be used with immediate offsets in the range from 0 to 255. The L16SI and L16UI machine instructions can only be used with offsets from 0 to 510. The L32I machine instruction can only be used with offsets from 0 to 1020. A load offset outside these ranges can be materalized with an L32R instruction if the destination register of the load is different than the source address register. For example:

    l32i a1, a0, 2040

is translated to:

    .literal .L1, 2040
    l32r a1, .L1
    addi a1, a0, a1
    l32i a1, a1, 0

If the load destination and source address register are the same, an out-of-range offset causes an error.

The Xtensa ADDI instruction only allows immediate operands in the range from -128 to 127. There are a number of alternate instruction sequences for the generic ADDI operation. First, if the immediate is 0, the ADDI will be turned into a MOV.N instruction (or the equivalent OR instruction if the code density option is not available). If the ADDI immediate is outside of the range -128 to 127, but inside the range -32896 to 32639, an ADDMI instruction or ADDMI/ADDI sequence will be used. Finally, if the immediate is outside of this range and a free register is available, an L32R/ADD sequence will be used with a literal allocated from the literal pool.

For example:

    addi    a5, a6, 0
    addi    a5, a6, 512
    addi    a5, a6, 513
    addi    a5, a6, 50000

is assembled into the following:

    .literal .L1, 50000
    mov.n   a5, a6
    addmi   a5, a6, 0x200
    addmi   a5, a6, 0x200
    addi    a5, a5, 1
    l32r    a5, .L1
    add     a5, a6, a5