How to Load Constants in Assembly for ARM Architecture

    ARM is a 32-bit CPU architecture where every instruction is 32 bits long. Any constants which are part of an instruction must be encoded within the 32 bits of the given instruction and this naturally limits the range of constants that can be represented in one instruction. This post will show you how we can deal with these limitations and how the latest revision of the ARM architecture (ARMv7) provides a simple and efficient solution.

    Most arithmetic and logical ARM instructions accept 3 parameters:

  • The destination: always a register.
  • Operand 1: always a register.
  • Operand 2: a register, an immediate constant value or a shifted register. We'll cover shifted registers in a future post. For now, we're only interested in the constants. Examples of such instructions are:
    add    r0, r1, r2    @ r0 = r1 + r2
    sub    r0, r1, #3    @ r0 = r1 - 3

    An Operand 2 immediate must obey the following rule to fit in the instruction: an 8-bit value rotated right by an even number of bits between 0 and 30 (inclusive). This allows for constants such as 0xFF (0xFF rotated right by 0), 0xFF00 (0xFF rotated right by 24) or 0xF000000F (0xFF rotated right by 4).

    Operand 2 immediates are also valid immediates for mov instructions, making it possible to move constant values into registers without performing any other computation:

    mov    r0, #0xFF0    @ r0 = 0xFF0

    In software - especially in languages like C - constants tend to be small. When they are not small they tend to be bit masks. Operand 2 immediates provide a reasonable compromise between constant coverage and encoding space; most common constants can be encoded directly.

Loading a Constant from Memory

    What happens if you need a constant which cannot be expressed as an Operand 2 immediate? The constant has to be moved into a register before use and there are many ways to do so. The traditional solution is to load the constant from memory.

    Loading a value from memory will require a pointer to the memory location of the value. Pointers need to be held in a register, so we are back to the same problem, an extra register is needed. However, in ARM, the program counter (pc) can generally be used like any other register and therefore can be used as a base pointer for a load operation. This allows you to store the constant relative to the instruction loading the constant. Loading the constant in a register then becomes something like this:

    ldr    r0, [pc, #offset]

    Here #offset is the offset in bytes of the constant relative to the program counter (PC). When executing an ARM instruction, PC reads as the address of the current instruction plus 8.#offset can take any values between -4095 and +4095 (inclusive).

    Knowing where to store the constants in memory (and keeping track of them) can be a tedious task. Thankfully most assemblers provide pseudo-instructions to simplify the operation. For example, in GNU assembler you can write this:

returns_0x12345678:
    ldr    r0, =0x12345678
    bx    lr               @ function return

    The above will assemble to this:

returns_0x12345678:
    ldr    r0, [pc, #0]    @ remember pc is 8 bytes ahead
    bx    lr               @ function return
    .word    0x12345678

    In fact the ldr= pseudo instruction is a bit more clever than it looks, as it will check if the given constant can be represented by an Operand 2 immediate and will generate a mov instruction if it can. A mov instruction will be faster than an ldr instruction as there is no need to read the constant from memory, also resulting in memory savings.

Loading a Constant from the Instruction Stream: ARMv7 way

    As mentioned earlier, there are other ways to load a constant. In the latest version of the ARM architecture, ARMv7, two new instructions were introduced to improve the situation:

  • movw, or move wide, will move a 16-bit constant into a register, implicitly zeroing the top 16 bits of the target register.
  • movt, or move top, will move a 16-bit constant into the top half of a given register without altering the bottom 16 bits. Now moving an arbitrary 32-bit constant is as simple as this:
    movw      r0, #0x5678      @ r0 = 0x00005678
    movt      r0, #0x1234      @ r0 = (r0 & 0x0000FFFF) | 0x12340000 (=0x12345678)

    Note that the order matters since movw will zero the upper 16 bits. Here again the GNU assembler provides some syntactic sugar: the prefixes :upper16: and :lower16: allow you to extract the corresponding half from a 32-bit constant:

    .equ    label, 0x12345678
    movw    r0, #:lower16:label
    movt    r0, #:upper16:label

    While this approach takes two instructions, it does not require any extra space to store the constant so both the movw/movt method and the ldr method will end up using the same amount of memory. Memory bandwidth is precious in and the movw/movt approach avoids an extra read on the data side, not to mention the read could have missed the cache.

    If you know you can use it, movw/movt is the recommended way to load a 32-bit constant. However, if it is possible to encode the 32-bit constant using an 8-bit immediate and if necessary rotated right, try to use Operand 2 directly, and avoid the need to use an extra register.

Two additional notes:

        1: According to ARMv7-M_ARM.pdf, page 121, "Modified Immediate", it's only possible to shift an 8-bit constant between 0 and 24, so the given example 0xF000000F would not be possible on ARM Cortex-M. However, it's possible to also use one of these constants:

%00000000abcdefgh00000000abcdefgh, %abcdefgh00000000abcdefgh00000000 or %abcdefghabcdefghabcdefghabcdefgh.

        2: In some cases, it might be quicker to load immediate 32-bit values from memory instead of using MOVW+MOVT. This occurs if you have two consecutive load instructions that can be pipelined. Thus if pipelining is possible, the first load instruction will take 2 clock cycles and the next will take one clock cycle. In other words: This requires the two load instruction to be right next to eachother, without any other instructions in between; otherwise both load instructions will use 2 clock cycles each, resulting in 4 clock cycles being used instead of 3. If using MOVW+MOVT, each of those will take one clock cycle. But if you're only loading a single 32-bit immediate value, I recommend using MOVW+MOVT.

original post at https://community.arm.com/processors/b/blog/posts/how-to-load-constants-in-assembly-for-arm-architecture


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章