The MIPS R4000 has one addressing mode:
Register indirect with displacement.
LW rd, disp16(rs) ; rd = *( int32_t*)(rs + disp16)
LH rd, disp16(rs) ; rd = *( int16_t*)(rs + disp16)
LHU rd, disp16(rs) ; rd = *(uint16_t*)(rs + disp16)
LB rd, disp16(rs) ; rd = *( int8_t*)(rs + disp16)
LBU rd, disp16(rs) ; rd = *( uint8_t*)(rs + disp16)
The load instructions load an aligned word, halfword, or byte
from the address specified by adding the 16-bit signed displacement
to the source register (known as the "base register").¹
By convention, the displacement can be omitted, in which case it is
taken to be zero.
The plain versions of these instructions sign-extend to a 32-bit value;
the U
versions zero-extend.
There are corresponding aligned store instructions.
SW rs, disp16(rd) ; *( int32_t*)(rd + disp16) = (int32_t)rs
SH rs, disp16(rd) ; *( int16_t*)(rd + disp16) = (int16_t)rs
SB rs, disp16(rd) ; *( int8_t*)(rd + disp16) = ( int8_t)rs
In all cases, if the effective address turns out not to be
suitably aligned, an alignment fault occurs.
Windows NT handles the alignment fault by loading the value
using the unaligned memory access instructions (which we'll see next time),
and then resuming execution.
The overhead of the emulation swamps the cost of having done it correctly
in the first place,
so if you know that the address may be unaligned,
then you are far better off using the unaligned memory access instructions
instead of having the kernel fix it up for you.
The assembler emulates absolute addressing with the help of the
at assembler temporary register.
For example, the pseudo-instruction
LW rd, global_variable
loads an aligned word from a global variable.
Let A be the address of the global variable,
and let
YYYY = (int16_t)(A & 0xFFFF)
and
XXXX = (A − YYYY) >> 16
Then the assembler generates the following two instructions:
LUI at, XXXX
LW rd, YYYY(at)
Note that if the bottom 16 bits of the address
are greater than 0x8000
,
then that results in a negative value for YYYY
,
and XXXX
will be one greater than the upper 16 bits
of the address.
Another pseudo-instruction is
LW rd, imm32(rs)
You may want to do this if indexing a global array.
A straightforward implementation of the pseudo-instruction would be
LUI at, XXXX ; load high part
ADDIU at, at, YYYY ; add in the low part
ADDU at, at, rs ; add in the byte offset
LW rd, (at) ; load the word
but this can be shortened by an instruction
by merging the fixed offset YYYY
into the displacement
of the effective address calculation in the LW
.
The result is
LUI at, XXXX
ADDU at, at, rs
LW rd, YYYY(at)
While the assembler emulation is convenient,
it may not be the most efficient.
If you are accessing the global variable more than once,
or if you are accessing more than one variable within the same
64KB
region,
you can share the LUI
instruction among them.
For example, suppose global1
and
global2
reside in the same
64KB
block of memory.
; lazy version of global2 = global1 + 1
LW r1, global1
ADDIU r1, r1, 1
SW r1, global2
This expands to
LUI at, XXXX
LW r1, YYYY(at)
ADDIU r1, r1, 1
LUI at, XXXX
SW r1, ZZZZ(at)
You can factor out the XXXX
into a register
that you reuse for the entire section of code.
; sneakier version of global2 = global1 + 1
LUI r2, XXXX
LW r1, YYYY(r2)
ADDIU r1, r1, 1
SW r1, ZZZZ(r2)
; can keep using r2 to access other variables in the block
In theory, you could even store constants in your data segment,
but since loading a 32-bit constant takes only two instructions
at most, you probably won't bother.
Next time, we'll look at unaligned access.
¹
In earlier versions of the MIPS architecture,
there was a load delay slot:
The value retrieved by a load instruction was not available
until two instructions later.
That means that in the sequence
LW r1, (r2) ; load word from r2 into r1
ADDIU r3, r1, 1 ; r3 = r1 + 1
the ADDIU
instruction operated on the old
value of r1, not the value that was loaded from memory.
If you want to add 1 to the value loaded from memory, you need
to insert some other instruction in the load delay slot:
LW r1, (r2) ; load word from r2 into r1
NOP ; load delay slot
ADDIU r3, r1, 1 ; r3 = r1 + 1
The MIPS III architecture removed the load delay slot.
On the R4000,
if you try to access the value of a register immediately after
loading it, the processor stalls until the value becomes ready.
Sure, the stall is bad, but it's better than running ahead with
the wrong value!