Memory: load & store

Memory: load & store

First, what is memory? Picture a huge row of numbered storage slots, each holding one byte (a byte is 8 bits). The number of a slot is its address. Registers are the CPU's handful of fast slots; memory is far larger but slower, and it holds everything that doesn't fit in registers, like arrays and strings. To work on a value in memory you load it into a register, change it, then store it back.

Registers are few; memory is vast. You move a word from memory into a register with lw (load word) and back with sw (store word). The address is written offset(register).

Memory in RISC-V is byte-addressed: every byte has its own sequential address. Because a word is 32 bits, or 4 bytes, you step memory pointers by 4 to reach the next word, which is why array offsets go 0, 4, 8, and so on. On real hardware, loading a word from an address that isn't a multiple of 4 can be slow or even trap; this simulator allows it, but keeping words 4-aligned is the habit to build.

Load / store shape
lw t0, 0(a1)    # t0 = memory at address a1
sw t0, 4(a1)    # memory at address a1+4 = t0

The offset(register) form means: take the address sitting in the register and add the small offset number to it. So 0(a1) is exactly the address in a1, and 4(a1) is 4 bytes further along. That's how you step through neighbouring values without changing the base register.

Reserve named memory in the data section with .word (one or more 32-bit values) or .space N (N blank bytes). Each word is 4 bytes, so consecutive items sit at offsets 0, 4, 8, …

Sum an array in memory
.data
nums: .word 10, 20, 30, 40   # four words

.text
la t1, nums      # t1 = address of the array
li t2, 4         # count
li t0, 0         # running sum

loop:
  beq t2, zero, done
  lw  t3, 0(t1)    # load current element
  add t0, t0, t3   # add to sum
  addi t1, t1, 4   # advance to next word (4 bytes)
  addi t2, t2, -1  # one fewer left
  j loop

done:
  mv a0, t0        # print the total (100)
  li a7, 1
  ecall
  li a7, 10
  ecall

Walk the loop. la t1, nums puts the array's address in t1, t2 counts the 4 elements left, and t0 holds the running sum. Each pass: beq t2, zero, done stops when no elements remain; lw t3, 0(t1) loads the element t1 points at, add t0, t0, t3 adds it to the sum, addi t1, t1, 4 moves the pointer on by one word, and addi t2, t2, -1 drops the count. After all four (10 + 20 + 30 + 40), done prints 100.

💡
Switch the simulator readout between HEX and DEC (the toggle in its toolbar) to read addresses and values in whichever base makes the moment clearer.