Memory: load & store
First, what is memory? Picture a huge row of numbered storage slots, each holding one byte (a byte is 8 bits). The number of a slot is its address. Registers are the CPU's handful of fast slots; memory is far larger but slower, and it holds everything that doesn't fit in registers, like arrays and strings. To work on a value in memory you load it into a register, change it, then store it back.
Registers are few; memory is vast. You move a word from memory into a register with lw (load word) and back with sw (store word). The address is written offset(register).
Memory in RISC-V is byte-addressed: every byte has its own sequential address. Because a word is 32 bits, or 4 bytes, you step memory pointers by 4 to reach the next word, which is why array offsets go 0, 4, 8, and so on. On real hardware, loading a word from an address that isn't a multiple of 4 can be slow or even trap; this simulator allows it, but keeping words 4-aligned is the habit to build.
lw t0, 0(a1) # t0 = memory at address a1 sw t0, 4(a1) # memory at address a1+4 = t0
The offset(register) form means: take the address sitting in the register and add the small offset number to it. So 0(a1) is exactly the address in a1, and 4(a1) is 4 bytes further along. That's how you step through neighbouring values without changing the base register.
Reserve named memory in the data section with .word (one or more 32-bit values) or .space N (N blank bytes). Each word is 4 bytes, so consecutive items sit at offsets 0, 4, 8, …
.data nums: .word 10, 20, 30, 40 # four words .text la t1, nums # t1 = address of the array li t2, 4 # count li t0, 0 # running sum loop: beq t2, zero, done lw t3, 0(t1) # load current element add t0, t0, t3 # add to sum addi t1, t1, 4 # advance to next word (4 bytes) addi t2, t2, -1 # one fewer left j loop done: mv a0, t0 # print the total (100) li a7, 1 ecall li a7, 10 ecall
Walk the loop. la t1, nums puts the array's address in t1, t2 counts the 4 elements left, and t0 holds the running sum. Each pass: beq t2, zero, done stops when no elements remain; lw t3, 0(t1) loads the element t1 points at, add t0, t0, t3 adds it to the sum, addi t1, t1, 4 moves the pointer on by one word, and addi t2, t2, -1 drops the count. After all four (10 + 20 + 30 + 40), done prints 100.