Typical RISC CPUs deliberately cut down the number of addressing modes directly supported by the instruction set, in favour of providing facilities for the programmer to construct his own addressing modes, and generally speeding up the execution of each instruction. Part of the speedup actually comes from fixing the size of each instruction, so that it's easy to decode and execute several in parallel. In fact, usually there are only register-direct, register-offset, and register-offset-with-update modes, plus immediate-operand loads, adds and bitwise operations. Compare that with CISC CPUs' typical support for absolute and indirect addressing modes, and occasionally even pre-and-post-indexed-indirect-with-offset (which, IIRC, the later versions of the 68K do support).
RISC CPUs usually have far more registers than their CISC cousins, and single-cycle execution of simple instructions. So you can construct an absolute addressing mode by loading the desired address into a spare register (using load-immediate instructions) and then issue a normal register-direct memory operation, and that might actually take fewer cycles than a CISC CPU executing the equivalent absolute addressing mode. On the PowerPC, an arbitrary 32-bit absolute load might be:
Code:
li r9,ADDRlo : oris r9,r9,ADDRhi : lwz r8,r9
taking a total of 5 cycles for a cache hit, during which other instructions can execute in parallel. Recent versions of ARM have 16-bit immediate MOV and MOVT which work similarly to li and oris; older ones may require four instructions to construct an arbitrary 32-bit constant in a register, or they can do a PC-relative load (r15 is the PC, so you can do an offset load from there with a normal load instruction) to bring in a constant packed in at assembly time. Similarly, you can construct indirect addressing modes using sequences of direct memory accesses, possibly interspersed with arithmetic.
The basic ARM cores use fewer transistors than typical 32-bit CISC CPUs. They can do that because they aren't stuffed with complex addressing modes. At the same time, they're considerably faster, even if you run the comparison at the same clock speed and run a benchmark that would theoretically benefit from complex addressing modes.
To illustrate this point concretely, the ARM3 of 1989 and the early versions of the 80386 both used a 1.5µm process, and are actually broadly similar in size; the ARM3 is about 20% larger in terms of transistor count. But the ARM3 includes a 4KB cache and associated tag RAMs on-die, which accounts for the great majority of those transistors, while the 80386 has none of that - it's *all* core. The 1.5µm 80386 was limited to 12MHz; the 1.5µm ARM3 runs up to 25MHz, and takes fewer cycles on average per instruction because it has an actual pipeline architecture, while the 386 is still basically a microcoded CPU.