Hmm... now you have opened a can of worms.
Depending on the speed of your CPU, and on the propagation delay of your ALU (and incrementer/decrementer etc.), there are different approaches to this.
C74 deliberately is aiming for speed. IIRC it has 2:1 multiplexers at the inputs of the registers PC, S, ADH\ADL,
so the registers individually could be either written from the ALU output or from the incrementer\decrementer (which is fed by the CPU internal address bus),
at the falling edge of PHI2.
;---
In theory, the idea to use a 16 Bit ALU able to run at twice the speed of the external CPU bus and to use it for incrementing/decrementing registers
during PHI2=low and to do data calculations during PHI2=high sounds nice, but the timing of the CPU
external address bus has to be considered:
If the address changes too fast after the falling edge of PHI2, write cycles might be getting corrupted.
If the address changes too slow after the falling edge of PHI2, this gives the address decoding logic outside the CPU less time to do the job.
Back in 1994, Bradford J. Rodriguez introduced the
PISC1:
One could replace the box labelled REG FILE with
dual port RAM,
but the architecture would have to be modified a bit because for the 6502: data bus is 8 Bits and address bus is 16 Bits.
Unfortunately, the only manufacturer for that sort of dual port RAMs nowaday seems to be IDT, and it's hard to tell for how long these chips will stay in production.
;---
If your 8 Bit ALU would be really fast and your CPU external bus would be really slow, and you would be able to do 4 clock cycles inside the CPU for every bus cycle,
when pulling some tricks this would give a rather simple architecture like in my first TTL CPU (the mill would resemble something like a PIC16C54 microcontroller),
but back then I didn't know what I know now.
Also, for at least half of the time the ALU would be sitting idle, and the microcode would be getting somewhat bloated because the CPU core
has to run 4 times faster than the external bus (4 microcode cycles per 6502 machine cycle, that is).
;---
A different approach would be using
74ALS867\74ALS869 8 Bit synchronous up/down counters,
but they are very exotic, they are expensive, and it's hard to tell for how long these chips will stay in production.