GARTHWILSON wrote:
Using a data stack [...] I wonder how the ZP memory maps might be different in the Apple II and Commodore if their kernels had taken advantage of this.
It's easy to imagine that, as time went on, new features got added piecemeal -- and the list of z-pg requirements just grew and grew. A unified organization such as a stack might've largely avoided that. IOW a new design needn't include the horrible congestion seen in old designs.
Quote:
full address decoding [...] would reduce the maximum clock speed, unless I resorted to programmable logic.
As I said, each case needs to be examined individually. To decode z-pg, what you need is a big, wide OR (or NOR) gate. Some systems
have programmable logic, and a big, wide OR/NOR is no problem. Other systems -- those using slower memory, peripherals and clock -- have less demanding timing margins, and it's OK to replace that big gate with some SSI equivalent circuit. You'll still benefit from the reduced cycle count of I/O in z-pg. But I admit programmable logic may be the only answer if you're pushing for maximum clock rates.
One very tidy decoding solution is the
74ALS679 -- a configurable 13-input gate which has no trouble in plunking the 16 registers of a VIA, for instance, into z-pg for you. It can easily be configured as big, wide NOR (but it won't achieve 8.5ns max prop delay). Another way to get a wide NOR (14-input) is with both sections of a
74F260 feeding into an 'ACT138 (only one output would be used). Or a 74FCT521 might serve as part of the scheme.
Quote:
my workbench computer would lose 76 bytes of ZP
Well,
nothing says all the I/O has to be mapped to z-pg. You could map some of the less critical devices elsewhere -- and, if the "elsewhere" location is judiciously chosen, there'll be
zero added complexity in the decode circuitry. A 50:50 split between Page 0 and Page 2 works nicely, for example. The circuit is virtually identical to that required for putting it all in pg 0.
FWIW, my own experience is with
my workbench computer -- the
Kimklone.
It has a Rockwell 'C02 running at 5 MHz, and the I/O decoding uses a 74HC679. There are three VIA's (48 bytes total) mapped into z-pg, and I have oodles of space left over since the only other thing in z-pg is the Forth data stack.
Quote:
your substitute code was of course covering all (or most of, since it did not preserve P) the bases, but we usually try approach the code in a way that needs less of that.
I agree that if we're careful then we need "less of that." I had mixed feelings about including PHA & PLA in my code examples. Still, your SPI code snippet can run
a lot faster if the bits being banged are mapped to z-page. (Edit: also expect further improvements in the posts to follow.)
Code:
SEND_BYT:
; save 3~; don't PHA
CLK_DN ; save 3~; RMB z-pg replaces LDA #1 and TRB abs. (details in Garth's post below)
; save 2~; don't TSX
; save 2~; don't LDA #2
FOR_Y 8, DOWN_TO, 0 ;
ASL A ; save 5~; ASL abs,X is 7~ but ASL A is only 2~
IF_C_CLR ;
RMB VIA3PB ; save 1~. RMB z-pg replaces TRB abs.
ELSE_ ;
SMB VIA3PB ; save 1~. SMB z-pg replaces TSB abs.
END_IF
CLK_HI_PULSE ; save 2~. INC DEC are each 1~ faster for z-pg than for abs.
NEXT_Y ;
; save 4~; don't PLA
RTS ;
The speedup we see here is from shorter instructions, from not having to push/pop A, and from not having a bit mask held in A. I hope I'm correct in assuming CLK_HI_PULSE assembles an INC abs followed by DEC abs (or vice versa).
FWIW, it's possible to optimize away use of Y as a loop counter, but that's not pertinent to I/O in z-pg so I'll leave it alone. BTW I sure do like the structured macros!
cheers
Jeff
[Edit for clarity, and better understanding of the CLK_DN macro. Tweak code comments and emphasize the split-I/O-area option.]