6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 10:51 pm

All times are UTC




Post new topic Reply to topic  [ 76 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Sun May 15, 2022 1:13 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
Dr Jefyll wrote:
...unlike PBR, DBR does not get a new value placed in it when an interrupt occurs.

Like the effects of TDC and TSC, DB behavior is something a 65C816 assembly language programmer can never afford to forget. :D

It’s somewhat unfortunate that the only way to access DB is through the stack with an 8-bit push or pull. That can make for some downright awkward programming at times. For that reason, the programs I have written for my POC units all start with PHK — PLB and don’t touch DB beyond that. That allows static data structures that are assembled as part of the program, e.g., lookup tables, strings, etc., to be accessed with 16-bit addressing without having to know in which bank the program is running. Run-time data, e.g., user input, that is not on direct page and could be anywhere in address space is accessed using 32-bit pointers (really 24-bits, but handled as 32 bits) that make the program bank-agnostic on data accesses.

Most functions (subroutines) in my programs use the stack for local workspace. In those cases, I usually redirect DP to SP+1 after reserving the workspace and then employ direct page addressing modes. Doing so makes the function bank-agnostic and adds significant flexibility with no significant downside.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 1:57 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
tmr4 wrote:
Still to be considered though is what I do when I exceed bank 2 for data. Still thinking about that.
If you set up a 3-byte pointer in Direct Page you can use "Direct Indirect Long" address mode to reach the entire 16 MB address range. In fact there are four modes that can reach all 16 MB (without relying on DBR):

    - Direct Indirect Long
    - Direct Indirect Long indexed with Y
    - Absolute Long
    - Absolute Long indexed with X

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 2:02 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
Dr Jefyll wrote:
Hmm, several new posts. I'm falling behind! :)

Gotta quit goofing off, Jeff. :D

Quote:
...can either of you clarify this preference for I/O being in Bank $00?

I do it for several reasons:

  1. Programming convenience. None of my programs directly touch hardware. They all go through BIOS APIs, which makes programs smaller and more immune to future memory map changes. Also, API calls are bank-agnostic—a program running in any bank can access the API without having to know anything about where the kernel is located in memory or any other aspect of the machine’s architecture. Since the API is called via software interrupts (COP), at least part of the kernel must be in bank $00. If the I/O is there as well, accessing it in the API primitives can be done with 16-bit addressing, resulting in smaller and faster-executing code.

    I should note that use of API calls isn't mandatory. Nothing stops a program from reinventing the wheel and touching hardware. Of course, that could cause system fatality, but that's a different topic.

  2. Glue logic convenience. Hardware-wise, the 816 treats bank $00 differently than any other bank. In any system in which ROM and I/O are wait-stated, the wait-stating logic is simpler to design if ROM and I/O are in bank $00. Bank $00 is easily detected in logic, making it relatively simple to avoid unnecessarily wait-stating RAM. Another benefit is mirroring of ROM and I/O in the extended address space is avoided.

  3. Contiguous extended RAM. If I/O and ROM are only in bank $00, all address space beyond $00FFFF is contiguous RAM, assuming the RAM is physically present. That confers significant flexibility with where applications can be loaded and executed, plus makes large run-time data structures practical and easy to address. This won't be the case if ROM and/or I/O are in an extended bank.

Quote:
But I don't see much value in choosing Bank $00 for I/O because of software issues, even in the context of an ISR.

I do. :)

Quote:
It's true that long accesses incur a penalty, and of course we'd prefer to avoid that. But I'm doubtful that having I/O appear in Bank $00 is of much use for avoiding that penalty.

Not bank $00 specifically. If the code responsible for I/O accesses is in the same bank as the I/O then all accesses can be with absolute addressing. Otherwise, long addressing will be required. This distinction can become quite important in manipulating hardware at the bit level, since the handy TRB and TSB instructions only have DP and absolute addressing modes. In passing, I should note that placing hardware on DP can complicate things more than one might think, as well as have little effect on performance.

That said, refer to the above about the value of keeping RAM above $00FFFF contiguous for programming purposes.

Quote:
I'm drawing a blank on your suggestion to "split your ISR between bank $00 and the I/O bank and JML to the rest of the ISR."

My comment had to do with he having only the front end of the ISR in bank $00 ROM and the rest of the ISR in some other bank. In that case, a JML is necessary to get to the main body of the ISR, as JML is one of the two program flow instructions that can direct execution to an arbitrary bank.

Quote:
And can you supply a reference for your statement that long accesses using R-M-W instructions will incur multiple clock cycle penalties per instruction? Long address modes do mean the instruction will include a three-byte address, and certainly it costs an extra cycle to fetch that additional byte. But AFAIK the penalty only applies once, even in the case of a R-M-W

Sorry, I mistyped that one—a “senior moment.” :? R-M-W instructions, such as INC or ROL don’t have long addressing modes.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Wed Sep 28, 2022 2:22 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 2:20 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
tmr4 wrote:
I'm using the 65C51 and the transmitter can't be interrupt-driven.

Non-WDC ones can, and even WDC's can be with the help of a '22. See this post.

Quote:
As I've seen you advise, my ISR is short.

Not so short though that something in the background program has to, in effect, poll to see if the ISR ran. Also, the point is to make the ISR finish quickly. Even a super long ISR might still always finish quickly, if only short portions of it get executed before exiting, which portion being dependent on the conditions.

The biggest disadvantage to having to use long addressing for I/O seems to be that you don't get the TRB, TSB, BIT, INC, and DEC instructions. (INC and DEC are nice for toggling bit 0 of a port.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 4:11 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Dr Jefyll wrote:
If you set up a 3-byte pointer in Direct Page you can use "Direct Indirect Long" address mode to reach the entire 16 MB address range. In fact there are four modes that can reach all 16 MB (without relying on DBR):

    - Direct Indirect Long
    - Direct Indirect Long indexed with Y
    - Absolute Long
    - Absolute Long indexed with X

Thanks. Direct Indirect Long is what I needed.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 5:08 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
GARTHWILSON wrote:
Not so short though that something in the background program has to, in effect, poll to see if the ISR ran.

My ISR takes a byte from either the ACIA or PS/2 keyboard controller and places them in a circular buffer similar to the one you recommend in your primer. My ISR is about 47 bytes long.

GARTHWILSON wrote:
The biggest disadvantage to having to use long addressing for I/O seems to be that you don't get the TRB, TSB, BIT, INC, and DEC instructions. (INC and DEC are nice for toggling bit 0 of a port.)

My ISR uses 16-bit absolute addressing. My data bank register is set to bank 2 at start up. That's where my I/O and RW data reside.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 5:32 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
BigDumbDinosaur wrote:
  • Contiguous extended RAM. If I/O and ROM are only in bank $00, all address space beyond $00FFFF is contiguous RAM, assuming the RAM is physically present. That confers significant flexibility with where applications can be loaded and executed, plus makes large run-time data structures practical and easy to address. This won't be the case if ROM and/or I/O are in an extended bank.

There's more than one way to do that. For example, my ROM and I/O are contiguous from $FF00-$200FF. My extended RAM is also contiguous from $20100-$7FFFF.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 5:46 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
tmr4 wrote:
There's more than one way to do that. For example, my ROM and I/O are contiguous from $FF00-$200FF. My extended RAM is also contiguous from $20100-$7FFFF.

ROM and I/O from $00FF00 to $0200FF? That's a strange arrangement in my book. I/O can be mapped into 2K in most cases. Where's all that other space going?

What are your plans for bank $00 RAM?

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 6:00 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Some interesting design choices here. I like the idea of keeping the hardware simple - and an effect of that is that the memory map might not be so simple, because optimising two things at once doesn't always work out. ​

I'd like to see this play out with I/O in bank 2 - it's not what we normally see, and it might have some interesting tradeoffs. But, I'm not sure of the advantage of having the I/O share with RAM in this bank - I see that it means DBR can point to both, but it also means that you now have another special area of RAM: bank 0 is always special, now bank 2 is special, and the rest of your RAM is all one piece. If you sketch out what you I/O routines will look like, and how you might load a block of data from an I/O device into working RAM, what consequences do you see?

It does feel to me that worrying about extra single-cycle costs is potentially a mistake, especially at higher clock speeds. The overall latency, and overall elapsed time, of the ISR, is what matters, and it's not usually quite so critical. One might be worrying about a tenth of a microsecond.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 6:53 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
BigDumbDinosaur wrote:
tmr4 wrote:
There's more than one way to do that. For example, my ROM and I/O are contiguous from $FF00-$200FF. My extended RAM is also contiguous from $20100-$7FFFF.

ROM and I/O from $00FF00 to $0200FF? That's a strange arrangement in my book. I/O can be mapped into 2K in most cases. Where's all that other space going?

I have 256 bytes of I/O from $20000-$200FF and about 64k of ROM from $FF00-$1FFFF. I don't need that much ROM, but like you mentioned, this keeps RAM in one contiguous range from $20100-$7FFFF. I/O is in bank 2 vs bank 1 because that's where my RW data is and this allows I/O access with 16-bit absolute addressing.

BigDumbDinosaur wrote:
What are your plans for bank $00 RAM?

Direct page, return stack and Forth data stack, likely multiple instances of each. At this point in my design it doesn't seem reasonable to waste bank 0 features on things that can be accommodated elsewhere. Of course I'm just starting out, so I'm not locked into anything yet.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 7:49 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
BigEd wrote:
If you sketch out what you I/O routines will look like, and how you might load a block of data from an I/O device into working RAM, what consequences do you see?

Most I/O routines are done. ACIA and PS/2 keyboard controller ISRs in bank 0 save to circular buffers in bank 2. Input routines (getc and get_blk for example) accesses these for input. Block data is saved from the serial buffer to a block buffer in bank 2. Output (putc) writes directly to serial which is connected to a smart display that also provides SD card access. I haven't made put_blk yet. I need an editor first.

BigEd wrote:
It does feel to me that worrying about extra single-cycle costs is potentially a mistake, especially at higher clock speeds. The overall latency, and overall elapsed time, of the ISR, is what matters, and it's not usually quite so critical. One might be worrying about a tenth of a microsecond.

I suppose this refers to other comments as counting clock cycles isn't one of my concerns. I started my Forth system as token threaded so speed was never one of my goals. I'm now doing direct threaded but I'm still not worried about speed.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 8:10 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
tmr4 wrote:
Direct page, return stack and Forth data stack, likely multiple instances of each. At this point in my design it doesn't seem reasonable to waste bank 0 features on things that can be accommodated elsewhere. Of course I'm just starting out, so I'm not locked into anything yet.
and
Quote:
I started my Forth system as token threaded so speed was never one of my goals. I'm now doing direct threaded but I'm still not worried about speed.

I will be interested to see your token threading. I can envision a token-threaded Forth, but haven't ever seen one. Even with ITC or DTC however, Forth program memory usage is so efficient that if you're working by yourself, I'm sure you'll never come close to filling a bank, even with stacks and I/O in the same bank. :D If you do token-threading, it'll be all the more compact. My '02 ITC Forth is rather full-featured, unlike fig. It takes something like 24KB of ROM, and then I have 16KB of RAM for compiling applications on the fly, and I've never needed anywhere near that much except for when I wanted large data arrays, and I would want many megabytes for some of those. In my '816 Forth, non-bank-0 RAM is only for these large data arrays, not program memory.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 1:30 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
GARTHWILSON wrote:
I will be interested to see your token threading.

I started developing with TTC to save memory on a 6502 system that only had 32k RAM and 16k ROM. I gave it up after seeing that expanding the number of tokens past my initial quantity of 256 required more memory than I was saving with the TTC construct.

GARTHWILSON wrote:
I'm sure you'll never come close to filling a bank, even with stacks and I/O in the same bank.

I agree. With the memory I gained with my PLD address decoder I can fit everything in 64k. So bank 0 would be fine for it with a 65816, but where's the fun in that. With the 65816, I'm basically treating bank 0 as the zero page on the 6502 and using the next two banks for the ROM and RAM that filled the rest of the 6502's 64k address space. I'm just tripling the size of it all. I then have another 320k extended RAM on top of that. What more could I ask for?

I really liked TTC and might have stuck with it if I'd had the nearly inexhaustible memory available with the 65816. Right now I'm more interested in the experience than in saving a byte or a cycle. If I try to optimize I'll never get anything done and I already have too much of a tendency to get side tracked.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 3:20 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
I've tried some of the refinements that have been suggested without success:

akohlbecker wrote:
For clarity/safety you might want to write RAM = Address:[0000..FEFF]. Even though it works in this file, since you're doing RAM & !ROM for RAM_CS, you're one mistake away (forgetting to &!ROM) from enabling both chip selects at the same time. Same thing for EXRAM = Address:[20100..7FFFF]. Then you can also do !RAM_CS = RAM # EXRAM which is easier to read

BigDumbDinosaur wrote:
[*]If possible, negative logic should not be used to drive outputs. If an output is considered true when low, declare the pin as inverted, e.g.:

Code:
Pin 20 = !OE;

...and:

Code:
OE       = CLK & RW;

In other words, let the PLD's hardware do the output negation and write your logic as positive. This form will use fewer product terms than the way you’ve got it.


My new code:
Code:
Device   g22V10 ;

/* Input */
Pin 1        = CLK;
Pin 2        = RW;
Pin [3..10]  = [A15..A8];
Pin 11       = A16;
Pin 22       = A17;
Pin 13       = VIA_IRQ1;
Pin 14       = A18;
Pin 15       = ACIAs_IRQ;

/* Output */
Pin 16 = !CLKB;
Pin 17 = !WE;
Pin 18 = !ROM_CS;
Pin 19 = !RAM_CS;
Pin 20 = !OE;
Pin 21 = !IO_CS;
Pin 23 = !IRQ;

/* Local Variables */
FIELD Address = [A18..A8];

RAM       = Address:[0000..FEFF];
ROM       = Address:[FF00..1FFFF];
IO        = Address:[20000..200FF];
EXRAM     = Address:[20100..7FFFF];

CLKB     = CLK;
WE       = CLK & RW;
OE       = CLK & !RW;
RAM_CS   = RAM # EXRAM;
ROM_CS   = ROM;
IO_CS    = IO;
IRQ      = VIA_IRQ1 & ACIAs_IRQ;

RAM_CS needs 70 pterms, I have 16 on that pin. Essentially the same as my original problem and why I used overlapping ranges.

With the PLD I'm using I can't use logic directly mapping the single page of ROM and IO that I want. My solution has the downside of selecting RAM when writing to ROM, but it works for my system. In fact it may be the only way to achieve my goal with this PLD. Anyone up for the challenge to prove otherwise?

As an aside, I don't think these refinements did much (or anything?) to help WINCUPL use fewer terms. Perhaps it's smart enough to get over my bad coding.

Also, note that my IRQ logic is surely off but I didn't bothering with it as these refinements didn't help solve the original problem. Perhaps it shows, negations play tricks with my brain. I'm still looking askew at the negated pins and corresponding logic and scratching my head. It's just not clearer for me what's going to come out of the pin. Including the negation in the logic seems clearer. For example
Code:
CLKB = !CLK

clearly states that CLKB is the inverted clock. In the refined code above I have
Code:
CLKB = CLK
but it's active negative from the pin setting
Code:
Pin 16 = !CLKB
I'm all for it if it helps it fit in the PLD, but if not, I'll go for what fits in my brain.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 15, 2022 3:43 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Just for reference in my challenge above, the following code also fails but only needs 18 product terms for RAM_CS, versus 70 in the code with non-overlapping address regions. 16 is the max on a pin for the ATF22V10C.

Code:
Device   g22V10;

/* Input */
Pin 1        = CLK;
Pin 2        = RW;
Pin [3..10]  = [A15..A8];
Pin 11       = A16;
Pin 22       = A17;
Pin 13       = VIA_IRQ1;
Pin 14       = A18;
Pin 15       = ACIAs_IRQ;

/* Output */
Pin 16 = CLKB;
Pin 17 = WE;
Pin 18 = ROM_CS;
Pin 19 = RAM_CS;
Pin 20 = OE;
Pin 21 = IO_CS;
Pin 23 = IRQ;

/* Local Variables */
FIELD Address = [A18..A8];

/* Logic */
RAM       = Address:[0000..FFFF];
ROM       = Address:[FF00..1FFFF];
EXRAM     = Address:[20000..7FFFF];
IO        = Address:[20000..200FF];

CLKB      = !CLK;
!WE       = CLK & !RW;
!OE       = CLK & RW;
!RAM_CS   = (RAM & !ROM) # (EXRAM & !IO);
!ROM_CS   = ROM & RW;
!IO_CS    = IO;
IRQ       = VIA_IRQ1 & ACIAs_IRQ;


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 76 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: