6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 4:36 pm

All times are UTC




Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Mon Apr 20, 2015 7:16 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
nyef wrote:
If you're willing to presume that all "program" code resides in one 64k bank, and data defaults to that same bank, then you only need to look to a bank override to cover a range of cycles from some point in the "future", call it 2-4 cycles ahead of your prefix instruction, until the next SYNC (opcode fetch). That sounds very doable in terms of the one-cycle "boring" NOPs on a 'C02. The per-instruction cycle details then get covered by suitable assembler macros.
Yup. It simplifies things a lot if code is only allowed to execute from one bank. Overlays could be stored in other banks, but they'd have to be copied over before they could execute. It depends what you want. My modified MOS KIM-1 could execute from either bank, but the circuit lost some simplicity to allow that.

nyef wrote:
I'm also reminded of reading about the software interface of an early version of the macintosh
Wow! -- very entertaining. Those guys are definitely using a shift-register time bomb! :twisted: In this excerpt from Burrells' chart, I've marked (in red) how the lower address bus is used as a bit-map that specifies time delay.
Attachment:
burrell dma annotated excerpt.jpg
burrell dma annotated excerpt.jpg [ 18.75 KiB | Viewed 567 times ]



White Flame wrote:
pointers can simply be 24-bit (with "wasted" bits in the middle byte)
Thanks for the comment. And fair enough -- that'll work. It means we're no longer dealing with linear addresses, but those aren't necessarily always superior, and we haven't looked at the pros & cons.

The main advantage IMO is in facilitating pointer arithmetic. Eg: when stepping through an array of items each $1000 in size, linear addresses let us find the address of the next or previous item simply by adding or subtracting $1000 to the address of the current item. Although something similar is possible with the other scheme (wasted bits in the middle), every arithmetic operation needs to be followed with a check and possible adjustment for overflow/underflow into the unused bits. People's tastes & priorities vary, but to me it's undesirable to always be guarding against bugs from that source. If the hardware of the expansion scheme allows linear addressing, then an address can be passed from one function to another with no burden on either end.

I concede the example of a linked list was poor choice. We won't likely be doing pointer arithmetic on those links :) (unless the object is gonna get relocated on the fly).

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sun Apr 26, 2015 1:08 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2015 7:23 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
I had an idea! BMI is the missing ingredient: it can check bit 7 of a byte. When we do pointer arithmetic, we compute the bottom byte, then the middle byte, then use BMI to see if we've overflowed. If we have, we clear bit 7 and carry into the top byte calculation.

So, what we need is a 32k memory window, which we can conveniently place at 0x4000. So, unfortunately, we also have to add 0x40 to the middle byte to form a pointer - but there can't be any overflow, so we just do that and no need to twiddle the top byte.

Does this mean that the cost of handling 23-bit pointers with a 32k memory window isn't so bad?


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2015 10:50 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Okay, the "use an '816" suggestions aren't what I had in mind. Since I was deliberately vague, I guess I'll expand on my idea:

I was thinking of a minimal component 65C02 design that is the cheapest possible (goal would be < $35, but I wouldn't get my hopes up), but still provides:
VIA (and all pins of Port A/B and control are available for GPIO)
UART of some sort
64kB of RAM or more
Bootstrap from (serial) EEPROM into memory so max speed 14MHz is possible.
No PALs/GALs (may have no choice but to relax this requirement and just get someone else to program them for me for the time being).

Aside from TTL glue, the cost of chips is static (need 'C02, VIA, RAM, EEPROM). So the bootstrap and UART can be handled by a microcontroller with enough pins. So making the cheapest board becomes a task of optimizing for space on the PCB. 64kB RAM is rare, so I would go with 128kB to save space. But I feel bad that 64kB gets left unused :P. Since I'll have a microcontroller already though, I figured one pin could be used for bank switching of some sort. But that may cause compatibility problems with existing code (don't reinvent the wheel now :P) when subroutines between banks need to talk to each other, based on the assumptions my design makes. To get ideas about how previous existing systems handle bank switching was the purpose of my original post.

An '816 system automatically implies at least 2 extra ttl support chips. Even more since RDY qualification and VDA/VPA qualification are necessary.

EDIT: I know Garth provides a minimum 6502 SBC on his website. However, his use of EPROM/EEPROM precludes running at high speed, unless the EPROM is shadowed as part of boot-up routines (and wait states are added). Additionally, parallel EPROMs take up a bunch of real estate. :P

I probably shouldn't have been so abstract with my intentions, because now the stuff I'm worrying about is nearly off topic XD.


Last edited by cr1901 on Tue Apr 21, 2015 12:50 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2015 10:55 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigEd wrote:
Does this mean that the cost of handling 23-bit pointers with a 32k memory window isn't so bad?
Hmm, BMI might be handy as long as it's just a two-byte number being added to a three-byte number. But as you say the idea has other limitations. Here (below) might be the best general solution. (16K window illustrated, located at $8000. Could easily be located at $4000 instead.)

Code:
clc
LDA numberA_loByte
ADC numberB_loByte
STA indirect_ptr

LDA numberA_midByte
ORA # $C0         ;ensure carry will get set if there's overflow out of the lower bits
ADC numberB_midByte
AND # $BF         ;leave bits 15,14 = 10 (binary)
STA indirect_ptr+1

LDA numberA_hiByte
ADC numberB_hiByte
STA page_register

LDA (indirect_ptr) ;do the access

This isn't horrible; in fact it only adds 4 cycles to an addition you were probably going to do anyway. I guess I'm uncomfortable with the "wasted bits" approach (maybe it's nicer to call them "unused") because addresses in that format constitute a new data type, and type-mismatch bugs become a possibility; also perhaps the need for code to do conversions.

The fact we have expanded-memory hardware guarantees there's a new data type. But I'd rather keep those dealings at the very lowest levels of my code -- perhaps make it a rule never to pass an unused-bits address between functions. But I admit there may be cases where such a rule might reasonably be bent.



Edited to add:
Quote:
I was thinking of a minimal component 65C02 design that is the cheapest possible
cr1901 wrote:
No PALs/GALs (may have no choice but to relax this requirement and just get someone else to program them for me for the time being).
Well, we need glue -- which a PAL/GA can do -- and we need some means (a microcontroller?) to boot the EEPROM contents into RAM. Why not let the PAL/GAL do the glue and the EEPROM-to-RAM bootup? (Hmm, you may have to go with a CPLD instead of PAL/GAL, but still.)


Quote:
An '816 system automatically implies at least 2 extra ttl support chips.
If you're including a '245 transceiver, I think that's dispensable on a system with just the '816, a RAM and VIA. (A bit-banged SPI link off the VIA will give you the UART. Might save space too.) Any other needs (address latch etc) would fit in the CPLD. But going with a 'C02 might be what you budget needs -- aren't they quite a lot more affordable? And a 'C02 memory-expansion could be part of the CPLD.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Tue Apr 21, 2015 1:18 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 21, 2015 1:13 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
5V-tolerant CPLDs are becoming hard to come by, so that would facilitate using logic shifters since at 5V the 'C02 doesn't understand that 3.3V is valid TTL :(. Not necessarily a huge deal- 4 10-input logic shifters are plenty for a 65xx bus, and don't take up much room. Yes, that is certainly a possibility- it could also include bank switching logic as well. This is meant to be a secondary, surface-mount tutorial project for me.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 21, 2015 1:28 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
cr1901 wrote:
5V-tolerant CPLDs are becoming hard to come by
What about mixing a 3V CPLD with 'C02, and no level shifter? When 'C02s first appeared on the scene, TTL compatibility was de rigueur, and with that came the assumption of 5V operation. The doc.s I've seen (eg: Rockwell) don't mention 3 volt operation. But I betcha a Rockwell 'C02 would be perfectly content at 3V -- albeit with a cost in max clock speed.

WDC 'C02s are specified for low-voltage operation. But an old Rockwell (for instance) is gonna save some dough, I think. Rated only 4 MHz, though.

The Rockwell will be in a 40-pin DIP, which incurs cost in the form of board space. A WDC cpu can be ordered in PLCC. So a 3V WDC cpu in PLCC (VIA, too?) might be just the ticket. And it might as well be an '816, since a WDC 'C02 costs almost as much.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 21, 2015 6:25 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Indeed, VIA is available in PLCC (and supposedly QFP). This actually should be doable in 5 chips: '816 or '02, 128kB RAM, VIA, CPLD (address decoding, control, and UART), and EEPROM (serial or otherwise). And of course, headers for I/O, clock, power, and reset, and UART :D.

There, I designed a SBC in my head, am I a 6502 pro yet (no)?

Now that I've got that sorted out, I'll make a more earnest attempt to parse the bank schemes provided in this thread. I agree that linear addresses are better- it's bad enough that in modern software one can easily have individual data structures > 64kB.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 21, 2015 8:41 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
cr1901, this is a good abstract discussion about banking schemes - which, for the most part, probably turn out to be more complex than your intentions in building a small fast system. Thanks for kicking it off! But, umm, can I suggest that you start up a thread to cover your 64 or 128k machine?

Jeff, thanks for doing the work of writing some real code! I fear you may be right, applying a 3-byte offset to a base pointer needs more attention to detail than I'd thought. If we must have a hole in our addressing, perhaps a 1-bit hole at the top of the second byte is as good s it gets. It's compatible with a 16/16/16/16k banking scheme too, if we arrange two consecutive pages mapped into two consecutive slots.

Just in passing: it may be that BIT does a good job of helping us deal with a 2-bit hole at the top of the second byte.

All this discussion reminds me of Acorn's Turbo machine, for internal use only, which has the great advantage of both extending existing addressing modes to 3-byte pointers and being backward compatible (because it's easy to fill the page of third bytes with zeroes). See previous discussion at
viewtopic.php?f=4&t=1465
and a previous discussion on the Apple III at
viewtopic.php?p=35550#p35550


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 21, 2015 9:09 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 674
Once you start getting into issues of large pointer arithmetic, that's not something you're going to want to be coding in 6502 straight. Either it'll be a heavily macro-ified assembly, or a higher level language.

Going to read or write a random address will involve setting the bank byte, and likely doing (zp),y addressing (since you're likely dealing with more than a single byte), so you're going to be doing special operations at least every 256 bytes anyway. Heck, even with a large 32KB window into linear space, if your data structure is only 8 bytes but starts at $007ffe in the flat space, you're going to have to do banking operations in the middle of that data access. Again, you really don't want objects spanning those bank boundaries no matter how "flat" you make it, unless you have a slow, high-level language that means you don't have to deal with details lower level than large pointers.

Many years ago, I had a concept called SoftMMU for the 6502, back when LUnix was a thing. In order to support banking mechanisms and the REU (DMA-based copying in & out of the 6502's space), as well as being reasonable to work with in hand-coded assembly, I decided that it would be built on (zp),y access, 3-byte pointers in zeropage (of which (zp),y used the lower two) and 256-byte windows into expanded RAM, simulated or real. When .Y under/overflowed, you JSR to bump to the next page. When you loaded a new 24-bit pointer into zp, you JSR to refresh the banking, copy memory around, and/or move the zp pointer to the new location. Effectively, the zp pointer was opaque as far as the program is concerned; SoftMMU would manage where it actually pointed in the 6502's address space, to match the request for the 24-bit address. Plus, you could have multiple pointers (though requiring JSR before accessing a different one) for ease of use. This was also compatible with unexpanded systems, where the pointer would always simply point to a unique page in the standard address space.

Of course, this line of thinking is to get a single programming model which is compatible with multiple memory expansion schemes, as opposed to one optimal scheme that can be hardcoded and optimized. However, when it comes down to actually using such a scheme, the indexing modes of the 6502 mean 256-byte spans of memory fall into place naturally.

But again, back to the most important point, this is if you want to optimize for easier random access across the entire expanded memory space. If you want data overlays for smaller more independent self-contained parts of a program, or multitasking by bank switching, then far data access can be much more complicated, because it's not the common case, and programs can effectively still work in 16-bit addressing.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 38 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: