6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 11:02 pm

All times are UTC




Post new topic Reply to topic  [ 56 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Dec 12, 2021 4:13 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 251
Location: South Africa
A quick update on this project.

My pull request was accepted (after a bit of back and forth). I still don't believe I can be the first person to run into the propagation/timing issue I had and that means there must be a better (besterest practices) way of simulation MCUs. For now everything works and seems quite stable so I'm going to call the 65C816 part of the simulation complete.

I'll do a general public release when the next version of Logisim Evolution is released (it should have my changes in it).

I've also coded up all the other ICs I think I'm likely to use (greyscale link).

Image

Unfortunately due to the way Logisim's libraries are laid out everything has to go into a single folder and that will be forevermore 'WDC'.

If anyone has any suggestions for ICs I should add (or useful chips I should be using) feel free to post them here.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 12, 2021 4:53 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
AndrewP wrote:
I'm going to ultimately be using five 65816s [...] I'm going to need to use an asymmetric duty cycle with the low part of the cycle running at an effective 25Mhz.
Very intriguing! Indeed, the asymmetric clock suggests to me that the five processors would take turns accessing a single memory array. (Hm, but in that case it would be the high phase of the clock that'd be comparatively short.)

Care to give us any spoilers about the master plan you have in mind? :)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 12, 2021 8:32 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 251
Location: South Africa
Dr Jefyll wrote:
Very intriguing! Indeed, the asymmetric clock suggests to me that the five processors would take turns accessing a single memory array. (Hm, but in that case it would be the high phase of the clock that'd be comparatively short.)
That's spot on! :D

The problem is I need a lot of time to do address decoding (like a lot, a lot); and that's why I need to start working with the address as soon as possible. A bunch of latching later and I should be able to get the write out to 'main' memory (and the read in in time to present to 65816) in that last short phase.

Dr Jefyll wrote:
Care to give us any spoilers about the master plan you have in mind?
Sure, very hand wavey but... I mentioned somewhere that a lot of the inspiration for this project was taken from the Commander X16. And in particular the VERA graphics chip. However the VERA (whilst being a great piece of engineering) feels a bit too specific and I've never really been a fan of FPGAs. With that in mind I set out to see if I could find (or build) an off the shelf solution, something with no bespoke components at all. And I wanted to be able to generate about 30 million pixels per second. Because apparently I'm a masochist.

A back-of-the-napkin calculation should tell you even five 65816s running at ridiculous frequencies would not even be close to generating that number of pixels

I was initially thinking I could have a 65816 drive a DMA controller but I quickly ran into two problems. The first being DMA controllers don't seem to be a thing any more (or those that do exist are really expensive); and the second being I needed to perform an operation in the controller (a complete impossibility). I've finally settled on a single line blitter made out of 4-bit adders with the entire setup* for each line being driven by a 65816; no clipping or anything those calculations are all done in software. That gives me 2 pixels for every 3 clock cycles. Much faster than banging them individually but still too slow.

And that's how I ended up with five 65816s. Three of them drive three identical line blitter circuits, one does sound and SD card IO and the last is an actual user programmable CPU that will also handle slow devices. I'll eventually post all the various pieces here but first there's a bit of design and hardware testing that still needs to happen. Fun times!


*That's source address, destination address, line length, transparency compare index, palette selection index, transparency enable and 4-bit enable.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 12, 2021 11:53 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Oooo -- sounds like great fun! Thanks for letting us in on the master plan (even if it is a bit hand wavy). :P

I'm a little puzzled as to the nature of this decoding that'll supposedly take so long. If it's just a simple address decode, I think you'll find the scheme in this post sufficiently speedy, even if all 24 bits need to be included.

AndrewP wrote:
DMA controllers don't seem to be a thing any more (or those that do exist are really expensive); and the second being I needed to perform an operation in the controller (a complete impossibility).
Have you considered building your own DMA controller? Even using discrete logic it'd be doable. And it could perform operations, too (within reason).

I'm just throwing some ideas out there. And I'm not familiar with Commander X16 and the VERA graphics chip, so I'm purely guessing as to how the various bits from memory need to get mutilated and massaged before (presumably) getting written back to memory.

Quote:
I've finally settled on a single line blitter made out of 4-bit adders with the entire setup* for each line being driven by a 65816
If I'm not rushing you too much, maybe it'd be good if you could summarize for us the task that's to be accomplished. Then if you please another summary of your triple-barreled solution. In other words, first the "what" then the "how." :) A block diagram would be helpful, too.

Is it your intention that the three blitter 816's will access the shared memory not only for pixel data but for all fetches (including code)? If it's code, too, then that's potentially a serious bottleneck, so it'd be worth considering other options. For example, even a small amount of private RAM for each processor would speed things up immensely. (And the '265 is one way you could provide a small amount of RAM, although I don't know whether it could easily be coaxed to cooperate in such a scheme.)

I'm not especially advocating the '265, but there's another feature I hope you're aware of. According to the datasheet, "The Parallel Interface Bus (PIB) is used to communicate instructions and data to and from task oriented processors, smart peripherals, co-processors, and parallel processors." AIUI, the PIB is a gateway to connect two otherwise independent systems. It may be attractive to have a single '265 acting as a slave or sub-processor for the sound and SD card IO (even if there are no other 265's in your system).

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 13, 2021 5:47 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 251
Location: South Africa
Dr Jefyll wrote:
I'm a little puzzled as to the nature of this decoding that'll supposedly take so long.
More handwaverey! Although in this case the answer is: 'kinda virtual memory'. And this is also a problem that will need it's own thread sometime...

Basically I want to allow user programs to have their own stacks entirely transparent to each other. To do this I remap accesses to bank zero based on the currently set user program latch. For example user program $FF bank zero might map to bank $02 so when address 00:801B appears on the bus it will be dispatched to memory address 02:801B instead. Whereas user program $FE maps to bank $04 so address 00a:801B is dispatched to 04:801B.

A more general solution is to take a 16bit number, say $FF00 with the upper 8 bits being the user program number and the lower 8 bits being the bank, and then use that as an address read into a memory location that contains $02 and then slip that into the actual bank address going out to main memory. This means I could remap $FF01 to main memory bank 03 et cetera...

And now I've way over explained that I'm basically treating the first bank of kernal RAM (funnily enough that is 128KB of private RAM per 65816) as a 16bit to 8bit map. For every memory access I have I have to do an additional memory access and, well, that makes for pretty slow decoding. It does seem to work in the simulation but the proof will be once I've built the thing. If it doesn't work then I'm chocking it up to experience and an interesting tangent that I didn't really mean to follow.

An interesting problem is that kernal mode can only be entered via interrupt - it's VPB going low that sets the kernal mode flip-flip. And once in kernal mode bank zero is the real bank zero which means that the interrupt stack has been left in another different bank, a bit of a problem for RTI. The solution (so far) is to ensure that every bank that contains program code contains the interrupt handler routine. Jjust last few bytes of it at the same location in the user bank so that when the kernal mode flip-fop is cleared and memory addresses get remapped into that user bank then the final RTS is still called. More problems occur when IRQs happen in kernal mode - I need to keep track of how many IRQs deep I am to ensure I RTI from the right stack. And I have an NMI problem that I haven't solved yet; I intend to drive the process scheduler using NMIs and all hell breaks loose if that occurs whilst any other interrupt is occurring or being returned from...

Dr Jefyll wrote:
Have you considered building your own DMA controller? Even using discrete logic it'd be doable. And it could perform operations, too (within reason).
Yup, that's kinda what my line-blitter thing is. It started life as (the very imaginatively named) "Memory Copier". The only operation it performs is a transparency index test using an SN74F521 and then skips the write when equal.

Dr Jefyll wrote:
Is it your intention that the three blitter 816's will access the shared memory not only for pixel data but for all fetches (including code)?
It is, and yet *another* thread I need to make all it's own is that timing circuit. And this one really needs to be presented with diagrams because it's simple in theory but hard to describe. An unhelpful comment might be: think of using a single 'LVC163 counting into a single 'LVC138 to generate each/all of the 65816's clock cycles; that will cause them to be 'staggered' into to their own memory access window in time. It's a bit more complicate than that but that's the gist of it.

Dr Jefyll wrote:
"The Parallel Interface Bus (PIB) ... a single '265 acting as a slave or sub-processor.
I must admit I tend to think of the '265 as being a slow and expensive '816. However when I looked at the the datasheet I either didn't see (or didn't understand) the parallel interface bus at the time. Having another single (slow) MPU slave is a great idea, thanks! It gets rid of so much discrete circuit complexity and it's probably more cost effective in the long run too.

And so where to from here? My next post in *this* thread is unfortunately going to be a Logisim related one because there has been an issue on the Logisim team side :( Project wise I'll post up the timing circuit in Hardware next, the what and the how :D and keep on keeping on from there.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 2:25 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
AndrewP wrote:
Basically I want to allow user programs to have their own stacks entirely transparent to each other. To do this I remap accesses to bank zero based on the currently set user program latch. For example user program $FF bank zero might map to bank $02 so when address 00:801B appears on the bus it will be dispatched to memory address 02:801B instead. Whereas user program $FE maps to bank $04 so address 00a:801B is dispatched to 04:801B.
Hmmm, alright... But it doesn't sound too terribly time consuming. If I'm understanding properly then the circuit below is one possible way to manage the matter.

As you know, a '157 mux can output either of its two input words, and it will output all zeros when Enable# goes high. The 3-input gates have a maximum prop delay of 3 ns or less. And the circuit can be simplified slightly, as gates A and B do the same as A' and B'.
Attachment:
wavery01.png
wavery01.png [ 7.85 KiB | Viewed 10241 times ]


Quote:
A more general solution is to take a 16bit number, say $FF00 with the upper 8 bits being the user program number and the lower 8 bits being the bank, and then use that as an address read into a memory location that contains $02 and then slip that into the actual bank address going out to main memory.
And you mentioned 128KB of private RAM per 65816 -- not 64K.

So, I assume it's the Kernal Mode signal that provides the additional address bit, like this:
Attachment:
wavery02.png
wavery02.png [ 3.75 KiB | Viewed 10241 times ]
This approach takes good advantage of the cheap, fast RAMs that are available nowadays. What's missing in the diagram is circuitry for writing to the RAM. Dual-port RAMs are not so cheap; but, with some clever glue applied, a conventional RAM would suffice.

What I'm wondering is why you feel so many CPUs are required. I understand your explanation of the staggered clock phases. But to me it'd seem preferable to have one very busy CPU (and perhaps a DMA unit). If the decode delays truly were as bad as you supposed then multiple CPUs would start to make sense. But I'm not convinced the decode delays are so terribly severe. Just my two cents worth!

Quote:
An interesting problem is that kernal mode can only be entered via interrupt - it's VPB going low that sets the kernal mode flip-flip.
Okay, and of course VPB doesn't go low until after PB, PC, and P have been pushed to stack.

Would it help your cause if the circuitry could be made aware of the interrupt before the pushes to stack occur? This is possible for both hardware and software interrupts, hardware interrupts being easier to detect. The tipoff for hardware interrupts is the failure of PC to advance as it usually does following an opcode fetch cycle (VPA=VDA=1). If PC doesn't advance then A0 will fail to toggle, and this is easily detected. It tells you the opcode will be discarded and instead an interrupt sequence is about to commence.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 9:08 am 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 251
Location: South Africa
Image
Dr Jefyll wrote:
...so many CPUs are required.
The short and only slightly facetious answer to this is: because I want to see if it's possible. The longer answer is more of a question. Can a 65x based computer be built that is useful and usable nowadays? How much much processing power is needed? Probably more than one 65816 and if so then: how many can I reasonable have talking to shared memory before the control circuitry gets stupidly complicated? The original solution was four but it looks like I might be able to squeeze in five. And, importantly, can each 65816 be generally usable when it's not driving graphics or sound or whatnot?

The answer to all of that is: I think so. Taking off the nostalgia glasses - or trying to anyway. The W65C816S6TQG-14 seems to be a really capable chip and it's chip that is still easy to understand. And hopefully it will culminate in a computer I can show someone and say "See this is the awesome, impressive thing it does, and this is how it does it". It's not just magical Intel Inside that is complicated beyond my ability to understand.

A bit more of the "why so many CPUs?" is that in my day-to-day job the multi-threaded / multi-processor thing is where I specialise. I should be able to bring that expertise over to software for this project too.

Dr Jefyll wrote:
Okay, and of course VPB doesn't go low until after PB, PC, and P have been pushed to stack. ... Would it help your cause if the circuitry could be made aware of the interrupt before
Absolutely! I think, without a redesign of the simulation yet, that that would be good solution to all my interrupt problems.

Dr Jefyll wrote:
So, I assume it's the Kernal Mode signal that provides the additional address bit, like this:
Kinda but I also want to be able to read and write to it as normal memory when in Kernal mode. And that makes things complicated. I'm not sure it's worth the complexity yet so I'll quote your schematic when I post a thread on the private memory workings. On that, the reason I'm using 128K private SRAM is because I have a bunch of IS63WV1288DBLL-10TLIs knocking about. Quite fortunate as that gives me 64K for the kernal and 64K for the User-Bank map.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 3:20 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Glad you're finding that circuit snippet interesting. The '157 can be a tad confusing, and I've marked a few typos in the diagram below. I didn't check the whole thing but it's clear you've gotten the idea. And BTW please attach images with your post, rather than using a third party site like Imgur. You'll find it more convenient, and there's also the issue of permanence. Images on a third party site tend to disappear after a while, reducing the value of a forum thread for future readers.
Attachment:
vWj3OQ6.png
vWj3OQ6.png [ 19.62 KiB | Viewed 10201 times ]


Quote:
Dr Jefyll wrote:
Okay, and of course VPB doesn't go low until after PB, PC, and P have been pushed to stack. ... Would it help your cause if the circuitry could be made aware of the interrupt before
Absolutely! I think, without a redesign of the simulation yet, that that would be good solution to all my interrupt problems.
Would it be helpful to also detect software interrupts -- ie, BRK and COP? To detect these, you need to actually snoop the opcodes ingested by the processor. In fact you can detect any opcode you like, and respond with all sorts of shenanigans, including the simple and the sophisticated!

Quote:
How much much processing power is needed? Probably more than one 65816 [...] the multi-threaded / multi-processor thing is where I specialise.
Okay, I get the multi-threaded thing, and hats off to you -- it's a provocative goal to pursue. But multiple threads don't necessarily require multiple processors. And in this case unclear how much of a performance boost multiple processors would provide.

65xx processors have no internal cache, and their thoughput is directly affected by how much memory bandwidth you can give them. Adding more processors won't speed things up unless there's also additional memory bandwidth. One way to provide that would be to give each processor some private memory, and to have it access shared memory only when necessary (ie, for shared data). But I think you told me the shared memory would provide each processor's needs for both code and data, and from that I infer that the total system memory bandwidth hasn't increased. That being so, I don't see how total throughput can increase (other than the small advantage of faster task switching because of having multiple register sets that don't require saving). Am I missing something?

It seems to me you'd do well to increase bandwidth by giving each processor some private memory that doesn't require the other processors to wait or run at a lower frequency. Or, you could eliminate the complexity and just concentrate on making a single processor run as fast as possible. (For the pixel bopping, I mean. It may still make sense to have a separate processor -- a '265? -- doing the sound and SD card access. But I picture that as being only very loosely coupled to the system -- ie, it would have its own memory.)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 5:10 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1399
Location: Scotland
Quote:
Quote:
How much much processing power is needed? Probably more than one 65816 [...] the multi-threaded / multi-processor thing is where I specialise.
Okay, I get the multi-threaded thing, and hats off to you -- it's a provocative goal to pursue. But multiple threads don't necessarily require multiple processors. And in this case unclear how much of a performance boost multiple processors would provide.


Expectations...

It's an interesting idea that we might be able to replace a modern desktop system with a "retro-new" build. I have thought about it many times but I was also "back there in the day" - well at least from '78 onwards.

What can I do today with my 16Mhz '816 system with 512KB of RAM...

I can edit, compile and run high level language programs locally. It's no-where near as fast or slick as I can on my Linux desktop but the principle is the same - on my desktop I use a text editor (vim), Makefiles and compilers. (C, also a BASIC desktop environment I wrote myself).

On RubyOS I can edit with my own nano-like editor, compile and run BCPL programs. I'm working on the BASIC... I could cross-compile C but I'm not that interested in that. I currently do have to cross-assemble native '816 code, but I'm also working on an assembler in BCPL.

On my Linux desktop I can access filing systems - and under RubyOS I have the same, but not everything is supported yet (no random access files, no pipes).

Graphics - yes to both, but it's old-school under RubyOS - graphics are actually serialised to a "smart" terminal. Think Textronix 4014 although it's actually more like a BBC Micro. A local display of 640x480x8bpp would be great, but memory bandwidth & size:- 300KB of RAM is a limitation. (and hardware to derive a video signal) Taking it down to 1bpp then 38KB of RAM which is manageable plus associated local hardware and software. (Or I have a separate graphics processor - e.g. something like the "VERA" in the X16 project or the "Blit" which was a separate "smart" graphics terminal c1982. My own "smart terminal" is a Linux application using the SDL library and I aim to make it a bit Blit-like eventually. Blit, Tek 4014 and RubyTerm all use async. serial the only difference is the codes and ability of the terminal.

The missing thing ... Speed. It's slow. Would a multi processor '816 system speed it up? Who knows - the limitation right now is that all my OS is written in BCPL and that's compiled to a bytecode that's interpreted on the '816. Linux is C compiled to native. What could we do on a multi processor system to speed things up? (And having worked for and been involved in parallel processing myself for some time, I'm still struggling to work out how it could help unless someone writes a lot of code to make it work) One processor per peripheral might be a start - then it's down to comms. Shared memory, message passing, etc. but making an editor use more than one core? A multi-pass compiler may benefit if you can get enough overlap. Anything doing searches - spell checkers, etc. could benefit a lot if written well.

(and on the spell chequer note, see: https://prog21.dadgum.com/29.html )

The obvious missing things might be network and a web browser. A modern web browser is an operating system in itself now.

And we're obviously not going to be doing high level data analysis, image processing, etc. at least not in a sensible time span... However VisiCalc was a thing on the Apple II in '79 and that spewed forth a whole new raft of computation methods - and computation needs! Same for AppleWorks and WordStar? A ROFF-like program is workable, but would you use that or prefer MS Office (or Libreoffice) I'm a LaTeX person myself and I do remember running that on old Sun3's with a 16Mhz 68030 and just 4MB of RAM in the late 80's, but it wasn't fast then...

I do enjoy dabbling with my RubyOS - muti-threading, single user - it reminds me of the old PDP11/40 I first ran Unix on in 1980 (although RubyOS is slower) and I'm slowly writing similar utilities and features (it's hard to not make anything look like Unix these days though) but could I move my entire work-flow over to it? Unlikely....

But a great project though!

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 8:05 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 251
Location: South Africa
Dr Jefyll wrote:
I've marked a few typos in the diagram below.
Thanks! ... and I even drew the bar on EN, you'd think I would have twigged that something was a bit off :?
Dr Jefyll wrote:
please attach images with your post, rather than using a third party site like Imgur.
Can do. And this is kinda why I'm posting to this thread... I'm trying to work out how to make attached images appear in the text rather than at the bottom.

Attachment:
AttachmentOfDoom.png
AttachmentOfDoom.png [ 17.86 KiB | Viewed 10134 times ]

Hmnmm, I'm going have to play a bit more to make things look pretty. Is there anyway to not have the attachment preview itself? Oo! Maybe [EDIT: Right click and open image in a new tab - it's now hosted on this forum] [MORE EDIT: Nope the forum software didn't like hosting the image without it being an attachment in this post.]

Dr Jefyll wrote:
KimKlone is a microcomputer I built in the 1980s
Wait! You built the KimKlone? I hadn't put two-and-two together realised and you were that Jeff. I can't think of anyone better to help with detecting software interrupt instructions... and with that neat little segue...
Dr Jefyll wrote:
Would it be helpful to also detect software interrupts -- ie, BRK and COP?
Again absolutely! I'm assuming it's going to something along the lines VPA & VDA & PHI2-is-about-to-go-low & (D == 0 | D == 2). If you could give me a few pointers on how to practically do that that'd be great. I get the feeling there may be timing gremlins in there. (Actually I had a walk after typing this and realised I only need to detect the BRK or COP before the stack manipulation cycles start so the timing might not be that tricky).

Dr Jefyll wrote:
Adding more processors won't speed things up unless there's also additional memory bandwidth. One way to provide that would be to give each processor some private memory...
or to use fast SRAM! I have started putting together a new thread on that and I will come back and edit its link into here later. [EDIT - Done]

I think I can make all five 65816s run completely transparently to each other in the same main memory. And then, given that that works I should have been more specific and mentioned that I specialise in parallel multi-processing. Five 65816s running concurrently at 13Mhz (best case) is some serious computing power. (I say typing this on my Core i5 8265.)

In terms of graphics line blitting it should be either the 65816 or the line blitter using same 'window' of memory access. i.e. if one blitter is reading from main memory the other 65816s or blitters will be unaffected. However if the 65816 driving that specific blitter tries to use main memory then it should be ABORTed. I haven't simulated anything I've said here so I'm not sure what I'm not sure about; this could all be impossible. It's unlikely that 65816 would use main memory whilst it's driving a blitter because it can only do that from kernal mode and in that case it's using it's own private memory.


Last edited by AndrewP on Wed Dec 15, 2021 2:29 pm, edited 5 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 8:17 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
To inline attached images, attach each one first, and then put the cursor where you want the image, and click "Place inline." You can do this with lots of images in the same post. If possible, please have them scaled such that we don't have to do a lot of panning to see the whole image. I always scale mine to the minimum number of pixels needed to clearly show what's needed. So for example, if you use a camera that gives you 4096 dots across (probably 12 megapixels), and you don't need more than 640 or 800 across to convey the needed information clearly, please scale it down accordingly. I use gimp which is free, and just takes a few seconds to crop and scale appropriately.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2021 8:34 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Andrew, I think this topic and the links in it will be of interest, regarding multiprocessing:
Theoretical question - Multiple CPU's

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 15, 2021 8:34 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Nearby, AndrewP asks:
AndrewP wrote:
Another question I've been asking is "what level of completeness should I have before posting?". And for this I don't have a good answer. Too soon and this project could get stuck on paper being endlessly re-designed; too late and I either present what I've done fait accompli or I get blocked on something I just cannot do on my own.


My approach, I think, would be to try to distinguish the design from the implementation. You'll still be iterating, but your aim is to have clarity on several different levels, and not to mix the levels:
- what should this system be able to do
- how must this system be organised
- what components and circuits will be used, and how

As noted upthread, the idea of hooking up multiple '02 or '816s to the same shared memory has come up before. It's an interesting and important difference in this case, if you intend to use a RAM which can be accessed multiple times faster than any one processor. In this case, adding a processor might indeed add to the amount of useful work you can do. It remains the case that someone - the system designer, the operating system designer, or the application programmer - will have to find a way to break their problem into several parallelisable chunks, in order to keep more than one processor busy.

Although this isn't quite the thread title for this discussion, now I look at it, your new thread didn't feel like quite the right one for these comments of mine. As they say, Naming Things is one of the hard problems of computer science. In this case, naming threads.

Your answer to
- what should this system be able to do
is probably something like
- get useful work out of multiple CPUs simultaneously
and your preferred tactic - your proposed organisation - is to use shared memory, but memory that's significantly faster than the CPUs.

Once you can set down these kinds of statements about your project, you can avoid the side-discussions about other goals and other ways of achieving your goal.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 15, 2021 5:19 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 251
Location: South Africa
drogon wrote:
It's an interesting idea that we might be able to replace a modern desktop system with a "retro-new" build. I have thought about it many times ...
And that's the challenge with 65x based systems, and if I'm being honest with myself the one I'd like to build included, they're mostly nostalgic retro to me. The Commander X16 leans heavily towards the VIC-20 and C64 crowd (both of which I find myself in) but it's not a useful machine. It's a machine I will buy, enjoy tinkering with and then set aside. The Foenix C256 is probably the closest thing to a useful retro computer but I don't know how to buy one and last I looked and saw a price it was going to cost me something like ten thousand Rand; that's not throwaway money. The Mega65 seems a bit academic to me and I've never felt compelled by it.

So back to modern retro systems. What do I think makes one useful? Or let me ask rather, what makes one appealing? I appreciate a nano-like editor (I like nano myself) but if I showed RubyOS to my young nephew or niece they would not understand and would not be interested. I think if there were discrete graphics chips in production still there would be a lot more interest in 65x based systems. I think media and entertainment and a big part of what makes a retro system appealing to anyone without a backgound in microprocessors. I still play games on MAME because I enjoy them but if I break out my C64 it's for nostalgia. As you've said, the missing thing is speed, a modern retro system needs to be fast. And that touches on the design point BigEd has mentioned further down.

drogon wrote:
The obvious missing things might be network and a web browser. A modern web browser is an operating system in itself now.
Agreed. I have complaint about how browsers are so complicated that they basically prevent any new competition.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 15, 2021 6:00 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
AndrewP wrote:
Dr Jefyll wrote:
Would it be helpful to also detect software interrupts -- ie, BRK and COP?
Again absolutely! I'm assuming it's going to something along the lines VPA & VDA & PHI2-is-about-to-go-low & (D == 0 | D == 2).
Attachment:
wavery00.png
wavery00.png [ 14.9 KiB | Viewed 10130 times ]
Alright, to explain I've adapted a schematic excerpt from another project. This logic watches for opcodes $50 (BVC) and $42 (WDM) being executed, but minor changes would allow you to look for BRK and COP instead. And because we need to be sure the opcode really is executing, we mustn't get fooled by the beginning of an interrupt sequence, in which case an opcode will be fetched but ignored. Therefore we keep a lookout for the beginning of an interrupt sequence by monitoring A0 as explained upthread. If A0 fails to toggle following an opcode fetch then we know an interrupt is about to be recognized.

What we have is a pipeline which advances on the falling edge of Phi2.

  • In cycle 1 the opcode is being fetched from memory.
  • In cycle 2 the opcode appears on output of the 574 and is decoded by the '238 and the gates.
  • The output of the gates moves down the pipeline on the following cycles, appearing successively on the outputs of the individual flipflops.

Variations are possible, of course, but this explains how hardware interrupts can be detected prior to the vector fetch. It also explains how execution of any specific opcode can be detected.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 56 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: