6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Sep 21, 2024 2:37 am

All times are UTC




Post new topic Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Sun Jan 13, 2019 7:48 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1382
drogon wrote:
Chromatix wrote:
It doesn't get all that warm as it is, without even a heatsink - so adding liquid cooling wouldn't change much. Ultimately, small .6µm chips like this tend not to be limited by thermals, but by fundamental wire and switching delays.


And to add to the "lack of heat" thing, I've been running a WDC 65C02 at 16Mhz for a month or 2 now. It's been on more or less on all the time and is there and "just works" when I use it. It's basically running at ambient. Faster? Who knows - my system is just on stripboard and a double sided PCB and I'm continually amazed it runs at that speed at all. It does have 10nS static RAM though.

-Gordon


True, the latest CMOS versions don't generate any significant heat, based on power consumption versus clock frequency (compared to their NMOS relatives). My last SBC uses an Atmel ATF22V10CQZ PLD, which, by itself, draws much more current than all of the other components combined. Still, you really need a infrared thermometer to see that it's running slightly warmer than room temp.

So, I can't see how water cooling would have any real effect on performance. Perhaps using a Peltier device, which would have a cold side, could drop the temperature below ambient and perhaps might allow for a faster clock, providing the rest of the components can keep up. Yet another experiment to think about :shock:

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 8:23 pm 
Offline
User avatar

Joined: Thu Nov 27, 2014 7:07 pm
Posts: 47
Location: Ocala, Fl, USA
I completely agree with the use of COP; however, the SuperCPU does not make use of it (a nop), so it never made sense to even consider it. I disagree (to some point) that the API interface is a "heavyweight" as it is the only alternative to BRK in the SuperCPU. It's tight, but for sure someone can make it better. The use of a 16-byte local stack costs nearly nothing to implement, and as used on the SuperCPU, runs at a speedy 20 MHz. It also makes a nice even, 32-byte frame. Params are in X/Y, while the function # is passed in A, with return values in A, X, or Y. The real limitation is I/O, which must slow the processor to 1 MHz, mirror, and sync in order to do any communicating, given that all the original chips were used.

Addressing static RAM, there is 128K of it in the SuperCPU, and it definitely runs faster than the old DRAMs that they used to fill in the rest of the address space. You can still move up to 3 bytes per 1 MHz cycle. Is it true that a modern DRAM is too slow for something running at 14-20 MHz? I don't know, but wouldn't think so.

So, much thanks so far to everyone, and it would be great to see this all happen some day soon. We just want to put it out there. I've been up to my ears with our new house, currently building a fully dynamic geothermal system capable of keeping our house, greenhouse, and orchard (as needed) at a beautiful 73 deg. F all year round (Florida). It really helps to be only 26' above sea level, btw. Our water table is at 30". So, as soon as we're 100% in (March or so), the computer project goes into high gear.

Attached is an article describing the (mostly) unknown boot process for a Native-mode OS in the SuperCPU.


Attachments:
SuperBoot v1_revised.pdf [983.61 KiB]
Downloaded 81 times
Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 8:30 pm 
Offline

Joined: Sat Jun 04, 2016 10:22 pm
Posts: 483
Location: Australia
drogon wrote:
Seems I can't access it without a facebook account. Ah well. Will it be described on another medium?


I second this.
I'm definitely interested as well. I don't think I'll be of any use, but I do want to be able to spectate if I can't. (I kinda wish I could say more, but I'd rather not derail the thread)


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 9:58 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
BigEd wrote:
I like the idea of using COP though, to get around the inter-bank memory reference problem

What problem is this, exactly?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 10:07 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
As BDD explained it, for the COP handler in bank 0 to read the COP instruction's operand, it needs to determine and access the bank holding the COP opcode.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 10:24 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
SDRAM (and its descendants) is fairly explicitly designed for use with a cache. You have to feed it a row address, then wait, then a column address, then wait again, then you get a whole burst of data out, 16 bits per cycle, for 8 cycles in a row (IIRC), if you have a single chip. There may even be a minimum clock speed in the tens of MHz. That's a very poor fit for the 6502 bus, though you could doubtless construct something to bridge the gap if you had to.

Going back a step to the last of the "asynchronous" DRAMs, which are much more similar to what classic micros used, EDO is usually quoted as having 60ns latency, which on the face of it is just about sufficient for 14MHz with a little margin for glue logic. I quickly found an old datasheet for a 4Mx4 60ns device with 5V and 3.3V supply options.

However, driving EDO DRAM correctly is quite a lot more complicated than driving SRAM; consecutive accesses in the same page (where you only need a /CAS strobe) are definitely fast enough (even for 20MHz), but you'll need to insert wait-states to cope with page changes (where a /RAS then a /CAS strobe are needed) and possibly also for handling refresh (which also means extra glue logic). This also means that you'll want program, stack and data to be on different DRAM devices, or to keep one or more of these in a smaller SRAM device, otherwise page changes will be very frequent and lead to a lot of wait-state cycles lost.

A related option is VRAM, which is best described as FPM/EDO DRAM with an extra port for sequential output. This extra port is intended for driving a video signal without having to continuously interleave access with the CPU.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 10:45 pm 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
As for the discussion of a new 65816 computer which seems to be interleaved into this discussion, I'll also working on something like this. I suspect my progress will be a lot slower than other people here, but I've at least got to a stage where I think I have a design that I like. With any group project that may come up from time to time there are a lot of different goals, plus we do this for the fun of it, not necessarily for the results, so working on things ourselves is often more rewarding. Having said that, I think I'll take decades to get where I want, so some sort of joint project makes sense too.

I've settled on using a FPGA as the core glue of the system, hooked up to a real 65816. I've got that running in breadboard form and it was remarkably easy. Currently I'm running the 816 at 3.3v for ease of integration with the FPGA, which is a Spartan-6. I've got a whole pile of level converters that I just need the time to wire up and test out 5v operation. I was going to stay with 3.3v for the system but there's a lot of IO chips that I want to use that are 5v only, plus I should get more speed out of the 816 at 5v. The clock is fed from the FPGA so it's easy to change the speed.

As for memory, I'm using a 64MB LPDDR RAM on an FPGA dev board currently. This means I can fill up the entire memory space with RAM. In the future I've got ideas about implementing a Paged MMU and that would open up the rest of the RAM with each process having up to a 16MB window into the full RAM. The RAM is accessed through a controller on the FPGA, which will also handle video output, reading directly from the RAM. Another far distant idea I had was to included 2 65816's on the board, running in interleaved cycles. This was so I could explore multiprocessing in software. Interestingly Rockwell had a microcontroller that had two 6502 cores running opposite cycles, but they went even further and shared the ALU and instruction decode (I think), so you basically got two CPU's for the cost of double registers.

The other feature I've worked on that I think has worked out well is having no ROM, but loading the ROM image from the same SPI flash that is used to initialise the FPGA. On startup the FPGA holds reset high while it copies 64k of data from the SPI flash (which has left over space) into Bank 0. This makes the system easy to program (just use the FPGA programming tools) and avoids slow ROM.

So the core of the system is relatively simple. I chose the Spartan-6 because it has a QFP package, which while not simple to solder, can at least be done at home. The RAM will most likely be a MT48LC64M8 which is not too hard to work with. This is all very similar to Arlet's 6502 Sandbox from which he kindly gave me the schematics and Verilog code and that got me started.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 13, 2019 10:48 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Quote:
Interestingly Rockwell had a microcontroller that had two 6502 cores running opposite cycles, but they went even further and shared the ALU and instruction decode (I think), so you basically got two CPU's for the cost of double registers.

Nowadays this is called "simultaneous multithreading" (SMT).


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 6:49 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
BitWise wrote:
In UNIX only the low level API functions (e.g. open, read, write, fork, kill, getpid, etc.) are accessed by an interrupting 'syscall' mechanism. The higher level API functions (e.g. fopen, fread, fwrite, printf, etc.) are all just library functions accessed by a subroutine call which makes them much more efficient.

However, the library APIs that handle I/O ultimately must make a "syscall" in order to accomplish anything. So the OS trap is not being avoided, but merely made somewhat opaque.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 7:54 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
BigEd wrote:
My inclination though would always be to use the accumulator to select the api call, and use X and Y for the first two parameters...

When UNIX was ported to the MC68000, the system call API was arranged so the index was loaded into register D0, which would mirror the idea of loading it into the 65C816's accumulator. Although the 68000 has more general purpose registers than the '816 they are not used for parameter-passing. It's all done on the stack.

In my "treatise" on 65C816 interrupt processing, I presented an API calling procedure in which the index is passed in the accumulator and parameters are passed via the stack. The call is invoked with COP—the signature byte is ignored. The example is one of taking the ANSI C library code that would open a file and seeing how it would be handled in 65C816 assembly language.

Code:
    /* create & open a new file in ANSI C */

    char fname[] = "/usr/bdd/newfile"; /* pathname */

    int main() {
        int fd;                        /* file descriptor */
        fd = creat(fname,0664);        /* create & open file */
        return(fd);                    /* return file descriptor to caller */
    }

If the C library were running on an '816, we might see a subroutine such as the following for the creat() API call:

Code:
;create new file...
;
         pea #$01b4            ;push file mode to stack
         pea #$41d7            ;push pathname pointer
         sep #%00100000        ;select 8 bit accumulator
         lda #$08              ;create() API index
         cop $00               ;transfer execution to kernel
         bcs _error_           ;kernel API returned an error
;
         rts                   ;file created & opened
;
_error_  ...error processing...

Within the kernel, front-end code such as the following would be executed when the COP instruction is encountered:

Code:
;KERNEL API FRONT END — EXECUTED IN RESPONSE TO A COP INSTRUCTION
;
;    ——————————————————————————————————————————————————————————————————
;    .A must be loaded with the 8 bit API index prior to executing COP.
;    ——————————————————————————————————————————————————————————————————
;
icop     rep #%00110000        ;16 bit registers
         pha                   ;save .A for return access
         phx                   ;preserve .X &...
         phy                   ;.Y if necessary
         cli                   ;restart IRQs
         and #$00ff            ;mask noise in .B (16 bit mask)
         beq icop01            ;API index cannot be zero
;
         dec a                 ;zero-align API index
         cmp #maxapi           ;index in range (16 bit comparison)?
         bcs icop01            ;no, error
;
         asl a                 ;double API index for...
         tax                   ;API dispatch table offset
         sta apioff            ;save offset &...
         jmp (apidptab,x)      ;run appropriate code
;
;
;    invalid API index error processing...
;
icop01   ...handle invalid API index...

The theory for this procedure is there are API calls that require three or more parameters. In some cases, the parameter count could be variable. For example, opening a file in Linux can be qualified with some additional flags to tell the kernel to create the file if it doesn't exist. In fact, a call to open() could involve anywhere from two to seven parameters.

Logically, such a requirement could be handled by pushing the file-handling parameters along with a parameter that tells the API code how many parameters are to be processed. In my opinion, doing this with a mix of register loads and stack pushes would quickly become unwieldy. None of the UNIX implementations about which I have internal knowledge take that approach; they all use the stack for parameters and a single register for the API index. Such a philosophy makes sense for the sake of uniformity. If all APIs use only the stack for parameter-passing a common front- and back-end mechanism can be used in all cases. As the '816 has instructions for conveniently managing the stack, it seems sensible to me to take advantage of them.

Quote:
A pity that the COP's operand isn't much use.

The signature byte can be gotten by preserving DB, loading DB with the stack copy of PB, fetching the stack copy of PC, copying it to .X and decrementing it. An <addr>,X load would then be made to get the signature byte, where <addr> would be $0000 (not $00, which would be direct page and result in the load coming from bank $00). It's major hoop-jumping just to get one byte. Better, I think, to supply the API index in a register or on the stack.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 8:38 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
I have a feeling that in register-rich architectures, it's common to pass some parameters in registers, and use the stack only for calls which need a lot of parameters. Certainly I'd be inclined to use X and Y for something (possibly because I'm used to stack accesses being awkward, possibly because Acorn's BBC MOS does things this way.)

I feel we may have diverged quite a way from satpro's head post. I suspect the answer there is to start new threads for new questions and new observations, to present ideas and collect feedback. It's difficult to keep a thread to a single question or purpose.

Any concrete suggestion for a 65816 kernel API is likely to be workable, or to be fixable with small tweaks. So, rather than chasing perfection, or universal acclaim, go ahead. An implementation is far more decisive and useful than a design!

There are surely useful things also to be said about memory systems for a 20 MHz design - might be better to summarise and discuss in a new thread. There's lots of prior art.

And indeed, jds, it would be good for you and for everyone if you started a thread about your design.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 3:47 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Is there any advantage to the COP mechanism over simply a JSL to a common entry routine (with the API # on the stack/accumulator) beyond not having to have a hard coded entry routine value?

The IIGS OS uses the JSL technique.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 4:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
If you use any kind of absolute addresses, you're fixing the binary API. It might be that you want to use the machine later in a computer with a different address map. Using COP (or BRK) makes the whole OS float. Likewise, it might allow for some kind of hardware protection, at a later point.

It's a kind of late binding, which seems generally to be felt to be a good thing.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 8:17 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
The COP vector itself is an absolute address anyway. You could have one address to JSL to that doesn't change, but whose contents change with version. At least COP doesn't further burden the IRQ ISR like BRK does on the '02.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 14, 2019 8:58 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
BigDumbDinosaur wrote:
The main value in using COP for API calls is nothing special has to be done to call an API from another bank. All of that is handled for you due to COP being just another interrupt.

I guess this is my real question. I'm still not sure what the problem is here.

Is the problem that the caller simply doesn't have to distinguish between JSR and JSL?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: