MCL65+
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
I have a update on the MCL65+ on my blog: https://microcorelabs.wordpress.com/202 ... 65-update/
I was able to run a number of VIC20 cartridge games such as Jungle Hunt, Pac-Mac, Star Trek, and a number of others in an effort to shake out issues with the correctness and implementation of the MCL65+'s emulated 6502 core as well as its bus interface. If there is one more more "killer apps" you can suggest or would like to see please let me know and I will try!
The next things I would like to do is running in an accelerated mode, and trying it on an Apple II+.
Depending on the level of interest, I also need to figure out a way to distribute the bare PCBs... I can either handle it, or could possibly use a third party. Any suggestions? They use all through-hole components and the software environment is the Arduino IDE (Teensyduino actually), so it should be inexpensive and easy to duplicate this project!
Thanks,
-Ted
I was able to run a number of VIC20 cartridge games such as Jungle Hunt, Pac-Mac, Star Trek, and a number of others in an effort to shake out issues with the correctness and implementation of the MCL65+'s emulated 6502 core as well as its bus interface. If there is one more more "killer apps" you can suggest or would like to see please let me know and I will try!
The next things I would like to do is running in an accelerated mode, and trying it on an Apple II+.
Depending on the level of interest, I also need to figure out a way to distribute the bare PCBs... I can either handle it, or could possibly use a third party. Any suggestions? They use all through-hole components and the software environment is the Arduino IDE (Teensyduino actually), so it should be inexpensive and easy to duplicate this project!
Thanks,
-Ted
Re: MCL65+
MicroCoreLabs wrote:
I have a update on the MCL65+ on my blog: https://microcorelabs.wordpress.com/202 ... 65-update/
Depending on the level of interest, I also need to figure out a way to distribute the bare PCBs... I can either handle it, or could possibly use a third party. Any suggestions? They use all through-hole components and the software environment is the Arduino IDE (Teensyduino actually), so it should be inexpensive and easy to duplicate this project!
Thanks,
-Ted
Depending on the level of interest, I also need to figure out a way to distribute the bare PCBs... I can either handle it, or could possibly use a third party. Any suggestions? They use all through-hole components and the software environment is the Arduino IDE (Teensyduino actually), so it should be inexpensive and easy to duplicate this project!
Thanks,
-Ted
EDIT: I agree with BigED that Tindie is an excellent route if you just want to make small batches here and there and sell them.
Last edited by rpiguy2 on Wed Jan 06, 2021 4:00 pm, edited 1 time in total.
Re: MCL65+
It will be very interesting to see the acceleration in effect! (Might be good for chess, for example, or for fractals, or possibly for adventure games which spend time searching and compressing.)
If you have a few surplus PCBs, you could offer them here. But beyond that I'd go along with the idea of sharing the files. If you want a shopfront kind of idea, perhaps try Tindie - there are several 6502 PCBs on there already. Or perhaps collaborate with an existing site, like thefuturewas8bit.com
Over on stardot, it's common for people to sell on ebay but also sell directly to forum members, often at a slightly better price.
If you have a few surplus PCBs, you could offer them here. But beyond that I'd go along with the idea of sharing the files. If you want a shopfront kind of idea, perhaps try Tindie - there are several 6502 PCBs on there already. Or perhaps collaborate with an existing site, like thefuturewas8bit.com
Over on stardot, it's common for people to sell on ebay but also sell directly to forum members, often at a slightly better price.
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
I uploaded the MCL65+ project to GitHub: https://github.com/MicroCoreLabs/Projec ... L65%2B.ino
This is the code/sketch I have been using on my VIC20. I did not include any of the cartridge ROMS, however they are easy to add if you wish. You can run the cartridges and expansion memories out of the Teensy's internal RAM as either cycle accurate or accelerated modes. You can also simply populate the complete 64K RAM array with your own code image which was how I ran all of the popular 6502 opcode tests.
The bus interface is abstracted from the emulated CPU core so you can run the emulation with your own top level interface rather than use the 6502 bus signals. The core is cycle accurate and supports most of the undocumented opcodes which are needed to run some applications.
I was able to accelerate some VIC20 games and applications and got interesting results. When the cartridge, RAM, and ZeroPage/Stack address ranges were accelerated, some games actually ran extremely well.. Meaning the original speed was too slow, but when accelerated became more fun to plan. This includes Pac-Man, Donkey Kong, and a few others. Defender was a little too fast when accelerated... I was not able to successfully accelerate the VIC20's BASIC by much. When I accelerated the ZeroPage and Stack I got about a 15% improvement in speed... But when I tried accelerating the BASIC or KERNAL ranges I got garbage on the screen, so there appears to be some timing dependancies with the VIC20 BASIC.
I believe this makes it the world's fastest VIC20...
Next will be the Apple II+...
This is the code/sketch I have been using on my VIC20. I did not include any of the cartridge ROMS, however they are easy to add if you wish. You can run the cartridges and expansion memories out of the Teensy's internal RAM as either cycle accurate or accelerated modes. You can also simply populate the complete 64K RAM array with your own code image which was how I ran all of the popular 6502 opcode tests.
The bus interface is abstracted from the emulated CPU core so you can run the emulation with your own top level interface rather than use the 6502 bus signals. The core is cycle accurate and supports most of the undocumented opcodes which are needed to run some applications.
I was able to accelerate some VIC20 games and applications and got interesting results. When the cartridge, RAM, and ZeroPage/Stack address ranges were accelerated, some games actually ran extremely well.. Meaning the original speed was too slow, but when accelerated became more fun to plan. This includes Pac-Man, Donkey Kong, and a few others. Defender was a little too fast when accelerated... I was not able to successfully accelerate the VIC20's BASIC by much. When I accelerated the ZeroPage and Stack I got about a 15% improvement in speed... But when I tried accelerating the BASIC or KERNAL ranges I got garbage on the screen, so there appears to be some timing dependancies with the VIC20 BASIC.
I believe this makes it the world's fastest VIC20...
Next will be the Apple II+...
Re: MCL65+
Thanks for sharing your code!
Getting bursts of full-speed activity when external bus cycles aren't needed should give a nice speedup. Any idea how fast your full-speed 6502 emulation is, on this 600MHz ARM device?
(Any cycle which doesn't need to read or write I/O or video memory should be ripe for full-speed operation, although you will need to model any memory mapping that the platform does. You can also speed up reads of video memory, if you keep a fast copy of it.)
Getting bursts of full-speed activity when external bus cycles aren't needed should give a nice speedup. Any idea how fast your full-speed 6502 emulation is, on this 600MHz ARM device?
(Any cycle which doesn't need to read or write I/O or video memory should be ripe for full-speed operation, although you will need to model any memory mapping that the platform does. You can also speed up reads of video memory, if you keep a fast copy of it.)
Re: MCL65+
MicroCoreLabs wrote:
I uploaded the MCL65+ project to GitHub: https://github.com/MicroCoreLabs/Projec ... L65%2B.ino
...
...
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
Quote:
Getting bursts of full-speed activity when external bus cycles aren't needed should give a nice speedup. Any idea how fast your full-speed 6502 emulation is, on this 600MHz ARM device?
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
Quote:
I just took a look at the other projects you put up on GitHub and they are very impressive.
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
I have yet another update: https://microcorelabs.wordpress.com/202 ... -apple-ii/
Im happy to say that the MCL65+ 6502 emulation also seems to work in the Apple II+.
I was able to run in cycle accurate mode, and then I began mirroring address ranges from system DRAM into the core where reads are served by the internal RAM array, writes pass through to the system memory, and both are cycle accurate. After that I emulated all RAM and BIOS ROMs using the accelerated internal array which runs at 600Mhz. I was able to just leave the I/O and video memory ranges as cycle accurate while the rest was accelerated!
The results were pretty dramatic! I did some simple tests in BASIC where I measured about a 6X speed improvement when running in accelerated mode over the 1Mhz cycle-accurate mode...
Cycle accurate mode: https://www.youtube.com/watch?v=UuSb7mr ... e=youtu.be
Accelerated mode: https://www.youtube.com/watch?v=rvJsCMR ... e=youtu.be
Im happy to say that the MCL65+ 6502 emulation also seems to work in the Apple II+.
I was able to run in cycle accurate mode, and then I began mirroring address ranges from system DRAM into the core where reads are served by the internal RAM array, writes pass through to the system memory, and both are cycle accurate. After that I emulated all RAM and BIOS ROMs using the accelerated internal array which runs at 600Mhz. I was able to just leave the I/O and video memory ranges as cycle accurate while the rest was accelerated!
The results were pretty dramatic! I did some simple tests in BASIC where I measured about a 6X speed improvement when running in accelerated mode over the 1Mhz cycle-accurate mode...
Cycle accurate mode: https://www.youtube.com/watch?v=UuSb7mr ... e=youtu.be
Accelerated mode: https://www.youtube.com/watch?v=rvJsCMR ... e=youtu.be
Re: MCL65+
MicroCoreLabs wrote:
I have yet another update: https://microcorelabs.wordpress.com/202 ... -apple-ii/
Im happy to say that the MCL65+ 6502 emulation also seems to work in the Apple II+.
The results were pretty dramatic! I did some simple tests in BASIC where I measured about a 6X speed improvement when running in accelerated mode over the 1Mhz cycle-accurate mode...
Cycle accurate mode: https://www.youtube.com/watch?v=UuSb7mr ... e=youtu.be
Accelerated mode: https://www.youtube.com/watch?v=rvJsCMR ... e=youtu.be
Im happy to say that the MCL65+ 6502 emulation also seems to work in the Apple II+.
The results were pretty dramatic! I did some simple tests in BASIC where I measured about a 6X speed improvement when running in accelerated mode over the 1Mhz cycle-accurate mode...
Cycle accurate mode: https://www.youtube.com/watch?v=UuSb7mr ... e=youtu.be
Accelerated mode: https://www.youtube.com/watch?v=rvJsCMR ... e=youtu.be
There was a Saturn Accelerator card and some clones that will work on the Apple II+, but they typically ran at 3.57Mhz.
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
It's actually more than ten times faster... My initial test program spent a lot of time printing numbers to the screen which goes onto the 1Mhz bus which slows it down. If I just print once every 1,000 iterations it is significantly faster because the complete program runs out of internal memory... I have not yet looked at the bus when this program is running but there might be additional BASIC or OS housekeeping cycles that are also slowing it down. It's a 600Mhz processor, so I would expect it to emulate the 6502 quite fast...
-
nollkolltroll
- Posts: 14
- Joined: 07 Jan 2021
Re: MCL65+
A great implementation, very interesting to see the source!
I was scratching my head for a while on how you managed without level converters on the address bus, but of course they are outputs. Only inputs will fry on 5V from the system
Might be nice to explain somewhere in the manual for future readers.
I do wonder though, if all systems will accept the 3.3V levels of the Teensy 4.1. Some might have pull-ups/downs that might interfere?
Did I miss an explanation of why there are separate data-in/out instead of a multi-directional level converter on 8 Teensy-pins?
This is the opposite of what I try to do, using a Teensy 4.1 as a generic I/O-chip. CPU, memory and bank-circuitry in real HW. Very nice to compare the solutions we both have done to similar problems.
I was scratching my head for a while on how you managed without level converters on the address bus, but of course they are outputs. Only inputs will fry on 5V from the system
I do wonder though, if all systems will accept the 3.3V levels of the Teensy 4.1. Some might have pull-ups/downs that might interfere?
Did I miss an explanation of why there are separate data-in/out instead of a multi-directional level converter on 8 Teensy-pins?
This is the opposite of what I try to do, using a Teensy 4.1 as a generic I/O-chip. CPU, memory and bank-circuitry in real HW. Very nice to compare the solutions we both have done to similar problems.
/NollKollTroll
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: MCL65+
Yes, because there are no pull-ups on the address lines on the motherboards it is safe to use the Teensy's 3.3V outputs directly. All other signals which were bidirectional or inputs to the Teensy needed voltage converters since it is not 5V tolerant.
Regarding the separate data input and output busses: I first explored using a Teensy4.0 which is a smaller board that has less IO's, but would have needed multiplexed address and/or data signals in addition to having a bidirectional data bus. One challenge of the project was being able to, on the same clock edge, sample the ready line and read data from the last transaction, and then generate the 16 address lines for the next bus cycle within the allowed time. I found that the time to multiplex signals and turn the data bus around took too long and could not make timing. The Teensy4.1 has enough IO's to have non-multiplexed address, data in and data out busses, so timing was easier to achieve, but not completely...
The Teensy4.x do not have parallel input/output busses, so one must either write or read one bit at a time which, even at 600Mhz, still adds up when you have so many signals. What I needed to do was to perform direct accesses to the GPIO registers of the Teensy's microcontroller. The IO mapping from the registers to the pins are not sequential, so a number of masks and shifts were also required. With these direct accesses and non-multiplexed address and data busses I was able to make timing on the bus.
Regarding the separate data input and output busses: I first explored using a Teensy4.0 which is a smaller board that has less IO's, but would have needed multiplexed address and/or data signals in addition to having a bidirectional data bus. One challenge of the project was being able to, on the same clock edge, sample the ready line and read data from the last transaction, and then generate the 16 address lines for the next bus cycle within the allowed time. I found that the time to multiplex signals and turn the data bus around took too long and could not make timing. The Teensy4.1 has enough IO's to have non-multiplexed address, data in and data out busses, so timing was easier to achieve, but not completely...
The Teensy4.x do not have parallel input/output busses, so one must either write or read one bit at a time which, even at 600Mhz, still adds up when you have so many signals. What I needed to do was to perform direct accesses to the GPIO registers of the Teensy's microcontroller. The IO mapping from the registers to the pins are not sequential, so a number of masks and shifts were also required. With these direct accesses and non-multiplexed address and data busses I was able to make timing on the bus.
- Sheep64
- In Memoriam
- Posts: 311
- Joined: 11 Aug 2020
- Location: A magnetic field
Re: MCL65+
Your hardware and software is almost everything I wanted to achieve.
If you want further quick wins for acceleration, I suggest auto-identification of the host and automatic breakpoints for the host's integer and floating point multiply routines. In the trivial case, JSR to a fixed address would reduce to one multiply. However, that fixed address is specific to each version of each host and may be affected by page bank registers which also vary by host.
You might want to implement big-banged cell networking between 6502 hosts. This would be a spiritual successor to 6854 used in AppleTalk and Acorn's EcoNet (unsuccessfully licensed by Commodore). Indeed, it would be quite special to see a common network protocol running across Commodore, Apple, Atari and Acorn. It should be possible to implement a common network interface using, for example, JSR $FFFE as another hook. From here, it should be fairly easy to implement text chat or similar.
If you could make a USD50 Commander X16, Foenix C256 or any reasonable subset thereof, that would be extremely popular. The Commander has a trivial page banking scheme and is loosely compatible with VIC20. The Feonix has a memory mapped FPU. Your implementation is likely to be faster and cheaper.
My first message to this forum contains an outline solution for the necessary multiplexing. (Circuit. State machine. Apologies in advance for color highlights. The meta-data shows that I drew these diagrams six months before I joined the forum. I have subsequently seen superfluous use of color mentioned at least five times.)
Ignoring SYNC and RDY, the minimal implementation requires 11 microcontroller pins and seven chips: one buffer chip for interrupt lines, two (or more) latches for address, latch and buffer for data, one 74x138 to orchestrate and one glue logic chip. With one additional microcontroller pin, one 74x138 and one glue logic chip, it is possible to implement an additional 8 bit data bus with 64 bit address-space or larger. Four or more variants may use common firmware.
Curiously, from your research, the larger Teensy 4.1 may be preferable for 8 bit implementation and the smaller Teensy 4.0 may be preferable for 32 bit or 64 bit extension. Either way, I strongly recommend your work to randyhyde as the basis of a 32 bit or 64 bit extension similar to my own.
If you want further quick wins for acceleration, I suggest auto-identification of the host and automatic breakpoints for the host's integer and floating point multiply routines. In the trivial case, JSR to a fixed address would reduce to one multiply. However, that fixed address is specific to each version of each host and may be affected by page bank registers which also vary by host.
You might want to implement big-banged cell networking between 6502 hosts. This would be a spiritual successor to 6854 used in AppleTalk and Acorn's EcoNet (unsuccessfully licensed by Commodore). Indeed, it would be quite special to see a common network protocol running across Commodore, Apple, Atari and Acorn. It should be possible to implement a common network interface using, for example, JSR $FFFE as another hook. From here, it should be fairly easy to implement text chat or similar.
If you could make a USD50 Commander X16, Foenix C256 or any reasonable subset thereof, that would be extremely popular. The Commander has a trivial page banking scheme and is loosely compatible with VIC20. The Feonix has a memory mapped FPU. Your implementation is likely to be faster and cheaper.
MicroCoreLabs on Sat 9 Jan 2021 wrote:
I first explored using a Teensy4.0 which is a smaller board that has less IO's, but would have needed multiplexed address and/or data signals in addition to having a bidirectional data bus.
Ignoring SYNC and RDY, the minimal implementation requires 11 microcontroller pins and seven chips: one buffer chip for interrupt lines, two (or more) latches for address, latch and buffer for data, one 74x138 to orchestrate and one glue logic chip. With one additional microcontroller pin, one 74x138 and one glue logic chip, it is possible to implement an additional 8 bit data bus with 64 bit address-space or larger. Four or more variants may use common firmware.
Curiously, from your research, the larger Teensy 4.1 may be preferable for 8 bit implementation and the smaller Teensy 4.0 may be preferable for 32 bit or 64 bit extension. Either way, I strongly recommend your work to randyhyde as the basis of a 32 bit or 64 bit extension similar to my own.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: MCL65+
Sheep64 wrote:
... The meta-data shows that I drew these diagrams six months before I joined the forum. I have subsequently seen superfluous use of color mentioned at least five times ...
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)