6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 5:35 pm

All times are UTC




Post new topic Reply to topic  [ 40 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Fri Apr 14, 2023 6:54 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
BigEd wrote:
If we're looking at the time to write the whole screen - to clear it, say - what's the corresponding time on a C64, or on a Beeb? I suspect it's more milliseconds than a frame's worth.


I just did a quick check using jsbeeb.

Empty loop takes 0.25 seconds for 1000 iterations

Code:
10TIME=0
20FORI%=1TO1000
30REM
40NEXT:PRINTTIME


Then I replaced the REM with CLS... It's hardly a benchmark, but good enough for comparing....

in mode 7 (1KB framebuffer), the time goes to 440 - subtract the loop overhead and it's 414, or 4.14 seconds, so 4ms per clear (250 clears/sec). In mode 0 (20KB framebuffer) it's 5532 or 55.07 seconds or 55ms per clear or 18 clears/sec.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2023 7:02 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Proxy wrote:
i'd say something like the IDT7201 would be a better fit. it's a unidirectional FIFO, 2 of those allow for fast bidirectional communication. with a status byte to read out the flags of the FIFOs it would take up 2 addresses on both sides.
on a side note i have been kinda fantasizing about a little cluster project with these FIFOs to create a ring network of multiple 65xx Computers that would directly plug into eachother. but so far it's nothing but a block diagram.


And FWIW: (Sort of continuing more off-topic sorry). The idea of using a FIFO to talk between 2 x 6502's is not new. Acorn did it back in the early 80's with the BBC Micro. It had a "Tube" interface which was essentially a full-speed bus interface designed to go into a custom chip with a number of FIFOs used for the 2nd processor to communicate back to the host processor - which at that point was nothing more than a "smart" terminal with access to screen, keyboard, disk, network and so on.

There were many other 2nd processors too - Z80, 32016 and of-course the very first ARM. In these enlightened days a Pi Zero emulates all of them and more. If you want a 300Mhz 6502, then it's there - or a PDP11, etc ...

And that's partly where I got the idea for my Ruby system too - in my case the 'host' is a combination of the ATmega which does disk and serial and the screen/keyboard is my Linux desktop with the '816 being the 'work' CPU...

For daisy-chained FIFOs you may wish to look-up the Inmos Transputer and the whole communicating sequential processes thing... I have support for inter-process communication in my BCPL system with plans to extend it to inter-board if/when I get work out the right means to do so ...

And diving more into my past (and more off-topic) - Once upon a time I worked for a Supercomputer company who used Transputers and we had 2 graphics cards - applications would talk to the graphics card via transputer links to generate high resolution full colour video, so offloading video has been a thing, at least for me, since the 80's ...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2023 8:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
@revaldinho's CPC-Cplink project might be of interest. It uses through-hole DIPs, so that's 4 nibble-wide FIFO chips (74HC40105), but it does the job of connecting two systems.
https://github.com/revaldinho/cpc-cplink/wiki


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2023 8:51 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
BigEd wrote:
@revaldinho's CPC-Cplink project might be of interest. It uses through-hole DIPs, so that's 4 nibble-wide FIFO chips (74HC40105), but it does the job of connecting two systems.
https://github.com/revaldinho/cpc-cplink/wiki


Ah, neat.

I'm suspecting that FIFOs are a little underrated - or maybe under utilised in projects... So rather than use a VIA style one byte, strobe + ack style interface, if I had a FIFO that was deep enough for the longest graphics transaction (defining a font character - 10 bytes?) then the 65xx could simply blast the data over, get on with something else, the poll for FIFO empty before sending the next lot over - hopefully allowing for some sort of overlap in processing on both sides...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 12:32 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
drogon wrote:
I'm suspecting that FIFOs are a little underrated - or maybe under utilised in projects... So rather than use a VIA style one byte, strobe + ack style interface, if I had a FIFO that was deep enough for the longest graphics transaction (defining a font character - 10 bytes?) then the 65xx could simply blast the data over, get on with something else, the poll for FIFO empty before sending the next lot over - hopefully allowing for some sort of overlap in processing on both sides...

-Gordon

hmm, i was thinking of something similar. if you're going for high resolution and low color depth. like Monochrome 640x400, then a 256 Byte FIFO can store around 3 lines worth of data. you could have an NMI trigger using the "Half Full" flag on the IDT7200. so the CPU would fill the FIFO to full, save where it left off in the video buffer, and then return from the interrupt.
plus a FIFO has the benefit of being asynchronous, so the video circuit and CPU don't need to share a clock. you could have a 25MHz Pixel clock but have the CPU running at a more stable 16-20MHz.
overall, thanks to a FIFO the CPU wouldn't constantly need to keep up with the video circuit, it just needs to be fast enough on average to keep the FIFO filled with data.

but of course, that means you still run into the same bandwidth bottleneck of the CPU not being able to copy data fast enough at higher resolutions, even with tricks like using the W65C02's $_3 illegal instruction column as a fast 4-bit output port.
as example: let's say you want 320x200@256 colors using 640x480 as a base powered by a 65816 @ 16MHz.
Each line would take 320 Bytes, so you would need a larger FIFO like the IDT7202 (1024 Bytes) which can hold ~3 lines worth of data.
A complete line is 800 Pixels long, with a 12.5MHz Pixel clock that is 32µs... which translates to around 512 CPU Cycles.
looking at those 3 lines of data that fit into the FIFO, you need to be able to load 960 Bytes within 1536 Clock Cycles to be able to run the display (= 0.625 Bytes/CPU Cycle).

another idea is to use a FIFO paired with a tiny DMA-like Controller that can directly copy bytes from memory to the FIFO (so the CPU doesn't have direct access to the FIFO). such a circuit would likely fit into a small CPLD and reach speeds of 1 Byte/CPU Cycle. which would be fast enough for the example above.

.

overall i agree, FIFOs are pretty underutilized and there seems to be a lot of cool stuff you could do with them


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 12:35 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Maybe a little OT, but I'm pretty sure if you look at games on 6502 systems they don't even try to write every pixel on every frame: they make efforts to modify less, and less often, and efficiently. Some recent games are open source, some historical games have been thoroughly reverse-engineering, so there's material out there to learn from.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 1:09 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
true, but i feel like it makes sense to assume the worst case timings for updating the screen (which is writing to all of it).
plus a lot of 8-bit consoles had atleast some hardware acceleration like sprites and basic scrolling. for example scrolling on the NES requires you to write some bytes to VRAM every few frames or so and update some registers, while a DIY system with a framebuffer has to manually move every byte/pixel.

even with modern 65xx parts being literally 20x faster than the systems of old, without hardware acceleration of some kind you're just gonna be suck at the same levels of graphical powers as really really old systems.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 5:44 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
This is going to be a bit of a disjointed post because I started to reply to Sean last night and then ran off to play Bridge before I posted. And then continued replying but had to run off to the radio club this morning.

And now there are a whole bunch of other good posts so I'm going probably end up replying in pieces as I read through them.

Sean wrote:
have you considered using the Hitachi HD63484 ACRTC?
I must admit I had never even heard of it up until now. If it was still in production I'm pretty certain I would have used it, as would a lot of retro machines I'd think.

Another... 'thing' I'm trying to do is use only off-the-shelf parts. So no SID chips, no AY-3-8190 for sounds. Basically if I want to do anything complicated for audio or video I'm left with the 65C02 and 65C816. Nothing else usable seems to be in production anymore.

I don't see 74LVC ICs going anywhere so that's predominantly what I'm using. A few 74ABT*, 74LCX and 74VHCT ICs too but only because they're cheaper and still perform fine for what I need; an LVC variant would work at least as well.

Slightly less pleasant are the 74F and FCT ICs I've had to use. They're still mass produced but tend to be slower, more expensive or power hungry. I would kill for a LVC193 instead of an FCT191. Likewise: why are there not LVC versions of the F283, F521, F20, F21 and F30? That would be so. very. useful. but I guess they're not generally used anymore :cry: .

Interestingly - and because it's mentioned above: I do use two HCT ICs. One being HCT40105 that buffers PS2 device input and the other being an HCT4040 that I tick with a 32KHz clock oscillator to get quarter(?) seconds or something**. I did originally use a 7201 to buffer keyboard and mouse input but because it's such an expensive chip I used one for both the keyboard and the mouse and had to use it's 9th bit to distinguish which was which. And it was just an absolute pain to keep the one off the 7210s inputs when the other was coincidentally writing. In the end it was cheaper and easier to use four HCT40105s instead. And speed really isn't and issue with PS2 devices.

* The SN74ABT ICs are typically not cheaper than their LVC equivalents but RS Components mispriced some at a 100 times cheaper than should have been so I bought the entire stock and now have more SN74ABT245s than I know what to do with. Really useful though.
** This will turn out to be not true in another post further down.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 6:58 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Proxy wrote:
true, but i feel like it makes sense to assume the worst case timings for updating the screen (which is writing to all of it).
plus a lot of 8-bit consoles had atleast some hardware acceleration like sprites and basic scrolling. for example scrolling on the NES requires you to write some bytes to VRAM every few frames or so and update some registers, while a DIY system with a framebuffer has to manually move every byte/pixel.

even with modern 65xx parts being literally 20x faster than the systems of old, without hardware acceleration of some kind you're just gonna be suck at the same levels of graphical powers as really really old systems.


Hardware scrolling isn't difficult... For a certain level of difficulty! It was done back in the day on some systems too (BBC Micro?) You need the video generator to have a programmable "base" address and to be able to wrap the address output after a certain number of screen lines. Then you can scroll up by simple adding 8 lines worth of width to the base address and clear what was the top 8 lines which are now the bottom 8 lines (lines being pixel high lines, assuming an 8 pixel high font - change as required). Horizontal scrolling can be achieved in a similar manner.

This also allows for double buffering in a fairly straightforward way.

It does slightly complicate the screen addressing as the base addresses change every time you scroll, but it's predictable, so can be catered for.

A 2D block-move engine can be used as "cheap" sprite engine - you copy the screen RAM of the PxQ pixel area off-screen, copy the sprite to the same location, then to move the sprite, you un-do the background copy and re-do the sprite copy. It's fiddly but can work (It's how I do it in my Raspberry Pi bare-metal system using the internal DMA engine to do a 2D memory move - and manages a few 100 small sprites in a full HD frame without too much issue on a Pi Zero) You can also use it to print bitmapped font text to the video RAM blindingly fast.

I also use the 2D block move for scrolling rather than try to fiddle with the hardware on the Pi thingy.

So I'd suggest working on a single cycle read or write 2D block move "blit" engine... Beats the 7 cycles per byte on the '816 block move instructions....

https://en.wikipedia.org/wiki/Blitter

The alternative involves swapping the RAM area the live video beam is fetched from to the sprite area when it reaches a particular XY position. This gives nice fast sprites that float over the background but I feel the video hardware them becomes more complex. Especially in TTL. However one persons hardware is anothers software, so ...

And also, given sufficient code you can do a surprising amount. How about the Bad Apple video at 640x512x1 (dithered) @ 50fps with 44Khz sound run live on a 2Mhz BBC Micro computer?

https://www.youtube.com/watch?v=D_ta5QxBSMk

(I've seen it live - pretty amazing - very extreme)

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 7:31 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
that HD63484 ACRTC is looking pretty damn good. Utsource has lots of them in stock it seems, at pretty good prices too.
plus you can apparently chain them together... though i still don't fully understand how that would work in terms of VRAM and accessing it.
But besides that, i'm a bit confused about it's operating speed.
for example if you were to try and generate 640x480 @ 8bpp, each memory access would grab 2 bits worth of data (16-bit VRAM bus), and it takes 2 cycles per memory access. that works out to 1 pixel per cycle, which at a pixel clock of 25MHz would mean the chip would need to run at 25MHz as well.
that's a bit of an issue considering that it's rated for at most 8MHz... and it boasts resolutions 4096x4096 @ 1bpp or 1024x1024 @ 16bpp. but how? are they just running the display itself much slower? or is it just marketing handwavy "technically it can do this, but noone does" non-sense?

either way there seem to be pretty much no projects around this chip online, so i'd love to see someone throw something together and see how well it works.

drogon wrote:
Hardware scrolling isn't difficult... For a certain level of difficulty! It was done back in the day on some systems too (BBC Micro?) You need the video generator to have a programmable "base" address and to be able to wrap the address output after a certain number of screen lines. Then you can scroll up by simple adding 8 lines worth of width to the base address and clear what was the top 8 lines which are now the bottom 8 lines (lines being pixel high lines, assuming an 8 pixel high font - change as required). Horizontal scrolling can be achieved in a similar manner.

Yes Vertical scrolling can be done by offsetting all memory accesses by some fixed amount multiplied by the amount of bytes that make up 1 line of pixels (or one line of characters). but horizontal scrolling is basically impossible if your frame buffer is linearly stored in memory, which it likely is in basic video circuits.

what i was thinking for both vertical and horizontal scrolling is making a canvas that is larger than the screen, and map that linearly into memory. the screen is then defined as a movable section within the canvas' limits. so now you got 2 sets of coordinates. the local screen coordinates that say where everything is relative to the screen, and the global screen coordinates which say where the screen sits on the canvas.
so to get the memory address you want to access you need to add the local and global screen coordinates together, and then convert the result to a linear address.
ultimately it's just 1 step more than when the screen itself is directly mapped to memory... but by adjusting the global screen coordinates you scroll around however you want.

of course smooth scrolling is a completely different can of worms that requires bit manipulation and accessing variable amount of bytes per line, and a lot of headaches.

drogon wrote:
So I'd suggest working on a single cycle read or write 2D block move "blit" engine... Beats the 7 cycles per byte on the '816 block move instructions....

hmm, i wonder how useful a CPLD based blitter would be if it's limited to moving whole bytes only, so if 1 byte contains multiple pixels you wouldn't be able to move individual columns of pixels.
it might take a lot of counters and adders to pull of a blitter so i'm not sure if you could squeeze one into a CPLD in the first place.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 7:56 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
drogon wrote:
Sounds like you're after the holy grail of retro graphics devices. Something that might fit the period - but at the same time you're after today's resolution and colours...
I really didn't mean to! :shock:

What started me on this journey was kinda the PE6502. But a bit before that I'd been idly wanting to build a computer I could understand. Initially I thought of using a RISC-V processor and just as quickly went "Nope". It was far too complicated. Then the PE6502 made me decide to go back to my 6502 roots and early discussion around the X16 at the time was still about using the '816.

So I decided fine, I'll build a Commodore 64 type thing with off-the-shelf sound and graphics chips and a bunch more memory. How hard could that that possibly be? I mean megabyte single IC SRAMs are a thing. I can use a GAL for address decoding. Slap the sound and video chips in there somewhere. And, boom, a retro-ish computer I can understand.

That didn't work out so well.

Not only did I think electronics was a magical world of things just working but I also soon found out there are no in-production sound or graphics chips that I could use. Fortunately I didn't know what I didn't know and - having a background in the game development industry - set out to build a simple frame buffer device that could do buffer flipping because that's what I'd always done on PCs. I simulated everything in Logisim, prototyped the bits I wasn't sure about and discovered that it actually seemed doable. About three years passed in that last sentence but it does actually seem doable.

If I can design a solid hi-res frame buffer video device that hobbyists can build, troubleshoot and use in retro projects then I'm happy. Couple that with a hardware pixel blitter device to handle the heavy-lifting of pixel drawing and then I'm even happier. Ultimately I'm still working towards a full modern 'home computer' style system but that's years away and I'm stuck in the software side at the moment.

Can I provide this holy grail? I really hope so because I need it for my own system!

drogon wrote:
Thinking (out loud) about options - a separate CPU and video RAM - how to implement it - a high speed parallel interface (Pair of VIAs?) or alternate phases of the clock on a shared memory system? Dual-port RAM (expensive!)
I have a partly drawn up post where I'm going to try and share my thoughts on this implementation. It's quite a bit so I'll posit it later tomorrow or Monday.

drogon wrote:
The solution of "throw a Pi/Pico/ESP32/Arduino/Propeller at it" is popular for other projects.
I use the Pi and Pi Pico extensively for testing and prototyping things as well as some CPLDs but I just don't like the thought of using them in my final system. I haven't manage to verbalise this dislike well but I suspect it's something that most on this forum could understand even if they don't agree.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 8:03 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Proxy wrote:
drogon wrote:
So I'd suggest working on a single cycle read or write 2D block move "blit" engine... Beats the 7 cycles per byte on the '816 block move instructions....

hmm, i wonder how useful a CPLD based blitter would be if it's limited to moving whole bytes only, so if 1 byte contains multiple pixels you wouldn't be able to move individual columns of pixels.
it might take a lot of counters and adders to pull of a blitter so i'm not sure if you could squeeze one into a CPLD in the first place.


I'd like to think that the custom IC/ASIC that they were made from in 1982 may possible today with a reasonable CPLD - however the issue of sub-byte moves - tricky and if it were me, I'd try to avoid it due to the read/modify/write cycle needed from the CPU to set a pixel when using 1, 2 or 4 bits per pixel. However the Xerox Alto where the original blitter term came from did it on a bit level in 1973, but I've no idea what that looked like hardware wise - I imagine a "Titanic" of TTL rather than a small raft...

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 8:25 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
AndrewP wrote:
What started me on this journey was kinda the PE6502. But a bit before that I'd been idly wanting to build a computer I could understand. Initially I thought of using a RISC-V processor and just as quickly went "Nope". It was far too complicated. Then the PE6502 made me decide to go back to my 6502 roots and early discussion around the X16 at the time was still about using the '816.


Going off-topic (again), the RISC-V is remarkably easy to use. Nice to write assembly code for, great C compiler support, fast ... To learn about it I wrote an emulator for it in the BCPL that runs on my Ruby 816 system, then re-wrote the bytecode VM that runs my BCPL system, then ported my entire OS over to it - running on my emulator on the '816. It worked very well, if a little slow.

However real RISC-V CPUs are almost impossibe to actually get. They're either small 32-bit systems designed for embedded use - say 32KB of RAM or 64-bit multi-core systems with GB or RAM aimed at Linux. The best I've found so-far is the ESP32-C3 with 400KB of RAM.

Unless you go down the FPGA route - which I'm trying, but currently don't have enough spare brain cells... But if someone could come up with a real retro system on e.g. DIP-64 package (don't need all those 32 bits of address) then who knows.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 15, 2023 8:59 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
to throw my 2 cents in, a while back i designed a RISC-V CPU in Logisim that is very basic.
no extensions, no pipelining, an 8-bit data bus, and a custom Interrupt system since the official one is too complex for me to implement.
of course because of the 8-bit data bus, execution is rather slow, each instruction takes between 4 and 8 cycles to complete.

the Interrupt system is very bare bones, a single interrupt input, and an extra instruction that can return from an interrupt.
when an interrupt occurs the CPU finishes the current instruction, saves the PC to a hidden register, jumps to address $00000100, and disables interrupts.
executing the RTI Instruction simply loads the value from the hidden register back into the PC, and re-enables interrupts.
there is no manual way to disable interrupts, you'd have to use external circuitry to prevent the signal from reaching the CPU to simulate an internal "disable flag".

the plan was to someday get this thing on an FPGA and maybe even add an extension like Compressed Instructions, which would help with performance.
I'll append the whole project, including microcode and the "information" text file which basically just contains some of my thought process about instruction decoding... it also has some CSR (Control/Status Register) stuff in it, which was the point when i decided to not bother and just do a custom interrupt system.


Attachments:
RISC-V Logisim.zip [120.01 KiB]
Downloaded 35 times
Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 16, 2023 8:01 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
There is a question which needs to be considered with any graphics system: does the main processor share memory with it, and write directly to that memory to generate an image, or is the graphics unit a stand-alone part which is communicated with by e.g. spi, serial, or shared memory block with commands to be performed (e.g. draw a dot here; draw a line from here to here; print this text in font_x, size_y, base line and bounding box...).

Either way, it can all get very complicated very quickly.

My own inclination is to use a hardware approach in which the graphics system does little more than generate a byte address and sync pulses, but I appreciate that this is not the view that many would take. There does seem little sense in designing something incredibly complex when the answer is probably 'bolt a pi on the side'. But that said, there's what I do in the day job, with its own constraints and limitations, and there's what I do for fun...

Neil


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 40 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 30 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: