6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri May 10, 2024 4:33 am

All times are UTC




Post new topic Reply to topic  [ 143 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next
Author Message
PostPosted: Sat Aug 27, 2022 7:43 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
alright, Bootloader works. both my Assembly and C environment are also functional now!

Attachment:
ttermpro_G89EieLjCO.jpg
ttermpro_G89EieLjCO.jpg [ 2.23 MiB | Viewed 978 times ]

and as the first actual program, the Mandelbrot set in beautiful grayscale textmode at a resolution of 700x200 "pixels".
it's obviously not perfect, this image took an hour to do and there is some noise and that weird line.
but it does run!


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 26, 2022 12:44 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
small update, i made a github repo with all important files for this project, but i still need to update it since i changed some of the logic and the ROM code.

in terms of software i've been kinda fighting with a FAT32 file system for an SD Card (on an Arduino module) that is hooked up through the VIA Port A.
I've ported FatFS, a highly portable fat32 C library. And after writing some bitbanging routines for the SPI Interface, it works perfectly fine. I can open/create files, and access their contents. but the 30kB program size is not idea for actual programs....
so i tried option 2: fat32_x16 which is the FAT32 libary used in the Commander X16, written entirely in assembly (~5kB Program size). and all you really need to supply is your own byte transfer function, but i've not been able to get it running.
I know it's written for the 65c02, but i feel like it should still run perfectly fine on a 65816 with all registers set to 8-bit mode, and with the assembler knowing it should assemble for the 65816. using some debugging print statements i can see the code sending commands to the SD Card, but it never responds (always reads back $FF). adding the same print statements to the FatFS C program, the SD Card does respond:
Code:
C program:
Sending: 40, Receiving: FF <- Command $40
Sending: 00, Receiving: FF
Sending: 00, Receiving: FF
Sending: 00, Receiving: FF
Sending: 00, Receiving: FF <- 32-bit Operand, all 0
Sending: 95, Receiving: FF <- CRC checksum
Sending: FF, Receiving: FF
Sending: FF, Receiving: 01 <- SD Card responds
Sending: FF, Receiving: FF

Assembly program:
40>>FF <- Command $40
00>>FF
00>>FF
00>>FF
00>>FF <- 32-bit Operand, all 0
95>>FF <- CRC checksum
FF>>FF
FF>>FF
FF>>FF
...    <- SD Card never responds

i thought one possibility could be that the CPU is just too fast for the SPI Interface, so that's why the slower C version works. so i slowed the CPU down using delay loops and by swapping the oscillator, but neither worked. plus SPI is designed to run at ~10-20MHz so the CPU speed shouldn't cause issues anyways. the VIA pins are also correctly initialized, exactly like in the C program.

so the only other thing i can think of is that something is wrong with my transfer code:
Code:
SPI_SCK     = %00000001
SPI_MOSI    = %00000010
SPI_MISO    = %10000000

SPI_SCK_N   = %11111110
SPI_MOSI_N  = %11111101
SPI_MISO_N  = %01111111

spi_read:
    LDA #$FF
spi_transfer:
    ; debug $0A
    ; JSR PRINT_H8
    ; debug '>'
    PHX
    PHY                 ; Save X and Y
    LDY #8
    LDX #0              ; Clear the Output Byte
@1: ASL A               ; Shift the MSB -> C
    PHA                 ; Save the Input Byte
    BCC @2              ; Check the Carry flag
    LDA f:VIA_ORA
    ORA #SPI_MOSI       ; if it's 1 set the MOSI line high
    STA f:VIA_ORA
    BRA @3
@2: LDA f:VIA_ORA
    AND #SPI_MOSI_N     ; if it's 0 set the MOSI line low
    STA f:VIA_ORA
@3: NOP
    LDA f:VIA_ORA
    ORA #SPI_SCK        ; Set the Clock line high
    STA f:VIA_ORA
    LDA f:VIA_ORA
    AND #SPI_MISO       ; Get the MISO line
    ASL A               ; And put it into the Carry flag
    TXA                 ; Get the Output Byte into A
    ROL A               ; Shift C -> LSB of the Output Byte
    TAX                 ; And Save the Output Byte again
    LDA f:VIA_ORA
    AND #SPI_SCK_N      ; Set the Clock line low again
    STA f:VIA_ORA
    PLA                 ; Restore the Input Byte for the next loop
    DEY                 ; Decrement the Counter by 1
    BNE @1              ; And Check if it reached 0
    ; debug '>'
    LDA #SPI_IDLE
    STA f:VIA_ORA
    TXA                 ; If it did reach 0, get the Output Byte
    ; JSR PRINT_H8
    PLY
    PLX                 ; And restore X and Y
RTS

all the debug prints are commented out, also yes i did check that all of the symbols like VIA_ORA are correct. here the C version to compare:

Code:
uint8_t SPI_transfer(uint8_t value){
   volatile uint8_t out = 0;
   
   for (int8_t i = 7; i > -1; i--){
      writeMOSI(bitread(value, i));   // Write the input bit to the MOSI line
      writeSCK(1);               // set the clock high
      bitwrite(out, i, getMISO());   // read the MISO line and write the bit to the output
      writeSCK(0);               // set the clock low
   }
   // printf("Sending: %X, Receiving: %X\n", value, out);
   return out;
}

writeMOSI and writeSCK just read the VIA_ORA Register, set/clear the specific bit and then write the value back. getMISO just reads the VIA_ORA Register and masks out all other bits. the ones in the assembly version should function basically the same, but without them being seperate functions.

maybe someone can see anything wrong with my assembly routine, or i'm missing something else. i think i'll just have to hook up my oscilloscope and see what exactly is happening on the SPI lines.

on another note, i've also been making plans for a VGA Card, 320x200 @ 256 colors. with 64kB of VRAM that are mapped from $FF0000 to $FFFFFF.
though the main problem is accessing the VRAM as the Card and CPU run at different clock speeds. i already have some ideas for write-only interfaces. for example:
the Video circuit only accesses VRAM every second cycle, so when writing to VRAM the write circuit would simply use the wait state signal to hold the CPU clock high until the next available VRAM cycle, then it would write the data to the specified address, and then let go of the CPU clock again. on average i think that would make a VRAM write ~3 cycles long. at 20MHz and using the MVN/MVP instructions would allow for 1 byte ever ~10 cycles, or a transfer speed of ~1.9MB/s. which sadly crushes my dreams of 60 FPS Video output, as that would require ~3.3MB/s of bandwidth. but it should still be good enough for basic drawings, demos, and GUIs.

.

anyways i just thought i'd do a little update to say what i've been doing with this thing for the past month or so.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 26, 2022 1:52 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8178
Location: Midwestern USA
Is perchance the 65C02 assembly language version using any of the Rockwell extensions, e.g., BBR, SMB, etc? The opcodes for those map to completely different operations on the 65C816. The 816’s emulation mode is not exactly like a real 65C02, as the 816-specific instructions work in either mode, although with varying effects.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 26, 2022 3:14 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
If there were any instructions that the 65816 doesn't know, then the assembler would've simply thrown unknown instruction errors since it's set to assemble for the 65816 specifically.
But I did also manually check if there were any Rockwell instructions, and there weren't any :twisted: .


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 26, 2022 4:55 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8178
Location: Midwestern USA
Proxy wrote:
If there were any instructions that the 65816 doesn't know, then the assembler would've simply thrown unknown instruction errors since it's set to assemble for the 65816 specifically.

But I did also manually check if there were any Rockwell instructions, and there weren't any :twisted: .

Something else to watch out for is the behavior of direct page. In emulation mode, indexing past $FF causes an effective wrap to $00, same as a real 65C02. In native mode, that doesn't happen. I’ve seen a few 65C02 programs that intentionally :?: indexed beyond $FF. Needless to say, that technique won’t fly with the 816.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 26, 2022 9:25 pm 
Offline

Joined: Fri Apr 15, 2022 1:56 pm
Posts: 45
Location: San Antonio, TX, USA
Proxy wrote:
Code:
    ...
    LDA #SPI_IDLE
    STA f:VIA_ORA
    ....

A long shot, but are you really intending to overwrite all of the ORA bits with the SPI_IDLE value (vs. just the SCK and MOSI bits?) Maybe the chip select line is not getting set correctly at the correct time.

Otherwise I can't see issues with the code presented. I'd agree the oscilloscope is a good next step. For me it was a big help with getting the interface code working for an SPI graphical display.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 30, 2022 10:01 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
of course... as soon as i hook up the Oscilloscope the Assembly code started working.
and now even without the scope both the assembly and C version seem to be working consistently

hmm... oh well
i'll just not think about it too much.
now i just have to see how the whole file opening/reading stuff works with this library


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 02, 2022 9:05 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
alright i somehow broke the file system code while trying to 65816-fy it and now when i open and read from a file instead of giving me the actual file contents it gives me the volume name...
which is very... interesting but also clearly wrong.
so i think that's a good stopping point for now.

for now i'll focus on adding some debugging features to the ROM (like a BRK routine to print out the contents of all registers) and then i'll start making the schematic for the video card.
While thinking about the Interface i came across the IDT720x, a 256/512/1024 word deep (9-bit wide) Unidirectional FIFO buffer. 3 of these in parallel is enough for the address and data bus.
the circuit for handling the FIFO would be similar to the one using the tri-state buffers, but the write circuit will only wait state the CPU if it tries to write to the FIFO while it's full (or half full for the 1kB version).
so basically this means the video card would have an onboard cache of 0.25kB/0.5kB/1kB, and as long as the CPU doesn't fill the cache it will not be slowed down.
since the FIFO is asynchronous i can have the video side read from it as fast as possible, 1 byte every 2 cycles @ 25MHz, means a transfer rate of 11.9MB/s.
which is much faster than the CPU's 2.72MB/s using the MVN/MVP Instructions.
so in theory this should mean that the CPU can continuously copy data to the cache without it ever filling up. is that actually true or did i screw the math up somewhere?

either way these chips seem like the perfect interface for this. I'll just have to see if i can fit all of this into a single CPLD's worth of pins (ATF1508)
any thoughts about other interfaces or such are always welcome!


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 22, 2022 2:30 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
OK i re-did some things, i'm now going for 640x400 Monochrome. i think having a higher resolution is going to be more useful than lots of colors.... since i likely won't run any visually good looking games on this thing anyways.
but while i was redoing the VGA Logic i got some ideas about hardware acceleration for basic things like drawing lines, triangles, rectangles, filling shapes, etc.
but i won't be able to do any of those in a CPLD... so i did some googling and came across this thread:
viewtopic.php?f=4&t=6517
which talks about using a 65C02 to directly generate a VGA Signal (H-Sync, V-Sync, and Video Out).

Which got me thinking, what if i had a 65C02 share the Video Memory bus with the Video Circuit? so i can have the CPU handle all of the Interfacing with the main system and do "hardware" acceleration like mentioned above, while the video circuit would handle the VGA side and periodically (every 8 cycles) read a byte from Memory.
something like this:
Attachment:
draw.io_cBuIjnpWoS.png
draw.io_cBuIjnpWoS.png [ 38.41 KiB | Viewed 735 times ]

In terms of Memory there will be 64kB of RAM, of which 31.75kB (starting at $8000) are used for the Video Circuit while the rest can be used by the Video side CPU for data and programs.

Now there are 2 ways i can handle both devices being on the same Bus.
1. Bus Sharing:
Video Circuit accesses the bus when PHI2 is low, and CPU accesses the bus when PHI2 is high (maybe using the BE pin by connecting it to PHI2)

2. Cycle Stealing:
Whenever the Video Circuit accesses the bus it stops the CPU (pulling RDY and BE low). this means the CPU will only run for 7 out of 8 cycles for the visible part of the screen, doing the math that works out to an average CPU speed of 92.38% compared to running at full speed.

I'm not very confident that i can get Bus Sharing to work because i don't know if the BE pin is fast enough to take the CPU off the bus when PHI2 goes low. or if you even need the BE pin in the first place.
so Cycle Stealing might be much easier to get working, especially at 25MHz. and the ~7% performance loss likely won't be very noticeable either.

another thing i have to do is use a second CPLD (likely an ATF1502 or 1504) as a ROM since i can't use a regular Flash chip because it would be too slow for a 25MHz CPU. plus i cannot really do wait states (without a lot of extra logic) because of the Video Circuit peeping into the bus every now and then. I'll see how many bytes of ROM i can squeeze out of an 1504. 32B or 64B should be enough to bootload it through the FIFOs.

speaking of which, i decided to still use the IDT FIFOs because they still seem really good for this scenario. except this time there are only 2 pointed in opposite directions to allow for 2-way communication.
in addition both sides also have a read-only "Status" register that just outputs the "Empty" Flag for the receiving FIFO, and the "Full" Flag for the sending FIFO (and same for the other side).
the Main System also gets the ability to send a Reset signal to the Video CPU, in case it crashes or gets stuck in an infinite loop. I might also add the ability to send an NMI to abort the current Operation without fully resetting it.

.

anyways i think this is a pretty cool idea, i'll flesh it out a bit more and see if i can get anything in KiCad going


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 22, 2022 5:52 pm 
Offline

Joined: Sat Jan 02, 2016 10:22 am
Posts: 197
You should be able to squeze a prety decent amount of code in a CPLD.

I've fitted 49 bytes into a 22V10 before.

As an experiment, I recompled that same 49 byte rom to target an ATF1502, where it fills 23 of the 32 logic cells.

Interstingly targetting the 1504 or 1508 it filled in 25 of the 64/128 cells.

That hints at the posibility of a couple of hundred bytes in the 1508.


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 22, 2022 7:13 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
when i tried to fit a 32 Byte boot-up code in the 1508 (along side the other stuff) it went from 115 to 161 Macrocells.
so that's why i decided to use a seperate CPLD just for the ROM. plus i like having some space in the 1508 in case i need/want to change things later that could effect the macrocell usage.
testing a ROM filled with random numbers, 128 Bytes seem to fit into an 1504, 54/64 Macrocells. Ironically, having Quartus optimize for Area makes it go up to 61/64.

128 Bytes is enough for a simple bootloader (or a test program to check if the VGA Circuit and the CPU are working at the same time without the FIFOs)


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 22, 2022 7:34 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
I’m generally able to fit 64-byte ROM plus serial port, decoding for memory, compact flash, and miscellaneous registers in a 64 macrocell CPLD like EPM7064 or ATF1504. I do design in schematic so I have more flexibility with allocation of resources.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 22, 2022 9:55 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
well sadly i'm not that good at optimizing hardware designs...
just the H-/V-Sync circuit plus the Logic required to build the Output Address take up 80 Macrocells.
there very likely is a lot of room for improvements there.

the Sync circuit work by using 2x 10-bit Counters, one for the Pixels in a Row (X Coordiate), and one for the Rows (Y Coordiate).
then the H-/V-Sync Signals are generated from those counters using AND gates and JK-Flip Flops. this is also used to generate the Read signal while the visible part of the screen is being drawn.
the output X and Y Values are used to calculate the Address that holds the data for the next 8 pixels using this:

Address = 1 + X + (Y * 80)

And the Multiplication can be turned into 2 shift operations, like this:

Address = 1 + X + ((Y * 16) + (Y * 64))

note that only the upper 7 bits of X are used in this, the lower 3 bits are used to select which bit of the already read byte is outputted to the VGA connector.

also if the outputted address is equal to 32000 (1 byte after the end of Video Data) it gets set to 0 instead, since it's the last memory access of the frame it has to load the first byte of the new frame. that multiplexer that does this takes up 22 Macrocells. ouch!

i really need to find a better way to generate the output address without having memory accesses outside the visible area... any ideas or links to examples would be welcome


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 23, 2022 5:41 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
This is the current board design i came up with:
Attachment:
kicad_edSjAUQcDb.png
kicad_edSjAUQcDb.png [ 383.08 KiB | Viewed 669 times ]

It's ~12cm long, and ~8.4cm tall. so a bit longer than the SBC itself, but it shouldn't be an issue.

This one is likely going to be a 2 layer board (still black to match the SBC), so i left a bit more space between the ICs and tried to keep the decoupling caps away from the center of the board, since most data lines are gonna run there and the caps would likely just make it harder for the auto-router to solve.

and speaking of routing, i still need to finialize the pin assignments on both CPLDs and the RAM ICs. It's also likely missing some components like pull up resistors or similar. I'll tripple check everything anyways before doing the routing.

just thought i'd drop this small update


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 25, 2022 12:06 pm 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 298
Proxy wrote:
i really need to find a better way to generate the output address without having memory accesses outside the visible area... any ideas or links to examples would be welcome


I have no experience with the device you're using, and don't know what sorts of designs it likes or dislikes. But I can offer you some alternatives that might improve things (or might not).

The first thing I'd do is separate the address counter from the VGA timing. That would need more state, but a lot less logic (all of those adders can go). Clear the address counter somewhere before you fetch the first byte, and you won't need to replace 32000 with 0 (you also won't need to add 1 to the address).

VGA timing is often in multiples of 8 pixels, so the low three bits of the horizontal counter can usually be ignored. My FPGA design has separate flags for horizontal and vertical blanking, which get set and cleared when the corresponding counter reaches a particular value. Both must be clear for it to display. You could also have a "memory enabled" state which is set and cleared one memory cycle before the horizontal display area (ignoring the vertical count), and qualify that with "not in vertical blanking" before you use it.

Generating those "set" and "clear" signals should be very cheap in a CPLD - a single macrocell each?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 143 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 14 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: