6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 9:34 pm

All times are UTC




Post new topic Reply to topic  [ 59 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Sun Nov 19, 2023 4:16 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 47
Hello everyone!
I've got a Ben Eater 6502 and Video card kit setup that I'm doing a Bad Apple demo for:
https://github.com/Fifty1Ford/Ben-Eater-Bad-Apple

I'm getting 16 frames a second on average right now. Nothing to even the framerate, no vsync in use, just pushing pixels as fast as I can.

The only change from the stock kit is extra bypass caps and power wires so I can clock the CPU at 5Mhz.
Since the VGA halts the CPU while it draws that means I get 1.3Mhz effective CPU speed.

The video setup is straight bitmapped video. 1 byte per pixel. 100x64 pixels. (6,400 bytes)
This is stored in a 8k area starting at $2000. 28 bytes of each row are cut off (128 byte alignment)

My encoder does Differential Run Length Encoding.
1 Byte holds the data, 1 byte holds the run length.
Because it is a 6bit/64 color video setup I use value 65 to mean 'Skip' for unchanged pixels.
I also just added an optimization for the Bad Apple source video.
Since it is Black, White, and two Greys in the source I now use a lookup table of all possible combinations of 3 pixels.
It means that I write 3 pixels even if only 1 or 2 changed but it reduced my file size by over a megabyte to just over 3 megabytes for the entire video.

Encoding is now at an average of 507 bytes a frame. 12.6 to 1 compression.

With this setup currently at 16 FPS with my unrolled decoder......
I'm looking for any suggestions for what to look at next for increases in framerate??

If I change my SD card initialization routine /hardware to use the CA2 handshake on port reads I should save 99 cycles per byte read VRS 184 cycles. Or 43,095 cycles a frame if my math is right?
That should help quite a bit I think. But what next?

With these limitations what else should I be looking into?

Any other fast decompression resources out there?
Lots of 6502 related compression is tape or disk 'FAST LOADER' based and is not really concerned with filling the screen as smoothly and quickly as possible but instead speeding up slow disk loads. Both move lots of data but I need to spend so much time updating bytes/pixels there is not much left for decoding.

Any guess as to the maximum FPS updating a 6k screen buffer with a 1.3mhz 6502?

Any helps greatly appreciated!


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 19, 2023 4:55 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
NormalLuser wrote:

Any guess as to the maximum FPS updating a 6k screen buffer with a 1.3mhz 6502?

Any helps greatly appreciated!


It's possible to get 50fps on a 2Mhz 6502 (well, as I understand it, it's 2 frames at 25fps, interlaced to give 640x512x1 resolution, dithered) plus 44Khz audio... The video buffer is 20KB.

It's quite impressive to watch and listen.

It could be argued that the technique used is cheating, but it is one large 6502 program that beam-races the display. The compression technique generates 6502 code directly - it's not code decompressing data. One mans code is another mans data and all that.

It works by having a 16-byte block of shared RAM between the 6502 and the device (FPGA, USB, PC) feeding the code into it - the last instruction is a jmp to the start of the 16-byte region. It runs on a c1981 BBC Micro.

https://www.youtube.com/watch?v=D_ta5QxBSMk&t=42s

So that might be the holy grail but just doing what you're doing is an achievement in itself - well done!

to emulate that? Well, have your compressor output in-line 6502 code - LDA #NN ; STA $1234; STA $1225 ; STA ... etc. for all values of NN then repeat - the trick might be trying to get hold of the vsync signal so you can use that to load he next frame of code from storage...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 19, 2023 5:28 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 47
Pretty neat! Thanks!
Though "This runs on a standard machine using a PC as a coprocessor connected to the external Tube port. Acorn added the Tube port to their 8-bit machines specifically for coprocessors."
There is a PC involved streaming the commands to it unfortunately.
I have been thinking about adding some routines to the decoder like 'clear/color screen' and of course I need to add commands to beep PB7 so I can get some music.
I'm kinda' stuck with the SD card for the moment and I don't think that has the throughout to stream all the asm commands?
But maybe it does? This really gives me something to think about!
Thanks again!


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 19, 2023 6:00 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
NormalLuser wrote:
Pretty neat! Thanks!
Though "This runs on a standard machine using a PC as a coprocessor connected to the external Tube port. Acorn added the Tube port to their 8-bit machines specifically for coprocessors."
There is a PC involved streaming the commands to it unfortunately.
I have been thinking about adding some routines to the decoder like 'clear/color screen' and of course I need to add commands to beep PB7 so I can get some music.
I'm kinda' stuck with the SD card for the moment and I don't think that has the throughout to stream all the asm commands?
But maybe it does? This really gives me something to think about!
Thanks again!


A PC streaming 6502 code... An FPGA streaming 6502 code... SD card streaming 6502 code... Not sure there is a real difference at the end of the day.

But if you want a pure version, look for the Teletext Bad Apple video - It's a basic 80 x 75 pixel resolution (I think - 40x25 characters in Teletext 2x3 pixel graphics) There is also an Acorn Electron version too...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 19, 2023 6:49 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1117
Location: Albuquerque NM USA
Running BadApple on retro computers is a good challenge. It tests the graphic subsystem and how it connects to the rest of the computer. I've tried two approaches; one is bit-bang I2C 128x64 OLED display. Because the image data is small (1K), bit bang can reach 20 frame/sec rate. The other approach is creating a pseudo-graphic fonts for text-based VGA display such that all permutations of 2x2 blocks are mapped in the font tables. This way 64x48 text VGA can quadruple display resolution to 128x96. It is a form of decompression--a frame of text data is 3K bytes, but it is expanded to 12K of pseudo-graphic. The data are stored in CF disk so it can support fast data transfer.

You can search for "badapple" on 6502.org and find a number of entries.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 2:18 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 47
Thanks for these ideas!
My big challenge is that I have none of the tricks everyone else seems to use.
I don't have any form of character mode at all, my pixels take up a full byte each, and my only mass storage is a SD card bit-banged on a VIA!

As a basic metric, my current fast screen fill routine to fill the 100x64 pixel/byte screen looks like this:

FillScreen:
TXA ;Color stored in X
;Start one more than needed because of DEX below
LDX #255 ;IE 255-1 is zero
FillScreenLoop:
INX ;This is unrolled so that there is a STA for each row of the screen.
;Display mapping has 28 bytes at end of each row unused.
;Display location = $2000
STA Display, x
STA Display +$80,x
STA Display +$100,x
STA Display +$180,x
STA Display +$200,x
..... <etc for all 64 lines > .....
STA Display +$1F80,x
CPX #98
BEQ FillScreenLoopEnd
JMP FillScreenLoop
FillScreenLoopEnd:
RTS

This routine takes 32,581 cycles to fill the 6,400 pixels on the screen.
Or 5.09 cycles a pixel. Other than a full 19,200 byte function in ROM that is a fully unrolled STA $2000 .. STA $2001, etc I don't think you can get much faster?

With that in mind my 1.3mhz CPU can only at max fill the screen a single color 39.9 times a second.
Any decode logic or time spent reading data eats away at that base fill rate.
With that thinking in mind:
Since I get 16 FPS on average, I must be spending almost 24 FPS worth of fill rate by reading and decoding. I know I can find some improvement there!
I have been crunching numbers and the gains with improving how fast I read SD card bits might gain me a bit more in performance than I had originally thought. I know it can be done, I'm just procrastinating on touching the SD card initialization since it is more or less a magical incantation as far as I can tell. I don't really want to translate High Elvish but it must be done!

This is such an interesting challenge that forces you to think about your hardware and software in ways that are outside the norm. (for me) The need for speed balanced with the need to decompress balanced with quality is a strange area to deal with.
I am enjoying this challenge!


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 3:08 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
NormalLuser wrote:
I am enjoying this challenge!
Good to hear! And it's enjoyable just being a spectator. :)

As for "improving how fast I read SD card bits," here is a link that might be helpful if you're using SPI mode. That post and the ones that surround it deal with maximizing bit-bang performance for a VIA driving SPI. Thanks in part to Garth's contributions, my code is faster than anything similar I've seen. He and I both take pains in regard to the bit assignments -- ie, which port bit each of the various SPI lines attaches to -- as this can open the door to some shortcuts in the code.

Speaking of code, here's a copy of the routine you posted, but using the Code tags for better presentation.
Code:
FillScreen:
TXA ;Color stored in X
;Start one more than needed because of DEX below
LDX #255 ;IE 255-1 is zero
FillScreenLoop:
INX ;This is unrolled so that there is a STA for each row of the screen.
;Display mapping has 28 bytes at end of each row unused.
;Display location = $2000
STA Display, x
STA Display +$80,x
STA Display +$100,x
STA Display +$180,x
STA Display +$200,x
..... <etc for all 64 lines > .....
STA Display +$1F80,x
CPX #98
BEQ FillScreenLoopEnd
JMP FillScreenLoop
FillScreenLoopEnd:
RTS
My first impulse was to rearrange things to minimize the number of times a ,x instruction would result in a page crossing. Then -- doh! :oops: -- I remembered that, for a write instruction, the extra cycle always applies, whether or not a page crossing occurs...

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 6:33 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
A small correction regarding frame rates - you shouldn't think of the difference between 16fps and 40fps as being 24fps, as that can be quite misleading. When profiling, it is better to think in frame times. Inverting everything, 16fps is 1000ms/16frames = 62.5ms/frame, 40fps is 1000ms/40frames = 25ms/frame, so your code is about 40ms (62.5ms-25ms) slower per frame than your baseline screen clear routine.

However, I think you are using a differential compression scheme? In which case most frames do not need to clear the whole screen at once, they only need to update parts that have changed, so it's not a great comparison. This is why cmorley's one can run at essentially 50fps (640x256) on a 2MHz processor - it goes straight to the regions that have changed and only writes those.

Is your goal to improve this in software, sticking within the constraints of your existing hardware, or are you also interested in improving the hardware?

The main thing I would do to speed things up is interleave the CPU operations with the video memory reads, rather than pausing the CPU until the blanking periods. That's a modification to the video card which you might be trying to avoid.. as at some point it stops being the same circuit - though you've already made a fair few of your own, I know! It would result in (and require) the CPU running at 5MHz, to match the rate at which the video circuit accesses memory, so you'd see roughly a 3x speed-up.

Also regarding the SD card, if you're wiling to change the hardware then it's definitely worth choosing VIA pins more carefully and/or using the shift register at least for reads - however, another option is to use a more bespoke circuit altogether, rather than a VIA, which could be very fast indeed.

Also, don't be scared of the SD card init sequence - it's mostly just sending and receiving bytes, so if you change the lines around and fix the read and write routines, then it should still work fine.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 7:10 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Just to note, there is I think a small optimisation possible in the screen clear code. Something like this, perhaps... which will surely contain one or two unimportant(!) off-by-one errors. It only saves a small handful of cycles, at best (I haven't counted)

Code:
FillScreen:
TXA ;Color stored in X
;We process half the lines in each loop so we can branch back within the range of a branch
;We decrement the counter so we don't need a comparison in each iteration
LDX #97
FillScreenLoop1:
;This is unrolled so that there is a STA for each row of the screen.
;Display mapping has 28 bytes at end of each row unused.
;Display location = $2000
;First loop processes just the odd lines
STA Display, x
STA Display +$100,x
STA Display +$200,x
..... <etc for all 32 odd lines > .....
STA Display +$1F00,x
DEX
BNE FillScreenLoop1
LDX #97
FillScreenLoop2:
;This time all the even lines
STA Display +$80,x
STA Display +$180,x
..... <etc for all 32 even lines > .....
STA Display +$1F80,x
DEX
BNE FillScreenLoop2
RTS


Edit: oops, need BPL, or to hoist the DEX, to get the right values


Last edited by BigEd on Tue Nov 21, 2023 11:51 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 11:06 pm 
Offline

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357
I did the Bad Apple video for the Apple II computer. What I did for compression is, I wrote a program to just mark the changes between frames. Then the video playback only stores the bytes that were changed between each frame. The video is then stored as: vertical byte, horizontal byte, #of bytes to change, screen bytes. There is no need to use FILL or draw the whole line and many lines will also not need any changes as well, saving a lot of unnecessary redrawing. I used the double-hi-res screen which makes 560x192 and can get 60 FPS at 4 Mhz on most screen transitions. With the odd screen being complicated enough that I only get 30 FPS.


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 21, 2023 12:07 am 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 47
Thanks for the idea!
So very close for code you just cranked out without testing!
Sorry to say that it skips row 0 pixels on the screen.

I already used a routine that took me forever to figure out pretty much like yours.
(Did I mention that what you cranked out in a comment section reply took me forever to figure out??)

I tried even/odd but instead split top of screen and then bottom so that it follows the VGA scanline for the least on-screen artifacting. Though the 'window blind' effect is pretty neat.

FillScreen:
TXA ;Color stored in X
LDX #100 ;one more than needed because of DEX below
FillScreenLoop:
DEX ;DEX up here so we can clear the 0 row
STA Display, x
STA Display +$80,x
<etc>
STA Display +$F80,x
BNE FillScreenLoop

LDX #100 ;one more than needed because of DEX below
FillScreenLoop2:
DEX ;DEX up here so we can clear the 0 row
;Doing $3000, so Display + $1000
STA Display +$1000,x
<etc>
STA Display +$1F80,x
BNE FillScreenLoop2
RTS

This uses 33,010 cycles.

The 'fast' routine I posted does not break it into two loops and only takes 32,910 cycles.
100 less.
What is lost doing the whole:
CPX #99
BEQ FillScreenLoopEnd
JMP FillScreenLoop
instead of:
BNE FillScreenLoop
at the end of the loop is gained back by only having the 1 loop instead of 2.
1 cycle is saved per column loop for the 100 saved cycles.
Not much but a win!
(Did I mention I spent WAY to much time on this?)

:)

Now that I am getting into 6502 assembly I find that I love how you can code for space or speed and all the ways you can do so much with so little!


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 21, 2023 2:32 am 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 47
Wow!
Thanks for the responses everyone!
Yes, I first want to get a version running with no hardware changes needed other than the SD card adapter, a couple of wires, and some capacitors(and a speaker). That gets you a 1.4mhz-ish equivalent CPU (5mhz halted ~71% of the time), vsync IRQ, and PB7 square wave audio.

I am using Differential Run Length Encoding with a 3 pixel dictionary lookup. Meaning a 2 byte 'packet' encodes 3 pixels, or a single non B/W/Grey color value, or a Skip pixel and a repeat of up to 255 pixels.

My 6502 decoder is very dumb and I am trying to keep it that way by not having it make many decisions.

All it does is fill the 8k that makes up the VGA screen buffer.
When it is full it goes back to the beginning.
But it can skip bytes and just add to the pointer.
So an unchanged frame uses up 64 bytes to skip but takes very little time to process/show (no vsync yet).

Just eye-balling using WinMerge shows that on 'normal' frames there are about 200 bytes that are different between the frames. Sometimes almost nothing, sometimes the whole frame.
That tracks with my 500 bytes on average per frame encode.
I need to locate myself somehow in the frame, so instead of logic comparing every data byte to see if it is a 'go to this 16 bit ram location' I said 'forget it!' and just stream it out as a 'skip' and then add the length of the skip to the screen location pointer in zp. In frames with a lot of changes it is no issue since there will be less than 255 bytes between changes anyway, and on low change frames I have cycles to spare.

Doing it like this lets me avoid any logic locating myself on the screen, I just stream skips, 3 pixel lookups, color pixels, and run lengths and blast it on the screen buffer as quick as I can.

Right now it is obvious that the low hanging fruit is the amount of time spent reading bytes from the SD card I think?

I put the SD card in 'CMD18' $52 READ_MULTI_BLOCK mode and then all I do is read the bytes off.

Currently I have (for reasons) the data read on pin 7 of Port A and the others on Port B. That means write bits or CS won't mess with the clock/pulse on CA2 that I hope to use (but don't yet).
I don't care in the least how slow or awkward writing to the SD card is as long as I can wire it up to the VIA somehow without extra hardware and then read just as fast as possible.

My current read code looks like this:
Code:
.R1loop:;Read Bit
  lda #SD_MOSI                ; enable card (CS low), set MOSI (resting state), SCK low
  sta  VIA_PORTB

  lda #SD_MOSI | SD_SCK       ; toggle the clock high
  sta  VIA_PORTB

  lda VIA_PORTA                   ; read next bit
  ROL
  ROL RLECount
;7 more times for a byte


This takes 184 cycles a byte according to Kowalski.
If I figure out how to get the SD card clocked off CA2 and make reads from Port A clock the SD card I should be able to just do this:

Code:
  lda VIA_PORTA                   ; read next bit
  ROL
  ROL RLECount
;7 more times for a byte


This takes 88<edit> cycles a byte.
Pretty big improvement!

I am trying to do it but I don't have the SD card initializing yet.

Regardless, sometime after I get this version working I plan on interleaving the CPU with the VGA clock and then trying to also stream digitized audio. At that point I might need to add hardware to speed up the SD card or just go ahead and use a 8bit CF card reader.

But I really want a easy 'turn-key' version of this demo so anyone with the BE style 6502/VGA setups can just grab $3 SD card reader, scrounge up a couple of wires and caps (and a speaker) and show off the system.
I'm looking for anything I can do without hardware changes at this point?
Thanks everyone!
You can see it here btw:
https://www.reddit.com/r/beneater/comments/17yj1by/my_6502_badapple_demo_needed_more_frames_per/


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 21, 2023 11:57 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Oops, sorry my BNE was wrong - BPL would work, or, as you did, hoisting the DEX would work too.

But I think there's an error in your experiment. Your first posted code will loop from 0 to 97 inclusive. My code (as fixed) will run from 97 to 0 inclusive. Your version of my code runs from 99 down to 0 which is why I think it runs slower - it's doing more work. As far as I can think it through, the simpler code at the bottom of the loop should save 5 cycles per iteration. The extra cost of having two loops which run half the distance is just a few cycles, once.


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 21, 2023 2:37 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 47
BigEd wrote:
Oops, sorry my BNE was wrong - BPL would work, or, as you did, hoisting the DEX would work too.

But I think there's an error in your experiment. Your first posted code will loop from 0 to 97 inclusive. My code (as fixed) will run from 97 to 0 inclusive. Your version of my code runs from 99 down to 0


Yea' You caught that! It was a mistake in the first post. The screen is
100 pixels wide by 64 tall, with 28 unused bytes per row.
I realized my error when I tested it on hardware and left junk on the screen.

I ran both versions through a 6502 emulator to get the cycle count and adjusted the loop count to fill the screen but maybe I made a mistake. I'll take another look at it. Thanks!


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 21, 2023 4:31 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
NormalLuser wrote:
If I figure out how to get the SD card clocked off CA2 and make reads from Port A clock the SD card I should be able to just do this:

Code:
  lda VIA_PORTA                   ; read next bit
  ROL
  ROL RLECount
;7 more times for a byte


This takes 88<edit> cycles a byte.
Pretty big improvement!

I think an interesting option could be putting MISO on bit 0 of port A, and the clock on CA2 automatically pulsing low on each read of port A. Then you can read a bit like this:

Code:
    ASL
    ora VIA_PORTA


i.e. shift A left, then OR in the bottom bit of port A, with other bits set as zero outputs or inputs pulled low. The "ora" operation will automatically send CA2 low and then high again, clocking in the next bit.

The next thing I wanted to suggest was thinking of the data as a bitstream rather than a byte stream. You're in control of the compression algorithm and playback code, and the SD card is just sending you a series of bits, so the byte boundaries don't actually matter. You could for example use '0' to represent that the next operation is a skip and '1' to represent that it's a sequence of data to write at the current location. Follow that with a run length or skip amount, possibly using a variable length encoding to support short or long runs efficiently, and after that if the initial bit was a 1, a stream of pixel data. Each pixel only requires one bit, so you repeat the above two instructions to read the next bit and then write it out to the video memory. In fact if you only care about a single bit, you don't need the ASL - that's only needed if you want to collect multiple bits one after another. So you can just "lda VIA_PORTA" and then branch based on whether it was zero or not.

Another option for the pixel data is to output a mask of pixels whose values need to change - so a zero bit means no change to the pixel, and a set bit means you swap the state of the pixel. But it may be faster to just read more data from the SD card rather than read-modify-write the video memory.

I'll think about it more later, I just wanted to share those ideas for now - I think the bitstream approach could have a big impact.

Edit: I had a play with this, and this sort of code seems to work - at least I was able to initialise the card this way, I've left the details of the init sequence out though as they're as in your code above:

Code:
; SD card interface test
;
; using bitbanging for NormalLuser
;
; SD_CS = CB2
; SD_SCK = CA2
; SD_MISO = PA0
; SD_MOSI = PB7

SD_BIT_MISO = $1    ; on Port A
SD_BIT_MOSI = $80   ; on Port B

; The low nybble sets CA2 (SD_SCK) in pulse output mode
;
; The high nybble sets CB2 (SD_CS) to output high or low
; CS is active-low, so "on" means low and "off" means high
;
PCR_CS_OFF = $ea  ; CB2 low, CA2 pulse output mode
PCR_CS_ON = $ca   ; CB2 high, CA2 pulse output mode

...

sd_init:
    ; Set CA2 to output pulses on reads from Port A, CS unasserted
    lda #PCR_CS_OFF : sta VIA_PCR

    ; Disable latching and shift register
    stz VIA_ACR

    ; Set Port A to input on PA0, output on other pins; and output zero
    stz VIA_PORTA
    lda #$ff-SD_BIT_MISO : sta VIA_DDRA

    ; Set Port B to output on PB0, initially outputting high
    lda #SD_BIT_MOSI : sta VIA_PORTB : sta VIA_DDRB

    ; First we have to let the card initialise by sending it at
    ; least 80 clock pulses with CS and MOSI high
    ;
    ; We do this by sending 10 bytes of $ff - 80 bits - with
    ; CS unasserted

    lda #PCR_CS_OFF : sta VIA_PCR

    ldx #16 ; a few extras
initloop:
    jsr sd_readbyte
    dex
    bne initloop
...

sd_readbyte:
    ; Receive a byte into A, high bit first
    lda #SD_BIT_MOSI : sta VIA_PORTB ; set MOSI

    bit VIA_PORTA ; toggle the clock once at the start

    lda VIA_PORTA : asl
    ora VIA_PORTA : asl
    ora VIA_PORTA : asl
    ora VIA_PORTA : asl
    ora VIA_PORTA : asl
    ora VIA_PORTA : asl
    ora VIA_PORTA : asl
    ora VIA_PORTANH       ; read last bit without causing a clock pulse

    rts

sd_writebyte:
    ; Send A, high bit first
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA : asl
    sta VIA_PORTB : bit VIA_PORTA

    rts


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 59 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: