6502 MIDI file player

lordbubsy · Post by **lordbubsy** » Thu Dec 27, 2018 12:23 pm

6502 MIDI file player

I’ve build a MIDI interface adapter for my single board computer. The output TXDB of my SC26C92 DUART is connected to the TTL MIDI input of a General Midi wavetable synthesizer (Roland SCB-55). I have 500kHz connected to IP5 which generates the needed 31,250 baud. Helpful schematics can be found at:
http://serdaco.com/files/BUILDINSTRUCTI ... LBOARD.pdf

The hardware and software is working, and I can send midi messages to serial port B.
Ultimately I’d like to program an interrupt based MIDI file player which plays standard GM (General MIDI) files.

I researched how MIDI file format 0 works and want to tackle the task in separate steps.

• A conversion program from VLQ (variable length quantity) to 32-bit value.
• A zero page pointer to grab MIDI data.
• A way to separate delta times and MIDI events.
• A lookup table to determine how many bytes each MIDI event takes.
• A counter to compare with delta times and execute MIDI events.
• Send the appropriate bytes to UART B at the right times.
• Make the whole process interrupt driven.

The first step: decoding variable length quantity (for MIDI) in W65C02S assembly.

Pseudo code:

1. Initialize the variable which will hold the value. Set it to 0. We'll call this variable 'result'.
2. Read the next byte of the Variable Length quantity from the MIDI file.
3. Shift all of the bits in 'result' 7 places to the left. (ie, Multiply 'result' by 128).
4. Logically OR 'result' with the byte that was read in, but first mask off bit #7 of the byte. (ie, AND the byte with hexadecimal 7F before you OR with 'result'. But make sure you save the original value of the byte for the test in the next step).
5. Test if bit #7 of the byte is set. (ie, Is the byte AND hexadecimal 80 equal to hexadecimal 80)? If so, loop back to step #2. Otherwise, you're done, and 'result' now has the appropriate value.

Code: Select all

        !cpu    65C02

*       =       $0400

;variables
result
        !byte   $00,$00,$00,$00 ;4 byte value

start
        stz     result          ;initialize result
        stz     result+1
        stz     result+2
        stz     result+3

        ldx     #0              ;initialize byte counter
lb1     ldy     #7
lb2     asl     result          ;shift all bits to the left
        rol     result+1        ;preserving the carry
        rol     result+2
        rol     result+3
        dey                     ;7 times?
        bne     lb2             ;no? repeat
        lda     midi,x          ;load byte from midi data (msb first)
        bpl     lb3             ;finish reading bytes if bit 7=0
        and     #$7f            ;mask 7th bit
        ora     result          ;add the value to the 7 freed bits of result
        sta     result
        inx
        bra     lb1             ;get the next byte from midi
lb3     ora     result          ;add the value to the 7 freed bits of result
        sta     result
        brk
*       =       $0500
midi
        !byte   $81,$80,$00,$00 ;81,80,00 vlq equal to $4000

I’ve tested with various values, and it seems to work as it should. But perhaps the routine can be optimized? I want to save zero page space, so I didn’t put the variables there.

Sources:
https://web.archive.org/web/20051129113 ... e/vari.htm

http://www.music.mcgill.ca/~ich/classes ... ormat.html

Chromatix · Post by **Chromatix** » Thu Dec 27, 2018 7:31 pm

Two ideas immediately jump out at me:

First, you are doing a multiple-precision left shift by 7 bits using a loop of shifts by one bit. You can instead do it by shifting one whole byte to the left and one bit to the right, without the loop.

Code: Select all

ROR result+3
LDA result+2
ROR A
STA result+3
LDA result+1
ROR A
STA result+2
LDA result
ROR A
STA result+1
LDA #0
ROR A
STA result

The above code may look longer and slower than four ROLs, but it only has to be executed once instead of seven times. You can therefore also delete the three instructions for handling that loop.

The other thing I would try to do is move that multi-precision shift off the top of the loop (where it's executed unconditionally, including on the first iteration where you know the contents of the variable to be zero) and to the bottom where you know that a shift is actually needed. So try this:

Code: Select all

	STZ result
	STZ result+1
	STZ result+2
	STZ result+3
	LDX #0
lb1:
	LDA midi,X
	BPL lb3
	AND #$7F
	TAY
	ROR result+3
	LDA result+2
	ROR A
	STA result+3
	LDA result+1
	ROR A
	STA result+2
	TYA
	ORA result
	ROR A
	STA result+1
	LDA #0
	ROR A
	STA result
	INX
	BRA lb1
lb3:
	ORA result
	STA result
	BRK

This reduces the total number of multi-precision shifts required by one. In the probably-common case where the value being decoded is less than 128, you don't need any at all, which is a very big timesaver in percentage terms - from 221 cycles in your code, down to 17 cycles in mine (measured over LDX to final STA).

lordbubsy · Post by **lordbubsy** » Thu Dec 27, 2018 11:04 pm

I got both suggestions working:

Code: Select all

start
        stz     result          ;initialize result
        stz     result+1
        stz     result+2
        stz     result+3

        ldx     #0              ;initialize byte counter
lb1     lda     midi,x          ;load byte from midi data (msb first)
        bpl     lb3             ;finish reading bytes if bit 7=0
        and     #$7f            ;mask 7th bit
        ora     result          ;add the value to result
        sta     result
        ror     result+3        ;bring bit 0 to carry
        lda     result+2        ;load previous byte
        ror                     ;shift to the right and put carry in 7th bit
        sta     result+3        ;put it in byte to the left
        lda     result+1
        ror
        sta     result+2
        lda     result
        ror
        sta     result+1
        lda     #0
        ror
        sta     result
        inx
        bra     lb1             ;get the next byte from midi
lb3     ora     result          ;add the value to the 7 freed bits of result
        sta     result
        brk

I’m not sure about the…

Code: Select all

   ASL A
   TAY
.
.
.
   TYA
   ROR A

…though.

Chromatix · Post by **Chromatix** » Thu Dec 27, 2018 11:06 pm

I actually edited the code to improve it after my initial post. Do try the revised version.

lordbubsy · Post by **lordbubsy** » Thu Dec 27, 2018 11:30 pm

I will, tomorrow.
How do you count the cycles? With Kowalski Simulator?

Chromatix · Post by **Chromatix** » Thu Dec 27, 2018 11:39 pm

I just look up how many cycles each instruction uses and add them up. It's somewhat easier on a simple centrally-controlled CPU like the 6502 than on a modern superscalar CPU. There's a table in the WDC datasheet, or: http://www.oxyron.de/html/opcodesc02.html

I've memorised a lot of the common ones; most "implicit" instructions requiring no memory accesses take 2 cycles, most absolute and absolute-indexed ones take 4, and branches take 3 cycles if branch taken, 2 if not taken. Read-modify-write ops take 6 cycles, as do JSR and RTS.

lordbubsy · Post by **lordbubsy** » Thu Dec 27, 2018 11:56 pm

OK, I thought you use some “magic” tool to quickly measure the cycle times.

Granted, there aren’t that many instructions, but still…

I tried it out, and it worked. Your last edited version is pretty much the same as the one that I got working. The only difference is that you use TAY to store ‘result’ while I store the accu directly into result, and then retrieve it later. But TAY is quicker.

Thanks for the improvements, it’s a lot quicker now.

lordbubsy · Post by **lordbubsy** » Sat Dec 29, 2018 12:51 pm

I’ve come to the point to decoding various events that can occur in a MIDI file. There are a fairly amount of documentations on the net, but this is one I chose as a reference.
http://somascape.org/midi/tech/mfile.html#events

I came up with this code:

Code: Select all

;-------;-------;-------;-------;-------;-------;-------;-------
DecodeEvent
                lda     (mfp)           ;load byte from midi data (msb first)
                cmp     #$ff            ;
                beq     MetaEvent_
                cmp     #$f0            ;
                beq     SysexEvent_
                cmp     #$f7            ;
                beq     SysexEvent_
                pha                     ;preserving the identifier
                lsr                     ;for midi events only the left nibble is needed
                lsr
                lsr
                lsr
                cmp     #$08            ;
                beq     NoteOff_
                cmp     #$09            ;
                beq     NoteOn_
                cmp     #$0a            ;
                beq     PolyPressure_
                cmp     #$0b            ;
                beq     Controller_
                cmp     #$0c            ;
                beq     ProgChange_
                cmp     #$0d            ;
                beq     ChanPressure_
                cmp     #$0e            ;
                beq     PitchBend_

                ldy     #$55            ;error code for debugging
                brk

MetaEvent_      jmp     MetaEvent
SysexEvent_     jmp     SysexEvent
NoteOff_        jmp     NoteOff
NoteOn_         jmp     NoteOn
PolyPressure_   jmp     PolyPressure
Controller_     jmp     Controller
ProgChange_     jmp     ProgChange
ChanPressure_   jmp     ChanPressure
PitchBend_      jmp     PitchBend

;-------;-------;-------;-------;-------;-------;-------;-------
MetaEvent
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
SysexEvent
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
NoteOff
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
NoteOn
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
PolyPressure
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
Controller
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
ProgChange
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
ChanPressure
                rts
;-------;-------;-------;-------;-------;-------;-------;-------
PitchBend
                rts

It’s just a start and doesn’t cover every event, but is this an adequate and readable way to go (in programming technique)?

Chromatix · Post by **Chromatix** » Sat Dec 29, 2018 9:05 pm

It'll work, but a jump table is often a better way to deal with tokenised dispatch. I would classify meta/sysex events as being less common than note on/off events, so specialising them that early doesn't help. Also you might as well reload the accumulator from the original location rather than saving it on the stack; it's 2 cycles quicker.

Code: Select all

DecodeEvent:
	LDA (mfp)
	AND #$F0
	LSR
	LSR
	LSR	; yes only three, because each jump address is 2 bytes
	TAX
	LDA (mfp)
	JMP (DecodeTable,X)
DecodeTable:
	.word	Dummy
	.word	Dummy
	.word	Dummy
	.word	Dummy
	.word	Dummy
	.word	Dummy
	.word	Dummy
	.word	Dummy
	.word	NoteOff
	.word	NoteOn
	.word	PolyPressure
	.word	Controller
	.word	ProgChange
	.word	ChanPressure
	.word	PitchBend
	.word	MetaSysex
[...etc...]

Another option is to organise your decoding as a binary tree instead of a linear search. In this way you can dispatch a 16-way table using 4 tests and branches consistently, instead of 1-16 of them. The first branch can be done without an explicit test instruction, because LDA sets the N flag to bit 7 of the value loaded. Use BIT and BNE/BEQ to test the remaining bits.

lordbubsy · Post by **lordbubsy** » Sun Dec 30, 2018 12:47 pm

I most likely have to consider your suggestions. MIDI type 0 files are playing now, ignoring meta and sysex events. I noticed that a more complex MIDI file stalls when there are a lot of messages with a delta time of 0.
I have attached the source file if someone is interested. The player is barely working though. The only system specific routines are those for the DUART I/O and timer interrupt.

lordbubsy · Post by **lordbubsy** » Mon Dec 31, 2018 1:55 pm

I’ve implemented the jump table, which results in a cleaner and faster code and works fine.

The midi file plays in an interrupt routine. Till now I’m entering the time (microseconds) per interrupt manually. To determine the interrupt time I have to calculate it from two variable, given by the midi file itself.
a. uspqn, from a tempo event at the beginning of a track (at time = 0), which is in microseconds per quarter note and covers a maximum of 24 bits.
b. tpqn, from the last two bytes of header chunk, which is in ticks per quarter note and covers a maximum of 15 bits.

What I need is to calculate the counter timer preset value tcp, which covers a maximum of 16 bits. What I did is to layout the formula so that it’s in a computer friendly form, i.e. big integers.

Formulas:
baud = tpqn / uspqn
ctp = 3.6864MHz / (2*baud)
ctp = (3686400 x uspqn) / (2000000 x tpqn)
ctp = (36864 x uspqn) / (20000 x tpqn)
Bytes needed:
2bytes x 3bytes / 2bytes x 2bytes
5bytes / 4bytes

To calculate this in ML I need a multiplication of 2bytes x 3bytes with a product of 5bytes,
and a multiplication of 2bytes x 2bytes with a product of 4bytes,
and division of 5bytes / 4bytes, the quotient has to be 16 bits.

I’ve found a multiplication routine here:
http://codebase64.org/doku.php?id=base: ... it_product
How can it be modified to 16 x 24 bits?
And I found a division routine here:
http://codebase64.org/doku.php?id=base: ... bit_result
How can it be modified to 40 / 32 bits?

I’d like to focus first on the multiplication.

Code: Select all

;16-bit multiply with 32-bit product 
;took from 6502.org
 
multiplier      = $f7 
multiplicand    = $f9 
product         = $fb 
 
mult16          lda     #$00
                sta     product+2       ; clear upper bits of product
                sta     product+3 
                ldx     #$10            ; set binary count to 16 
shift_r         lsr     multiplier+1    ; divide multiplier by 2 
                ror     multiplier
                bcc     rotate_r 
                lda     product+2       ; get upper half of product and add multiplicand
                clc
                adc     multiplicand
                sta     product+2
                lda     product+3 
                adc     multiplicand+1
rotate_r        ror                     ; rotate partial product 
                sta     product+3 
                ror     product+2
                ror     product+1 
                ror     product 
                dex
                bne     shift_r 
                rts

lordbubsy · Post by **lordbubsy** » Mon Dec 31, 2018 4:08 pm

OK, this doesn’t seem to work, so I’ll take an example from Leventhal’s book.

Code: Select all

;8-bit multiply with 16-bit product
;took from 6502 Assembly Language Programming by Lance A. Leventhal

        !cpu    w65c02

*       =       $0400

multiplicand    = $00   ;8 bit
multiplier      = $01   ;8 bit 
product         = $02   ;16 bit
;               = $03

                lda     #$00            ;lsb's of product= zero
                sta     product+1       ;msb's of product= zero
                ldx     #$08            ;number of bits in multiplier= 8
shift
                asl                     ;shift product left one bit (lsb)
                rol     product+1       ;shift product left one bit (msb) with carry
                asl     multiplier      ;shift multiplier left
                bcc     chcnt           ;no addition if next bit is zero
                clc                     ;add multiplicand to product, lsb with carry 0
                adc     multiplicand
                bcc     chcnt
                inc     product+1       ;with carry if necessary
chcnt
                dex                     ;loop until 8 bits are multiplied
                bne     shift
                sta     product         ;store lsb's of product
                brk

Now I'll try to expand this to 16 x 24 bit.

Dr Jefyll · Post by **Dr Jefyll** » Mon Dec 31, 2018 4:26 pm

lordbubsy wrote:

I need a multiplication of 2bytes x 3bytes with a product of 5bytes

Is speed important? In this post I talk about a speedup to an already-efficient multiply routine by Bruce ( dclxvi ).

I mention it here because the nature of the speedup is such that it becomes increasingly effective as the word size increases. As written, it's just 2bytes x 2bytes. But the technique ( illustrated in the animation here ) is even more attractive for your application, which includes a 3byte multiplicand.

Nice to see someone wrassling with some MIDI, BTW !

cheers
Jeff

lordbubsy · Post by **lordbubsy** » Mon Dec 31, 2018 5:09 pm

No, speed is absolutely not an issue for now, as it is just performed once before playing the midi file.
Perhaps at a later time it could be attractive to speed it up (to accommodate tempo changes during playing the midi file), but for now I’m having a hard time to wrap my head around the 8 x 8 bit multiplication shown above, and expand it to 16 x 8 bit, then 16 x 16 bit and eventually 24 x 16 bit. I know these are just basics, but I’m having a good time trying to figure it out.

At the moment I’m trying to expand it to 16 x 8 bit, but I can’t get it to work.

Code: Select all

        !cpu    w65c02

*       =       $0400

multiplicand    = $00   ;16 bit
multiplier      = $02   ;8 bit 
product         = $03   ;24 bit

                lda     #$00            ;product = zero
                sta     product+1       ;product = zero
                sta     product+2       ;product = zero
                ldx     #$08            ;number of bits in multiplier = 8
shift
                asl                     ;shift product left one bit (lsb)
                rol     product+1       ;shift product left one bit (msb) with carry
                rol     product+2       ;shift product left one bit (msb) with carry

                asl     multiplier      ;shift multiplier left
                bcc     chcnt           ;no addition if next bit is zero

                clc                     ;add multiplicand to product, lsb with carry 0
                adc     multiplicand
                sta     product
                lda     product+1
                adc     multiplicand+1
                sta     product+1

                bcc     chcnt
                inc     product+2       ;with carry if necessary
chcnt
                dex                     ;loop until 8 bits are multiplied
                bne     shift
                sta     product         ;store lsb's of product
                brk

Chromatix · Post by **Chromatix** » Mon Dec 31, 2018 11:53 pm

First I'll observe that 36864/20000 simplifies losslessly to 1152/625, dividing both sides by 32. This is likely to simplify the programming problem by making the intermediate values smaller.

To implement the multiplications, observe that 1152 = $480 which has only two bits set (2^10 + 2^7), and 625 is $271 which means it can be written as (2^9 + 2^7 - 2^4 + 2^0). This is a big advantage of multiplying by constants, as you can easily write a series of shifts and adds (or subtractions) without branching. Since the numerator multiplicand has a lot of unset bits to the right, you can simply not bother shifting those up to begin with, as long as you remember to compensate later; this reduces the number of bytes needed to store the scaled uspqn to 4. So the equation expands to (in pseudo-C syntax):

ctp = (((uspqn << 3) + uspqn) / ((tpqn << 9) + (tpqn << 7) - (tpqn << 4) + tpqn)) << 7;

Don't actually write the above in C, because it'll lose low bits of precision in the intermediates, since C doesn't do fixed-point arithmetic. In assembly, we *can* produce the extra low bits during the division, and then (as we did earlier) shift left 7 by shifting right 1. In fact we can produce even more bits, and use them to "dither" the timer for greater long-term timing precision.

The division routine now needs to take 32-bit unsigned numerator and denominator, and I think should produce a 16.16 fixed-point result (also unsigned) and detect overflow. Overflow occurs when the denominator, shifted 16 places left, is still smaller than the (unshifted) numerator, which in turn is impossible if the denominator is larger than 16 bits itself. We can test for this quite easily before starting the division proper:

Code: Select all

uspqn = $1A	; uint24
tpqn = $1D	; uint16 but $1F is used as temporary

numer = $20	; uint32
denom = $24	; uint32
quot = $28	; u16.16
remain = $2C	; workspace

scaling1:
	; numer = uspqn * 9
	LDA uspqn+0
	ASL A
	STA numer+0
	LDA uspqn+1
	ROL A
	STA numer+1
	LDA uspqn+2
	ROL A
	STA numer+2
	LDA #0
	ROL A
	STA numer+3
	ASL numer+0
	ROL numer+1
	ROL numer+2
	ROL numer+3
	ASL numer+0
	ROL numer+1
	ROL numer+2
	ROL numer+3
	LDA numer+0
	ADC uspqn+0
	STA numer+0
	LDA numer+1
	ADC uspqn+1
	STA numer+1
	LDA numer+2
	ADC uspqn+2
	STA numer+2
	BCC scaling2
	INC numer+3

scaling2:
	; denom = tpqn * $271
	; init denom with (tpqn << 7)
	STZ denom+3
	LDA tpqn+1
	LSR A
	STA denom+2
	LDA tpqn+0
	ROR A
	STA denom+1
	LDA #0
	ROR A
	; add raw tpqn
	ADC tpqn+0
	STA denom+0
	LDA denom+1
	ADC tpqn+1
	STA denom+1
	BCC :+
	INC denom+2
:	; add (tpqn << 9)
	ASL tpqn+0
	ROL tpqn+1
	LDA #0
	ROL A
	STA tpqn+2
	LDA denom+1
	ADC tpqn+0
	STA denom+1
	LDA denom+2
	ADC tpqn+1
	STA denom+2
	LDA denom+3
	ADC tpqn+2
	STA denom+3
	; subtract (tpqn << 4)
	ASL tpqn+0
	ROL tpqn+1
	ROL tpqn+2
	ASL tpqn+0
	ROL tpqn+1
	ROL tpqn+2
	ASL tpqn+0
	ROL tpqn+1
	ROL tpqn+2
	SEC
	LDA denom+0
	SBC tpqn+0
	STA denom+0
	LDA denom+1
	SBC tpqn+1
	STA denom+1
	LDA denom+2
	SBC tpqn+2
	STA denom+2
	BCS divison
	DEC denom+3

division:
	; check for overflow before actually dividing
	LDA denom+3
	BNE no_overflow
	LDA denom+2
	BNE no_overflow
	LDA numer+3
	CMP denom+1
	BCS overflow
	BMI no_overflow
	LDA numer+2
	CMP denom+0
	BMI no_overflow

overflow:
	; handle by returning the maximum valid counter value
	LDA #$FF
	STA quot+3
	STA quot+2
	STZ quot+1
	STZ quot+0
	RTS

no_overflow:
	; initialise remainder workspace with shifted-down numerator
	STZ remain+3
	STZ remain+2
	LDA numer+3
	STA remain+1
	LDA numer+2
	STA remain+0

I'll let you carry on from there.

6502 MIDI file player

6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Decoding various events in a MIDI file

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player

Re: 6502 MIDI file player