FT816 Core

Rob Finch · Post by **Rob Finch** » Thu Nov 27, 2014 8:37 am

Work has been started on a 65c816 compatible core called FT816. I've managed to get a small test program working for it.
Sources for the '816 core are in Github: http://github.com/robfinch/Cores/tree/m ... T816/trunk

There is a sample LED/Switch test system called FT816Sys.v as a top module.
Next level is an mpu module which has chip select decoding, call FT816mpu.v
Next level down is the cpu itself FT816.v

Code: Select all

	cpu		W65C816S
	.org	$E000

start:
	clc					; switch to '816 mode
	xce
	rep		#$30		; set 16 bit regs & mem
	ndx 	16
	mem		16
	lda		#$0070		; program chip selects for I/O
	sta		$F000		; at $007000
	lda		#$0071
	sta		$F002
	ldy		#$0000
.st0001:
	ldx		#$0000
.st0002:
	inx
	bne		.st0002
	lda		$7100
	sta		$7000
	iny
	bra		.st0001

	.org	$FFFC
	dw		$E000

MichaelM · Post by **MichaelM** » Thu Nov 27, 2014 12:54 pm

Rob:

Thanks for sharing. Had a look at the project. There's certainly a lot of work that's already been done. I'm certainly looking forward to reading about your progress on this project.

Rob Finch · Post by **Rob Finch** » Thu Nov 27, 2014 11:45 pm

Quote:

There's certainly a lot of work that's already been done.

I was able to make use of the '816 portion of a previous core (RTF65003) in order to speed development. Large parts of the core were 'already done' and working. One of the nice things about working in an HDL rather than with real chips, is it's easier to cut-and-paste. One can leverage the use of existing projects.

The core is fairly large, but it should fit into an slx9 or xc3s500 part (it's about 5000 LUTs). It's also fairly fast. I've got it running @64MHz. It outputs chip selects for slower memory and I/O parts, where the speed of the addressed area is controllable. 1/32, 1/4 clock rate are cs options.

MichaelM · Post by **MichaelM** » Fri Nov 28, 2014 2:47 am

Rob:

I noticed that your approach made many of the address and ALU calculations in parallel. From that I gathered that you were optimizing for speed rather than area/size. I think that you are succeeding in meeting your apparent goal by getting the part to operate around 64MHz. I think it will make the core quite attractive, especially for the speed hounds among us.

I also saw that you were using a shift register for the clock divider. That's a nice touch. It is a simple and effective approach to get your 1/32 and 1/4 clocks for the addressed devices. It is certainly easier to maintain the speed of the core by using shift registers instead of counters for dividers. The extra logic required for counting and decoding is far less efficient than your shift register approach.

I noticed that you made an update earlier today. It appeared related to the return address differences between BRK traps, interrupts, and subroutine calls. I had thought about making my 65C02 core compatible with the idiosyncratic behavior of the 6502/65C02 for subroutine calls, BRK traps, and interrupts. Instead, I opted to make the return addresses for all three the same so that my RTS and RTI microroutines both increment the return address by 1 before the fetch of the instruction.

Is that what you were attempting to do, and if so, why not make RTS and RTI behave in a common manner with respect to the return address?

Rob Finch · Post by **Rob Finch** » Fri Nov 28, 2014 6:33 am

Quote:

I noticed that your approach made many of the address and ALU calculations in parallel. From that I gathered that you were optimizing for speed rather than area/size. I think that you are succeeding in meeting your apparent goal by getting the part to operate around 64MHz. I think it will make the core quite attractive, especially for the speed hounds among us.

Yes, that's a speed optimization, and also a coding simplicity optimization (development time). The core could probably be made more compact by making better reuse of some of the components.

Quote:

Is that what you were attempting to do, and if so, why not make RTS and RTI behave in a common manner with respect to the return address?

I was worried about breaking existing software. I think I've got the behaviour the same as the 6502/65816. There are some pieces of software that pull and increment the return address. For instance to access inline parameters. It cost extra logic to remain 100% software compatible. One thing that's different is the PC increment. I think on the '816 only the low order 16 bits increment. In FT816 all 24 bits increment. This means that software that relies on wrapping around at the end of a bank is broken.

Rob Finch · Post by **Rob Finch** » Sat Nov 29, 2014 7:32 am

Numerous bug fixes have taken place over the past couple of days. A sign the core is still in early development. But it is running code. An attempt to get strings to display onscreen is currently in progress. Clearscreen works but strings are coming out partially garbled.

Triple byte incrementing pointers are on the table tonight. On the 816 when the memory is set to 16 bits, data operations become 16 bit. That includes the memory based operations like INC and ROR. A nice to have feature would be triple-byte increments (24 bit) for zero page pointers. The question is how to implement. My thought is to have a range of zero page reserved that automatically responds to increment and decrement operations with 24 bit operations rather than 16 bit ones. Suppose the range was $20 to $2F. INC $20 would increment across three bytes ($20,$21 and $22), rather than two. It sounds simple to do, but it requires manipulating 24 bit values in the core. It might be nice to have 24 bit shifts / rotates as well.

Dr Jefyll · Post by **Dr Jefyll** » Sat Nov 29, 2014 11:29 pm

Rob Finch wrote:

It sounds simple to do, but it requires manipulating 24 bit values in the core.

When you mentioned triple-byte increments, I misunderstood, maybe, and pictured an operation performed byte-serially in three pieces. Doing it that way means if there's no carry you can take an early exit and save cycles (or one, at least). If the triple-byte value lives in 8-bit external RAM, that idea seems like a win. But I think you're anticipating Direct Page will be in on-chip RAM, is that right? So you can grab 24 bits at once? If 16 bits at once is easier, you could still break the 24-bit operation down into two pieces.

But I wonder whether, deep in its heart, FT816 wants to be a native 24-bit machine. (Or 32/24-bit.) I mean with downgraded 8/16-bit modes to match the 65816.

-- Jeff

Rob Finch · Post by **Rob Finch** » Sun Nov 30, 2014 2:21 am

Quote:

When you mentioned triple-byte increments, I misunderstood, maybe, and pictured an operation performed byte-serially in three pieces. Doing it that way means if there's no carry you can take an early exit and save cycles (or one, at least). If the triple-byte value lives in 8-bit external RAM, that idea seems like a win. But I think you're anticipating Direct Page will be in on-chip RAM, is that right? So you can grab 24 bits at once? If 16 bits at once is easier, you could still break the 24-bit operation down into two pieces.

Yes, the increment is byte serial. I modified the core to skip the store on a RMW instruction if the high-byte didn't change.
I think I'm scrapping the triple-byte increment mode, in favor of a couple of 24 bit counters I/O devices located in zero page. I coded the counters at the MPU level. Leaving the cpu alone. With the CPU clock so fast, 24 bit counters rather 16 would be more useful. It's also possible to trigger a count cycle in software, so the counters could be used as interpretive pointers as well. I have the one counter operating as a down counter to generate periodic interrupts. It takes 19 bits to divide down to 100Hz from the cpu clock.

Quote:

But I wonder whether, deep in its heart, FT816 wants to be a native 24-bit machine. (Or 32/24-bit.) I mean with downgraded 8/16-bit modes to match the 65816.

I have to resist temptations on that one. The thought of full 24 bit registers did cross my mind.

I've managed to find some free opcode space in the 65816 instruction set - it's the branches. A branch displacement of $FF is a no-no, so the branch opcodes could be reused if the displacement is $FF, to mean something else. I've though of using them as prefix codes and maybe stealing some from Michael's cpu core.

Rob Finch · Post by **Rob Finch** » Mon Dec 01, 2014 3:25 am

The $FF branch displacements have been allocated to long forms for the branches. Code below shows how it's working. Long branching has been put to use in a parser for terminal emulation. Long branching has been made a core parameter should it not be desired. The following code was assembled with the Finitron 65816 assembler.

Code: Select all

   5306 00E0E0                             DisplayChar:
   5307 00E0E0 29 FF 00                     	AND		#$0FF
   5308 00E0E3 24 3C                        	BIT		EscState
   5309 00E0E5 30 FF 87 00                  	LBMI	processEsc
   5310 00E0E9 C9 08 00                     	CMP		#BS
   5311 00E0EC F0 FF 31 01                  	LBEQ	doBackSpace
   5312 00E0F0 C9 91 00                     	CMP		#$91			; cursor right
   5313 00E0F3 F0 FF 7F 01                  	LBEQ	doCursorRight
   5314 00E0F7 C9 93 00                     	CMP		#$93			; cursor left
   5315 00E0FA F0 FF 84 01                  	LBEQ	doCursorLeft
   5316 00E0FE C9 90 00                     	CMP		#$90			; cursor up
   5317 00E101 F0 FF 86 01                  	LBEQ	doCursorUp
   5318 00E105 C9 92 00                     	CMP		#$92			; cursor down
   5319 00E108 F0 FF 88 01                  	LBEQ	doCursorDown
   5320 00E10C C9 99 00                     	CMP		#$99			; delete
   5321 00E10F F0 FF 35 01                  	LBEQ	doDelete
   5322 00E113 C9 0D 00                     	CMP		#CR
   5323 00E116 F0 44                        	BEQ		doCR
   5324 00E118 C9 0A 00                     	CMP		#LF
   5325 00E11B F0 44                        	BEQ		doLF
   5326 00E11D C9 94 00                     	CMP		#$94
   5327 00E120 F0 FF 46 01                  	LBEQ	doCursorHome	; cursor home
   5328 00E124 C9 1B 00                     	CMP		#ESC
   5329 00E127 D0 05                        	BNE		.0003
   5330 00E129 64 3C                        	STZ		EscState		; put a -1 in the escape state
   5331 00E12B C6 3C                        	DEC		EscState
   5332 00E12D 60                           	RTS

The FT816 test system runs Supermon816 but there are some display issues still. These are believed to be a software problem with the terminal emulation and not a softcore problem. That is not to say the core doesn't have bugs - the latest fix was to JMP (indirect) - but the core seems more stable the last day or so.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Dec 01, 2014 6:16 am

Rob Finch wrote:

The FT816 test system runs Supermon816 but there are some display issues still. These are believed to be a software problem with the terminal emulation and not a softcore problem. That is not to say the core doesn't have bugs - the latest fix was to JMP (indirect) - but the core seems more stable the last day or so.

The download version of SuperMon 816 has a WYSE 60 driver in it. The mumbo-jumbo that makes it work is well-commented and can be readily changed to support a different terminal type.

barrym95838 · Post by **barrym95838** » Mon Dec 01, 2014 7:34 am

Rob Finch wrote:

Code: Select all

   5306 00E0E0                             DisplayChar:
   5307 00E0E0 29 FF 00                     	AND		#$0FF
   5308 00E0E3 24 3C                        	BIT		EscState
   5309 00E0E5 30 FF 87 00                  	LBMI	processEsc
   5310 00E0E9 C9 08 00                     	CMP		#BS
                                                   ...

How exactly does this work, Rob?

I don't know how your native mode works, so I have to ask the following (please forgive my ignorance if it's already been explained elsewhere):

1) The 8-bit BIT $3C instruction ANDs the value in A with the 8-bit value in $3C, using it to set or clear Z. It transfers the contents of bits 6 and 7 in $3C to V and N, respectively, right?

2) The 16-bit BIT $3C instruction would then AND the value in A with the 16-bit value in $3C and $3D, little-endian style, and use it to set or clear Z. It would then transfer the contents of bits 14 and 15 of the contents of $3C and $3D to V and N, respectively, right?

3) Where are bits 14 and 15 of the 16-bit value "in" $3C? Are they not in bits 6 and 7 of $3D? I don't know how you implemented your ZP variable EscState, but is it possible that your terminal emulation bug lies in this vicinity?

Mike

Rob Finch · Post by **Rob Finch** » Mon Dec 01, 2014 10:00 am

Quote:

The download version of SuperMon 816 has a WYSE 60 driver in it. The mumbo-jumbo that makes it work is well-commented and can be readily changed to support a different terminal type.

I was referring to a probable bug in my own code to emulate a WYSE 60 terminal no Supermon. I had read through the comments. Supermon816 works great. I found one display bug in the text video controller I had set up - readback of character attribute codes wasn't working. Nothing to do with the '816 emulation.

Quote:

How exactly does this work, Rob?

I don't know how your native mode works, so I have to ask the following (please forgive my ignorance if it's already been explained elsewhere):

1) The 8-bit BIT $3C instruction ANDs the value in A with the 8-bit value in $3C, using it to set or clear Z. It transfers the contents of bits 6 and 7 in $3C to V and N, respectively, right?

2) The 16-bit BIT $3C instruction would then AND the value in A with the 16-bit value in $3C and $3D, little-endian style, and use it to set or clear Z. It would then transfer the contents of bits 14 and 15 of the contents of $3C and $3D to V and N, respectively, right?

3) Where are bits 14 and 15 of the 16-bit value "in" $3C? Are they not in bits 6 and 7 of $3D? I don't know how you implemented your ZP variable EscState, but is it possible that your terminal emulation bug lies in this vicinity?

You are right on with #1 and #2. There are no modes to this core - it's a straight '816 emulation. So it's supposed to work exactly like the '816 would. Accumulator and memory are both set to 16 bits before the subroutine is called. There are two bytes of zero page reserved for the EscState flag. $3C,$3D. The bit test is using the MSB as a flag that an escape sequence is present. The flag is set to -1 later in code if an ESC char is present. The program does get into processing escape sequences, as things like reverse video, and cursor on/off seem to work. It's just that the display goes nuts when I try running Supermon's disassemble function. It does disassemble code, but it's all over the screen rather than being neatly laid out. I could post the code, but it's a little long.

Rob Finch · Post by **Rob Finch** » Mon Dec 01, 2014 10:09 am

Code for the terminal emulation.
Vars:
- VIDBUF is the text video memory, laid out PC style with high byte attribute, and low byte screen char
- VideoPos is an index into video memory.
- NormAttr is the normal display attribute in use
- VIDREGS is text controller video register set. +13,14 an index for the cursor position

Code: Select all

;------------------------------------------------------------------------------
; Display a character on the screen device
;------------------------------------------------------------------------------
;
DisplayChar:
	AND		#$0FF
	BIT		EscState
	LBMI	processEsc
	CMP		#BS
	LBEQ	doBackSpace
	CMP		#$91			; cursor right
	LBEQ	doCursorRight
	CMP		#$93			; cursor left
	LBEQ	doCursorLeft
	CMP		#$90			; cursor up
	LBEQ	doCursorUp
	CMP		#$92			; cursor down
	LBEQ	doCursorDown
	CMP		#$99			; delete
	LBEQ	doDelete
	CMP		#CR
	BEQ		doCR
	CMP		#LF
	BEQ		doLF
	CMP		#$94
	LBEQ	doCursorHome	; cursor home
	CMP		#ESC
	BNE		.0003
	STZ		EscState		; put a -1 in the escape state
	DEC		EscState
	RTS
.0003:
	ORA		NormAttr
	PHA
	LDA		VideoPos
	ASL
	TAX
	PLA
	STA		VIDBUF,X
	LDA		CursorX
	INA
	CMP		#$56
	BNE		.0001
	STZ		CursorX
	LDA		CursorY
	CMP		#$30
	BEQ		.0002
	INA
	STA		CursorY
	BRL		SyncVideoPos
.0002:
	JSR		SyncVideoPos
	BRL		ScrollUp
.0001:
	STA		CursorX
	BRL		SyncVideoPos
doCR:
	STZ		CursorX
	BRL		SyncVideoPos
doLF:
	LDA		CursorY
	CMP		#30
	LBEQ	ScrollUp
	INA
	STA		CursorY
	BRL		SyncVideoPos

processEsc:
	LDX		EscState
	CPX		#-1
	BNE		.0006
	CMP		#'T'	; clear to EOL
	BNE		.0003
	LDA		VideoPos
	ASL
	TAX
	LDY		CursorX
.0001:
	CPY		#55
	BEQ		.0002
	LDA		#' '
	ORA		NormAttr
	STA		VIDBUF,X
	INX
	INX
	INY
	BNE		.0001
.0002:
	STZ		EscActive
	RTS
.0003:
	CMP		#'W'
	BNE		.0004
	STZ		EscState
	BRL		doDelete
.0004:
	CMP		#'`'
	BNE		.0005
	LDA		#-2
	STA		EscState
	RTS
.0005:
	CMP		#'('
	BNE		.0008
	LDA		#-3
	STA		EscState
	RTS
.0008:
	STZ		EscState
	RTS
.0006:
	CPX		#-2
	BNE		.0007
	STZ		EscState
	CMP		#'1'
	LBEQ	CursorOn
	CMP		#'0'
	LBEQ	CursorOff
	RTS
.0007:
	CPX		#-3
	BNE		.0009
	CMP		#ESC
	BNE		.0008
	LDA		#-4
	STA		EscState
	RTS
.0009:
	CPX		#-4
	BNE		.0010
	CMP		#'G'
	BNE		.0008
	LDA		#-5
	STA		EscState
	RTS
.0010:
	CPX		#-5
	BNE		.0008
	STZ		EscState
	CMP		#'4'
	BNE		.0011
	LDA		NormAttr
	; Swap the high nybbles of the attribute
	XBA				
	SEP		#$30		; set 8 bit regs
	NDX		8			; tell the assembler
	MEM		8
	ROL
	ROL
	ROL
	ROL
	REP		#$30		; set 16 bit regs
	NDX		16			; tell the assembler
	MEM		16
	XBA
	AND		#$FF00
	STA		NormAttr
	RTS
.0011:
	CMP		#'0'
	BNE		.0012
	LDA		#$BF00		; Light Grey on Dark Grey
	STA		NormAttr
	RTS
.0012:
	LDA		#$BF00		; Light Grey on Dark Grey
	STA		NormAttr
	RTS

doBackSpace:
	LDY		CursorX
	BEQ		.0001		; Can't backspace anymore
	LDA		VideoPos
	ASL
	TAX
.0002:
	LDA		VIDBUF,X
	STA		VIDBUF-2,X
	INX
	INX
	INY
	CPY		#56
	BNE		.0002
.0003:
	LDA		#' '
	ORA		NormAttr
	STA		VIDBUF,X
	DEC		CursorX
	BRL		SyncVideoPos
.0001:
	RTS

; Deleting a character does not change the video position so there's no need
; to resynchronize it.

doDelete:
	LDY		CursorX
	LDA		VideoPos
	ASL
	TAX
.0002:
	CPY		#55
	BEQ		.0001
	LDA		VIDBUF+2,X
	STA		VIDBUF,X
	INX
	INX
	INY
	BRA		.0002
.0001:
	LDA		#' '
	ORA		NormAttr
	STA		VIDBUF,X
	RTS

doCursorHome:
	LDA		CursorX
	BEQ		doCursor1
	STZ		CursorX
	BRA		SyncVideoPos
doCursorRight:
	LDA		CursorX
	CMP		#55
	BEQ		doRTS
	INA
doCursor2:
	STA		CursorX
	BRA		SyncVideoPos
doCursorLeft:
	LDA		CursorX
	BEQ		doRTS
	DEA
	BRA		doCursor2
doCursorUp:
	LDA		CursorY
	BEQ		doRTS
	DEA
	BRA		doCursor1
doCursorDown:
	LDA		CursorY
	CMP		#30
	BEQ		doRTS
	INA
doCursor1:
	STA		CursorY
	BRA		SyncVideoPos
doRTS:
	RTS

HomeCursor:
	LDA		#0
	STZ		CursorX
	STZ		CursorY

; Synchronize the absolute video position with the cursor co-ordinates.
;
SyncVideoPos:
	LDA		CursorY
	ASL
	TAX
	LDA		LineTbl,X
	CLC
	ADC		CursorX
	STA		VideoPos
	STA		VIDREGS+13		; Update the position in the text controller
	RTS

Rob Finch · Post by **Rob Finch** » Fri Oct 02, 2015 6:55 pm

Fixed a subtle bug in the core today. The stack pointer high byte was not being set to 01h when a switch to emulation mode occurs. Instead the core forced stack address calculations in emulation mode to use '01h' as the stack page without modifying SPH. This bug did not appear to break any software that was tested so far. However when switching back to native mode from emulation mode, the SPH register didn't contain an 01h. Instead it would contain the previous value set in native mode. It is now fixed to switch to page 01h.

I have not tested this fix yet. I need to upgrade my toolset to be Window 10 compatible.

Rob Finch · Post by **Rob Finch** » Fri Oct 30, 2015 3:51 pm

I've copied the FT816 core and am adding 32 bit support to create the FT832 core. It will be backwards compatible. As suggested by others on the board, the FT832 will have all 32 bit registers in native mode. The program bank register is turned into a code segment register, and the data bank register is turned into a data segment register. In order to switch to emulation mode from native mode a long jump JML instruction will be used that specifies both the program counter and code segment. In this case the code segment value will be $FFFFFFFF to switch to 8 bit emulation or $FFFFFFFE to switch to 16 bit emulation. The reason a jump instruction is used because the code segment needs to be zeroed on the switch, and the programs execution address forced to a known value. In native 32 bit mode the eight bit displacement instruction will be shifted left twice before put to use. This will allow more efficient use of zero page (extending it to 1kB). Also in native mode long address mode instructions are 32 bit addresses rather than 24 bit. Otherwise the instruction set remains the same. It is necessary to flip bits in the status register to get byte or 16 bit operations from native mode.

FT816 Core

FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core

Re: FT816 Core - SPH01

Re: FT816 Core / FT832 Core