6502.org • View topic - Yet another (unnamed) 65C02 core

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

Yet another (unnamed) 65C02 core

Page 3 of 3

[ 41 posts ]

Go to page Previous 1, 2, 3

Print view

Previous topic | Next topic

Author

Message

Windfall

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Sat Feb 19, 2022 9:44 pm

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands

After the 'full' and 'small' versions of this core, now follows the 'tiny' one, with its own interesting memory / performance tradeoff.

Where all three variations have a memory footprint of 524288 * n + 2048 * 5 + 64 bits, and the full and small have an n of 7 and 4 respectively (the small needs fewer 64 KB blocks but delays absolute addressed instructions by one cycle), the tiny has an n of 1, so it requires little more than a regular core, but still benchmarks at 165% (where the full and small both do over 200%). The 165% can get higher still, because some penalty cycles can probably be prevented (although it may decrease FMax).

So, with 2% memory overhead (which allows speculative reads from zero page and the stack, the related instructions still take only 1 cycle), and the need for true dual ported memory (either two reads or one write), the 'tiny' version of the core still performs 65% better than a real 65C02. Also, where the full and small top out at 200 MHz on a Stratix V, the tiny manages 240 MHz (currently resulting in a 400 MHz benchmark), which is right up there with the other, slightly more resource efficient cores.

Last edited by Windfall on Tue Feb 22, 2022 10:01 pm, edited 1 time in total.

Top

BigEd

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Mon Feb 21, 2022 7:26 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England

Nice illustration of how microarchitectural refinements sometimes do, and sometimes don't, pay off against max frequency. (And of course those payoffs shift around depending on implementation techniques and technology.)

Top

Windfall

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Tue Feb 22, 2022 9:56 pm

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands

BigEd wrote:

Yes. After sufficient optimisation, architectural changes do tend to result in the same perfomance ...

I've now published the 'tiny' version on my website, alongside slightly changed 'full' and 'small' versions : running Klaus Dormann 6502 verification tests obtained on https://github.com/mungre/beeb6502test revealed one bug (SBC V flag was wrong), and two largely irrelevant discrepancies (an issue with P bits 5 and 4, which I did not fix because it affects Fmax, and an issue with (not-EA) NOP sizes, which I fixed).

Since 'tiny' can make do with one copy of main memory, but requires it to be 3 x 8 = 24 bits wide (for shared code/data accesses), one 'byte twist' is needed: use true dual ported memory, read two consecutive 16-bit words (disregarding address bit 0), select the right 24 bits (according to address bit 0).

The published 'tiny' is 200 MHz Fmax, 200% benchmark (instead of 240 MHz Fmax, 165% benchmark, maybe that one follows some time).

Top

BigEd

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 8:38 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England

Windfall wrote:

Since 'tiny' can make do with one copy of main memory, but requires it to be 3 x 8 = 24 bits wide (for shared code/data accesses), one 'byte twist' is needed: use true dual ported memory, read two consecutive 16-bit words (disregarding address bit 0), select the right 24 bits (according to address bit 0).

Very interesting... but I must be missing something. I can't quite see how you get the sequential 24 bits you want. Could you elaborate a little please? How wide is the dual ported memory? Is it addressed by bytes? Does your 24 bit read take just one cycle total?

Top

John West

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 9:42 am

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 298

If it was 32Kx16, I can see how it might work: BRAM is dual-ported, so you can do two independent reads at the same time. If you want to read 3 bytes from address x, you read from address floor(x/2) from one port and floor(x/2)+1 from the other. Combine those into a 32 bit word and use either bits 0-23 or bits 8-24.

I don't see how a 24 bit wide memory works though. How do you translate a 16 bit address into the BRAM address without dividing by 3?

My own plan was to have four 8-bit memories, each providing 8 bits of a 32 bit word, but with independent addresses. Each memory can access either floor(x/4) or floor(x/4)+1. By controlling which memories get which address, and a bit of shuffling of the resulting data, you can get four consecutive bytes from any address. The address selection and shuffling would add a little complexity and probably reduce the maximum clock speed. I was hoping that needing fewer cycles for each instruction would compensate for that - I was particularly interested in the ability to have one instruction being read while the previous instruction wrote to memory (also, fetching 32 bits in one go means most of the time you get the opcode of the following instruction a cycle early, which might be helpful).

I probably won't be pursuing this any further though, as Windfall has already achieved most of what I was hoping to.

Top

Windfall

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 12:19 pm

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands

BigEd wrote:

You map the 64K x 8 address (r) onto 32K x 16 memory. Read 16 bits from r[15:1] + 0 and r[15:1] + 1, then use r[0] to extract the right 24 bits.

The setup in my 6502 Second Processor implementation is as below (how the TDP_W macro couples to memory should be self-explanatory, and imagine backslash-linefeed continuation in the macro part ...).

Code:

`define RAM_IN_FPGA_TDP_W(ABITS, INAME, RENABLEA, RADDRESSA, RDATAA, RENABLEB, RADDRESSB, RDATAB, WENABLE, WADDRESS, WDATA, ROMFILE)

ram_in_fpga_tdp # (.DATA_BITS(16), .ADDRESS_BITS(ABITS-1), .RAM_FILE(ROMFILE)) INAME
(
  .a_clock(ram_clock), .a_address(WENABLE ? WADDRESS[ABITS-1:1] : RADDRESSA), .ar_enable(RENABLEA), .ar_data(RDATAA), .aw_enable(WENABLE), .aw_data({ WDATA, WDATA }), .aw_byte(WADDRESS[0] ? 2'b10 : 2'b01),
  .b_clock(ram_clock), .b_address(                                RADDRESSB), .br_enable(RENABLEB), .br_data(RDATAB)
);

[... snipped ...]

reg         peek_16x24_1_skew;
wire [15:0] peek_16x24_1_data_0;
wire [15:0] peek_16x24_1_data_1;

always @(posedge ram_clock)
  peek_16x24_1_skew <= peek_16x24_address_1[0];

`RAM_IN_FPGA_TDP_W(16, peek_16x24_1_1, peek_16x24_enable_1, peek_16x24_address_1[15:1] + 0, peek_16x24_1_data_0, peek_16x24_enable_1, peek_16x24_address_1[15:1] + 1, peek_16x24_1_data_1, peek_16x24_write_1, write_16x8_address, write_16x8_data, "reco6502_main_0_w.hex")

assign peek_16x24_data_1 = peek_16x24_1_skew ? { peek_16x24_1_data_1, peek_16x24_1_data_0[15:8] } : { peek_16x24_1_data_1[7:0], peek_16x24_1_data_0 };

Last edited by Windfall on Wed Feb 23, 2022 12:32 pm, edited 1 time in total.

Top

Windfall

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 12:30 pm

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands

John West wrote:

My own plan was to have four 8-bit memories, each providing 8 bits of a 32 bit word, but with independent addresses. Each memory can access either floor(x/4) or floor(x/4)+1. By controlling which memories get which address, and a bit of shuffling of the resulting data, you can get four consecutive bytes from any address.

There is usually no need to shuffle data. Only addresses.

Top

BigEd

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 1:46 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England

Ah, thanks - the unavoidable thing, then, is to add one to an address - you can't just mask off the last bit - but it might well be that the (time) cost of the increment is no problem.

Top

Windfall

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 2:06 pm

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands

BigEd wrote:

Ah, thanks - the unavoidable thing, then, is to add one to an address - you can't just mask off the last bit - but it might well be that the (time) cost of the increment is no problem.

It will take a bit of time, like everything in front of it (multiplexing, other address calculations). The optimizer may be able to combine it with other additions (a +1 could simply become a carry in of another addition).

The addition can be avoided completely by doing the +1 read on another memory block with all its contents shifted down by 1 byte. This is what I do in several places in the 6502 Second Processor. Of course, the write address then incurs a -1. But that often turns out to be cheaper.

Top

John West

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 3:10 pm

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 298

There's a limited set of places that the addresses can come from. If the +1 is in the critical path, you might be able to duplicate those sources and have move it elsewhere. So two units handling indexed addressing modes, with one adding an extra carry. Two PCs, one a byte ahead of the other. And so on.

Of course, if the critical path is one of those other places, that would end up making it slower. It needs careful reading of the timing reports.

Top

Windfall

Post subject: Re: Yet another (unnamed) 65C02 core

Posted: Wed Feb 23, 2022 3:50 pm

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands

John West wrote:

If you build for speed only, the optimizer could very well do that, yes. It could duplicate logic as needed while weighing the benefits against the routing costs (if any), and pick the best one. As long as you don't use 'synthesis keep' it will do its own combinatorial optimization anyway, and probably do a much better job than you ever could

Top

Page 3 of 3

[ 41 posts ]

Go to page Previous 1, 2, 3

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

Who is online

Users browsing this forum: No registered users and 5 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum