6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 10:15 pm

All times are UTC




Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sun Oct 15, 2023 8:01 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
You may have noticed a recent theme in which I've been investigating ways in which a person might build an SBC and program it simply, in situ, using a PC as both the programmer and the terminal.

The first attempt used a parallel eeprom - basically Grant's design with a couple of minor changes - and worked well but required the eeprom removing and reprogramming every time the code changed... the SBC works fine but the pins on the eeprom can take a battering, and it required me to build an eeprom programmer (nucleo, some ram, and a zip socket) and write two lots of software to get the code in - one lot for the nucleo and one for the programmer running under linux. It works, but lacks elegance.

The most recent attempt was the Neolithic Romless in which the code du jour in hex file form is massaged into a binary format and squirted over a serial link - the same link being used later for normal serial comms. The only three disadvantages I can think of at the moment are (a) it's volatile, so you have to reload the code at every re-power; (b) it lacks a way to reset the processor without it dropping into reload mode; and (c) ideally any code running should wait for a wake-up character from the terminal to avoid any early characters being missed.

So here's another thought... an SPI serial eeprom or flash could be programmed in-circuit (isolated by resistors perhaps) using a simple programmer - there are lots available on ebay, or e.g. an arduino or a nucleo could be used. Obviously the 6502 can't execute in place from an SPI memory, but a similar approach might be used to the Neolithic Romless and output a serial stream to a serial shift register, with a counter to maintain the address...
  • The command code for a 'read' on every SPI memory I've come across is 0x03, sent high bit first, followed by two address bytes. The fourth byte is the memory read at that location, and if the chip select is kept low, it will autoincrement and read forever, irrespective of the data sent.
  • Generating that 0x03 is simply an AND of the bits Q1 and Q2 of a counter driven by the CLK signal; this produces the necessary two bits every eight clocks.
  • If the next two bits Q3 and Q4 are NORed together, and that output is ANDed with Q1 and Q2, then an output 0x03, 0x00, 0x00, 0x00 is repeated ad nauseam. (Cycles 0, 1, 2, 3).
  • The output data from the SPI is valid starting at cycle 3 - assuming that the device is 64kB or less - and can be clocked into a suitable serial shift register. The time to generate a write pulse would be in the first half clock of each cycle; after that the address counter can be incremented.
  • The address counter needs to initially reset, and stay reset until cycle 3, after which it can increment at CLK/8. A sufficiently high count (power of 2, so say A14 rising for a 16kB transfer) can be used to disable the counter, and to enable the processor

The nice thing about this is that the transfer is ridiculously fast: at an 8MHz clock, you're looking at 1ms per kB (if I got my sums right) and you could fill this with whatever... Basic and a monitor or some sort filing system from a memory card or just a simple boot loader. What's in the SPI is non-volatile but easily externally programmed.

I still need to think about the timings involved, but I suspect this is a five or six chip solution.

Thoughts?

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 15, 2023 10:34 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1117
Location: Albuquerque NM USA
Bootstrap from serial EPROM is an interesting idea, particularly relevant for 6502. Z80 world always needs a mass storage for CP/M so bootstrap from CF seems obvious, but the 6502 world seldom needs mass storage, or it can be slower SD disk. It is more difficult to bootstrap from SD disk, so serial EPROM may be a better approach for a very compact, ROM-less design.

I assume you want the design to be entirely in TTL logic? One of my earliest retro hobby projects after retirement was a serial bootstrap 68000 using AT24C256, but it used a CPLD for serial EPROM interface. The CPLD design was all in schematic but it has been 6-7 years since I looked at it closely. It may be interesting to see how the state machine can be realized in TTL logic and apply it to 6502. Possibly a disk-less version of CRC65.

Bill

PS, by the way, it took about 1/2 second to read the content of AT24C256 (32KB) into DRAM.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 5:27 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
Quote:
I assume you want the design to be entirely in TTL logic?

You know me so well :mrgreen:

Yes, I'm looking at a ttl/hc discrete logic design. The state machine is very simple, I think: an external reset, er, resets everything and starts the fast counter (borrowing the system clock, probably); a 393 counter for the six states required to generate the command signal (a couple of extra stages beforehand will increase the load time but might be handy to generate phased signals if needed); a couple more 393 for the address counter, a flip flop to enable/disable things, and a couple of gates. I think that the connection to the bus can be via resistors - 47k feels like a good number.

I'll stick up a preliminary design soon when I've hand more thought on it.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 7:09 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
In another thread about this, there was a suggestion to use 2 SPI Flash devices as a sort of micro-sequencer to control the signals... My thoughts are where might this end - 3 devices might implement a better micro-sequencer to control the signals and gate data into the CPU to synthesize memory write commands, or 4 - a 4-bit wide ROM, 5,6,7 or ... 8 ? where might it end... And at what point does one persons hardware become anothers software...

It's an interesting project and thought line though...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 7:59 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
I don't think it needs to be that complex: The same CS signal that the SPI needs to see can conveniently both enable the SPI and disable the 65c02 (and reset it) in the same way as Neolithic Romless does. The only vaguely tricky bit I think is to reset the address counter on cycle four, and then it can be stopped at any power of two when the Qx line goes high.

It's a state machine that spends most of its life asleep, just like me...

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 8:09 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
Looking at how some of the ARM microcontrollers do XIP from SPI flash, it looks like they do something essentially similar: they dump either the complete flash contents, or selected chunks of it, to on-chip ram. But they have the SPI hardware (single, dual, quad, or sometimes octal) on chip, and some sort of memory management too; probably more complex than we'd need for a 6502.

I think the significant thing about using SPI is this ability to read sequential addresses until stopped. If you want random access to an SPI flash you'd need to send three bytes and receive one for each random location (read instruction, two bytes of address - three for larger parts - and the data itself). That's thirty-two clocks per CPU cycle: A 32MHz clock is probably not a problem for discrete HC or AHC logic, but would rather restrict the operating speed of the 6502. Obviously that gets faster for wider parts - an octal SPI uses (IIRC) four output pins on two edges - but the controlling logic is definitely getting out of hand for discrete logic.

Neil

edit: XIP = execute in place


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 10:02 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Ah, so you could construct a bootstrap approach which only ever reads sequentially from the SPI device? Some kind of straightline code for the 6502 to execute, or some kind of serial stream (perhaps with gaps between bytes) which feeds a serial to parallel RAM loader engine?

Any extra mileage in using a nibble wide SPI device so you have cycle by cycle control of 4 wires?


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 10:28 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
I don't think there is; you'd need pretty much the same control for the nibble-wide as for the device it's controlling.

For this design I'm just looking at a straightforward dump-a-block-of-code to the top of ram (so it always includes the reset vectors) with a block size 1k-2k-4k-8k-16k, and then running that code. One option (proof of concept) is the full 16k, starting with basic at 0xc000 and some sort of monitor above that, ideally able to download e.g. a hex file directly.

That's the great thing about this hobby: I don't have to make it work to spec first time...

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 10:34 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
A month or so ago I was thinking about trying a hybrid between this sort of thing and Jeff's minimal three-wire system - in particular using the CPU to generate the addresses to write to. If you arrange for your regular address decoding to serve bytes sequentially from SPI EEPROM, e.g. for all addresses above $ff00 regardless of the address accessed, you can first stream a sequence of instructions to load a short second stage loader into RAM (e.g. in the stack), then chain to that and it can read subsequent bytes more efficiently to fill the rest of the RAM.

The clock would need to be stretched during EEPROM accesses to give time to deserialise the data. 4-bit outputs on the EEPROMs would remove the need for stretching - two EEPROMs next to each other can provide all 8 bits at once.

The first stage could just be:

Code:
ldx #6 : txs
lda #$?? : pha
lda #$?? : pha
lda #$?? : pha
lda #$?? : pha
lda #$?? : pha
lda #$?? : pha
jmp $100

This costs 24 bytes (and cycles) of slow SPI EEPROM access to write 6 bytes into RAM, but those 6 bytes forming the second stage loader can stream in the third stage code more efficiently - one byte (and cycle) of EEPROM access per byte loaded into RAM:

Code:
loop:
    lda $fff8
    pha
    bra loop

This will fill the rest of the stack and end up overwriting the offset in the "bra loop" at the end, so execution can continue into the third stage code. The third stage code then has the whole stack page to live in and can stream more data to specific locations elsewhere in RAM.

I think the circuit for this would consist of the SPI EEPROM and shift register to deserialise it, connected to the data bus; and driving the SPI EEPROM you'd possibly need a counter to count the 8 bits and stretch the clock until the result is ready, and a bit more glue. The dual 4-bit EEPROM version wouldn't need the counter, I think, so could be very compact.

Jeff posted an example circuit to drive such EEPROMs here, which I think he's tested: viewtopic.php?p=76975#p76975 He had to use a delay circuit to send the right signal to the EEPROMs - I think some other brands of EEPROM wouldn't require that though.

Edit: Or a PLD as plasmo said :)


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 1:58 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1117
Location: Albuquerque NM USA
What BigEd and gfoot have said, I am intrigued by the idea of multi-stage bootstrap with serial EPROM feeding instructions to be executed by 6502 which build a small program in RAM to load in an Intel Hex loader that loads bigger program. This way 6502 is always in control, no bus arbitration logic to fiddle with, and no address generation, no bidirectional address bus. It is simpler logic. I've build one with parallel adapter (FT245) and no CPLD, just 2 TTL glue logic. So if I convert the serial EPROM to parallel data, the rest of the circuit may be the same.

I would still use CPLD in TTL-only designs because it supports rapid prototyping to check out a concept and develop the necessary supporting software, then the actual circuit can be implemented in TTL logic.

@gfoot, the stacking approach looks interesting. I was using lots of LDA #val, STA base,x, INX to create a program in RAM then jump to it.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 3:51 pm 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
barnacle wrote:
You may have noticed a recent theme in which I've been investigating ways in which a person might build an SBC and program it simply, in situ, using a PC as both the programmer and the terminal. ...

  • ...
  • The address counter needs to initially reset, and stay reset until cycle 3, after which it can increment at CLK/8. A sufficiently high count (power of 2, so say A14 rising for a 16kB transfer) can be used to disable the counter, and to enable the processor

The nice thing about this is that the transfer is ridiculously fast: at an 8MHz clock, you're looking at 1ms per kB (if I got my sums right) and you could fill this with whatever... Basic and a monitor or some sort filing system from a memory card or just a simple boot loader. What's in the SPI is non-volatile but easily externally programmed. ...


Note that if you use something like a 74x299 with tri-state output and support for daisychaining, you can shift out 24 bits of data onto htree of daisychained serial shift registers, with one being attached to the data bus and the other two being attached to A0-A15. When /USR_CE is pulled low to output the shift register contents and BE is pulled low to float the data bus, that also floats the address bus for the USR_AL and USR_AH. With a two stage bootloader, you could have a couple of toggle switches read on two VIA GPIO to select one of four available bootup images, and so once the first stage bootloader is written and debugged, it can be left in place.

An advantage of this is that the stop condition can be dead simple ... if it is considered to be enough to write to RAM in the top 32K, then an A15 value of 0 is the "end of load" marker.

Or you could just load a first stage booloader from the serial FlashROM with /RESET and BE low, which then proceeds to execute when /RESET is released, using the system resources to read the full boot image from another location in the serial FlashROM and away you go. You only need 3 65c22 GPIO to access a dedicated serial shift ROM through bit banging. That means you only need a pair of USR shift registers, USR_DATA and USR_ADDRL, and A8-A15 are pulled high through a network resister, so the first stage bootloader is always written to the $FFxx page.

Similar to the single stage bootloader, if 127 bytes is enough for the first stage booloader, the "end of load" trigger can be A7=0.

Then an /S /R flip flop (or a pair of NAND gates used to build an /S /R flip-flop) with the /Reset attached to the /Reboot button and the /Set attached to the USR_ADDRL_IO7 line (which is pulled high with a resister so the process is not stopped unless the /USR_ADDRL_CE = 0).

If 64bytes are enough for the first stage booloader, you can include both a framing mark and an end of load mark at the top of the USR_ADDRL. Clear the pair of serial shift registers when the boot process start, and have the top bit of the address data always 1, for framing, and use the second highest bit of the address data as the end of load marker when low. When Q7 goes from 0 to 1, the read for the FlashROM is paused and the inverse is passed through to /USR_OE and through a pulldown resister to /RAM_CS. Then perhaps NAND(USR_ADDRL_IO7,PHI2) is passed to RAM_R/W through a pull down resister, to accomplish the write.


Last edited by BruceRMcF on Mon Oct 16, 2023 4:27 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 3:53 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Most program code accesses are sequential, and only break that habit if some kind of branch instruction is executed. Accesses to non-program addresses don't need to break the sequence from the SPI. So even without caching SPI reads in RAM, some kind of execute-in-place is relatively feasible for nearly any CPU, including the 6502 (but more easily the '816, because it's easier to filter out genuine program accesses), without needing to send a fresh command/address sequence for every byte. You could expect an 8x slowdown, with an increased branch penalty but with data accesses and internal-operation cycles being nearly "free".

The original ARM chips exploited this generally sequential access pattern to optimise DRAM access, with an explicit "sequential access" output pin to hint to the memory controller that it could hold /RAS active and perform the next access with only a (much faster) /CAS cycle. This sequential access mode was also used for the LDM/STM instructions, and for cacheline fills on the cache-equipped ARM610 and ARM250. This was, I think, long before the concept of XIP from a serial ROM was relevant.

I'm still interested in (a little later) trying the idea of using the CPU as an address generator, but directly converting SPI or RS-232 reads into RAM writes, with the CPU mostly ignorant of this data and just executing NOPs. At the end of this loading sequence, the CPU would be reset a second time and start running normally.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 4:30 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
barnacle wrote:
The first attempt used a parallel eeprom - basically Grant's design with a couple of minor changes - and worked well but required the eeprom removing and reprogramming every time the code changed... the SBC works fine but the pins on the eeprom can take a battering, and it required me to build an eeprom programmer (nucleo, some ram, and a zip socket) and write two lots of software to get the code in - one lot for the nucleo and one for the programmer running under linux. It works, but lacks elegance.


when i started with my first 65C02 SBC i had similar issues, my solution was to just have the ROM contain a basic serial bootloader to load programs into RAM and execute them.
that way you basically never need to touch the ROM again unless you want to change the bootloader or add new features to the ROM (which in my case never happend i just ended up shrinking it down to 256B to have more RAM for programs).

of course similar to using the 8153, you need to load a program into it after every power-up. but honestly it never bothered me as i switch between projects pretty rapidly and constantly do small changes to test them out and then do more changes. so loading directly into RAM is still overall faster than programming either a parallel or in-system SPI ROM.
my current 65816 SBC still works like that too! (atleast until i got some mass storage going)

.

anyways i got an alternative idea to avoid the need to remove the ROM from the board to reprogram it.
let the CPU do it.
for a parallel EEPROM or Flash chip this is pretty simple and requires pretty much no extra logic, (assuming the WE pin is hooked up) as you should just be able to have the CPU rewrite it's own ROM, eliminating the need for an external programmer after the inital programming (unless you accidentially brick it).

but even when you do choose the SPI EEROM, you could still hook it's pins to a VIA or similar to allow the CPU to access the ROM and program itself.
then you would have 2 options of programming it, either with an external programmer (which is safer) or by having the CPU do it by loading a file over a serial terminal or similar (which is more risky but likely quicker/more convenient if you don't already have an arduino or something ready to go)


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 5:35 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE
Of course, one approach is to massage the data desired such that it matches that for the 8153 system, and simply squirt data out of the SPI and into the 8153, reducing in the best mathematical mode the problem to one that has already been solved...

Some interesting ideas going on here, though some are possibly either more complex than I'm looking for or not really in the spirit of what I want.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 16, 2023 6:00 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1117
Location: Albuquerque NM USA
Because serial EPROM is small (DIP8), what I did was having two serial EPROM and a jumper selects which serial EPROM to boot. Serial EPROM is also disconnected from the system once it is loaded, so it can switch to other serial EPROM while the system is powered up. This way, you can boot from one serial EPROM, switch, then program the 2nd serial EPROM.

Another approach is having a physical Mode switch that decides whether it is a programmer (boot from serial or parallel port) or a computer (boots from on-board parallel flash). In the programmer mode, new application software and flash programming software can be loaded via serial or parallel port to burn application software into on-board parallel flash. In computer mode, the parallel flash boots the new application software. This approach is simple enough to implement with a couple TTL glue logic and a DPDT switch.
Bill


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 39 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: