6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Jun 17, 2024 4:19 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Sun Nov 18, 2018 6:13 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
I thought this odd machine might be of interest here - it's been mentioned previously elsewhere.

The main idea with the VIPER, other than it being useful in safety-critical realtime control, was that it should be verifiably correct, as should the programs which ran on it. That's a tall order, but I think it's the reason why it was a very simple machine.

- 32 bits wide
- word-addressed, just 20 bits of address space, with separate i/o space
- only four registers, including the program counter (which is only 20 bits wide)
- only one condition bit
- no interrupts
- all instructions exactly one word long, with space for a 20 bit literal
- most instructions operate on the accumulator
- two registers can act as index registers, using only positive (unsigned) offsets
- no subroutine call or return instructions
- one of the index registers can act as link register, which helps to implement a subroutine mechanism
- if anything untoward happens, the machine halts and raises an error pin - the system takes appropriate action such as using an alternate CPU, restarting, or starting a different program.
- implementable in 4000-5000 "logic array cells"

The CPU was described as a fifth the complexity of its peers, and was to be programmed using the newly designed language Newspeak, which had program correctness as a primary goal.

There's a description of the machine here:
For more, perhaps see the links in the thread mentioned before.

I haven't thought deeply about what it might be like to program VIPER at the lowest level, but my feeling is that having just A, X, Y where Y is used for subroutine calls is going to feel quite constrained. Having just one status bit means that most branches will be preceded by a comparison - the implicit setting of Z and N which you get in the 6502 will be missing.

It's interesting, I think, to design a machine for real time and not put interrupts in. No doubt the machine is fit for purpose, and the programmer has to reckon for themselves how, and when, to deal with external events and deadlines. It's a different way of working. You can imagine a system of any complexity might need several VIPERs, each dealing with one aspect of the system. Or perhaps if the machine is fast enough compared to the time constants from the outside world, it can service several aspects in some kind of round-robin fashion.


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 18, 2018 9:19 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Ed:

Thanks for bringing up the topic of VIPER. I am considering an implementation of this processor with only minor modifications.

For the past few years, I have been developing real-time systems that do not use interrupts. The key concept that enabled these systems for which I had to get buy in from the developers was the idea that all functions had to run using Run-To-Completion (RTC) semantics / execution model. In other words, functions cannot block on I/O, inter-process communications, etc. Any I/O that are not ready to accept or provide data cause the functions to immediately suspend.

This approach essentially requires all functions to be implemented as state machines so that the function can be restarted at the same point where it suspended operations. A super loop calls each function in some defined manner. If a scheduling algorithm other than round-robin is required, it can be implemented within the context of the super loop. For those functions which are based on timed events, a separate function and dedicate timer HW was used to provide the required timer events; determinism was very good. This simple approach has been very successful in my applications, and I am pretty sure it can be applied to a system utilizing a VIPER-derived microprocessor.

The instruction format for VIPER, described well in the referenced document by Dr. Pygott, is essentially that of a microprogrammed state machine. The instruction provides control fields, much like the instruction of the Data General Nova and Eclipse minicomputers, that can be applied directly to the ALU. A simple finite state machine is all that is needed to fetch the single memory operand that is required for most instructions except the shift/rotate instructions.

The lack of interrupts also helps in the implementation of complex operations that may require two or more instructions. For example, VIPER lacks a stack pointer, but X can be pressed into service to implement a stack in software. With interrupts, a software implementation of a stack would be more difficult. A push or pop operation, which in a more commonly-defined microprocessor is implemented in the critical region provided by the instruction itself, would require the programmer / compiler to implement a critical region around the instructions that perform the push / pop operation.

In VIPER, a push operation can be performed using two sequential instructions: (1) a write of a register (primarily A) to a location in memory pointed to by the base / offset provided in the instruction plus the contents of an Index register (X or Y); (2) an increment / decrement of the index register used in the previous instruction. A pop operation can be similarly performed. If VIPER supported interrupts, the first limitation would be the loss of the Y register, since it would need to be used to provide the link register, and the second would be a requirement that the push/pop instruction sequences have to be enclosed by instructions that mask and enable the interrupts. Without supporting interrupts, the programmer / programming language system is free to perform the push / pop operations at any time without taking any special precautions.

In fact, I can see several ways for a VIPER-based system to handle interrupts in much the same manner as more commonly implemented processors. My approach may require that the I/O subsystem be more capable than the simple devices that most of us use. I/O devices with sufficient buffering and some rudimentary processing may be necessary condition, but such capabilities are already found in most I/O devices today: USB, Ethernet, etc. Only simple I/O devices like SPI, I2C, UARTs, etc. may have to be supported by a simple, dedicated FSM-based controller when used with VIPER.

The way the Inmos transputer handled interrupts seems to have been derived from the interrupt-free design principle of the VIPER. The transputer only accepted interrupts whenever a computation was "completed". This was defined by the execution of certain instructions such as control flow / branch and store instructions. (Note: it was a bit more complicated than this, but I don't have the transputer compiler writer's guide handy at the moment to look up all of the "interrupt" points.)

If the RTC mode of program / function execution proves to be unable to support the desired event handling latency, then it is fairly easy to automatically insert event flag checking instruction sequences throughout a program. It is possible to insert these event checking sequences at the end of each "basic block", after each subprogram call, etc, In real-time control systems, truly asynchronous events are avoided as much as possible. The asynchronous events and the associated processing are made as synchronous as possible.

Thus, I think there is too much reliance on interrupts to capture and process events in many types of real-time systems. Truly asynchronous events will have to be handled in a manner consistent with their relationship to the cyclic real-time tasks that they affect. In many cases, the data produced by non-fault asynchronous events are synchronized to the basic cyclic tasks performed by the system. Thus, without thinking too deeply on the matter, asynchronous events destined for processing by a cyclic real-time task could be detected at the completion of the processing and I/O posting of that cyclic real-time task. In essence, the asynchronous event is processed synchronously.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 18, 2018 9:54 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
Some interesting thoughts there! I may have to have a quick look into the transputer's interrupt story.

I realise I failed to link to another description of VIPER:

One other note on interrupts: in the development of our OPC machines, at one point we needed two instructions for push and pop operations to support subroutine calls. We used macros for this, but the interesting thing is that we found we didn't need to make any sort of critical section. The two instructions can safely be hit by an interrupt, if the ordering is right: for a push, first adjust the stack pointer, and then write the value. For a pop, read the value, and then adjust the stack pointer. This way there's always a valid stack pointer which can accept data in the case of an interrupt: it might stand guard over an empty slot, but that's never a problem.


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 18, 2018 10:56 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1435
Location: Scotland
BigEd wrote:
Some interesting thoughts there! I may have to have a quick look into the transputer's interrupt story.


I worked with transputers for a while. Quite interesting little things. Most of my work was in C or assembler and not occam though. The C compiler we used allowed creation of parallel processes.

transputers had a single interrupt input. To use it, you created a high priority process then get that process to wait on the event input channel. When the event triggered then the high priority process would start (if possible, if another high priority process was running at the time, then that would run to completion first).

The interrupt stored the low priority registers, etc. in fixed memory locations then started the high priority process. This store could take some time if the FPU was in-use on a T800 (or, I think, if you were doing a block-move)

Fairly simple really, but not that predictable. You could also have a high priority process wait on a timer - which you could use to implement timed events, etc. however you were subject to jitter caused by the fpu, etc. We ported Minix to the transputer at one point using that timer 'interrupt' as part of the system.

I'm not sure the transputer was ever really intended for real-time stuff though.

Interesting stuff about Viper too. (I worked in RSRE in the early 90's, but escorted to one room and you never knew what the next room was doing) Reminds me of Gigatron - no interrupts there, yet by effective cycle counting and clever clock speed selection they get it to display VGA and run your program...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 3:56 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
BigEd wrote:
The two instructions can safely be hit by an interrupt, if the ordering is right: for a push, first adjust the stack pointer, and then write the value. For a pop, read the value, and then adjust the stack pointer.

That is a great observation. When considering the order of the operations, I did not consider this effect.

However, I think the issue at hand is interrupts. I think that interrupts are a good solution for some situations, but in many instances, they are a source of many problems. As you pointed out, my example regarding using critical sections to protect the push / pop operations is not adversely affected by interrupts.

So let's take it a bit further. Let's suppose interrupt handling was to be added to the VIPER. What changes would be required to the architecture? Would those changes require additional instructions and registers to be added?

As I see the architecture, 30+ years into its future, I am pleasantly surprised by the power of the architecture. At first glance, three 32-bit registers, a 20-bit program counter, a single-bit status bit does seem to be a bit constraining. Given that there is no restriction that the destination register of an indexed addressing mode instruction cannot be the index register itself, the VIPER can easily implement indirect addressing. It would require two instructions: (1) instruction to load the pointer, and (2) the second to load the operand. However, the level of indirection does not have to end there. The number of pointer loads that can be performed is essentially unlimited. One thing this type of operation is useful for is for following the STATIC LINK to locate a local variable of an enclosing function as is common with Pascal. Another use for this capability would be to walk function tables, linked lists, etc.

As I've been trying to understand the instruction set, I've been relating the VIPER instruction format to 6502/65C02/M65C02A instructions and capabilities. With the exception of the PSW and the automatic setting of the NVZC bits during various instructions, I've not found that the missing "functionality" cannot be duplicated by very short instruction sequences. Stack operations are easily performed in two instructions, which appears to be consistent with what you guys found sufficient for the OPC machines.

As I find more of these nuggets in the VIPER instruction format, I am beginning to think that the VIPER was a victim of the available technology and available budget at the time of its introduction. The small, structured ASIC technology used in the first generation implementation limited its performance because a HW multiplier was not included. When compared performance to a VAX11/730, it was found to perform its assigned function well, but did not provide adequate margin to support additional functions.

I think this analysis was a bit of an apples versus oranges comparisons. Furthermore, I have serious doubts about whether regulatory agencies would look favorably on a safety-critical product that may be expected to take on additional functions during its life-cycle. Perhaps the reviewers were pointing that, at that moment, the VAX11 showed more promise to allow future products to integrate more functions into a single processor than did the VIPER. However, we all know what happened to the VAX, and that microprocessors have consistently improved their performance with every succeeding generation.

Another aspect of the review was that the reviewers wanted to use floating point arithmetic, and did not particularly care for the fixed point arithmetic required with the VIPER. In fact, the report I read on this subject indicates that the fixed point arithmetic used by VIPER exhibited results with lower variance than the floating point results of the minicomputer to which it was being compared for all but two of the parameters examined. Floating point arithmetic allows programmers greater freedom from detailed analysis compared to fixed point arithmetic. In my view of floating point, this is a point that does not receive a lot of attention. To make fixed point work correctly and reliably, greater understanding of the process is required. The additional time required for this is time well spent. Invariably there will be software bugs, so for safety-critical SW, I would like to know that the developers really understood the problem they were working on.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 8:42 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
Yes, to some extent I agree with your views on fixed point, especially for embedded purposes as opposed to general scientific computing. It's still a huge boost to have a hardware multiply - which they didn't have the resources to implement, although they say they left room in the opcode map. And, to a lesser extent, hardware divide. These days, even on the smallest FPGA, we could probably add those in. (In a CPLD, if VIPER fits at all, it would be much more cramped.)

It's clear that when making, say, satellites, the project plan allows for lots of time to get things right, and enough money to get the best facilities and people. Perhaps the same is true in big science and perhaps in military development. Less so in consumer electronics! And so we use compilers, and floating point, and a different set of tactics.

I'm guessing that the implementations of VIPER from back in the day would not have overlapped fetch and execution - if so, there's quite a potential for a performance boost right there. I think we shouldn't be concerned too much about using short instruction sequences to get the effect of a more complex instruction set: it's a RISCy kind of idea. A macro assembler makes all the difference.

Speaking of instruction sequences, I'm very taken with the idea of multiply-step and divide-step instructions, which put some additional hardware in place but still leave the software some work to do. I notice the transputer's floating point square root was a sequence of four instructions, ostensibly to improve the interrupt latency cost of such a complex instruction. I think I saw them say that the 20MHz transputer still manages microsecond-scale latency. That seems to me a good way to look at it: interrupt latency should not be measured in clock cycles, but in wall clock time.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 8:50 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1435
Location: Scotland
BigEd wrote:

Speaking of instruction sequences, I'm very taken with the idea of multiply-step and divide-step instructions, which put some additional hardware in place but still leave the software some work to do. I notice the transputer's floating point square root was a sequence of four instructions, ostensibly to improve the interrupt latency cost of such a complex instruction. I think I saw them say that the 20MHz transputer still manages microsecond-scale latency. That seems to me a good way to look at it: interrupt latency should not be measured in clock cycles, but in wall clock time.


I've just dug out some of my old transputer manuals - on a T800 interrupt latency (that is to say the time from a high priority process waiting on a channel to it starting when that channel is 'ready') can be up to 75 cycles (at 20Mhz) or 3.5µS and without the fp unit in-use it's down to 2.5µS

The i860 (risc) had separate instructions to pump the floating point pipeline (and it needed 8 or 9 instructions, from memory), which it ran concurrently with the integer unit, however taking an interrupt on them was something you never, ever wanted to cope with.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 10:02 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
Thanks - that's not quite down at the single-microsecond level, but presumably it was felt to be good enough. I see now that they describe the "typical" latency as being sub-microsecond.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 10:11 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1435
Location: Scotland
BigEd wrote:
Thanks - that's not quite down at the single-microsecond level, but presumably it was felt to be good enough. I see now that they describe the "typical" latency as being sub-microsecond.


The also claimed parts up to 30Mhz - I don't recall ever seeing anything much over 20Mhz, although it's been a long time.

(for about 8 years, I worked for a company called Meiko in Bristol - founded by some former inmos people. Lots of transputers and other stuff)

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 11:43 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
Meiko? Excellent! I did see some 25MHz transputers a few weeks ago, in a Meiko Computing Surface. I'm not sure about 30MHz, but a google book search does show up a few mentions.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 11:58 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1435
Location: Scotland
BigEd wrote:
Meiko? Excellent! I did see some 25MHz transputers a few weeks ago, in a Meiko Computing Surface. I'm not sure about 30MHz, but a google book search does show up a few mentions.


Ah, Jim Austens. I visited him last year and saw that. Must make the effort to visit him and another friend up there again.

Nice bit of nostalgia for me, and I think that's the one I installed in RSRE, although it might not be - there were a few built in that configuration. Lots of 40Mhz i860 boards, video boards and a weird custom interface board to an Ampex tape drive to do analysis of SAR tapes. No bootable media though. that was one of the last generation CS1's before Meiko moved onto Sparcs, although that one has a Sparc based host system (actually a sparc station 1 with s-bus transputer link interface and backplane bus master circuitry from what I recall.) The i860s were there because the transputers by then (92?) were simply too slow. The transputers were used as nothing more than smart comms chips.

As for 30Mhz transputers - the reality was that they were just not keeping up - the T9000 was delayed and customers were demanding faster system, more memory, etc. and by then it was too late. Nice though it is, it was never going to compete with the sheer cpu speed of the newer Intel chips at that time and the concurrent/parallel programming model just didn't fit the customers who all too regularly said "I have this 30 year old FORTRAN code - make it go faster"...

I doubt anything bootable still exists for them, sadly. (although the reality is that a modern smartphone has about a billion times more compute power than that system PLUS all of Jims Crays put together...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 12:42 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
Ah, you've been to Jim's sheds too - I do recommend a visit (alternate Saturdays) to anyone able to get to that part of Yorkshire and who is sufficiently attracted to old computing kit. And able-bodied - it's by no means accessible and is a series of physical challenges to get around.

It turns out I've written elsewhere about the (long, arduous, sad) T9000 story.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 5:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10828
Location: England
revaldinho reminded me of the OPC7, our DIY minimal machine which seems like an interesting parallel to VIPER. I'm pretty sure there's no inheritance from VIPER into OPC7. But in both cases they seem to remind the programmer of the 6502: in one case a simplified and in the other case an extended form.

VIPER:
    32 bits
    A, X, Y, PC and B as a status bit
    20 bit physical address with unsigned indexed addressing available.
    Separate I/O and data spaces.
    All instructions single word, same format with 20 bit literal field.
    No stack.
    No interrupts.
    One of the two index registers can act as a link register to help with subroutine calls.
    Full set of conditional jumps.

OPC7:
    32 bits
    16 registers (zero, PC, and 14 general purpose) and an 8 bit status register.
    20 bit physical address with sign-extended indexed addressing.
    Separate I/O and data spaces.
    All instructions single word, in two formats: one has 20 bit literal field, the other has 16 bit.
    No stack.
    Software interrupts.
    Any general-purpose register can act as a link register to support subroutine calls.
    All instructions predicated (3 bits supporting 6 conditions), which is also the mechanism for conditional jumps.

I'd quite like to compare the implementation complexity but I don't have the information to hand. Clearly OPC7 has much more programmer-visible state: 16 x 32 is 512 bits, which is more than twice the total state in VIPER. OPC7 is described in 68 lines of verilog.

Links: OPC7 spec and discussion.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 21 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: