6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Jun 16, 2024 7:37 am

All times are UTC




Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Mainframe Class Hardware
PostPosted: Mon Jan 25, 2010 9:15 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I was bored yesterday, and started googling around for the performance of various well-received mainframe computers "back in the day."

To my shock and horror, I discovered two things:

* VAX 11/780 (one of my personal favorite computers to work with, BTW) ran at 1 MIPS.

* IBM System/370-158 ran at 1 MIPS.

These two computers were world-class machines, both considered mainframes (the VAX was marketed as a "super-mini", but let's be realistic, it is a freakin' mainframe!), and both of which ran large-scale enterprises for decades.

I remember (quite fondly) using the 11/780 while attending MVCC. I remember a whole class of people using the computer concurrently during our lab session, and I know for a fact no less than four concurrent labs were in session with mine. That's a total of no less than (20 students times 4 labs) 80 concurrent, interactive user sessions, plus the printers, running compilers, and so on.

All this, on a CPU that is, in essence, a 1MHz CPU.

**WOW**.

The key, of course, is two-fold -- you need an operating system that is designed for real-time processing, which VMS certainly was. Absolutely no I/O API was blocking at the kernel level, and any blocking API you might have used were always implemented as user-level libraries that abstracted the QIO interface on your behalf. Additionally, the underlying hardware was virtually entirely built around DMA.

I should point out that the I/O architecture on "real Mainframes" like the System/3*0 series are, pretty much, indistinguishable from contemporary buses like USB or (especially) Firewire -- DMA driven, high throughput, heavily reliant on interrupts, heavy reliance on CRCs or, in some cases, even ECC codes, etc.

So, this got me thinking about the 6502 and 65816. Here are processors that are capable of keeping up with these otherwise awesome behemoths. Due to the 8-bit bus width, you'll need to drive the clock speed correspondingly higher. A 4MHz 6502 _should_ be able to compare, more or less, in the same ballpark as a VAX 11/780 or S/370-158 in terms of raw processing power (single-threaded). A 3MHz 65816 should be able to compare as well (multitasking).

BTW, I qualified the 6502 and 65816 as single- or multi-tasking because of the ease with which you can switch tasks on each. The 6502 isn't quite so easy -- to run comparable applications on a 6502, assuming an infinite memory space at your disposal, you still have to physically copy the stack and zero pages. (However, you can use external logic to remap those pages.) The 65816 allows you to use a (larger) stack located anywhere in bank 0, and bank-0-configurable "direct" page as well. Heck, the 65816 even has the same address space as a System/360, and like the later versions of the 360, can be augmented with an external address translation unit.

Maybe, when I'm done with my current array of projects, I'll resume work on the Kestrel and build it out as a "personal mainframe," relying on a plurality of I/O processors to implement high I/O throughput. Why? Well, why not? :) It wouldn't be a hobby if I had a reason for it.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 25, 2010 10:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10827
Location: England
Interesting thoughts, but note that a VAX MIPS is an odd unit from the Whetstone benchmark:
Quote:
VAX MIPS - The geometric mean of Millions of Operations Per Second for the sections covering fixed point arithmetic, if then else and assignments, multiplied by five. Such a calculation for the DEC VAX 11/780, accepted as running at 1 Million Instructions Per Second, produces approximately 1.0 MIPS.


That page seems to give a 6502 (running BASIC, which isn't quite fair) a score of about 3 milli-MIPS


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 25, 2010 10:09 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10827
Location: England
Mind you, I'm with you on the nostalgia for multi-user VMS (although it could be slow when things were busy)

If we ran textual apps, with a page of stack, a page of direct access, 64k of code and 64k of data, we could get 100 processes on a fully equipped 65816 - let's say 10-20 users?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 25, 2010 11:18 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
BigEd wrote:
That page seems to give a 6502 (running BASIC, which isn't quite fair) a score of about 3 milli-MIPS


According to the same website, that's still 6x faster than a Z-80 clocked 3.5x as fast, also in BASIC. ;)

So, you figure a factor of 15 improvement in performance between BASIC and hand-written assembly, and you're looking at something close to 0.045 MWIPS/MHz. At 4MHz, you're at an estimated 0.17 MWIPS, or about 1/5th the speed of a VAX.

BTW, according to Wikipedia, the 11/780's CPU clock cycle was 1200ns (1.2 microseconds), so what this boils down to is a very CISC processor with instructions expressly designed to make the VAX look good on that benchmark. Thus, the VAX 11/780 really does execute at approximately 1 MIPS (I'm not sure how it recovers performance from a slower clock, but apparently it's known to do so).

(And this isn't a new practice, either; Intel often used custom-designed C compilers, kept in-house, to produce code that was specifically optimized to make benchmarks look good.)


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 26, 2010 12:22 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
kc5tja wrote:
I was bored yesterday, and started googling around for the performance of various well-received mainframe computers "back in the day."...


I know you said "well received" & "back in the day" mainframe, but it reminded me of the Cray Supercomputers, which I know is not in the same category as a standard old school mainframe... but I thought I'd throw it out there anyway for sh*ts and giggles.

http://en.wikipedia.org/wiki/Supercomputer

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 12:38 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8214
Location: Midwestern USA
kc5tja wrote:
...Intel often used custom-designed C compilers, kept in-house, to produce code that was specifically optimized to make benchmarks look good.

They brought that "technique" to a peak during the early days of the AMD Athlon. It worked until someone (Tom Pabst?) used a processor-agnostic compiler to compile the same benchmarks and test on both AMD and Intel offerings. At that point the Pentium III wasn't looking so good. :)

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 12:41 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
My experience with a Cray supercomputer was over a 9600bps link. ;)

Supercomputers definitely aren't mainframes -- they just don't have the I/O capacity that literally defines the mainframe category. But, they have a ton of bandwidth to another resource -- RAM -- and their ability to crunch numbers is (or rather, was) jaw dropping.

Contemporary home computer evolution has tended towards the supercomputer end of the scale, but has not really addressed the I/O problem very much until fairly recently.

Indeed, the first home computer that came close to being a personal mainframe (as such) was the Commodore-Amiga. Here was a machine built around the 68000 (putting it firmly between PDP-11 and VAX in relative performance) and which had 32 I/O slave units fed by DMA. No, that is not a typo -- I said 32 DMA channels, all of which could operate concurrently, albeit with some cycle-stealing from the CPU.

You had DMA channels for:
* 6 bitplanes of video. (2 planes steal from the CPU)
* 2 channels per sprite, times 8 sprites (total: 16; 4 sprites steal from the CPU)
* 4 channels for the blitter (configurable whether it steals from the CPU)
* 1 channel for the Copper
* 4 audio channels.
* 1 channel for floppy disk.

On top of that, exec.library (the AmigaOS kernel) was structured almost identically to VMS, where all I/O was non-blocking by default, and it made extensive use of message ports ("mail boxes" in VMS terms) and signal bits ("event bits" in VMS terms). Support for "asynchronous subroutines" appeared to be planned, as fields in kernel structures and APIs existed for them, but they apparently were never finished, and consequently, never used to my knowledge.

Of course, by today's standards, the Amiga seems quaint. ;)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 12:52 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8214
Location: Midwestern USA
kc5tja wrote:
Of course, by today's standards, the Amiga seems quaint. ;)

By today's standards, a 1 GHz Athlon seems downright ancient. :)

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 1:01 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
BigDumbDinosaur wrote:
kc5tja wrote:
Of course, by today's standards, the Amiga seems quaint. ;)

By today's standards, a 1 GHz Athlon seems downright ancient. :)


Indeed -- it seems so strange to me that I was running the original Athlon 800MHz slot-A processor up until early last year. ;)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 1:03 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I never had an experience with a Cray, but I do remember from years ago he was making his computers using wirewrap. I can still picture those wires running all over.... Gotta be an inductive nightmare. But I guess he mastered parallel processing at lower speeds back then.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 2:23 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
80MHz, IIRC, was his machine's typical top speed. Although it was wire-wrapped (it ought not be more inductive than a printed circuit board if routed with care), he also exploited the cylindrical design of his computers as well, to minimize the length of his wiring.

One thing Cray didn't believe in, strangely enough, was memory management units and similar technologies like paging or segmentation. The hardware took too long to function for the performance levels he was working towards. As a result, a write to a bogus pointer could take down the entire supercomputer.

In practice, this wasn't such an issue, since most supercomputers were devoted to one, and only one, computational task (e.g., simulating nuclear processes, simulating the weather, airflow over a wing, etc), which often took weeks.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 9:42 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Quote:
IBM System/370-158 ran at 1 MIPS

I learnt to program on an ICL clone (a System 4/30) of the IBM/360 as a kid and the opcodes are still burnt into my brain.

Although these machines execute compatively slowly they have extreme complex instructions that operate on multi-byte values. For example the 360 has BCD arithemetic that can operate on two values each 1 to 16 bytes long (31 digits + a sign). A further instruction ('translate and edit under mask') can take a BCD result and convert it into a printable string with commas, decimal points, leading zero suppression and floating currency characters. A task that would take hundreds of 6502 instructions to replicate.

This ability to perform in complex operations by iterating in the micro code rather than in the application code gives the impression that they are executing far faster then they really are.

I remember the I/O being very DMA based. All devices were identified by a channel nybble and a device nybble. Our machine could execute a DMA transfer of up to 256 bytes to each channel simultaneously. We never used this capability fully - When a local bank donated the machine to my school they didn't train anyone to use the job control langauge that was used to run it. We ended up writing our own single user mainframe operating system. On a IBM 360/70/90 the I/O instructions are priviledged and can only be used by the operating system - we used them all over our code.

The 4/30 was the lowest machine in the range and didn't have the full set of instructions - Instructions where physically implemented by cards in the back of the CPU cabinet. We had a lot of spaces where bigger machines in the series, like 4/70, would have cards. The CPU trapped illegal instructions and I wrote code to emulate most of the ones we lacked including the brain damaged base 16 IBM floating point operations.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 9:54 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
ElEctric_EyE wrote:
I never had an experience with a Cray, but I do remember from years ago he was making his computers using wirewrap. I can still picture those wires running all over.... Gotta be an inductive nightmare. But I guess he mastered parallel processing at lower speeds back then.

The Cray I and II are really relatively simple 16-bit CPUs with very fast multi-stage pipelined floating point units - like an 80387 on steriods. The operating system can multi-task but it was normally avoided because of the time taken to drain the and refill the pipelines.

I worked for a firm that wrote geological oil reservoir simulators in the late 80's. We had a version of our code for the Cray 2 where all the FORTRAN for the main calculation was arranged to be easily converted to pipelineable operations. When I moved to investment banking the 90's Crays where popular on Wall St for overnight valuation of mortgage backed securities, essentially another long running simulation task.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 26, 2010 6:17 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
BitWise wrote:
A further instruction ('translate and edit under mask') can take a BCD result and convert it into a printable string with commas, decimal points, leading zero suppression and floating currency characters. A task that would take hundreds of 6502 instructions to replicate.


You can regain this performance with I/O devices tailored to the task. For example, write a BCD string to a set of input registers, and read out the translated equivalent from output registers. With a programmable DMA channel, you can even vectorize this process.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 26, 2010 8:18 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
kc5tja wrote:
...
* VAX 11/780 (one of my personal favorite computers to work with, BTW) ran at 1 MIPS.
...
All this, on a CPU that is, in essence, a 1MHz CPU.

**WOW**.

The key, of course, is two-fold -- you need an operating system that is designed for real-time processing, which VMS certainly was. Absolutely no I/O API was blocking at the kernel level, and any blocking API you might have used were always implemented as user-level libraries that abstracted the QIO interface on your behalf. Additionally, the underlying hardware was virtually entirely built around DMA.

I should point out that the I/O architecture on "real Mainframes" like the System/3*0 series are, pretty much, indistinguishable from contemporary buses like USB or (especially) Firewire -- DMA driven, high throughput, heavily reliant on interrupts, heavy reliance on CRCs or, in some cases, even ECC codes, etc.

So, this got me thinking about the 6502 and 65816. Here are processors that are capable of keeping up with these otherwise awesome behemoths. Due to the 8-bit bus width, you'll need to drive the clock speed correspondingly higher. A 4MHz 6502 _should_ be able to compare, more or less, in the same ballpark as a VAX 11/780 or S/370-158 in terms of raw processing power (single-threaded). A 3MHz 65816 should be able to compare as well (multitasking).
...


The VAX 11/780 runs at closer to 500 KIPS IIRC.

I don't think a 4 Mhz 6502 will compare against a 500 KIPS VAX 11/780. You are comparing 6502 instructions one-to-one against VAX instructions which is a serious mistake.

The VAX 11/780 has 32-bit registers; the 6502 has 8-bit registers. A 32-bit register load on a VAX is one instruction; on a 6502 it would require at least eight, and if the 11/780 is using a complex addressing mode, it would require even more. So it's probably more reasonable to assume you'd need an at least a 12 Mhz 6502 or 65816 to match a VAX 11/780.

Also, the 6502 would use considerably more bandwidth performing an equivalent amount of work because the instruction fetch bandwidth would be much higher, due to more number of instructions being fetched even though each instruction is individually smaller.

Toshi


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: