6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Apr 25, 2024 4:50 am

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Fri Oct 23, 2020 3:05 pm 
Offline
User avatar

Joined: Thu Apr 11, 2019 7:22 am
Posts: 40
This is more of a question to the "anycpu" forum but I figured out that I might get better replies here.

I incidentally came across this page http://daveshacks.blogspot.com/2015/12/inside-alu-of-armv1-first-arm.html on the reverse engineering of the ARM1 processor, and I got confused about the fact that, as it seems, the ALU circuit does not use any kind of carry look ahead to operate. According to the article, it is made of 1 bit slides which are just chained together with the carry and the zero flag being directly propagated from one bit to the next. This is apparently the circuit:

Attachment:
alu.PNG
alu.PNG [ 80.63 KiB | Viewed 734 times ]


So, I don't really understand why this does not cause an unacceptable delay on the design. My only guess is that since this was a 3-stage pipelined processor, maybe fetching and decoding stages already took longer, so the ALU was in fact not critical (?)

Any ideas?


Last edited by joanlluch on Sun Oct 25, 2020 8:19 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 3:45 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The ARM1 did not have caches, so most likely the off-chip memory access is the long pole in the tent.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 3:52 pm 
Offline
User avatar

Joined: Thu Apr 11, 2019 7:22 am
Posts: 40
Arlet wrote:
The ARM1 did not have caches, so most likely the off-chip memory access is the long pole in the tent.
So, does this imply that, in effect, they didn't really care about the ultimate performance of the ALU?.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 4:07 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Indeed. As long as the ALU is not the slowest part of the design, it doesn't need to be faster.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 4:15 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
In practice, ripple carry is still useful for small groups of bits within an overall carry-lookahead design. The lookahead circuitry might make use of the extra Zero signal that the ALU generates.

I notice that the polarity of the Carry output is different from that of the input - so there is definitely something between bit slices, even if it's just an inverter.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 4:22 pm 
Offline
User avatar

Joined: Thu Apr 11, 2019 7:22 am
Posts: 40
Chromatix wrote:
In practice, ripple carry is still useful for small groups of bits within an overall carry-lookahead design. The lookahead circuitry might make use of the extra Zero signal that the ALU generates.

I notice that the polarity of the Carry output is different from that of the input - so there is definitely something between bit slices, even if it's just an inverter.

According to the article "The eagle-eyed will also notice that the Carry propagation and Zero calculation circuits alternates slightly between each bit, with b0, b2, etc identical, and b1, b3, etc. identical. The end result is the same but the reason for the difference is to keep the execution path as fast as possible by eliminating an inverter per bit; note that the Carry Out and Zero Out signals are opposite polarity to the inputs."


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 4:37 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
I see. Well there's only two gates per bit in the carry chain, so that's a total of 64 gate delays across the 32-bit word. That seems acceptable if the gate delay is only about 1ns.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 4:44 pm 
Offline
User avatar

Joined: Thu Apr 11, 2019 7:22 am
Posts: 40
Chromatix wrote:
I see. Well there's only two gates per bit in the carry chain, so that's a total of 64 gate delays across the 32-bit word. That seems acceptable if the gate delay is only about 1ns.
Well, having two gates per bit for the carry chain is always the case, so I guess that's not the difference with other designs. It most likely implies that there was no need for a faster ALU in that design.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 6:39 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
It is quite surprising that ARM1 has a ripple carry. Even more surprising, the 32 bit barrel shifter executes in the same cycle, slowing down one of the inputs to the ALU. And as far as I can tell (BUT NO!), ARM2 and then ARM3 haven't put in any faster an adder, even though the target clock speed goes up and in the case of ARM3 there's an on-chip cache. (But I may be missing something, and I believe at some point the barrel shifts cost an extra cycle on execution.)

Edit: this presentation says ARM2 has a 4-bit carry lookahead structure.
Edit: and also says ARM1 use of a complex gate means only one gate delay per bit.

It's certainly about balance though: if the CPU and external RAM speeds are linked, and the RAM speed is limiting, the CPU need not be faster and probably shouldn't be, otherwise someone spent some effort and maybe some area or power needlessly.

Some interesting reading if you can find these papers:

Furber, S., & Thomas, A. (1990). ARM3 — a study in design for compatibility. Microprocessors and Microsystems, 14(6), 407–415. doi:10.1016/0141-9331(90)90113-a

Furber, S. . (1988). The advantages of RISC architectures. Computer Standards & Interfaces, 8(1), 29–35. doi:10.1016/0920-5489(88)90073-6


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 6:54 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Maybe another factor is that the ARM1 was designed by a small and inexperienced team. Looking at a picture of the chip, the ALU takes up a nice regular rectangular space, lining up very well with the register file and barrel shifter. Adding irregular structures, investing more time, and adding risk of introducing a mistake, was probably not worth it.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 7:16 pm 
Offline
User avatar

Joined: Thu Apr 11, 2019 7:22 am
Posts: 40
BigEd wrote:
It is quite surprising that ARM1 has a ripple carry. Even more surprising, the 32 bit barrel shifter executes in the same cycle, slowing down one of the inputs to the ALU. And as far as I can tell (BUT NO!), ARM2 and then ARM3 haven't put in any faster an adder, even though the target clock speed goes up and in the case of ARM3 there's an on-chip cache. (But I may be missing something, and I believe at some point the barrel shifts cost an extra cycle on execution.)
I had a similar reaction when I found the ARM1 had a ripple carry, I certainly did not expect it. The barrel shifter does not cost an extra cycle, shifting one operand is an embedded feature of many instructions and these instructions definitely do not take longer.

BigEd wrote:
Edit: this presentation says ARM2 has a 4-bit carry lookahead structure.
Edit: and also says ARM1 use of a complex gate means only one gate delay per bit.

Thanks for that, It's quite interesting to learn that:

- the ARM1 had a ripple carry (page 21)
- the ARM2 had a 4 bit carry look ahead (from page 23)
- the ARM6 had a carry-select adder (from page 28)


Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 7:20 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Anyone interested in digging further into ARM1 layout/circuit, send me a PM! (There's a version of the visualARM which is a bit more like the visual6502 than the public version. Some JavaScript skills would be useful, for finding and highlighting signals.)


Attachments:
Screen Shot 2020-10-23 at 20.17.36.png
Screen Shot 2020-10-23 at 20.17.36.png [ 316.41 KiB | Viewed 689 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Fri Oct 23, 2020 7:46 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
That rather good technical presentation is one of five by Przemyslaw Bakowski - grab a copy while you can:


There's also a large and technical book by Steve Furber, "ARM System-on-Chip Architecture" which contains the info about the ALU structure in the various ARMs, along with a great deal more.


Attachments:
ARM-datapath.png
ARM-datapath.png [ 136.59 KiB | Viewed 686 times ]
ARM1-ripple.png
ARM1-ripple.png [ 111.65 KiB | Viewed 686 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: ARM1 ALU
PostPosted: Sat Oct 24, 2020 9:07 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Another interesting read on the innards of ARM1, by Ken Shirriff:
http://www.righto.com/2015/12/reverse-e ... or-of.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: