Hackaday posted about this remarkable study: a 256-processor supercomputer using hundreds of thousands of TTL devices.
There's a
300-page pdf which goes into great detail about the design.
It uses many small ROMs as well as 74S381 ALU devices, 74283 adders and 74S182 look-ahead carry generators. It directly supports single-cycle add, subtract and multiply of single-precision floats, and supports multi-step sequences for double-precision operations. So that's a 24x24 multiply in 400 nanoseconds, if I have it right.