Some of you are probably thinking, "finally, something other than multiplexers!". This design uses the "shift and add" method of multiplying. If you're not sure what that is, go Google it and read up (this page isn't concerned with teaching common algorithms), and if you still have questions leave them in the comment box below. Anyway there are four main parts to this circuit: Multiplier Shifter, Multiplicand Shifter, Partial Product Adder, and Clocking & Control.. Multiplication is initiated when /MUL0WR goes low and is complete when /MULDNE goes low. The multiplier requires no other signals to complete a multiplication. Although debugging of this design was accomplished using an 8-bit version of the circuit, it is easily expanded by adding additional shift registers, adders, gates for zero detection, and delay circuitry in the control section. A testable version of this circuit can be found in the Testable Circuits section.

Now that you've been excited by seeing something other than multiplexers, we will move on to "optimizing" the circuit.

The 32-bit version of this circuit takes a maximum of 32 cycles to compute a 32-bit product. If this circuit were clocked by a 1MHz signal instead of being asynchronous, it would require a maximum of 32us, or 32000ns. Surely this time could be reduced significantly? Yes it can, at a cost of many more ICs, and the circuit will be much more simple and easier to expand upon.

Let's take a look at a circuit that was designed some time ago (as can be seen by the fact that it's only 16-bits wide), the multiplier.

HTML Comment Box is loading comments...

1011 (11)

1011

0000

0000

1100011

This is what we will implement, using....multiplexers (and a few adders)!

We will testing each bit of the multiplier to see if it is a 0 or 1, and gate either all zeros or the multiplicand through the multiplexers accordingly. If the bit is zero, we will gate all zeros, if it is a one we will gate the multiplicand. Since we only need to decide between two pieces of data we will be using 74153 Quad 2-to-1 multiplexers, and thus will need eight to cover an entire 32-bit DWORD. This is done simultaneously for each bit of the multiplier so we will require a total of 32 * 8 = 256 multiplexers. Now you might say that the propagation delay through 256 muxs will be far greater than 32000ns. Well we aren't doing things in a serial fashion here, we will be using these in parallel.

Each pair of multiplexers is followed by a 32-bit adder to calculate a partial product. The outputs from the "low order" mux are fed to all 32 inputs of the adder, while the outputs from the "high order" mux are fed to the adder with an offset of 1 bit. This first stage of adding requires eight 74283 4-bit adders per mux pair over 16 mux pairs, for a total of 128. After the first stage we will just be adding pairs of partial products.

Below is a 4-bit version so you get the idea of how the circuit works.