Scalable hardware-algorithms for binary prefix sums
Department or Administrative Unit
We address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a w/sup k/-bit, (k/spl ges/2), sequence using as basic building blocks linear arrays of at most w/sup 2/ shift switches, where w is a small power of 2. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most w/sup 2/. We adopt a VLSI delay model where the "length" of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a w/sup k/-bit binary sequence in the time of 2k-2 broadcasts, while the corresponding prefix sums can be computed in the time of 3k-4 broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most w/sup 2/, the total number of broadcasts involved is less than three times the number required by an "ideal" design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a kw/sup 2/-bit binary sequence in the time of 3k+[log/sub w/ k]=3 broadcasts. Using this design, the corresponding prefix sums can be computed in the time of 4k+[log/sub w/ k]-5 broadcasts.
Lin, R., Nakano, K., Olariu, S., Pinotti, M. C., Schwing, J. L., & Zomaya, A. Y. (2000). Scalable hardware-algorithms for binary prefix sums. IEEE Transactions on Parallel and Distributed Systems, 11(8), 838–850. https://doi.org/10.1109/71.877941
IEEE Transactions on Parallel and Distributed Systems
Copyright © 2000, IEEE