Scalable hardware-algorithms for binary prefix sums
Document Type
Article
Department or Administrative Unit
Computer Science
Publication Date
8-2000
Abstract
We address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a w/sup k/-bit, (k/spl ges/2), sequence using as basic building blocks linear arrays of at most w/sup 2/ shift switches, where w is a small power of 2. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most w/sup 2/. We adopt a VLSI delay model where the "length" of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a w/sup k/-bit binary sequence in the time of 2k-2 broadcasts, while the corresponding prefix sums can be computed in the time of 3k-4 broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most w/sup 2/, the total number of broadcasts involved is less than three times the number required by an "ideal" design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a kw/sup 2/-bit binary sequence in the time of 3k+[log/sub w/ k]=3 broadcasts. Using this design, the corresponding prefix sums can be computed in the time of 4k+[log/sub w/ k]-5 broadcasts.
Recommended Citation
Lin, R., Nakano, K., Olariu, S., Pinotti, M. C., Schwing, J. L., & Zomaya, A. Y. (2000). Scalable hardware-algorithms for binary prefix sums. IEEE Transactions on Parallel and Distributed Systems, 11(8), 838–850. https://doi.org/10.1109/71.877941
Journal
IEEE Transactions on Parallel and Distributed Systems
Rights
Copyright © 2000, IEEE
Comments
This article was originally published in IEEE Transactions on Parallel and Distributed Systems. The article from the publisher can be found here.
Due to copyright restrictions, this article is not available for free download from ScholarWorks @ CWU.