STL-provided algorithms and typical user-written algorithms use one or more iterators as arguments and traverse, build, and update containers using these iterators. In providing parallel implementations of these algorithms it would be useful to design the implementations in such a way that different strategies of work distribution can be made a part of the algorithm implementation. In the sequential world, STL reduces N*M*K possible implementations for N data types, M containers and K algorithms to N+M+K implementations. In the parallel world, given P strategies of work distribution, we would like to extend STL so that N*M*K*P implementations are reduced to N+M+K+P implementations. This can be done with the support of per-thread parallel iterators and strategy classes discussed below.
Data parallel versions of sequential algorithms have a group of threads participating in the algorithm. The parallel implementation of a sequential algorithm is a template function with one template parameter in addition to its sequential counterpart. This parameter represents the Strategy class. The parallel algorithm function accepts an extra parameter as compared to the sequential counterpart. This parameter is the strategy object which is an instance of the Strategy class. In the body of the parallel algorithm, the strategy object is used to convert each of the sequential iterators to per-thread parallel iterators. The per-thread parallel iterator traverses the container in such a way that it touches those parts of the container for which the thread that this iterator corresponds to is responsible. When each thread has computed a partial result the results are composed through a reduction operation.(See figure 4 and compare with figure 3). (Note that all STL algorithms cannot be written exactly this way. Mainly because the composition of the partial results may not be an associative or commutative operation for a particular style of work distribution.)
Figure 3: Typical forms of
sequential algorithms in STL style: The sequential
algorithm takes in some iterators (the one in the figure takes 3) and
other input parameters,
traverses the iterators sequentially and produces an output.
Figure 4: In a typical
parallel algorithm strategy objects are used to convert
the sequential iterators to per-thread parallel iterators which are
used in the partial algorithms by each thread. The results computed by
each of the threads in the partial algorithms are composed to produce
the final output.