This section describes our experiences in applying our system to portions of two PLAPACK applications, a Cholesky factorization program and a code for solving Lyapunov equations [4].
For these experiments, our compiler performs all analysis automatically.
Except for inlining, we perform the transformations manually according to the
strategy described in Section . While our compiler is not
yet complete, the individual transformations are all well-understood. Since
the analysis and the overall compilation strategy are the enabling
technologies behind these results, our manual transformations should not
affect the results. The PLAPACK annotations were written by a person
who is not a member of the PLAPACK implementation team.
For purposes of comparison, the baseline programs were supplied by the
PLAPACK group and written using the cleanest PLAPACK interface. The
hand-optimized programs were written by PLAPACK experts. All results were
obtained on a 40 node Cray T3E.
To gather these results we annotated 29 of PLAPACK's 113 externally visible routines, yielding an annotation file that was 323 lines. Our Broadway-optimized results focused on customizing one PLAPACK routine, the PLA_Trsm() routine, which is common to both the Cholesky and Lyapunov applications. The hand-optimized Lyapunov program did not limit itself to this same scope. Details concerning the hand-optimized version of the Cholesky program can be found in the literature [3].
Our annotations mimicked the hand optimizations by defining an abstract
interpretation for describing the distribution of PLAPACK objects, leading to
optimizations like those described in Section . (Unlike the
example in Figure
, we did not define the
Contents property.) The basic idea is that while most PLAPACK
procedures are designed to accept any type of view, the actual parameters
often have special distributions. When this information is propagated into
the procedure, it yields a variety of specialization opportunities.
Uncovering these opportunities requires the compiler to analyze multiple
layers of nested procedure calls. It is the encapsulation of these layered
routines that makes the unoptimized routines both general and inefficient.