Check out the new USENIX Web site. next up previous
Next: Performance Evaluation Up: An Annotation Language for Libraries Previous: The Broadway Compiler

   
Results with PLAPACK

This section describes our experiences in applying our system to portions of two PLAPACK applications, a Cholesky factorization program and a code for solving Lyapunov equations [4].

For these experiments, our compiler performs all analysis automatically. Except for inlining, we perform the transformations manually according to the strategy described in Section [*]. While our compiler is not yet complete, the individual transformations are all well-understood. Since the analysis and the overall compilation strategy are the enabling technologies behind these results, our manual transformations should not affect the results. The PLAPACK annotations were written by a person who is not a member of the PLAPACK implementation team. For purposes of comparison, the baseline programs were supplied by the PLAPACK group and written using the cleanest PLAPACK interface. The hand-optimized programs were written by PLAPACK experts. All results were obtained on a 40 node Cray T3E.

To gather these results we annotated 29 of PLAPACK's 113 externally visible routines, yielding an annotation file that was 323 lines. Our Broadway-optimized results focused on customizing one PLAPACK routine, the PLA_Trsm() routine, which is common to both the Cholesky and Lyapunov applications. The hand-optimized Lyapunov program did not limit itself to this same scope. Details concerning the hand-optimized version of the Cholesky program can be found in the literature [3].

Our annotations mimicked the hand optimizations by defining an abstract interpretation for describing the distribution of PLAPACK objects, leading to optimizations like those described in Section [*]. (Unlike the example in Figure [*], we did not define the Contents property.) The basic idea is that while most PLAPACK procedures are designed to accept any type of view, the actual parameters often have special distributions. When this information is propagated into the procedure, it yields a variety of specialization opportunities. Uncovering these opportunities requires the compiler to analyze multiple layers of nested procedure calls. It is the encapsulation of these layered routines that makes the unoptimized routines both general and inefficient.



 
next up previous
Next: Performance Evaluation Up: An Annotation Language for Libraries Previous: The Broadway Compiler
Samuel Z. Guyer
1999-08-25