Source code can be processed like plain text, but with a parser for the programming language, source code can be queried much more easily. LAPIS includes a Java parser, so the examples that follow are in Java.
Unlike other systems for querying and processing source code, TC operates on regions in the source text, not on an abstract syntax tree. At the text level, the user can achieve substantial mileage knowing only a few general types of regions identified by the parser, such as Statement, Comment, Expression, and Method, and using text constraints to specialize them. For example, our parser identifies Comment regions, but does not specially distinguish the ``documentation comments'' that can be automatically extracted by the javadoc utility. Figure 8 shows a Java method preceded by a documentation comment.
/** * Convert a local filename to a URL. * For example, if the filename is "C:\FOO\BAR\BAZ", * the resulting URL is "file:/C:/FOO/BAR/BAZ". * @param file File to convert * @return URL corresponding to file */ public static URL FileToURL (File file) throws MalformedURLException { return new URL ("file:" + toURLDelimiters (file.getAbsolutePath ())); } |
Figure 8: A Java method with a documentation comment.
|
DocComment = Comment starts with "/**";
A similar technique can be used to distinguish public class methods from private methods:
PublicMethod = Method starts with "public";
In this case, however, the accuracy of the pattern depends on programmer convention, since attributes like public may appear in any order in a method declaration, not necessarily first. All of the following method declarations are equivalent in Java:
public static synchronized void f ()
static public synchronized void f ()
synchronized static public void f ()
If necessary, the user can deal with this problem by adjusting the pattern (e.g., Method starts with Line contains "public") or relying on the Java parser to identify attribute regions (e.g., Method contains Attribute contains "public") . In practice, however, it is often more convenient to use typographic conventions, like public always appearing first, than to modify the parser for every contingency. Since text constraints can express such conventions, constraints might also be used to enforce them, if desired.
We can use DocComment and PublicMethod to find public methods that need documentation:
PublicMethod but not just after DocComment;
Text constraints are also useful for defining custom structure inside source
code. Java documentation comments can include various kinds of fields, such
as
DocField = starts with delimiter "@", in DocComment; ParamDoc = DocField, starts with "@param"; ReturnDoc = DocField, starts with "@return"; ExceptionDoc = DocField, starts with "@exception";Using this structure, we can find methods whose documentation is incomplete in various ways. For example, this expression finds methods with parameters but no parameter documentation:
PublicMethod contains FormalParameter, just after (DocComment but not contains ParamDoc);