Plain text has less explicit structure than HTML, so text constraints for plain text typically refer to delimiters like punctuation marks and line breaks. Consider the following example of processing email messages. Several airlines distribute weekly email announcing low-price airfares. An excerpt from one message (from US Airways) is shown in Figure 7.
Roundtrip Fares Departing From BOSTON, MA To -------------------------------------------------- $109 INDIANAPOLIS, IN $89 PITTSBURGH, PA Roundtrip Fares Departing From PHILADELPHIA, PA To -------------------------------------------------- $79 BUFFALO, NY $89 CLEVELAND, OH $89 COLUMBUS, OH $89 DAYTON, OH $89 DETROIT, MI $79 PITTSBURGH, PA $79 RICHMOND/WMBG., VA $79 SYRACUSE, NY |
Figure 7: Excerpt from an email message announcing cheap airfares.
|
Table = starts with delimiter "Roundtrip Fares Departing From", ends with delimiter BlankLine;The rows of the table can be found using Line, also identified by the built-in parser:
Flight = Line starts with "\$" in Table; Fare = Number just after "\$" in Flight;
The origin and destination cities can be described in terms of their boundaries:
Origin = just after delimiter "From", just before delimiter "To", in Line at start of Table; Destination = just after Price, in Flight;
Using these definitions, we can readily filter the message for flights of interest, e.g. from Boston to Pittsburgh:
Flight, contains Destination contains "PITTSBURGH", in Table contains Origin contains "BOSTON";
The expression for the flight's origin is somewhat convoluted because flights (which are rows of the table) do not contain the origin as a field, but rather inherit it from the heading of the table. This example demonstrates, however, that useful structure can be described and queried with a small set of relational operators.