# load and clean May search logs 1. M = load `/logs/may05' as (user, query, time); 2. M = filter M by not isURL(query); 3. M = filter M by not isBot(user); # determine frequent queries in May 4. M_groups = group M by query; 5. M_frequent = filter M_groups by COUNT(M) > 10^4; # load and clean June search logs 6. J = load `/logs/june05' as (user, query, time); 7. J = filter J by not isURL(query); 8. J = filter J by not isBot(user); # determine June add-ons to May frequent queries 9. J_sub = foreach J generate query, flatten(Subphrases(query)) as subphr; 10. eureka = join J_sub by subphr, M_frequent by query; 11. addons = foreach eureka generate Residual(J_sub::query, J_sub::subphr) as residual; # count add-on occurrences, and filter by count 12. addon_groups = group addons by residual; 13. counts = foreach addon_groups generate residual, COUNT(addons) as count; 14. frequent_addons = filter counts by count > 10^5; 15. store frequent_addons into `myoutput.txt';Line 1 specifies the filename and schema of the May query log. Lines 2 and 3 filter out search queries that consist of URLs or are made by suspected "bots" (the filters are governed by the custom Boolean functions isURL and isBot, which have been manually ordered to optimize performance). Lines 4-5 identify frequent queries in May. Lines 6-8 are identical to Lines 1-3, but for the June log. Lines 9-10 match sub-phrases in the June log (enumerated via a custom set-valued function Subphrases) against frequent May queries. Line 11 then extracts the add-on portion of the query using a custom function Residual (e.g., "olympics" is an add-on to the frequent May query "New York"). Lines 12-14 count the number of occurrences of each add-on term, and filter out add-ons that did not occur frequently. Line 15 specifies that the output should be written to a file called myoutput.txt. In general, Pig Latin programs express acyclic dataflows in a step-by-step fashion using variable assignments (the variables on the left-hand side denote sets of records). Each step performs one of: (1) data input or output, e.g., Lines 1, 6, 15; (2) relational-algebra-style transformations, e.g., Lines 10, 14; (3) custom processing, e.g., Lines 9, 11, governed by user-defined functions (UDFs). A complete description of the language is omitted here due to space constraints; see [22].