get_free_page()
then used to return pages that had already been cleared
without having to waste time clearing the pages when they were requested.
The kernel compile with this ``optimization'' took nearly twice
as long to complete due to cache misses.
Measurements with LmBench showed performance decreases as well. The
performance loss from clearing pages was verified with hardware counters to
be due to more cache misses.
We repeated the experiment by uncaching the pages before clearing
them and not adding them to the list of cleared pages. This allowed us to
see how much of a penalty clearing the pages incurred without having the
effect of using those pages to speed get_free_page()
. There was no
performance loss or gain. This makes sense since the data cache was not
affected because the pages being cleared were uncached and even after being
cleared they weren't used to speed up get_free_page()
. The number of
cache misses didn't change from the kernel without the page clearing
modifications.
3
When the cache was turned off for pages being cleared and they were used in
get_free_page()
the system became much faster. This kept the
cache from having useless entries put into it when get_free_page()
had
to clear the page itself when the code requesting the page never read
those values (it shouldn't read them anyway).
This suggests that it might be worthwhile to turn off the data and
instruction cache during the idle task to avoid polluting the cache with
any accesses done in the idle task. There's no need to worry about the idle
task executing quickly, we're only concerned with switching out of it
quickly when another task needs to run so caching isn't necessary.
We must
always ensure that the overhead of an optimization doesn't outweigh any
potential improvement in performance [11] [4].
In this case we did not incur great overhead when clearing pages. In
fact, all data structures used to keep track of the cleared pages are lock
free and interrupts are left enabled so there is no possibility of keeping
control of the processor any
longer than if we had not been clearing pages. Even when calling
get_free_page()
the only overhead is a check to see if there are any
pre-cleared pages available. Our measurements with page-clearing on but not
adding the pages to the list still costs us that check in
get_free_page()
so any potential overhead would have shown up.
This is important since the idle task runs quite often even on a system heavily
loaded with users compiling, editing, reading mail so a lot of
I/O happens that must be waited for.