Where does the time go?

------------------------------------------------------------------------------
your basic HTTP GET request
------------------------------------------------------------------------------

Rough initial measurements:

C client (fastget), Apache server, raw GET:
    shovel -> shovel    4.8 ms
    plow   -> shovel    5.1 ms
Java client (HttpExp), Apache server, raw GET:
    shovel -> shovel    6.1 ms
    plow   -> shovel    5.9 ms
    
A GET is a 35-byte data request, and the reply runs about 364 bytes
(counting headers plus the 100-byte payload).

The wire transaction looks like this:

3-way TCP handshake
    12:55:21.787242 plow.cs.dartmouth.edu.50689 > shovel.cs.dartmouth.edu.80:
        S 3757274290:3757274290(0) win 8760 <mss 1460> (DF)
    12:55:21.787310 shovel.cs.dartmouth.edu.80 > plow.cs.dartmouth.edu.50689:
        S 3659477862:3659477862(0) ack 3757274291 win 8760 <mss 1460> (DF)
    12:55:21.787800 plow.cs.dartmouth.edu.50689 > shovel.cs.dartmouth.edu.80:
        . ack 1 win 8760 (DF)

There goes the GET (35 bytes), followed by its ack (not piggybacked):
    12:55:21.788412 plow.cs.dartmouth.edu.50689 > shovel.cs.dartmouth.edu.80:
        P 1:36(35) ack 1 win 8760 (DF)
    12:55:21.788562 shovel.cs.dartmouth.edu.80 > plow.cs.dartmouth.edu.50689:
        . ack 36 win 8760 (DF)

Here comes the reply (364 bytes):
    12:55:21.790619 shovel.cs.dartmouth.edu.80 > plow.cs.dartmouth.edu.50689:
        P 1:365(364) ack 36 win 8760 (DF)

shovel closes (FIN) the connection, plow acks the close -- twice? --
follewed by a FIN of its own (when the application closes its end?), and
shovel acks the final FIN.
    12:55:21.791289 shovel.cs.dartmouth.edu.80 > plow.cs.dartmouth.edu.50689:
        F 365:365(0) ack 36 win 8760 (DF)
    12:55:21.791575 plow.cs.dartmouth.edu.50689 > shovel.cs.dartmouth.edu.80:
        . ack 365 win 8760 (DF)
    12:55:21.791664 plow.cs.dartmouth.edu.50689 > shovel.cs.dartmouth.edu.80:
        . ack 366 win 8760 (DF)
    12:55:21.792121 plow.cs.dartmouth.edu.50689 > shovel.cs.dartmouth.edu.80:
        F 36:36(0) ack 366 win 8760 (DF)
    12:55:21.792185 shovel.cs.dartmouth.edu.80 > plow.cs.dartmouth.edu.50689:
        . ack 37 win 8760 (DF)
    
That's 3 packets to handshake, two packets to send the request, one packet
to send the reply, and five packets to tear down the connection.

There are five half-round-trips (dependencies that incur a network latency),
although not all may be visible to the application.

Ethernet latency for 100 bytes is on the order of 0.1ms
(100 bytes == 1000 bits == 0.1ms). So let's say the ethernet is responsible
for 0.5-1.1ms of delay, and the rest is application and network stack delay.
The C client time goes up by about 0.3ms, and the Java client time goes down.

In the local case, the cpu is:
    C:       0% idle, 35% user, 65% kernel
    Java:    0% idle, 35% user, 65% kernel
In the remote case, the cpus are:
    C:      56% idle,  2% user, 42% kernel   client
             8% idle, 31% user, 60% kernel   server
    Java:   46% idle, 12% user, 41% kernel   client
            20% idle, 20% user, 57% kernel   server

So what happens?
Both cases are CPU-bound, with most of the work in the kernel (the IP stack,
apparently). The Java case consumes a little more (10% absolute) user CPU on
the client, presumably because it has a longer call path from the C libraries
up through JNI to the Java code. That reduces Java's response time, giving
the server a little more breathing time (3% less kernel, 10% less user).

So the lower throughput for Java is due to its longer call path in the
application, which shouldn't surprise us much. Otherwise the characteristics
are very similar.
Java slows down most in the local case, and of course that's because Java
increases the CPU-boundedness problem.
