Content distribution applications typically require data consistency and reliability. TCP has successfully satisfied these requirements for unicast connectivity; a TCP-equivalent reliable transport protocol for multicast communication has been the subject of active research in recent years [12]. In an ALMI multicast group, the end-to-end reliability problem still exists; however, the cause of the problems differs greatly from that over IP multicast. In ALMI, unicast TCP connections provide data reliability on a hop-by-hop basis, which implies that packet losses due to network congestion and transmission errors are eliminated. Instead, the main reason for packet losses in ALMI are due to multicast tree transitions, transient network link failures, or node failures.
In ALMI, implosion and exposure control happens naturally, it efficiently aggregate requests and retransmit data without the need for router support or knowledge of session topology. Upon loss detection, a session member sends a request onto the interface where data is received from. Requests are then aggregated at each hop so that only one of them escapes the loss subtree. When applications can buffer data or regenerate data from disk, retransmission can happen locally. In this case, the node above the lossy link will retransmit data to the requesting subtree. Otherwise, when upstream node has reset its application naming states(explained below) and can no longer retransmit data locally, a NODATA packet is sent back to the requestor, i.e. the head of the loss subtree. The requestor then initiates an out-of-band connection directly to the source, and subsequent request and retransmit are conducted over this out-of-band connection. In both local and out-of-band retransmission, upon receiving retransmitted packets, requestor forwards them to downstream requestors. The out-of-band connection is torn down after fulfilling the request. The choice of out-of-band request versus relaying request and retransmissions hop-by-hop is due to ALMI's loss characteristics: they are infrequent but usually happen in bulk. Typically, once a node loses its connection, it takes about 3 round trip time to re-connect to the multicast tree and detect packet losses. Although relaying request all the way up to the source can sometime aggregate more independent loss requests at higher up the tree, it adds per-hop processing and transmission delay for each request and retransmission packet, and also disrupts the normal data distribution process. On the contrary, an out-of-band connection separates data distribution from retransmissions and have much less processing delay.
Additionally, ALMI also deploys ACKs to synchronize data
reception states at members. This is necessary for applications that
require total reliability but have limited buffer space. Before
resetting their buffers, members need to ensure all packets in buffer
are correctly received by all members. An ACK is a list of
source, sequence number
pairs, where sequence number is the
highest contiguous sequence number received locally from a data
source. Initiated from leaf nodes, ACKs are sent upstream
towards the root. At each intermediate node, once a member received
ACKs from all its children, it forwards upstream an ACK
containing the minimum of sequence numbers for each source. When the
ACK reaches root, it is multicasted back downstream and reset
every nodes' state to their common minimum. A member is then free to
clear up all packet buffers with sequence number less than the
minimum. The frequency of the ACK process depends on both the
data rate and the smallest buffer space at a member application.