Sunday, December 2, 2012

Multicast Oddities

The operation of multicast is standardized, well-defined, and well-understood.

Or is it?  (Insert ominous music here)

RFCs and other standards mostly apply to packets as they traverse the network.  Network administrators take care of multicast infrastructure so that software developers usually don't have to worry about the subtleties of multicast connectivity and routing.  However, what the operating system does inside a host is often not governed by formal standards.  There are some corner cases related to multi-homed hosts where operating systems behave in non-intuitive or inconsistent ways that the software developer must be aware of.

In the scenarios shown below, we tested multicast connectivity with a set of simple multicast tools "msend" (for sending) and "mdump" (for receiving).  See mtools for source and binaries for these programs.  In all cases, the msends and mdumps use the same multicast groups and destination ports.  The testing was done on four operating systems: Linux, Solaris-10, FreeBSD, and Windows.


SCENARIO 1: UN-ROUTED

In this scenario, two interfaces are used, one for sending multicast packets and the other for receiving them.  In particular, this scenario assumes that the two networks attached to the interfaces are not routed, at least not for multicast.

The results of testing the scenario are what you would expect - the mdump does not receive the traffic.  This is true for all four operating systems.

No surprise here, right?  Well hang on tight, things are about to get more interesting.



SCENARIO 2: ROUTED

In this scenario, the two networks are routed.  I.e. multicast packets sent to network 1 are routed to network 2.  One would expect that the mdump would now receive the packets, right?

It does for Solaris-10, FreeBSD, and Windows.  However, when tested with Linux, the mdump stays silent.  We ran "netstat -g" and the kernel thinks it is joined to the multicast group on Network 2, and a "tcpdump" (with promiscuous mode turned OFF) indicates that the packets are in fact being received by the kernel.  But for some reason, the Linux kernel is discarding those packets.

I, for one, consider this to be a bug in Linux.  It violates application portability.  That is, if the mdump is run on a separate machine connected to network 2, the packets are received just fine.  But run it on the same machine as the sender, and it stops working.



SCENARIO 3: ROUTED, EXTRA MDUMP

In this scenario, a second mdump is added, joining network 1.  One would expect both mdumps to receive the multicast, and in fact they both do.  So the only surprise here is that mdump2 appears to have "fixed" the Linux bug in scenario 2.



SCENARIO 4: UN-ROUTED, EXTRA MDUMP

Things here get interesting again.  Back to an unrouted scenario, with the second mdump.  One would expect mdump2 to receive the data but not the original mdump.

This is what happens for Solaris-10 and WIndows.  For Linux and FreeBSD, *both* mdumps receive the data.  I consider this to be a bug for Linux and FreeBSD - why should an additional mdump change the behavior - but I consider it less serious than scenario 2's problem.  In scenario 4, the problem is receiving more packets than expected, which could be filtered out by the application.

So, when dealing with multi-homed hosts, it is usually advisable to stick to a single interface if the networks  are routed, or to use different multicast groups on each network if non-routed networks are used.

(BTW, this article was first published in 2009 on a different, now deleted, blog.)

No comments: