Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't really want to, but if you configure all of the LAG participants on the path to do round-robin or similar balancing rather than hashing based on addresses, you can have data in one flow that exceeds an individual connection. You'll also be pretty likely to get out of order data, and tcp receivers will exercise their reassembly buffer, which will kill performance and you'll rapidly wish you hadn't done all that configuration work. If you do need more than one link's worth of throughput, you'll almost always do better by running multiple flows, but you may need still need to configure your network so it hashes in a way that you can get diverse paths between two hosts, defaults might not give you diversity even on different flows.


the data out of order is the key bit.

How do these guys get the data in order and we dont?


Consider that a QSFP28 module uses four 25gbps lanes to support sending one single 100gbps flow. So electronics do exist that can easily do what you are asking. I think it is just the economics of doing it for the various ports on a switch, lack of a standard, etc.


SFP/QSFP/PCIe etc., are combining multiple lanes originating from a physical bundle of limited size; transmitters could easily share a single clock source. The wire protocol includes additional signalling that lets the receiver know how to recombine the bits coming over each lane in the correct order.

In contrast, Ethernet link aggregation lets you combine ports that can be arbitrarily far apart -- maybe not even within the same rack (see MC-LAG). Ethernet link aggregation doesn't add any encapsulation or sequencing information to the data flows it manages.

You can imagine an alternate mechanism which added a small header to each packet with sequence numbers; the other end of the aggregation would then have to remove that header after sorting the packets in order..


Plus the NIC/PHY is likely assuming only a small range of propagation delay differences between the lanes/links.

Probably falls down if one link is 1cm and the other is 100km.

A LAG could be done with different medium/speeds, though perhaps not likely in practice.


You can calculate that using the deskew buffer sizes from here:

https://www.ieee802.org/3/df/public/23_01/0130/ran_3df_03_23...

That said, I fully expect such an order of magnitude difference to overwhelm the deskew buffer.


> A LAG could be done with different medium/speeds, though perhaps not likely in practice.

802.1AX indeed requires all members of a LAG to use the same speed.


> How do these guys get the data in order and we dont?

LAGs stripe traffic across links at the packet level, whereas QSFP/OSFP lanes do so at the bit level.

Different sized packets on different LAG links will take different amounts of time to transmit. So when striping bits, you effectively have a single ordered queue, whereas when striping packets across links, there are multiple independent queues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: