This has been going on in the HFT space for a number of years. FPGA's are used to parse data feeds as the sheer volume of quotes overwhelms most systems.
In fact after moving the networking stack into user land and using inifiniband networking gear, its probably the third most common optimization I've seen/heard of for HFT systems.
Some one had asked about hte number of quotes that need to be parsed. From forbes...
> Mr. Hunsader: The new world is now a war between machines. For some perspective, in 1999 at the height of the tech craze, there were about 1,000 quotes per second crossing the tape. Fast forward to 2013 and that number has risen exponentially to 2,000,000 per second.
Keep in mind that the "tape" is the slow SIP line that exchanges use to keep prices in sync and show customers that don't use the exchanges direct feeds. ie it aggregates all the quotes from all venues and throws a way alot as they can't be parsed in time or didn't change the top level quote.
With 40+ venues at which a HFT fund can get feeds from 2,000,000 second is a fraction of what a cutting edge HFT would have to parse to keep up with all venues.
The typical setup is that you'll run strategies across multiple machines so you have the gateway machine that directs the quote to the appropriate machine. The biggest problem is the speed at which the quotes arrive.
Unlike a web request, that you can take 300 milliseconds to parse and return, if you don't parse and respond to the quote in under 10-20 micro seconds you've already lost.
So the FPGA transition is to make sure there is never a back log of quotes or any pauses in the handling of bursty quotes. This can't be overstated enough. Margins are squeezed so tightly now that your algo will appear to be working fine until a big burst of quotes happen and your machines can't keep up and when the dust settles in 20 seconds, you'll find you lost $5000, which might be your entire day's profit from that one symbol/algo pair.
I'm curious if these kinds of setups are also used in other high-freq scenarios.
For instance, I could imagine using techniques like userland network stack and reserving cores exclusively in services like WhatsApp. I think they're currently on a highly customized Erlang stack and are able to handle huge numbers of queries per machine. Any insiders here with a good background story?
I'm curious about what volumes we're talking about here. I'm not an electrical engineer, so I don't understand when FPGAs are needed (or justify the development cost) over distributed software systems.
It's usually the latency requirements that rule-out distributed systems. The amount of total data isn't particularly crazy, it's just that it comes in very concentrated bursts.
Avoiding system call overhead and the GPL. I suspect most of the performance improvement has nothing to do with being in userspace, but people use that term as a shorthand to refer to a whole collection of techniques like polling, batching, not performing routing/filtering, etc.
Close, but it really has nothing at all to do with the GPL. It has to do with reducing context switches (from userspace to kernelspace) and reducing hardware interrupts. When the name of the game is latency, context switches really hurt you. Batching is awful for latency, and great for throughput fyi.
In fact, openonload[1], and vma[1], 2 of the most common vendor provided kernel offload tools in use are both open source!
In fact after moving the networking stack into user land and using inifiniband networking gear, its probably the third most common optimization I've seen/heard of for HFT systems.
Here's a quick, but surprisingly accurate description of a common HFT setup: http://www.forbes.com/sites/quora/2014/01/07/what-is-the-tec...
Some one had asked about hte number of quotes that need to be parsed. From forbes...
> Mr. Hunsader: The new world is now a war between machines. For some perspective, in 1999 at the height of the tech craze, there were about 1,000 quotes per second crossing the tape. Fast forward to 2013 and that number has risen exponentially to 2,000,000 per second.
Keep in mind that the "tape" is the slow SIP line that exchanges use to keep prices in sync and show customers that don't use the exchanges direct feeds. ie it aggregates all the quotes from all venues and throws a way alot as they can't be parsed in time or didn't change the top level quote.
With 40+ venues at which a HFT fund can get feeds from 2,000,000 second is a fraction of what a cutting edge HFT would have to parse to keep up with all venues.
The typical setup is that you'll run strategies across multiple machines so you have the gateway machine that directs the quote to the appropriate machine. The biggest problem is the speed at which the quotes arrive.
Unlike a web request, that you can take 300 milliseconds to parse and return, if you don't parse and respond to the quote in under 10-20 micro seconds you've already lost.
So the FPGA transition is to make sure there is never a back log of quotes or any pauses in the handling of bursty quotes. This can't be overstated enough. Margins are squeezed so tightly now that your algo will appear to be working fine until a big burst of quotes happen and your machines can't keep up and when the dust settles in 20 seconds, you'll find you lost $5000, which might be your entire day's profit from that one symbol/algo pair.