This is the first article in Big Data Profiles, a series profiling individuals and teams who have successfully delivered large-volume, data-driven systems.
Martin Thompson is a high-performance and low-latency specialist, and one of the creators of the Disruptor. While at LMAX and Betfair, Martin built and tested systems that are capable of handling hundreds of thousands of transactions per second with response times in the microseconds. The business domains of these systems (betting, trading) meant that complete continuation of service under extreme load was cruicial.
Every transaction counts
When dealing with financial systems, even a single lost transaction can create bad publicity, put the viability of the whole system in question, and cause customers to seek legal recourse through regulators. Repeating failed transactions at a later point may not be possible as the business opportunity has already been missed. The ability to explain and prove system behaviour (for instance, why a series of transactions was carried out in a particular order) and performant reporting on historical data (stretching into the distant past) are concerns that have to be baked into from the very start. Verifying that specific invariants are maintained is the central part of the functional testing.
One of Martin’s teams had great success using build pipelines (as described in Dave Farley and Jez Humble’s Continuous Delivery book) – a series of stages (build and unit tests, integration tests and acceptance tests, tests of cross-functional requirements, exploratory tests) through which the software travels and is exercised under increasingly production-like configurations and environments. The pipeline was geared towards providing feedback to the team as quickly as possible; for instance, certain acceptance tests were moved into earlier test stages and acted as canaries to catch critical regressions (that were only exhibited during complex interaction between components) as soon as they were introduced.
The team also institutionalised ‘dogfooding’ by holding an internal competition after every iteration using the latest production version. The friendly contest between users pushed the system in novel and unexpected ways and uncovered problems (such as bottlenecks and exploits) that were not caught even during the exploratory testing stages.
Martin has a word of warning: teams that undertake building high-throughput/low-latency systems need to have the appropriate architectural skills/experience (in order to, for instance, pick an appropriate database technology or avoid obvious performance bottlenecks), as well as the ability to write code that doesn’t go “against the grain” of the underlying hardware. Applying the YAGNI principle to such decisions may box the team in and could lead to expensive rework or embarassing failure.
To read more profiles in this series, Follow @benilov