Big Data Profiles: interview with Martin Thompson

This is the first article in Big Data Profiles, a series profiling individuals and teams who have successfully delivered large-volume, data-driven systems.

Martin Thompson is a high-performance and low-latency specialist, and one of the creators of the Disruptor. While at LMAX and Betfair, Martin built and tested systems that are capable of handling hundreds of thousands of transactions per second with response times in the microseconds. The business domains of these systems (betting, trading) meant that complete continuation of service under extreme load was cruicial.

Every transaction counts

When dealing with financial systems, even a single lost transaction can create bad publicity, put the viability of the whole system in question, and cause customers to seek legal recourse through regulators. Repeating failed transactions at a later point may not be possible as the business opportunity has already been missed. The ability to explain and prove system behaviour (for instance, why a series of transactions was carried out in a particular order) and performant reporting on historical data (stretching into the distant past) are concerns that have to be baked into from the very start. Verifying that specific invariants are maintained is the central part of the functional testing.

Pipelines

One of Martin’s teams had great success using build pipelines (as described in Dave Farley and Jez Humble’s Continuous Delivery book) - a series of stages (build and unit tests, integration tests and acceptance tests, tests of cross-functional requirements, exploratory tests) through which the software travels and is exercised under increasingly production-like configurations and environments. The pipeline was geared towards providing feedback to the team as quickly as possible; for instance, certain acceptance tests were moved into earlier test stages and acted as canaries to catch critical regressions (that were only exhibited during complex interaction between components) as soon as they were introduced.

Dogfooding

The team also institutionalised ‘dogfooding’ by holding an internal competition after every iteration using the latest production version. The friendly contest between users pushed the system in novel and unexpected ways and uncovered problems (such as bottlenecks and exploits) that were not caught even during the exploratory testing stages.

Martin has a word of warning: teams that undertake building high-throughput/low-latency systems need to have the appropriate architectural skills/experience (in order to, for instance, pick an appropriate database technology or avoid obvious performance bottlenecks), as well as the ability to write code that doesn’t go “against the grain” of the underlying hardware. Applying the YAGNI principle to such decisions may box the team in and could lead to expensive rework or embarassing failure.

Release of DbFit 2.0.0 RC1

I am pleased to announce the release of DbFit 2.0.0 RC1. The binary and manual can be downloaded from the new DbFit homepage.

What’s included in this release?

What’s coming next

  • Combine DbFit fixtures with .NET FitNesse fixtures (.NET standalone mode).
  • A refresh of the documentation and tutorial.
  • Better documentation for those unfamiliar with FitNesse.

Failing fast in rails with a smoke test initializer

The Rails app that I’m working on has a dependency on ImageMagick. In fact, if ImageMagick is either missing or broken in the app’s environment, the app is beyond repair and Rails shouldn’t even start.

To achieve this, I have added an initializer under config/initializers/dependency_tests.rb:

require 'minitest/unit'
require 'stringio'

class DependencyTests < MiniTest::Unit::TestCase
  def test_imagemagick_6_is_present
    assert_match(/Version: ImageMagick 6/, `convert`.split("\n")[0], 'ImageMagick 6 not present!')
  end
end

MiniTest::Unit.output = StringIO.new
MiniTest::Unit.new.run
unless ENV['SKIP_TESTS']
  raise MiniTest::Unit.output.string if MiniTest::Unit.runner.failures + MiniTest::Unit.runner.errors > 0
end

Note : this particular example is specific to ruby 1.9.3 (in the way that minitest is used) but can be easily adapted to other versions of ruby.

If the test fails, then its error messages are rethrown on the console. This approach has already saved me quite a few minutes of troubleshooting (most recently yesterday when I was bringing everything up to scratch post the Mountain Lion upgrade).

Making the most of Scala's XML templating

The Problem

On one of my current projects, the system receives a web request from a consumer, makes further requests to upstream services and collates a response. For the sake of illustration, let’s say that one of these upstream systems was IMDB and its response looked like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>  
<film id="12345">  
 <name>Avatar 5</name>  
 <actors/>  
 <actor/>Tom Cruise</actor>  
 <actor>Angelina Jolie</actor>  
 <actor>Johnny Depp</actor>  
 </actors>  
 <releasedates>  
 <releasedate country="fr-FR">18th December 2015</releasedate>  
 <releasedate country="de-DE">29th December 2015</releasedate>  
 <releasedate country="en-UK">1st December 2015</releasedate>  
 <releasedate country="en-US">4th July 2015</releasedate>  
 </releasedates>  
</film>  

Let’s assume that we want to verify that our system handles the above XML correctly. Because we want to avoid external dependencies in our integration tests, we fake IMDB out and build the fake’s response inline within the test:

fakes.imdb().toReturn(  
 film()  
 .withId("12345")  
 .withName("Avatar 5")  
 .withActors("Tom Cruise", "Angelina Jolie", "Johnny Depp")  
 .withAReleaseDate("fr-FR", "18th December 2015")  
 .withAReleaseDate("de-DE", "29th December 2015")  
 .withAReleaseDate("en-UK", "1st December 2015")  
 .withAReleaseDate("en-US", "4th July 2015")  
)  

(here film() is just a method that invokes new FilmBuilder())

We need some code which satisfies this API and produces the appropriate XML.

The Java Version

The original, (Java) implementation looks like this:

public class FilmBuilder extends Builder {
  private String id = "";
  private String[] actors;
  private Map < String,
  String > releaseDates = new LinkedHashMap < String,
  String > ();
  private String name;

  public String build() {
    String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" + "<film id=\"" + id + "\">\n";
    xml += " <name>" + name + "</name>\n";
    if (actors != null) {
      xml += "<actors>";
      for (String actor: actors) {
        xml += "<actor>" + actor + "</actor>";
      }
      xml += "</actors>";
    }
    xml += " <releasedates>\n";
    xml = renderTextThing(xml, releaseDates, "releasedate");
    xml += " </releasedates>\n" + "</film>";
    return xml;
  }

  private String renderTextThing(String primePlaceXml, Map < String, String > textMap, String elementName) {
    for (Map.Entry < String, String > entry: textMap.entrySet()) {
      primePlaceXml += "<" + elementName + " country='" + entry.getKey() + "'>" + entry.getValue() + "</" + elementName + ">";
    }
    return primePlaceXml;
  }

  public FilmBuilder withId(String id) {
    this.id = id;
    return this;
  }

  public FilmBuilder withActors(String...actors) {
    this.actors = actors;
    return this;
  }

  public FilmBuilder withAReleaseDate(String country, String releaseDate) {
    this.releaseDates.put(country, releaseDate);
    return this;
  }

  public FilmBuilder withName(String name) {
    this.name = name;
    return this;
  }
}

This satisfies the requirements but is very meat and two veg (and ain’t really much of a looker). Perhaps we can improve on it.

Enter Scala’s XML Support

Scala supports XML natively (when trying to figure out the ins and outs, Daniel Spiewak’s blog post was the most useful documentation I could find online).

The Scala port of the FilmBuilder looks like this:

import java.util.LinkedHashMap
import java.util.Map
import xml.Unparsed

class FilmBuilder extends Builder {
  import scala.collection.JavaConversions._

  def build: String = {
    """<?xml version="1.0" encoding="UTF-8" standalone="yes"?>""" +
      <film id={id}>
        <name>{name}</name>  
        {
          if (actors != null) {
            <actors>{actors.map(name => <actor>{Unparsed(name)}</actor>)}</actors>
          }
        }  
       <releasedates>  
         {releaseDates.map(kv => <releasedate country={kv.\ _1}>{kv.\ _2}</releasedate>)}  
       </releasedates>  
     </film>
  }

  def withId(id: String): FilmBuilder = { this.id = id; this }
  def withName(name: String): FilmBuilder = { this.name = name; this }
  def withActors(actors: String*): FilmBuilder = { this.actors = actors; this }
  def withAReleaseDate(language: String, dateString: String): FilmBuilder = {
    this.releaseDates.put(language, dateString); this
  }

  private var id: String = null
  private var name: String = null
  private var actors: Seq[String] = null
  private val releaseDates: Map[String, String] =
    new LinkedHashMap[String, String]
}

Interesting features and gotchas to mention:

  • Your XML needs to be valid markup. Your IDE’s compiler will be an angry red until you put that closing tag in. This is a nice contrast to the Java solution’s string building approach, which is very susceptible to leaving a dangling tag open somewhere accidentally.
  • You can use conditionals in your template. This is particularly useful when you have optional tags in your XML.
  • Collections get flattened correctly inline (which is pretty awesome). This means that you only really have to generate an enumeration of XML nodes and let Scala do the rest.
  • Inline strings get helpfully escaped. This does mean though that if you have an inline string <tag>...</tag>, it will in fact be rendered as & lt;tag & gt;...& lt;tag & gt;, which may not be exactly what you wanted. To prevent this from happening, you need to wrap your string in an Unparsed (I’ve done this above for the actor tag).
  • I like how you can recognise the XML structure in the Scala version. This isn’t really possible (with all that noise) in the Java variant.