DDD East Anglia 2017 In Review

Saturday 23 Sep 2017 at 21:00
Conferences  |  conferences development dotnet ddd


This past Saturday, 16th September 2017, the fourth DDD East Anglia event took place in Cambridge.  DDD East Anglia is a relatively new addition to the DDD event line-up but now it’s fourth event sees it going from strength to strength.


I’d made the long journey to Cambridge on the Friday evening and stayed in a local B&B to be able to get to the Hills Road College where DDD East Anglia was being held on the Saturday.  I arrived bright and early, registered at the main check-in desk and then proceeded to the college’s recital room just around the corner from the main building for breakfast refreshments.

After some coffee, it was soon time to head back to the main college building and up the stairs to the room where the first session of the day would commence.  My first session was to be Joseph Woodward’s Building A Better Web API With CQRS.


Joseph starts his session by defining CQRS.  It’s an acronym, standing for Command Query Responsibility Segregation.  Fundamentally, it’s a pattern for splitting the “read” models from your “write” models within a piece of software.  Joseph points out that we should beware when googling for CQRS as google seems to think it’s a term relating to cars!

CQRS was first coined by Greg Young and it’s very closely related to a prior pattern called CQS (Command Query Separation), originally coined by Bertrand Meyer which states that every method should either be a command which performs an action, or a query which returns data to the caller, but never both.  CQS primarily deals with such separations at a very micro level, whilst CQRS primarily deals with the separations at a broader level, usually along the seams of bounded contexts.  Commands will mutate state and will often be of a “fire and forget” nature.  They will usually return void from the method call.  Queries will return state and, since they don’t mutate any state are idempotent and safe.  We learn that CQRS is not an architectural pattern, but is more of a programming style that simply adheres to the the separation of the commands and queries.

Joseph continues by asking what’s the problem with some of our existing code that CQRS attempts to address.   We look at a typical IXService (where X is some domain entity in a typical business application):

public class ICustomerService
     void MakeCustomerPreferred(int customerId);
     Customer GetCustomer(int customerId);
     CustomerSet GetCustomersWithName(string name);
     CustomerSet GetPreferredCustomers();
     void ChangeCustomerLocale(int cutomerId, Locale newLocale);
     void CreateCustomer(Customer customer);
     void EditCustomerDetails(CustomerDetails customerDetails);

The problem here is that the interface ends up growing and growing and our service methods are simply an arbitrary collection of queries, commands, actions and other functions that happen to relate to a Customer object.  At this point, Joseph shares a rather insightful quote from a developer called Rob Pike who stated:

“The bigger the interface, the weaker the abstraction”

And so with this in mind, it makes sense to split our interface into something a bit smaller.  Using CQRS, we can split out and group all of our "read" methods, which are our CQRS queries, and split out and group our "write" methods (i.e. Create/Update etc.) which are our CQRS commands.  This will simply become two interfaces in the place of one, an ICustomerReadService and an ICustomerWriteService.

There's good reasons for separating our concerns along the lines of reads vs writes, too.  Quite often, since reads are idempotent, we'll utilise heavy caching to prevent us from making excessive calls to the database and ensure our application can return data in as efficient a manner as possible, whilst our write methods will always hit the database directly.  This leads on to the ability to have entirely different back-end architectures between our reads and our writes throughout the entire application.  For example, we can scale multiple read replica databases independently of the database that is the target for writes.  They could even be entirely different database platforms.

From the perspective of Web API, Joseph tells us how HTTP verbs and CQRS play very nicely together.  The HTTP verb GET is simply one of our read methods, whilst the verbs PUT, POST, DELETE etc. are all of our write concerns.  Further to this, Joseph looks at how we can often end up with MVC or WebAPI controllers that require services to be injected into them and often our controller methods end up becoming bloated from having additional concerns embedded within them, such as validation.  We then look at the command dispatcher pattern as a way of supporting our separation of reads and writes and also as a way of keeping our controller action methods lightweight.

There are two popular frameworks that implement the command dispatcher pattern in the .NET world. MediatR and Brighter.  Both frameworks allow us to define our commands using a plain old C# object (that implements specific interfaces provided by the framework) and also to define a "handler" to which the commands are dispatched for processing.  For example:

public class CreateUserCommand : IRequest
     public string EmailAddress { get; set; }
     // Other properties...

public class CreateUserCommandHandler : IAsyncRequestHandler<CreateUserCommand>
     public CreateUserCommandHandler(IUserRepository userRepository, IMapper mapper)
          _userRepository = userRepository;
          _mapper = mapper;

     public Task Handle(CreateUserCommand command)
          var user = _userRepository.Map<CreateUserCommand, UserEntity>(command);
          await _userRepository.CreateUser(user);

Using the above style of defining commands and handlers along with some rudimentary configuration of the framework to allow specific commands and handlers to be connected, we can move almost all of the required logic for reading and writing out of our controllers and into independent, self-contained classes that perform a single specific action.  This enables further decoupling of the domain and business logic from the controller methods, ensuring the controller action methods remain incredibly lightweight:

public class UserController : Controller
     private readonly IMediator _mediator;
     public UserController(IMediator mediator)
          _mediator = mediator;
     public async Task Create(CreateUserCommand user)
          await _mediator.Send(user);

Above, we can see that the Create action method has been reduced down to a single line.  All of the logic of creating the entity is contained inside the handler class and all of the required input for creating the entity is contained inside the command class.

Both the MediatR and Brighter libraries allow for request and post-request pre-processors.  This allows defining another class, again deriving from specific interfaces/base classes within the framework, which will be invoked before the actual handler class or immediately afterwards.  Such pre-processing if often a perfect place to put cross-cutting concerns such as validation:

public class CreateUserCommandValidation : AbstractValidation<CreateUserCommand>
     public CreateUserCommandValidation()
          RuleFor(x => x.EmailAddress).NotEmpty().WithMessage("Please specify an email address");

The above code shows some very simple example validation, using the FluentValidationlibrary, that can be hooked into the command dispatcher framework's request pre-processing to firstly validate the command object prior to invoking the handler, and thus saving the entity to the database.

Again, we've got a very nice and clean separation of concerns with this approach, with each specific part of the process being encapsulated within it's own class.  The input parameters, the validation and the actual creation logic.

Both MediatR and Brighter have an IPipelineBehaviour interface, which allows us to write code that hooks into arbitrary places along the processing pipeline.  This allows us to implement other cross-cutting concerns such as logging.  Something that's often required at multiple stages of the entire processing pipeline.

At this point, Joseph shares another quote with us.  This one's from Uncle Bob:

"If your architecture is based on frameworks then it cannot be based on your use cases"

From here, Joseph turns his talk to discuss how we might structure our codebases in terms of files and folders such that separation of concerns within the business domain that the software is addressing are more clearly realised.  He talks about a relatively new style of laying out our projects called Feature Folders (aka Feature Slices).

This involves laying out our solutions so that instead of having a single top-level "Controllers" folder, as is common in almost all ASP.NET MVC web applications, we instead have multiple folders named such that they represent features or specific areas of functionality within our software.  We then have the requisite Controllers, Views and other folders underneath those.   This allows different areas of the software to be conceptually decoupled and kept separate from the other areas.  Whilst this is possible in ASP.NET MVC today, it's even easier with the newer ASP.NET Core framework, and a NuGet package called AddFeatureFolders already exists that enables this exact setup within ASP.NET Core.

Joseph wraps up his talk by suggesting that we take a look at some of his own code on GitHub for the DDD South West website (Joseph is one of the organisers for the DDD South West events) as this has been written using the the CQRS pattern along with using feature folders for layout.


After Joseph's talk it's time for a quick coffee break, so we head back to the recital room around the corner from the main building for some liquid refreshment.  This time also accompanied by some very welcome biscuits!

After our coffee break, it's time to head back to the main building for the next session.  This one was to be Bart Read's Client-Side Performance For Back-End Developers.


Bart's session is all about how to maximise performance of client-side script using tools and techniques that we might employ when attempting to troubleshoot and improve the performance of our back-end, server-side code.  Bart starts by stating that he's not a client-side developer, but is more of a full stack developer.  That said, as a full stack developer, one is expected to perform at least some client-side development work from time to time.  Bart continues that in other talks concerning client-side performance, the speakers tend to focus on the page load times and page lifecycle, which whilst interesting and important, is a more a technology-centric way of looking at the problem.  Instead, Bart says that he wants to focus on RAIL, which was originally coined by Google.  This is an acronym for Response, Animation, Idle and Load and is a far more user-centric way of looking at the performance (or perhaps even just perceived performance) issue.  In order to explore this topic, Bart states that he learnt JavaScript and built his own arcade game site, Arcade.ly, which uses extensive JavaScript and other resources as part of the site.

We first look at Response.  For this we need to build a very snappy User Interface so that the user feels that the application is responding to them immediately.  Modern web applications are far more like desktop applications written using either WinForms or WPF than ever and users are very used to these desktop applications being incredibly responsive, even if lots of processing is happening in the background.  One way to get around this is to use "fake" pages.  These are pages that load very fast, usually without lots of dynamic data on them, that are shown to the user whilst asynchronous JavaScript loads the "real" page in the background.  Once fully loaded, the real page can be gracefully shown to the user.

Next, we look at Animation. Bart reminds us that animations help to improve the user perception of responsiveness of your user interface.  Even if your interface is performing some processing that takes a few milliseconds to run, loading and displaying an animation that the user can view whilst that processing to going on will greater enhance the perceived performance of the complete application.  We need to ensure that our animations always run at 60 fps (frames per second), anything less than this will cause them to look jerky and is not a good user experience.  Quite often, we need to perform some computation prior to invoking our animation and in this scenario we should ensure that the computation is ideally performed in less than 10 milliseconds.

Bart shares a helpful online utility called CanvasMark which provides benchmarking for HTML5 Canvas rendering.  This can be really useful in order to test the animations and graphics on your site and how they perform on different platforms and browsers

Bart then talks about using the Google Chrome Task Manager to monitor the memory usage of your page.  A lot of memory can be consumed by your page's JavaScript and other large resources.  Bart talks about his own arcade.ly site which uses 676MB of memory.  This might be acceptable on a modern day desktop machine, but it will not work so well on a more constrained mobile device.  He states that after some analysis of the memory consumption, most of the memory was consumed by the raw audio that was decompressed from loaded compressed audio in order to provide sound effects for the game.  By gracefully degrading the quality and size of the audio used by the site based upon the platform or device that is rendering the site, performance was vastly improved.

Another common pitfall is in how we write our JavaScript functions.  If we're going to be creating many instances of a JavaScript object, as can happen in a game with many individual items on the screen, we shouldn't attach functions directly to the JavaScript object as this creates many copies of the same function.  Instead, we should attach the function to the object prototype, creating a single copy of the function, which is then shared by all instances of the object and thus saving a lot of memory.  Bart also warns us to be careful of closures on our JavaScript objects as we may end up capturing far more than we actually need.

We now move onto Idle.   This is all about deferring work as the main concern for our UI is to respond to the user immediately.  One approach to this is to use Web Workers to perform work at a later time.  In Bart's case, he says that he wrote his own Task Executor which creates an array of tasks and uses the builtin JavaScript setTimeout function to slowly execute each of the queued tasks.  By staggering the execution of the queued tasks, we prevent the potential for the browser to "hang" with a white screen whilst background processing is being performed, as can often happen if excessive tasks and processing is performed all at once.

Finally, we look at Load.  A key take away of the talk is to always use HTTP/2 if possible.  Just by switching this on alone, Bart says you'll see a 20-30% improvement in performance for free.  In order to achieve this, HTTP/2 provides us with request multiplexing, which bundles requests together meaning that the browser can send multiple requests the the server in one go.  These requests won't necessarily respond any quicker, but we do save on the latency overhead we would incur if sending of each request separately.  HTTP/2 also provides server push functionality, stream priority and header compression.  It also has protocol encryption, which whilst not an official part of the HTTP/2 specification, is currently mandated by all browsers that support the HTTP/2 protocol, effectively making encryption compulsory.  HTTP/2 is widely supported across all modern browsers on virtually all platforms, with only Opera Mini being only browser without full support, and HTTP/2 is also fully supported within most of today's programming frameworks.  For example, the .NET Framework has supported HTTP/2 since version 4.6.0.  One other significant change when using HTTP/2 is that we no longer need to "bundle" our CSS and JavaScript resources.  This also applies to "spriting" of icons as a single large image.

Bart moves on to talk about loading our CSS resources and he suggests that one very effective approach is to inline the bare minimum CSS we would require to display and render our "above the fold" content with the rest of the CSS being loaded asynchronously.  The same applies to our JavaScript files, however, there can be an important caveat to this.  Bart explains how he loads some of his JavaScript synchronously, which itself negatively impacts performance, however, this is required to ensure that the asynchronously loaded 3rd-party JavaScript - over which you have no control - does not interfere with your own JavaScript as the 3rd-party JavaScript is loaded at the very last moment whilst Bart's own JavaScript is loaded right up front.  We should look into using DNS Prefetch to force the browser to perform DNS Lookups ahead of time for all of the domains that our site might reference for 3rd party resources.  This incurs a one off small performance impact as the page first loads, but makes subsequent requests for 3rd party content much quicker.

Bart warns us not to get too carried away putting things in the HEAD section of our pages and instead we should focus on getting the "above the fold" content to be as small as possible, ideally it should be all under 15kb, which is the size of data that can fit in a single HTTP packet.  Again, this is a performance optimization that may not have noticeable impact on desktop browsers, but can make a huge difference on mobile devices, especially if they're using a slow connection.  We should always check the payload size of our sites and ensure that we're being as efficient as possible and not sending more data than is required.  Further to this, we should use content compression if our web server supports it.  IIS has supported content compression for a long time now, however, we should be aware of a bug that affects IIS version 8 and possibly version 9 which turns off compression for chunked content. This bug was fixed in IIS version 10.

If we're using libraries or frameworks in our page, ensure we only deliver the required parts.  Many of today's libraries are componentized, allowing the developer to only include the parts of the library/framework that they actually need and use.  Use Content Delivery Networks if you're serving your site to many different geographic areas, but also be aware that, if your audience is almost exclusively located in a single geographic region, using a CDN could actually slow things down.  In this case, it's better to simply serve up your site directly from a server located within that region.

Finally, Bart re-iterates.  It's all about Latency.   It's latency that slows you down significantly and any performance optimizations that can be done to remove or reduce latency will improve the performance, or perceived performance, of your websites.


After Bart's talk, it's time for another coffee break.  We head back to the recital room for further coffee and biscuits and after a short while, it's time for the 3rd session of the day and the last one prior to lunch.  This session is to be a Visual Note Taking Workshop delivered by Ian Johnson.

As Ian's session was an actual workshop, I didn't take too many notes but instead attempted to take my notes visually using the technique of Sketch-Noting that Ian was describing.

Ian first states that Sketch-Noting is still mostly about writing words.  He says that most of us, as developers using keyboards all day long, have pretty terrible hand writing so we simply need to practice more at it.  Ian suggests to avoid all caps words and cursive writing, using a simple font and camel cased lettering (although all caps is fine for titles and headings).  Start bigger to get the practice of forming each letter correctly, then write smaller and smaller as you get better at it.  You'll need this valuable skill since Sketch-Noting requires you to be able to write both very quickly and legibly. 

At this point, I put my laptop away and stopped taking written notes in my text editor and tried to actually sketch-note the rest of Ian's talk, which gave us many more pointers and advice on how to construct our Sketch Notes.  I don't consider myself artistic in the slightest, but Ian insists that Sketch Notes don't really rely on artistic skill, but more on the skill of being able to capture the relevant information from a fast-moving talk.  I didn't have proper pens for my Sketch Note and had to rely solely on my biro, but here in all its glory is my very first attempt at a Sketch Note:


After Ian's talk was over, it was time for lunch.  All the attendees reconvened in the recital room where we could help ourselves to the lunch kindly provided by the conference organizers and paid for by the sponsors.  Lunch was the usual brown bag affair consisting of a sandwich, some crisps a chocolate bar, a piece of fruit and a can of drink.  I took the various items for my lunch and the bag and proceeded to wander just outside the recital room to a small green area with some tables.   It was at this point that the weather decided to turn against us and is started raining very heavily.  I made a hasty retreat back inside the recital room where it was warm and dry and proceeded to eat my lunch there.


There were some grok talks taking place over the lunch time, but considering the weather and the fact the the grok talk were taking place in the theatre room which was the further point from the recital room, I decided against attending them and chose to remain warm and dry instead.

After lunch, it was time to head back to the main building for the next session, this one was to be Nathan Gloyn's Microservices - What I've Learned After A Year Building Systems.


Nathan first introduces himself and states that he's a contract developer.  As such, he's been involved in two different projects over the previous 12 months that have been developed using a microservices architecture.  We first asked to consider the question of why should we use microservices?  In Nathan's experience so far, he says, Don't!  In qualifying that statement, Nathan states that microservices are ideal if you need only part of a system to scale, however, for the majority of applications, the benefits to adopting a microservices architecture doesn't outweigh the additional complexity that is introduced.

Nathan state that building a system composed of microservices requires a different way of thinking.  With more monolithic applications, we usually scale them by scaling out - i.e. we use the same monolithic codebase for the website and simply deploy it to more machines which all target the same back-end database.  Microservices don't really work like this, and need to be individually small and simple.  They may even have their own individual database just for the individual service.

Microservices are often closely related to Domain-driven Design's Bounded Contexts so it's important to correctly identify the business domain's bounded contexts and model the microservices after those.  Failure to do this runs the risk that you'll create a suite of mini-monoliths rather than true microservices.

Nathan reminds us that we are definitely going to need a messaging system for an application built with a microservice architecture.  It's simply not an option not to use one as virtually all user interactions will be performed across multiple services.  Microservices are, by their very nature, entirely distributed.  Even simple business processes can often require multiple services and co-ordination of those services.  Nathan says that it's important not to build any messaging into the UI layer as you'll end up coupling the UI to the domain logic and the microservice which is something to be avoided.  One option for a messaging system is NServiceBus, which is what Nathan is currently using, however many other options exist.   When designing the messaging within the system, it's very important to give consideration to versioning of messages and message contracts.  Building in versioning from the beginning ensures that you can deploy individual microservices independently rather than being forced to deploy large parts of the system - perhaps multiple microservices - together if they're all required to use the exact same message version.

We next look at the difference between "fat" versus "thin" services.  Thin services generally only deal with data that they "own", if the thin service needs other data for processing, they must request it from the other service that owns that data.  Fat services, however, will hold on to data (albeit temporarily) that actually "belongs" to other services in order to perform their own processing.  This results in coupling between the services, however, the coupling of fat and thin services is different as fat services are coupled by data whereas thin services are coupled by service.

With microservices, cross-cutting concerns such as security and logging become even more important than ever.  We should always ensure that security is built-in to every service from the very beginning and is treated as a first class citizen of the service.  Enforcing the use of HTTPS across the board (i.e. even when running in development or test environments as well as production) helps to enforce this as a default choice.

We then look at how our system's source code can be structured for a microservices based system.  It's possible to use either one source control repository or multiple and there's trade-offs against both options.  If we use a single repository, that's really beneficial during the development phase of the project, but is not so great when it comes to deployment.  On the other hand, using multiple repositories, usually separated by microservice, is great for deployment since each service can be easily integrated and deployed individually, but it's more cumbersome during the development phase of the project.

It's important to remember that each microservice can be written using it's own technology stack and that each service could use an entirely different stack to others.  This can be beneficial if you have different team with different skill sets building the different services, but it's important to remember to you'll need to constantly monitor each of thetechnology stacks that you use for security vulnerabilities and other issues that may arise or be discovered over time.  Obviously, the more technology stacks you're using, the more time-consuming this will be.

It's also important to remember that even when you're building a microservices based system, you will still require shared functionality that will be used by multiple services.  This can be built into each service or can be separated out to become a microservice in it's own right depending upon the nature of the shared functionality.

Nathan talks about user interfaces to microservice based systems.  These are often written using a SPA framework such as Angular or React.  They'll often go into their own repository for independent deployment, however, you should be very careful that the front-end user interface part of the system doesn't become a monolith in itself.  If the back-end is nicely separated into microservice based on domain bounded contexts, the front-end should be broken down similarly too.

Next we look at testing of a microservice based system.  This can often be a double-edged sword as it's fairly easy to test a single microservice with its known good (or bad) inputs and outputs, however, much of the real-world usage of the system will be interactions that span multiple services so it's important to ensure that you're also testing the user's path through multiple services.  This can be quite tricky and there's no easy way to achieve this.  It's often done using automated integration testing via the user interface, although you should also ensure you do test the underlying API separately to ensure that security can't be bypassed.

Configuration of the whole system can often be problematic with a microservice based system.  For this reason, it's usually best to use a separate configuration management system rather than trying to implement things like web.config transforms for each service.  Tools like Consul or Spring Cloud Config are very useful here.

Data management is also of critical importance.  It should be possible to change data within the system's data store without requiring a deployment.  Database migrations are a key tool is helping with this.  Nathan mentions both Entity Framework Migrations and also FluentMigrator as two good choices.  He offers a suggestion for things like column renames and suggests that instead of a migration that renames the column, create a whole new column instead.  That way, if the change has to be rolled back, you can simply remove the new column, leaving the old column (with the old name) in place.  This allows other services that may not be being deployed to continue to use the old column until they're also updated.

Nathan then touches on multi-tenancy within microservice based systems and says that if you use the model of a separate database per tenant, this can lead to a huge explosion of databases if your microservices are using multiple databases for themselves.  It's usually much more manageable to have multi-tenancy by partitioning tenant data within a single database (or the database for each microservice).

Next, we look at logging and monitoring within our system.  Given the distributed nature of a microservice based system, it's important to be able to log and understand the complete user interaction even though logging is done individually by individual microservices.  To facilitate understanding the entire end-to-end interaction we can use a CorrelationID for this.  It's simply a unique identifier that travels through all services, passed along in each message and written to the logs of each microservice.  When we look back at the complete set of logs, combined from the disparate services, we can use the CorrelationID to correlate the log messages into a cohesive whole.  With regard to monitoring, it's also critically important to monitor the entire system and not just the individual services.  It's far more important to know how healthy the entire system is rather than each service, although monitoring services individually is still valuable.

Finally, Nathan shares some details regarding custom tools.  He says that, as a developer on a microservice based system, you will end up building many custom tools.  These will be such tools as bulk data loading whereby getting the data into the database requires processing by a number of different services and cannot simply be directly loaded into the database.  He says that despite the potential downsides of working on such systems, building the custom tools can often be some of the more enjoyable parts of building the complete system.

After Nathan's talk it was time for the last coffee break of the day, after which it was time for the final day's session.  For me, this was Monitoring-First Development by Benji Weber.


Benji starts his talk by introducing himself and says that he's mostly a Java developer but that he done some .NET and also writes JavaScript as well.  He works at an "ad-tech" company in London.  He wants to first start by talking about Extreme Programming as this is a style of programming that he uses in his current company.  We look at the various practices within Extreme Programming (aka XP) as many of these practices have been adopted within wider development styles, even for teams that don't consider themselves as using XP.  Benji says that, ultimately, all of the XP practices boils down to one key thing - Feedback.  They're all about getting better feedback, quicker.  Benji's company uses full XP with a very collaborative environment, collectively owning all the code and the entire end-to-end process from design/coding through to production, releasing small changes very frequently.

As part of this style adopted by the teams, it's lead onto the adoption of something they term Monitor-Driven Development.  This is simply the idea that monitoring of all parts of a software system is the core way to get feedback on that system, both when the system is being developed and as the system is running in production.  Therefore, all new feature development starts by asking the question, "How will we monitor this system?" and then ensuring that the ability to deeply monitor all aspects of the system being developed is a front and centre concern throughout the development cycle.

To illustrate how the company came to adopt this methodology, Benji shares three short stories with us.  The first started with an email to the development team with the subject of "URGENT".  It was from some sales people trying to demonstrate some part of the software and they were complaining that the graphs on the analytics dashboard weren't loading for them.  Benji state that this was a feature that was heavily tested in development so the error seemed strange.   After some analysis into the problem, it was discovered that data was the root cause of the issue and that the development team had underestimated the way in which the underlying data would grow due to users doing unanticipated things on the system, which the existing monitoring that the team had in place didn't highlight.  The second story involves the discovery that 90% of the traffic their software was serving from the CDN was HTTP 500 server errors!  Again, after analysis it was discovered that the problem lay in some JavaScript code that a recently released new version of Internet Explorer was interpreting different from the old version and that this new version was caused the client-side JavaScript to continually make requested to a non-existent URL.  The third story involves a report from an irate client that the adverts being served up by the company's advert system was breaking the client's own website.  Analysis showed that this was again caused by a JavaScript issue and a line of code of self = this; that was incorrectly written using a globally-scoped variable, thereby overwriting variables that the client's own website relied upon.  The common theme throughout all of the stories was that the behaviour of the system had changed, even though no code had changed.   Moreover, all of the problems that arose from the changed behaviour were first discovered by the system's user and not the development team.

Benji references Google's own Site Reliability Engineering book (available to read online for free) which states that 70% of the reasons behind things breaking is because you've changed something.  But this leaves a large 30% of the time where the reasons are something that's outside of your control.  So how did Benji approach improving his ability to detect and respond to issues?  He started by looking at causes vs problems and concluded that they didn't have enough coverage of the problems that occurred.  Benji tells us that it's almost impossible to get sufficient coverage of the causes since there's an almost infinite number of things that could happen that could cause the problem.

To get better coverage of the problems, they universally adopted the "5 whys" approach to determining the root issues.  This involves starting with the problem and repeated asking "why?" to each cause to determine the root cause.  An example is, monitoring is hard. Why is it hard since we don't have the same issue when using Test-Driven Development during coding?  But TDD follows a Red - Green - Refactor cycle so you can't really write untestable code etc.


So Benji decided to try to apply the Test-Driven Development principles to monitoring.  Before even writing the feature, they start by determining how the feature will be monitored, then only after determining this, they start work on writing the feature ensuring that the monitoring is not negatively impacted.  In this way, the monitoring of the feature becomes the failing unit test that the actual feature implementation must make "go green".

Benji shares an example of show this is implemented and says that the "failing test" starts with a rule defined within their chosen monitoring tool, Nagios.  This rule could be something like "ensure that more adverts are loaded than reported page views", whereby the user interface is checked for specific elements or a specific response rendering.  This test will show as a failure within the monitoring system as the feature has not yet been implemented, however, upon implementation of the feature, the monitoring test will eventually pass (go green) and there we have a correct implementation of the feature driven by the monitoring system to guide it.  Of course, this monitoring system remains in place with these tests increasing over time and becoming an early warning system should any part of the software, within any environment, start to show any failures.  This ensure that the development team are the first to know of any potential issues, rather than the users being first to know.

Benji says they use a pattern called the Screenplay pattern for their UI based tests.  It's an evolution of the Page Objects pattern and allows highly decoupled tests as well as bringing the SOLID principles to the tests themselves.  He also states that they make use of Feature Toggles not only when implementing new features and functionality but also when refactoring existing parts of the system.  This allows them to test new monitoring scenarios without affecting older implementations.  Benji states that it's incredibly important to follow a true Red - Green - Refactor cycle when implementing monitoring rules and that you should always see your monitoring tests failing first before trying to make them pass/go green.

Finally, Benji says that adopting a monitoring-driven development approach ultimately helps humans too.  It helps in future planning and development efforts as it builds awareness of how and what to think about when designing new systems and/or functionality.


After Benji's session was over, it was time for all the attendees to gather back in the theatre room for the final wrap-up by the organisers and the prize draw.  After thanking the various people involved in making the conference what it is (sponsors, volunteers, organisers etc.) it was time for the prize draw.  There were some good prizes up for grabs, but alas, I wasn’t to be a winner on this occasion.  The DDD East Anglia 2017 event had been a huge success and it was all the more impressive given that the organisers shared the story that their original venue had pulled out only 5 weeks prior to the event!  The new venue had stepped in at the last minute and held an excellent conference which was  completely seamless to the attendees.  We would never have known of the last minute panic had it not been shared with us.  Here's looking forward to the next DDD East Anglia event next year.