DDD South West 2017 In Review

IMG_20170506_090757This past Saturday 6th May 2017, the seventh iteration of the DDD South West conference was held in Bristol, at the Redcliffe Sixth Form Centre.

This was my second DDD South West conference that I’d attended, having attended the prior DDD South West conference in 2015 (they’d had a year off in 2016), so returning the 2017, the DDD South West team had put on another fine DDD conference.

Since the conference is in Bristol, I’d stayed over the evening before in Gloucester in order to break up the long journey down to the south west.  After an early breakfast on the Saturday morning, I made the relatively short drive from Gloucester to Bristol in order to attend the conference.

IMG_20170506_085557Arriving at the Redcliffe centre, we were signed in and made our way to the main hall where we were treated to teas, coffees and danish pastries.  I’d remembered these from the previous DDDSW conference, and they were excellent and very much enjoyed by the arriving conference attendees.

After a short while in which I had another cup of coffee and the remaining attendees arrived at the conference, we had the brief introduction from the organisers, running through the list of sponsors who help to fund the event and some other house-keeping information.  Once this was done it was time for us to head off to the relevant room to attend our first session.  DDDSW 7 had 4 separate tracks, however, for 2 of the event slots they also had a 5th track which was the Sponsor track – this is where one of the technical people from their primary sponsor gives a talk on some of the tech that they’re using and these are often a great way to gain insight into how other people and companies do things.

I heading off down the narrow corridor to my first session.  This was Ian Cooper’s 12-Factor Apps In .NET.

IMG_20170506_092930Ian starts off my introducing the 12 factors, these are 12 common best practices for building web based applications, especially in the modern cloud-first world.  There’s a website for the 12 factors that details them.  These 12 factors and the website itself were first assembled by the team behind the Heroku cloud application platform as the set of best practices that most of the most successful applications running on their platform had.

Ian states how the 12 factors cover 3 broad categories: Design, Build & Release and Management.  We first look at getting setup with a new project when you’re a new developer on a team.  Setup automation should be declarative.  It should ideally take no more than 10 minutes for a new developer to go from a clean development machine to a working build of the project.  This is frequently not the case for most teams, but we should work to reduce the friction and the time it takes to get developers up and running with the codebase.  Ian also state that a single application, although it may consist of many different individual projects and “moving parts” should remain together in a single source code repository.  Although you’re keeping multiple distinct parts of the application together, you can still easily build and deploy the individual parts independently based upon individual .csproj files.

Next, we talk about application servers.  We should try to avoid application servers if possible.  What is an application server?  Well, it can be many things, but the most obvious one in the world of .NET web applications is Internet Information Server itself.  This is an application server and we often require this for our code to run within it.  IIS has a lot of configuration of it’s own and so this means that our code is then tied to and dependent upon that specifically configured application server.  An alternative to this is have our code self-host, which is entirely possible with ASP.NET Web API and ASP.NET Core.  By self-hosting and running our endpoint of a specific port to differentiate it from other services that may also potentially run on the same server, we ensure our code is far more portable, and platform agnostic.

Ian then talks about backing services.  These are the concerns such as databases, file systems etc. that our application will inevitably need to talk to.  But we need to ensure that we treat them as the potentially ephemeral systems that they are, and therefore accept that such a system may not even have a fixed location.  Using a service discovery service, such as Consul.io, is a good way to remove our application’s dependence on a a required service being in a specific place.

Ian mentions the port and adapters architecture (aka hexagonal architecture) for organising our codebase itself.  He likes this architecture as it’s not only a very clean way to separate concerns, and keep the domain at the heart of the application model, it also works well in the context of a “12-factor” compliant application as the terminology (specifically around the use of “ports”) is similar and consistent.  We then look at performance of our overall application.  Client requests to our front-end website should be responded to in around 200-300 milliseconds.  If, as part of that request, there’s some long-running process that needs to be performed, we should offload that processing to some background task or external service, which can update the web site “out-of-band” when the processing is complete, allowing the front-end website to render the initial page very quickly.

This leads on to talking about ensuring our services can start up very quickly, ideally no more than a second or two, and should shutdown gracefully also.  If we have slow startup, it’s usually because of the need to built complex state, so like the web front-end itself, we should offload that state building to a background task.  If our service absolutely needs that state before it can process incoming requests, we can buffer or queue early request that the service receives immediately after startup until our background initialization is complete.  As an aside, Ian mentions supervisorD as a handy tool to keep our services and processes alive.  One great thing about keeping our startup fast and lean, and our shutdown graceful, we essentially have elastic scaling with that service!

Ian now starts to show us some demo code that demonstrates many of the 12 best practices within the “12-factors”.  He uses the ToDoBackend website as a repository of sample code that frequently follows the 12-factor best practices.  Ian specifically points out the C# ASP.NET Core implementation of ToDoBackend as it was contributed by one of his colleagues.  The entire ToDoBackend website is a great place where many backend frameworks from many different languages and platforms can be compared and contrasted.

Ian talks about the ToDoBackend ASP.NET Core implementation and how it is built to use his own libraries Brighter and Darker.  These are open-source libraries implementing the command dispatcher/processor pattern which allow building applications compliant with the CQRS principle, effectively allowing a more decomposed and decoupled application.  MediatR is a similar library that can be used for the same purposes.  The Brighter library wraps the Polly library which provides essential functionality within a highly decomposed application around improved resilience and transient fault handling, so such things as retries, the circuit breaker pattern, timeouts and other patterns are implemented allowing you to ensure your application, whilst decomposed, remains robust to transient faults and errors – such as services becoming unavailable.

Looking back at the source code, we should ensure that we explicitly isolate and declare our application’s dependencies.  This means not relying on pre-installed frameworks or other library code that already exists on a given machine.  For .NET applications, this primarily means depending on only locally installed NuGet packages and specifically avoiding referencing anything installed in the GAC.  Other dependencies, such as application configuration, and especially configuration that differs from environment to environment – i.e. database connection strings for development, QA and production databases, should be kept within the environment itself rather than existing within the source code.  We often do this with web.config transformations although it’s better if our web.config merely references external environment variables with those variables being defined and maintained elsewhere. Ian mentions HashiCorp’s Vault project and the Spring Boot projects as ways in which you can maintain environment variables and other application “secrets”.  An added bonus of using such a setup is that your application secrets are never added to your source code meaning that, if your code is open source and available on something like GitHub, you won’t be exposing sensitive information to the world!

Finally, we turn to application logging.  Ian states how we should treat all of our application’s logs as event streams.  This means we can view our logs as a stream of aggregated, time-ordered events from the application and should flow continuously so long as the application is running.  Logs from different environments should be kept as similar as possible. If we do both of these things we can store our logs in a system such as Splunk, AWS CloudWatch Logs, LogEntries or some other log storage engine/database.  From here, we can easily manipulate and visualize our application’s behaviour from this log data using something like the ELK stack.

After a few quick Q&A’s, Ian’s session was over and it was time to head back to the main hall where refreshments awaited us.  After a nice break with some more coffee and a few nice biscuits, it was quickly time to head back through the corridors to the rooms where the sessions were held for the next session.  For this next session, I decided to attend the sponsors track, which was Ed Courtenay’s “Growing Configurable Software With Expression<T>

IMG_20170506_105002Ed’s session was a code heavy session which used a single simple application based around filtering a set of geographic data against a number of filters.  The session looked at how configurability of the filtering as well as performance could be improved with each successive iteration refactoring the application to using an ever more “expressive” implementation.  Ed starts by asking what we define as configurable software.  We all agree that software that can change it’s behaviour at runtime is a good definition, and Ed says that we can also think of configurable software as a way to build software from individual building blocks too.  We’re then introduced to the data that we’ll be using within the application, which is the geographic data that’s freely available from geonames.org.

From here, we dive straight into some code.  Ed shows us the IFilter interface that exposes a single method which provide a filter function:

using System;

namespace ExpressionDemo.Common
{
    public interface IFilter
    {
        Func<IGeoDataLocation, bool> GetFilterFunction();
    }
}

The implementation of the IFilter interface is fairly simple and combines a number of actual filter functions that filter the geographic data by various properties, country, classification, minimum population etc.

public Func<IGeoDataLocation, bool> GetFilterFunction()
{
    return location => FilterByCountryCode(location) && FilterByClassification(location)
    && FilterByPopulation(location) && FilterByModificationDate(location);
}

Each of the actual filtering functions is initially implemented as a simple loop, iterating over the set of allowed data (i.e. the allowed country codes) and testing the currently processed geographic data row against the allowed values to determine if the data should be kept (returned true from the filter function) or discarded (returning false):

private bool FilterByCountryCode(IGeoDataLocation location)
{
	if (!_configuration.CountryCodes.Any())
		return true;
	
	foreach (string countryCode in _configuration.CountryCodes) {
		if (location.CountryCode.Equals(countryCode, StringComparison.InvariantCultureIgnoreCase))
			return true;
	}
	return false;
}

Ed shows us his application with all of his filters implemented in such a way and we see the application running over a reduced geographic data set of approximately 10000 records.  We process all of the data in around 280ms, however Ed tells us that it's really a very naive implementation and with only a small improvement in the filter implementations, we can improve this.  From here, we look at the same filters, this time implemented as a Func<T, bool>:

private Func<IGeoDataLocation, bool> CountryCodesExpression()
{
	IEnumerable<string> countryCodes = _configuration.CountryCodes;

	string[] enumerable = countryCodes as string[] ?? countryCodes.ToArray();

	if (enumerable.Any())
		return location => enumerable
			.Any(s => location.CountryCode.Equals(s, StringComparison.InvariantCultureIgnoreCase));

	return location => true;
}

We're doing the exact same logic, but instead of iterating over the allow country codes list, we're being more declarative and simply returning a Func<> which performs the selection/filter for us.  All of the other filters are re-implemented this way, and Ed re-runs his application.  This time, we process the exact same data in around 250ms.  We're starting to see an improvement.  But we can do more.  Instead of returning Func<>'s we can go even further and return an Expression.

We pause looking at the code to discuss Expressions for a moment.  These are really "code-as-data".  This means that we can decompose our algorithm that performs some specific type of filtering and can then express that algorithm as a data structure or tree.  This is a really powerful way or expressing our algorithm and allows our application simply functionally "apply" our input data to the expression rather than having to iterate or loop over lists as we had in the very first implementation.  What this means for our algorithms is increased flexibility in the construction of the algorithm, but also increase performance.  Ed shows us the implementation filter using an Expression:

private Expression<Func<IGeoDataLocation, bool>> CountryCodesExpression()
{
	var countryCodes = _configuration.CountryCodes;

	var enumerable = countryCodes as string[] ?? countryCodes.ToArray();
	if (enumerable.Any())
		return enumerable.Select(CountryCodeExpression).OrElse();

	return location => true;
}

private static Expression<Func<IGeoDataLocation, bool>> CountryCodeExpression(string code)
{
	return location => location.CountryCode.Equals(code, StringComparison.InvariantCultureIgnoreCase);
}

Here, we've simply taken the Func<> implementation and effectively "wrapped" the Func<> in an Expression.  So long as our Func<> is simply a single expression statement, rather than a multi-statement function wrapped in curly brackets, we can trivially turn the Func<> into an

Expression:

public class Foo
{
	public string Name
	{
		get { return this.GetType().Name; }
	}
}

// Func<> implementation
Func<Foo,string> foofunc = foo => foo.Name;
Console.WriteLine(foofunc(new Foo()));

// Same Func<> as an Expression
Expression<Func<Foo,string>> fooexpr = foo => foo.Name;
var func = fooexpr.Compile();
Console.WriteLine(func(new Foo()));

Once again, Ed refactors all of his filters to use Expression functions and we again run the application.  And once again, we see an improvement in performance, this time processing the data in around 240ms.  But, of course, we can do even better!

The next iteration of the application has us again using Expressions, however, instead of merely wrapping the previously defined Func<> in an Expression, this time we're going to write the Expressions from scratch.  At this point, the code does admittedly become less readable as a result of this, however, as an academic exercise in how to construct our filtering using ever more lower-level algorithm implementations, it's very interesting.  Here's the country filter as a hand-rolled Expression:

private Expression CountryCodeExpression(ParameterExpression location, string code)
{
	return StringEqualsExpression(Expression.Property(location, "CountryCode"), Expression.Constant(code));
}

private Expression StringEqualsExpression(Expression expr1, Expression expr2)
{
	return Expression.Call(typeof (string).GetMethod("Equals",
		new[] { typeof (string), typeof (string), typeof (StringComparison) }), expr1, expr2,
		Expression.Constant(StringComparison.InvariantCultureIgnoreCase));
}

Here, we're calling in the Expression API and manually building up the expression tree with our algorithm broken down into operators and operands.  It's far less readable code, and it's highly unlikely you'd actually implement your algorithm in this way, however if you're optimizing for performance, you just might do this as once again we run the application and observe the results.  It is indeed slightly faster than the previous Expression function implementation, taking around 235ms.  Not a huge improvement over the previously implementation, but an improvement nonetheless.

Finally, Ed shows us the final implementation of the filters.  This is the Filter Implementation Bridge.  The code for this is quite complex and hairy so I've not reproduced it here.  It is, however, available in Ed's Github repository which contains the entire code base for the session.

The Filter Implementation Bridge involves building up our filters as Expressions once again, but this time we go one step further and write the resulting code out to a pre-compiled assembly.  This is a separate DLL file, written to disk, which contains a library of all of our filter implementations.  Our application code is then refactored to load the filter functions from the external DLL rather than expecting to find them within the same project.  Because the assembly is pre-compiled and JIT'ed, when we invoke the assembly's functions, we should see another performance improvement.  Sure enough, Ed runs the application after making the necessary changes to implement this and we do indeed see an improvement in performance.  This time processing the data set in around 210ms.

Ed says that although we've looked at performance and the improvements in performance with each successive refactor, the same refactorings have also introduced flexibility in how our filtering is implemented and composed.  By using Expressions, we can easily change the implementation at run-time.  If we have these expressions exported into external libraries/DLL's, we could fairly trivially decide to load a different DLL containing a different function implementation.  Ed uses the example of needing to calculate VAT and the differing ways in which that calculation might need to be implemented depending upon country.  By adhering to a common interface and implementing the algorithm as a generic Expression, we can gain both flexibility and performance in our application code.

After Ed’s session it time for another refreshment break.  We heading back through the corridors to the main hall for more teas, coffees and biscuits.  Again, with only enough time for a quick cup of coffee, it was time again to trek back to the session rooms for the 3rd and final session before lunch.  This one was Sandeep Singh’s “JavaScript Services: Building Single-Page Applications With ASP.NET Core”.

IMG_20170506_120854Sandeep starts his talk with an overview of where Single Page Applications (SPA’s) are currently.  There’s been an explosion of frameworks in the last 5+ years that SPA’s have been around, so there’s an overwhelming choice of JavaScript libraries to choose from for both the main framework (i.e. Angular, Aurelia, React, VueJS etc.) and for supporting JavaScript libraries and build tools.  Due to this, it can be difficult to get an initial project setup and having a good cohesion between the back-end server side code and the front-end client-side code can be challenging.  JavaScriptServices is a part of the ASP.NET Core project and aims to simplify and harmonize development for ASP.NET Core developers building SPA’s.  The project was started by Steve Sanderson, who was the initial creator of the KnockoutJS framework.

IMG_20170506_121721Sandeep talks about some of the difficulties with current SPA applications.  There’s often the slow initial site load whilst a lot of JavaScript is sent to the client browser for processing and eventual display of content, and due to the nature of the application being a single page and using hash-based routing for each page’s URL, and each page being loaded because of running some JavaScript, SEO (Search Engine Optimization) can be difficult.  JavaScriptServices attempts to improve this by combining a set of SPA templates along with some SPA Services, which contains WebPack middleware, server-side pre-rendering of client-side code as well as routing helpers and the ability to “hot swap” modules on the fly.

WebPack is the current major module bundler for JavaScript, allowing many disparate client side asset files, such as JavaScript, images, CSS/LeSS,Sass etc. to be pre-compiled/transpiled and combined in a bundle for efficient delivery to the client.  The WebPack middleware also contains tooling similar to dotnet watch, which is effectively a file system watcher and can rebuild, reload and re-render your web site at development time in response to real-time file changes.  The ability to “hot swap” entire sets of assets, which is effectively a whole part of your application, is a very handy development-time feature.  Sandeep quickly shows us a demo to explain the concept better.  We have a simple SPA page with a button, that when clicked will increment a counter by 1 and display that incrementing count on the page.  If we edit our Angular code which is called in response to the button click, we can change the “increment by 1” code to say “increment by 10” instead.  Normally, you’d have to reload your entire Angular application before this code change was available to the client browser and the built-in debugging tools there.  Now, however, using JavaScript services hot swappable functionality, we can edit the Angular code and see it in action immediately, even continuing the incrementing counter without it resetting back to it’s initial value of 0!

Sandeep continues by discussing another part of JavaScriptServices which is “Universal JavaScript”, also known as “Isomorphic JavaScript”.  This is JavaScript which is shared between the client and the server.  How does JavaScriptServices deal with JavaScript on the server-side?   It uses a .NET library called Microsoft.AspNetCore.NodeServices which is a managed code wrapper around NodeJS.  Using this functionality, the server can effectively run the JavaScript that is part of the client-side code, and send pre-rendered pages to the client.  This is a major factor in being able to improve the initial slow load time of traditional SPA applications.  Another amazing benefit of this functionality is the ability to run an SPA application in the browser even with JavaScript disabled!  Another part of the JavaScriptServices technology that enables this functionality is RoutingServices, which provides handling for static files, and a “fallback” route for when explicit routes aren’t declared.  This means that a request to something like http://myapp.com/orders/ would be rendered on the client-side via the client JavaScript framework code (i.e. Angular) after an AJAX request to retrieve the mark-up/data from the server, however, if client-side routing is unavailable to process such a route (in the case that JavaScript is disabled in the browser, for example) then the request is sent to the server so that the server may render the same page (perhaps requiring client-side JavaScript which is now processed on the server!) on the server before sending the result back to the client via the means of a standard HTTP(s) request/response cycle.

I must admit, I didn’t quite believe this, however, Sandeep quickly setup some more demo code and showed us the application working fine with a JavaScript enabled browser, whereby each page would indeed be rendered by the JavaScript of the client-side SPA.  He then disabled JavaScript in his browser and re-loaded the “home” page of the application and was able to demonstrate still being able to navigate around the pages of the demo application just as smoothly (due to the page’s having been rendered on the server and sent to the client without needing further JavaScript processing to render them).  This was truly mind blowing stuff.  Since the server-side NodeServices library wraps NodeJS on the server, we can write server-side C# code and call out to invoke functionality within any NodeJS package that might be installed.  This means that if have a NodeJS package that, for example, renders some HTML markup and converts it to a PDF, we can generate that PDF with the Node package from C# code, sending the resulting PDF file to the client browser via a standard ASP.NET mechanism.  Sandeep wraps up his talk by sharing some links to find out more about JavaScriptServices and he also provides us with a link to his GitHub repository for the demo code he’s used in his session.

IMG_20170506_130734After Sandeep’s talk, it was time to head back to the main conference area as it was time for lunch.  The food at DDDSW is usually something special, and this year was no exception.  Being in the South West of England, it’s obligatory to have some quintessentially South West food – Pasties!  A choice of Steak or Cheese and Onion pasty along with a packet of crisps, a chocolate bar and a piece of fruit and our lunches were quite a sizable portion of food.  After the attendees queued for the pasties and other treats, we all found a place to sit in the large communal area and ate our delicious lunch.

IMG_20170506_130906The lunch break at DDDSW was an hour and a half and usually has some lightning or grok talks that take place within one of the session rooms over the lunch break.  Due to the sixth form centre not being the largest of venues (some of the session rooms were very full during some sessions and could get quite hot) and the weather outside turning into being a rather nice day and keeping us all very warm indeed, I decided that it would be nice to take a walk around outside to grab some fresh air and also to help work off a bit of the ample lunch I’d just enjoyed.

I decided a brief walk around the grounds of the St. Mary Redcliffe church would be quite pleasant and so off I went, capturing a few photos along the way.  After walking around the grounds, I still had some time to kill so decided to pop down to the river front to wander along there.  After a few more minutes, I stumbled across a nice old fashioned pub and so decided that a cheeky half pint of ale would go down quite nice right about now!  I ordered my ale and sat down in the beer garden overlooking the river watching the world go by as I quaffed!  A lovely afternoon lunch break it was turning out to be.

IMG_20170506_134245-EFFECTSAfter a little while longer it was time to head back to the Redcliffe sixth form centre for the afternoon’s sessions.  I made my way back up the hill, past the church and back to the sixth form centre.  I grabbed a quick coffee in the main conference hall area before heading off down the corridor to the session room for the first afternoon session.  This one was Joel Hammond-Turner’s “Service Discovery With Consul.io for the .NET Developer

IMG_20170506_143139Joel’s session is about the Consul.io software used for service discovery.  Consul.io is a distributed application that allows applications running on a given machine, to query consul for registered services and find out where those services are located either on a local network or on the wider internet.  Consul is fully open source and cross-platform.  Joel tells us the Consul is a service registry – applications can call Consul’s API to register themselves as an available service at a known location – and it’s also service discovery – applications can run a local copy of Consul which will synchronize with other instances of Consul on other machines in the same network to ensure the registry is up-to-date and can query Consul for a given services’ location.  Consul also includes a built-in DNS Server, a Key/Value store and a distributed event engine.  Since Consul requires all of these features itself in order to operate, they are exposed to the users of Consul for their own use.  Joel mentions how his own company, Landmark, is using Consul as part of a large migration project aimed at migrating a 30 year old C++ legacy system to a new event-driven distributed system written in C# and using ASP.NET Core.

Joel talks about distributed systems and they’re often architected from a hardware or infrastructure perspective.  Often, due to each “layer” of the solution requiring multiple servers for failover and redundancy, you’ll need a load balancer to direct requests to this layer to a given server.  If you’ve got web, application and perhaps other layers, this often means load balancers at every layer.  This is not only potentially expensive – especially if your load balancers are separate physical hardware – it also complicates your infrastructure.  Using Consul can help to replace the majority of these load balancers since Consul itself acts as a way to discover a given service and can, when querying Consul’s built-in DNS server, randomize the order in which multiple instances of a given service are reported to a client.  In this way, requests can be routed and balanced between multiple instances of the required service.  This means that load balancing is reduced to a single load balancer at the “border” or entry point of the entire network infrastructure.

Joel proceeds by showing us some demo code.  He starts by downloading Consul.  Consul runs from the command line.  It has a handy built-in developer mode where nothing is persisted to disk, allowing for quick and easy evaluation of the software.  Joel say how Consul command line API is very similar to that of the source control software, Git, where the consul command acts as an introducer to other commands allowing consul to perform some specific function (i.e. consul agent –dev starts the consul agent service running in developer mode).

The great thing with Consul is that you’ll never need to know where on your network the consul service is itself.  It’s always running on localhost!  The idea is that you have Consul running on every machine on localhost.  This way, any application can always access Consul by looking on localhost on the specific Consul port (8300 by default).  Since Consul is distributed, every copy of Consul will synchronize with all others on the network ensuring that the data is shared and registered services are known by each and every running instance of Consul.  The built in DNS Server will provide your consuming app with the ability to get the dynamic IP of the service my using a DNS name of something like [myservicename].services.consul.  If you've got multiple servers for that services (i.e. 3 different IP's that can provide the service) registered, the built-in DNS service will automagically randomize the order in which those IP addresses are returned to DNS queries - automatic load-balancing for the service!

Consuming applications, perhaps written in C#, can use some simple code to both register as a service with Consul and also to query Consul in order to discover services:

public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory, IApplicationLifetime lifetime)
{
	loggerFactory.AddConsole(Configuration.GetSection("Logging"));
	loggerFactory.AddDebug();
	app.UseMvc();
	var serverAddressFeature = (IServerAddressesFeature)app.ServerFeatures.FirstOrDefault(f => f.Key ==typeof(IServerAddressesFeature)).Value;
	var serverAddress = new Uri(serverAddressFeature.Addresses.First());
	// Register service with consul
	var registration = new AgentServiceRegistration()
	{
		ID = $"webapi-{serverAddress.Port}",
		Name = "webapi",
		Address = $"{serverAddress.Scheme}://{serverAddress.Host}",
		Port = serverAddress.Port,
		Tags = new[] { "Flibble", "Wotsit", "Aardvark" },
		Checks = new AgentServiceCheck[] {new AgentCheckRegistration()
		{
			HTTP = $"{serverAddress.Scheme}://{serverAddress.Host}:{serverAddress.Port}/api/health/status",
			Notes = "Checks /health/status on localhost",
			Timeout = TimeSpan.FromSeconds(3),
			Interval = TimeSpan.FromSeconds(10)
		}}
	};
	var consulClient = app.ApplicationServices.GetRequiredService<IConsulClient>();
	consulClient.Agent.ServiceDeregister(registration.ID).Wait();
	consulClient.Agent.ServiceRegister(registration).Wait();
	lifetime.ApplicationStopping.Register(() =>
	{
		consulClient.Agent.ServiceDeregister(registration.ID).Wait();
	});
}

Consul’s services can be queried based upon an exact service name or can be queries based upon a tag (registered services can assign multiple arbitrary tags upon registration to themselves to aid discovery).

As well as service registration and discovery, Consul also provides service monitoring.  This means that Consul itself will monitor the health of your service so that should one of the instances of a given service become unhealthy or unavailable, Consul will prevent that service’s IP Address from being served up to consuming clients when queried.

Joel now shared with us some tips for using Consul from within a .NET application.  He says to be careful of finding Consul registered services with .NET’s built-in DNS resolver.  The reason for this is that .NET’s DNS Resolver is very heavily cached, and it may serve up stale DNS data that may not be up to date with the data inside Consul’s own DNS service.  Another thing to be aware of is that Consul’s Key/Value store will always store values as byte arrays.  This can sometimes be slightly awkward if we’re storing mostly strings inside, however, it’s trivial to write a wrapper to always convert from a byte array when querying the Key/Value store.  Finally, Joel tells us about a more advanced feature of Consul which is the “watches”.  Theses are effectively events that Consul will fire when Consul’s own data changes.  A good use for this would be to have some code that runs in response to one of these events that can re-write the border load balancer rules to provide you with a fully automatic means of keeping your network infrastructure up-to-date and discoverable.

In wrapping up, Joel shares a few links to his GitHub repositories for both his demo code used during his session and the slides.

IMG_20170506_154655After Joel’s talk it was time for another refreshment break.  This one being the only one of the afternoon was accompanied by another DDD South West tradition – cream teas!  These went down a storm with the conference attendees and there was so many of them that many people were able to have seconds.  There was also some additional fruit, crisps and chocolate bars left over from lunch time, which made this particular break quite gourmand.

After a few cups of coffee which were required to wash down the excellent cream teas and other snack treats, it was time to head back down the corridor to the session rooms for the final session of the day.  This one was Naeem Sarfraz’s “Layers, Abstractions & Spaghetti Code: Revisiting the Onion Architecture”.

Naeem’s talk was a retrospective of the last 10+ years of .NET software development.  Not only a retrospective of his own career, but he posits that it’s quite probably a retrospective of the last 10 years of many people’s careers.  Naeem starts by talking about standards.  He asks us to cast our minds back to when we started new jobs earlier in our career and that perhaps one of the first things that we had to do in starting in the job was to read page after page of “standards” documents for the new team that we’d joined.  Coding standards, architecture standards etc. How times have changed nowadays where the focus is less on onerous documentation but more of expressive code that is easier to understand as well as such practices as pair programming allowing new developers to get up to speed with a new codebase quickly without the need to read large volumes of documentation.

IMG_20170506_160127Naeem then moves on to show us some of the code that he himself wrote when he started his career around 10 years ago.  This is typical of a lot of code that many of us write when we first started developing, and has methods which are called in response to some UI event that has business logic, database access and web code (redirections etc.) all combined in the same method!

Naeem says he quickly realised that this code was not great and that he needed to start separating concerns.  At this time, it seemed that N-Tier became the new buzzword and so newer code written was compliant with an N-Tier architecture.  A UI layer that is decoupled from all other layers and only talks to the Business layer which in turn only talks to the Data layer.  However, this architecture was eventually found to be lacking too.

Naeem stops and asks us about how we decompose our applications.  He says that most of the time, we end up with technical designs taking priority over everything else.  We also often find that we can start to model our software by starting at the persistence (database) layer first, this frequently ends up bleeding into the other layers and affects the design of them.  This is great for us as techies, but it’s not the best approach for the business.  What we need to do is start to model our software on the business domain and leave technical designs for a later part of the modelling process.

Naeem shows us some more code from his past.  This one is slightly more separated, but still has user interface elements specifically designed so that rows of values from a grid of columns are designed in such a way that they can be directly mapped to a DataTable type, allowing easy interaction with the database.  This is another example of a data-centric approach to the entire application design.

IMG_20170506_162337We then move on to look at another way of designing our application architecture.  Naeem tells us how he discovered the 4+1 approach, which has us examining the software we’re building by looking at it from different perspectives.  This helps to provide a better, more balanced view of what we’re seeking to achieve with it’s development.

The next logical step along the architecture path is that of Domain-Driven Design.  This takes the approach of placing the core business domain, described with a ubiquitous language – which is always in the language of the business domain, not the language of any technical implementation – at the very heart of the entire software model.  One popular way of using domain-driven design (DDD) is to adopt an architecture called “Onion architecture”.  This is also known as “Hexagonal architecture” or “Ports and adapters architecture”. 

We look at some more code, this time code that is compliant with DDD.  However, on closer inspection, it really isn’t.  He says how this code adopts all the right names for a DDD compliant implementation, however, it’s only the DDD vernacular that’s been adopted and the code still has a very database-centric, table-per-class type of approach.  The danger here is that we can appear to be following DDD patterns, but we’re really just doing things like implementing excessive interfaces on all repository classes or implementing generic repositories without really separating and abstracting our code from the design of the underlying database.

Finally, we look at some more code, this time written by Greg Young and adopting a CQRS based approach.  The demo code is called SimplestPossibleThing and is available at GitHub.  This code demonstrates a much cleaner and abstracted approach to modelling the code.  By using a CQRS approach, reads and writes of data are separated also, with the code implementing Commands - which are responsible for writing data and Queries – which are responsible for reading data.  Finally, Naeem points us to a talk given by Jimmy Bogard which talks about architecting application in “slices not layers”.  Each feature is developed in isolation from others and the feature development includes a “full stack” of development (domain/business layer, persistence/database layer and user interface layers) isolated to that feature.

IMG_20170506_170157After Naeem’s session was over, it was time for all the attendees to gather back in the main conference hall for the final wrap-up by the organisers and the prize draw.  After thanking the various people involved in making the conference what it is (sponsors, volunteers, organisers etc.) it was time for the prize draw.  There were some good prizes up for grabs, but alas, I wasn’t to be a winner on this occasion.

Once again, the DDDSW conference proved to be a huge success.  I’d had a wonderful day and now just had the long drive home to look forward to.  I didn’t mind, however, as it had been well worth the trip.  Here’s hoping the organisers do it all all over again next year!

SqlBits 2017 In Review

IMG_20170408_073513This past Saturday 8th April 2017, the annual SqlBits conference took place in the International Centre in Telford, Shropshire.  The event is a four day conference, with the first three days being a paid conference and the final day, the Saturday, always being a free community day.

I’d had to get up quite early for this event, setting my alarm for 5:30am to allow me to get my all-important cup of coffee before setting off for the approximately 90 mile journey to arrive at the conference for the opening time of 7:30am!

IMG_20170408_074118After arriving at the venue, which on this occasion had free car parking in the venue’s ample car park, I heading indoors to search for the registration booth.  It appeared that I’d arrived slightly early as I joined a small crowd that was gathered in one of the venue’s corridors just outside the entrance to a large hall where the event itself was taking place.  After a short wait, we were informed that we could enter the hall and proceed to the registration booths on the way in.

After registering and receiving my badge, conference programme and goodie bag, I heading to the catering area for another all-important cup of coffee and some breakfast.

IMG_20170408_074857This year, having recently become vegetarian, I knew the almost obligatory bacon butties would not be an option, so I quickly acquired a cup of coffee (the most important thing) and searched for the vegetarian breakfast option.  The choice was of either a fruit smoothie or a fruit bowl.  I selected the fruit bowl and made my way to one of the many tables dotted around the venue to consume my breakfast and take a peek through the goodie bag!

After a short while, an announcement came over a venue wide loudspeaker system to tell the attendees that the first sessions would be starting in 10 minutes.  This year, there were eight tracks of talks, each one presented in one of eight separate domes scattered throughout the venue.  I quickly finished my coffee, collected my things and made my way to Dome 3 for my first session, Ust Oldfield’s “A Deep Dive Into Data Lakes”.

IMG_20170408_081230Ust first introduced himself as a consultant working for Adatis who provide Business Intelligence (BI) consultancy services.  He says that, as a company, they were fully conversant with standard data warehouses, but needed to move forward in order to understand the relatively recent phenomenon of data lakes on the Microsoft Azure platform.

Ust asks the audience who already knows about Data Lakes, not many of us do, so he asks if we’re familiar with Hadoop – the Apache Foundations distributed computing framework – to which a few more people are familiar.  Ust explains that, under the hood, Azure’s Data Lake is a combination of distributed Azure BLOB storage (and so can work with any file type or size) with Hadoop overlaid on top to provide the distributed compute capability.

Azure Data Lake (hereafter referred to as ADL) uses things called “Extents” to contain its data, which are 250MB blocks that all storage is divided into.  Ust explains that ADL uses the “lambda” architecture and allows users to perform computations and queries using a language called U-SQL, which he says is like a cross between T-SQL used in SQL Server and C#.  All of the files that are added to a data lake can be set to automatically expire and be deleted, and so ADL contains functionality to allow some automated maintenance of the data held within it.

All data warehouses and lakes in ADL go through three stages which are the ingestion of raw data, and enrichment phase (where the data is verified, de-duplicated, cleaned and augmented with additional data from other sources) and finally is curated and presented for user consumption.  U-SQL scripts specify how the raw input data is transformed into output data and U-SQL includes both traditional ways to select and filter raw data, similar to how SQL Server would provide such functionality, but also includes other methods of transforming data more specific to distributed data sets, such as MapReduce

ADL provides a dashboard within the Azure website where ADL can be accessed and scripts created and run against the ADL data, however, there is also Microsoft Visual Studio tooling available so that many of the ADL functions can be accessed through Visual Studio.  One very interesting feature is that U-SQL scripts that would normally be confined to running within Azure can be downloaded to a local machine and debugged using Microsoft Visual Studio and it’s important to note that some functionality of ADL can only be accessed via the ADL Visual Studio tooling.

When performing queries against ADL data, U-SQL scripts are split and parallelized into multiple “vertexes”, which are the discrete units of computation within ADL.  Each vertex can be independent or dependent upon a previous vertex completing it’s computation.  You can manage vertexes and their dependencies within ADL, but this is a piece of functionality only available within the ADL Visual Studio tooling.

Ust shows us a demo of some U-SQL queries running over some sample ADL data.  He demonstrates how, despite ADL like almost all Azure features is a pay-as-you-go service, reworking your queries to be longer running queries that use fewer ADLAU’s (Azure Data Lake Analysis Units – the discrete single compute/billing units within ADL) can actually save you a lot of money.  This is due to how ADL charges are calculated, meaning that it’s far more expensive to use more ADLAU’s than it is to use fewer but for a longer time.

Ust shows us some tips around using “partition elimination”, which is a mechanism whereby data is pre-filtered prior to being distributed and computed upon by your standard U-SQL scripts.  Partition Elimination is best implemented with a deliberately defined file naming system (i.e. MyLogs_2017_05_01.txt, MyLogs_2017_05_16.txt etc.)  Using such a mechanism, you can filter the data files to be included within the U-SQL compute based upon partial filename matches and wildcards (i.e. you could process MyLogs for the month of April 2017 with a pre-filter such as MyLogs_2017_04_**.txt).  Ust tells us some more about the ADL data and the requirements for its storage.  He says that indexes are mandatory within ADL data, but that we can only have a single clustered index on each table.  Currently, ADL does not support non-clustered indexes however this is something that may come in the future.

Finally, Ust talks about data “skew”, which is the mechanism of how your dataset is distributed throughout the cluster for computing.  Data can be split for processing based upon a round-robin technique, which guarantees an even distribution of data across all nodes in the cluster, but does not guarantee that similar data will be kept together and processed by the same node.  This can cause a performance degradation of the compute function as potentially separate nodes must communicate much more to transfer related data when it’s on multiple nodes.  The other technique for data distribution is to split the data based upon a hash.  This guarantees that related data will be kept together on the same node – thus potentially improving the compute performance – but can now no longer guarantee that the data is evenly distributed across all the nodes in the cluster.  This means that some nodes will have significantly more work to perform than other nodes which can again impact overall compute performance.  Therefore, it’s essential that you understand the general “shape” of your raw data in order to maximise the compute performance – and thus the overall cost – of your ADL service.

After Ust’s session, we had a quick coffee break and I grabbed another cup of coffee.  There was just enough time to drink my coffee and take a very quick look around the main hall of the venue before I had to make my way back to Dome 3 for the next session.  This one was Hugo Kornelis’s “Normalization Beyond Third-Normal Form”.

IMG_20170408_093319Hugo starts his sessions by reminding us of some key concepts that we need to be aware of when performing any data modelling activity.  He talks about the “Universe Of Discourse” which is the view of reality as defined by the data/software model, it’s not necessarily the view of actual reality.  We then look at the purpose of database normalization.  We recall that normalization is the process of organising our data into columns and tables in such a way as to reduce redundancy and improve data integrity.  Hugo points out that normalization’s purpose is not to prevent incorrect data but to prevent impossible, inconsistent or business rule violating data.  We can’t stop the user from entering false data into the name column, but we can prevent them from providing us with a non-date value for their birthdate.  Hugo also reminds us that normalization is never performed at a database level, only at a table level.  It’s perfectly possible to have a database that, across it’s many tables, contains multiple forms of normalization.

Next, we look at what defines normalization.  Hugo tells us that it’s based upon Functional Dependency.  This is a constraint that dictates that for every value in Column A in a relationship, there is exactly none, or one value for Column B (i.e. A –> B).  Column A can actually be a composite of multiple actual columns (i.e. {A,B} –> C) and Hugo gives the conference-specific example of a SqlBits Dome number and a chair number which can define the exact name of the attendee sitting there.  It’s possible that the composite can exist on the other side of the relationship (i.e. A –> {B,C}) however this can be reduced to two constraints of A -> B and A –> C. 

Hugo reminds us of 3rd normal form.  This is the most “popular” normal form that many people take their databases to and then stop there.  3rd normal form (3NF) states that, in a given table, every non-key column is dependent upon the key, the whole key and nothing but the key (so help me Codd!).  We can use an algorithm called “Bernstein’s Algorithm for Synthesis of a 3rd Normal Form Schema” to help us create a database schema that is guaranteed to be in 3rd normal form, so long as all of our functional dependencies are known up front.  Hugo also mentions Boyce-Codd normal form, which is based upon 3rd normal form but extends the requirement that all columns in a table, including key columns, must be dependent upon the key.  When all columns in a table are dependent upon the key, there should usually be no duplicated data within that table’s row.

Hugo proceeds by detailing something called Elementary Key normal form.  This is perhaps a little known and used type of normal form, based upon 3rd normal form but where the constraint is defined as only non-elementary columns being dependent upon the key.  So what is an non-elementary column?  Well, it’s where functional dependencies such as {A,B} –> C does not have the reverse dependencies of either C –> A or C –> B.  This can also be expressed as where every full non-trivial functional dependency of the form A –> B, then either A is a key or B is (a part of) an elementary key.  Hugo explains that, in practice, Elementary Key normal form is almost identical to 3rd normal form.

From here, Hugo takes us into the more elaborate normal forms.  We start with 4th normal form. 4th normal form, unlike the lower normal forms, is less concerned with functional dependencies, but rather with multi-valued dependencies.  These are best explained with an example.  Hugo uses a table representing the availability of experts to discuss SQL problems on given days of the week:

Day Expert Subject
Monday Jim Design
Monday Jim Tuning
Tuesday Jim Debugging
Tuesday Fred Design

Looking at this data, we can infer the following fact: On Monday, you can ask Jim about Design.  From this fact, we can further infer two additional facts: On Monday, you can ask Jim questions, and Jim knows about Design.  In looking at the two facts that we’ve inferred, we can see that it is not possible to work backwards and infer the first original fact merely from the two subsequent facts.  This is a violation of 4th normal form.  In order to make this data compliant with 4NF, we must separate the information regarding days of the week and subjects into different tables, each table then becomes compliant with 4th normal form:

Expert Day
Jim Monday
Jim Tuesday
Fred Tuesday
 
Expert Subject
Jim Design
Jim Tuning
Jim Debugging
Fred Design

After 4NF we move on to look at 5th normal form.  5th normal form is based upon 4th normal form but extends the rules to dictate that there must be no “join dependencies” between the columns except based upon key.   A join dependency is effectively the ability to take a single table, split it into multiple tables and be able to recreate the original table by constructing a query that joins the split tables back into one.  In practice, a table being in 5NF effectively means that if a column has the same value in multiple rows and to remove the value from the table requires the removal of multiple columns then our table is not compliant with 5th normal form. 5NF is so closely related to 4NF that it’s very rare for a table compliant with 4NF to not also be compliant with 5NF.

Expert
Jim
Fred
 
Day
Monday
Tuesday
 
Subject
Design
Tuning
Debugging

Hugo briefly touches upon 6th normal form.  He starts by stating that 6th normal form is very hard to find in practice, being far more an academic curiosity.  6NF is based upon 5NF but further constrains the join dependencies to state that no join dependencies, even those implied by key, are allowed to exist within the table.  This effectively means that there can never be any NULL values within any columns of a 6NF table.  There would be no need for NULLs as we could simply remove the entire row.  The primary reason we don’t see tables and especially entire databases that conform to 6th normal form is that 6NF largely implies that our entire data schema is modelled using a very large number of tables with each table having only a key column and a data value column.  Today’s real-world database platforms are simply not optimised to operate with such a data schema and so data normalisation to this level is rarely, if ever, performed in the real world.

Hugo next talks about Optimal Normal Form.  This is based upon 6NF but prevents the “splitting” of tables if “elementary fact types” would be split. Elementary fact types are multiple columns that would have to remain together in a single table to ensure integrity of data.  Again, optimal normal form is very rarely found in the real-world.

IMG_20170408_105616Finally, Hugo talks about a entirely different type of data normalization, and this is known as Domain/Key Normal Form.  Domain Key Normal Form (DKNF) is not based upon functional dependencies like all other forms of normalization, but is instead based solely upon domain constrains and key constraints.  Domains in this context refers to the range of values that are allowed within the given column.  These are not the values allowed by the data type of the underlying column, but rather the values allowed by the business logic of the domain.  An example that violates DKNF could be shown as follows with a school report card and grade for students whereby the score is a value between 0 and 100, and the status of FAIL or PASS (FAIL for scores below 50):

Student Score Status
James 78 PASS
William 63 PASS
David 48 FAIL
Timothy 57 PASS

From the table above, we can see that it would be possible to enter a value of FAIL in the Status column for the row containing James’ name.  The database constraints would not prevent us from doing this, however, we would be violating our business rules that state that Scores greater than or equal to 50 are a PASS status.  In order to correct this data so as not to violate DKNF, we would change it as follows by splitting into two tables:

Student Score
James 78
William 63
David 48
Timothy 57
 
Status Minimum Score Maximum Score
FAIL 0 49
PASS 50 100

By splitting the data, we ensure that business logic is captured and no table data can violate the domain rules.  An interesting side-effect of complying with DKNF is that you’ll also comply with 5NF too.  The relevance of DKNF, despite being a very different form or normalization that other forms, is that data integrity against business rules can now be expressed and enforced from the database design alone, something that has traditionally been enforced only within application code that is responsible for reading and writing data to and from the database.  It should be noted, however, that compliance with DKNF isn’t always possible and depends very much on the business domain.

After this, Hugo’s session was complete and it was time for another short coffee break.  I quickly grabbed another coffee from one of the numerous catering stands dotted throughout the venue and checked my programme for the Dome that I would need to head towards for the next session.  That was Dome 4 and Conor Cunningham’s “SQL Server vNext and SQL Azure – Upcoming Features”.

IMG_20170408_111016

Conor’s talk was originally intended to be given by Lindsey Allen, however, a scheduling mix-up had resulting in Lindsay being unable to give the presentation.  Instead we were provided with some excellent content from Conor Cunningham who is the Principal Software Architect for Microsoft on the SQL Server Query Processor Team.  Conor is here to tell us all about the new features that will be coming in the upcoming versions of the on-premise SQL Server product as well as SQL Azure.

Firstly, Conor tells us that both the on-premise SQL Server product and the SQL Azure product share the exact same codebase.  SQL Azure has a monthly release cadence and so is always the first product to receive new SQL Server functionality and have that functionality available to the public whilst on-premise SQL Server currently has a release cadence of approximately 1 year and so receives the same features from SQL Azure in each of it’s subsequent public releases.

A big feature coming in SQL Server vNext (the official marketing title is not yet decided) is the ability to run it on Linux.  This isn’t just a version of SQL Server that’s specially built for Linux, but the exact same binaries that run the Windows version of SQL Server.  Microsoft has built an abstraction layer, known as a “PAL” (or Platform Abstraction Layer) which is used to align all operating system or platform specific code in one place and allow the rest of the codebase to stay operating system agnostic.  Moreover, SQL Server when run on Linux will effectively be SQL Server running inside a Docker container.  Previously, SQL Server has relied on Windows Server Failover Clustering (WSFC) to provide clustering capability to SQL Server, however, as part of the work required to allow SQL Server to run on Linux, this is being abstracted away to allow 3rd party cluster management software to be used.  Initially, SQL Server on Linux will support an open-source product called Pacemaker, however more cluster management product support will follow in time.

IMG_20170408_113255There have been big improvements within the In-Memory Tables features of SQL Server.  These improvements mean that In-Memory tables, which were previously constrained in how they operated compared to normal disk-based tables, will now operate much closer to how standard tables operate, supporting many more features including JSON support, CROSS Apply, CASE statements amongst others.

IMG_20170408_113256Another major set of improvement work within SQL Server vNext are improvements in the area of ColumnStore indexes.  ColumnStore indexes are perhaps one of the best new features to be added to SQL Server in recent years and allow potentially significant performance enhancements for queries on tables using such an index.  ColumnStore index now have support for BLOB column data types and the index itself is now compressed, reducing space and storage requirements as well as improving performance.  Further, rebuilds of ColumnStore indexes will now no longer cause significant blocking of the tables upon which the index is being rebuilt, meaning that users of the database are no longer severely negatively impacted by such rebuilds.

IMG_20170408_114604SQL Server vNext also includes advancements in “Adaptive Query Processing”.  This is a major new area of functionality in SQL Server and will receive even more improvements in future versions of SQL Server beyond vNext.  Adaptive Query Processing is a series of algorithms that work within SQL Server’s Query Processor in order to improve query performance by analysing query plans, SQL Server data and other meta-data.  It aims to improve query performance without introducing any query degradation from incorrect query plan optimisations.  It does this dynamically adjusting joins (i.e. switching from hash joins to merge or loop joins, or vice-versa), adjusting memory grants in order to ensure efficient allocation of memory without under or over allocating and interleaving compilation and execution for the most complex queries in order to maximise their performance.

IMG_20170408_120426Another major new feature of SQL Server vNext will be support for Graph Databases.  Graph databases are highly specialised databases that have their data in graph structures, using nodes, edges and properties in which to store their data allowing for semantic querying of data.  Common applications of graph databases are for querying large graphs of data such as those found inside a social network.  Graph data and the ability to efficiently query it makes questions such as “How many friends of Person A are also friends of Person B?” and “Which friends of Person A are also friends of the friends of Person B?” very easy to answer, something a relational database would have difficulty in achieving in an efficient manner.  SQL Server vNext’s support for graph databases promises to offer full CRUD support for node and edge creation, query language extensions to allow querying of graph data as well as allowing queries to span both standard relational SQL Server data and graph data at the same time.

Conor continues his exploration of the new SQL Azure features by telling us about a new feature in SQL Azure that can automatically create indexes for table columns inferred from usage of the column within queries, the maximum database size has also improved, now supporting databases up to 4TB in size.  There’s also some long-awaited improvements to the syntax of the T-SQL query language itself.  There are new string concatenation and aggregation functions as well as a TRIM function (finally!).  New Japanese collation families have been added also and new bulk insert operations have been added to support specific new standards such as RFC 4180 CSV file formats.

IMG_20170408_123009After a brief Q&A at the end of Conor’s session it was time for another refreshment/toilet break.  I decided I was all coffee’d out by this point, having had around 4 or 5 cups of coffee so far, and it still only being just past 12pm.  There was one more session before the break for lunch and so after consulting my conference programme, I headed off to Dome 2 for the somewhat light-hearted session that was Denny Cherry’s “Things You Should Never Do In SQL Server”.

Denny introduced himself first and indicated that this session was to be a bit lighter than other sessions in the day, being a look at some of things that you should not do rather than the things you should.  As such, he reminded us that everything on his slides was wrong!

He started by talking about the enforcement of data integrity in SQL Server and tells us that we really shouldn’t use things like triggers, stored procedures or even application code to enforce data integrity.  SQL Server is a fully relational database and we can leverage what SQL Server is good at by designing our schemas to provide such integrity for us.  Denny talks about a book that he reviewed as a technical editor, which was so bad that he implored the publisher to completely scrap the book.  One of the pearls of wisdom in this book, he says, was the recommendation to use 32-bit editions of SQL Server for “local offices” reserving the 64-bit edition of SQL Server only for large corporations.  Don’t do this.  We all run on 64 bit operating systems today, so where available, we should always be running 64 bit application software, too.  He states that recommendations from third party software vendors should always be questioned, too.

Next up is migrating databases.  Something seen quite frequently is the “copy database wizard” to migrate databases from one server to another.  This is error prone and simply not as good an option as something like log shipping, which has been around for decades is a very robust and mature technique for performing migrations.  Then we look at the account under which SQL Server will run.  Whilst it’s true that very old versions of SQL Server (pre-2005 versions at least) required local administrator privileges in order to run, modern versions of SQL Server do not require such special privileges at all.  SQL Server does require some additional permissions above those usually found in a standard local user account, but not many more.  Always run with the minimum permissions you need.

Next we look at SQL Profiler.  It’s a great tool for debugging issues on a SQL Server instance, but it should never really be connected to your production database.  This can negatively impact performance of the database.  It’s far better to use it against either a local server or an offline backup or staging server.  Moreover, the very latest versions of SQL Server have functionality that SQL Profiler unfortunately doesn’t support.

Denny then moves on to look at the SQL queries we write.  He says that it’s really not worth the effort to ensure that SQL is written in a cross-platform manner (i.e. ANSI SQL).  Whilst it’ll work, of course, you’re really giving up a lot of functionality and performance improvements that have been built into the platform specific dialects of SQL used on each database platform.  Using SQL that is written specifically for the platform you’re targeting will always allow you to write code in the most performant manner.  Moreover, it’s incredibly rare to have to need your SQL written in a cross-platform way as it’s incredibly rare to actually want to ever migrate your databases to an entirely new database platform.

Denny then looks at the some anti-patterns with data itself.  He states that you should always use NULL where there’s an absence of data, and not values such as empty strings, minimum dates (i.e. 01/01/1990) and other “magic” values.  He also says that you should never blindly design your database schema to a specific level of normalization.  You should always consider the application that the data will support and the required performance of that data and domain design and then design your database schema accordingly.  Next we talk about transaction logs.  Denny says he’s seen a number of people simply deleting transaction logs in order to reclaim disk space.  This is a bad idea, and if you find you really need more disk space, you should simply buy more disk space rather than severely impact your ability to recover your database from crashes by deleting the transaction log.  On the subject of transaction logs, Denny reminds us not to use RAID 5 for our disk array that will store the transaction log.  RAID 5 is not optimized for write intensive operations – which the transaction log requires – and so the performance will suffer as a result.  Also, never ever use AUTO_SHRINK to automatically reclaim disk space.  Whilst this does reclaim disk space, the negatives of doing this far outweigh the positives.

Next we look at columns and schema.  Denny reminds us to always use the correct and most appropriate data type for our data, and always be aware of the kinds of data we’ll be working with.  For example, in the US, the zip code (equivalent to postcode in the UK) is entirely numeric – it’s a 5 digit number.  An integer column might seem appropriate here, but some states (Maine) have their zip code start with two zeroes and it must always be written in 5 digit format with these leading zeroes.  Further, zip codes / post codes are not always numeric once you move beyond the USA, so it’s highly likely you’ll need to support alphabetic characters in there too.  Also, don’t assume that certain values will never change and therefore use them as a primary key.  Some developers have previously used a US Social Security Number as a primary key thinking that it’ll never change, and whilst it very rarely does, it’s not guaranteed not to.

Some developers believe that views will improve performance, however, they really don’t.  And don’t ever be tempted to use nested views as they are considered evil and incredibly difficult to debug.  Don’t require RDP access to a SQL Server in order to run queries against it, it’s not only a security risk, but it’s simply not required.  In thinking about the permissions that you can grant to users and other objects in SQL Server, always ensure that you grant the minimum amount of permissions required.  Also, don’t ever revoke permissions from the built-in database roles, such as the public role.  Denny talks about a time that an over-zealous auditor at a client insisted that certain permissions be removed from the public database role (this was a permission allowing access to the underlying Windows Registry).  When revoked this caused users within the public role to be entirely unable to log in to the SQL Server due to SQL Server’s own requirement to access the Windows Registry when a user logs in!  And with that in mind, like third-party vendors, don’t ever listen to external auditors about what permissions you should or should not assign/revoke on your SQL Server.

IMG_20170408_133339After Denny’s session, it was time for lunch.  All of the attendees gathered in the main hall and headed towards one of the 4 main catering points throughout the venue.  As a vegetarian, there was a rather nice option which was Butternut squash & Sage tortellini with roasted Mediterranean vegetables.  This was followed by either a Strawberry bavarois or a warm chocolate brownie with toffee sauce.  Well, I couldn’t resist the chocolate brownie, so after collecting my meal along with some freshly squeezed orange juice to wash it all down, I found an empty spot at one of the many tables and ate my lunch.

After my lunch I decided to take a stroll around the grounds of the International Centre as it had become such a lovely sunny day outside.  The morning’s sessions had been great but intensive, so this gentle stroll allowed me to clear my head, get some fresh air, enjoy the sunshine and put myself back in the right frame of mind required for the final two sessions of the day.  I made my way back inside the venue as the lunch break drew to a close and once again consulted my programme to determine the correct dome for the next session.  This time it was Dome 6 for Richard Douglas’s “Understanding the Transaction Log”.  After taking my seat, the speaker announced that he wasn’t actually Richard at all, as unfortunately, Richard was suffering from food poisoning so this talk was to be given by one of Richard’s colleagues, John Martin.

IMG_20170408_142519John starts by saying that the transaction log is not just all about backups.  There are numerous and varied uses for the transaction log, and it’s integral to a well-running and performant SQL Server.  John talks about the three recovery models for the transaction log.  There’s “Simple”, “Full” and “Bulk” modes and John is keen to point out that even when using Simple mode, everything is still logged to the transaction log.  There may not be as much detail as could be found if using Full or Bulk modes, but everything is still there.  One of the golden rules is to only ever have one transaction log file for each database.  You can have numerous data files, however, transaction logs should be kept within a single file.  This improves performance and makes recovery and backups much easier.  John reminds us what the various modes do.  Simple mode is effectively auto-maintenance and auto-shrinking of our log – SQL Server will take care of all of this itself.  This may or may not be a good thing depending upon your use case.  In Full mode, SQL Server will not perform any of it’s own maintenance or shrinking at all.  You’re responsible for doing this.  This is very often the better choice as you will know your own use cases better than SQL Server does and so can schedule such maintenance for the most convenient and appropriate times.

IMG_20170408_144507John reminds us of the set of operations within SQL Sever that are “minimally logged”, even when operating in Full recovery mode.  Many common operations , such as “SELECT * INTO”, the CREATE / ALTER & DROP’ping of indexes etc. are all minimally logged in the transaction log, and this minimal logging impacts your ability to perform a complete “point-in-time” restore of your database in the event of a crash.

Select_InsertAfter this we look at the general process by which a standard SELECT or INSERT statement is processed by SQL Server, we can see how there’s a lot of moving parts to the entire process flow and how the transaction log is central to ensuring that SQL Server is able to provide the D (Durability) from the ACID properties that we require from our database engine.  John reminds us that it’s not until our data is fully persisted to the transaction log file that SQL Server considers the data persisted and committed to the database as it’s only after this step that the data could be recovered in the event of a server crash.

John moves on to talk about the internals of the transaction log and how SQL Server uses the space within the log file.  Transaction log files are split into multiple VLFs (Virtual Log File).  These VLF’s exist within the same single physical file on disk.  A VLF can be in either an active or inactive state, and at any given time, there’s always at least one active VLF.  SQL Server will manage the creation of new VLFs as the transaction log grows over time, but it’s possible to control some of how VLF’s are created yourself.  The creation of too many VLF’s within a file is thought to be a bad thing and there’s various discussions about how best to manage this.  Ordinarily, you will have either 4, 8 or 16 VLF’s inside the transaction log file and the VLF is sized appropriately based upon the “chunk” size (chunks are the size by which the transaction log physical file grows on disk).

We look at the makeup of a VLF and John tells us that within a VLF there’s many “log blocks”.  Log Blocks are between 512 bytes and 60KB in size, and a VLF will have as many log blocks as is required to fill the VLF space.  John states how it’s better to try to keep the log blocks at the largest possible size of 60KB as performance is better with fewer, but larger log blocks rather than with many smaller log blocks.  Inside the log block are the individual log records.  Each log record is identified by a unique Log Sequence Number (LSN).  The log sequence number is made up from the VLF sequence number, the log block number and the log record number.

John talks about how the transaction log grows over time.  First we look at log growth when using the “Simple” recovery mode.  SQL Server will periodically instigate a “checkpoint”.  This is the flushing of dirty pages of data within memory to disk and any VLF’s that have no open transactions within them are cleared down.  Note that this does not reclaim any disk space from the VLF, however.  If additional logs are added SQL Server moves through the VLF’s within the file looking for space to write the log records, primarily looking for currently inactive VLFs that have been previously cleared and reusing those. If we reach the end of the file, we “wrap around” to the beginning of the file searching for inactive VLF’s where we can write the transactions.  If no inactive VLFs are found, we must increase the size of the physical transaction log file.  This negatively impacts the server, which has to pause all activity whilst the physical file is extended on disk.  Then we look at log growth with “Full” or “Bulk” recovery mode, which is almost the same as log growth with the simple recovery mode, but instead of a checkpoint, we have a transaction file backup occurring instead which ensures we have much more transaction data available for a full recovery of the state of the server if required.

IMG_20170408_152331At this point, John talks about how we can make things go faster with SQL Server by improving transaction log performance.  We start firstly with a good I/O architecture – consider the balance of reads and writes of your data and select an appropriate RAID strategy that’s optimized for your own use case, and remember that using a bigger RAID cache is always better.  Within the architecture of your SQL Server itself, it’s best if you can determine the actual required size of your physical transaction log file at the beginning.  This is difficult to predict, but if you can get close, your server will perform better as a result.  We should be aware of such things as page splits, and particularly the “evil” variety as not all page splits are bad.  These bad splits can have serious negative performance impacts.  John also cautions against using “delayed durability” which is effectively asynchronous transaction writes.  These cause the server to consider data persisted even though it’s not yet fully written to disk.  Depending upon your application, delayed durability could be ok, but if your system must never lose a single transaction then don’t use it.  One time when it might be appropriate to temporarily enable delayed durability is during large scale purging of data from the database.  John tells us to keep an eye on our indexes.  Too many of those means that each one is a discrete data structure that must be updated and persisted to disk individually which can hurt performance.

Finally John tells us about monitoring the SQL Server transaction log file and we can use both external tools such as PERFMON for that as well as built-in SQL Server system stored procedures such as the sys.dm_io_virtual_file_stats stored procedure.  Regularly reviewing these monitors and logs can help identify bottlenecks within the server and highlight the areas where issues may arise.

IMG_20170408_153231After John’s session, it was time for the final afternoon coffee break, which this time was accompanied by a rather nice selection of cakes!  After a cup of coffee and a cake or two (the scones and carrot cake was particularly nice!) it time to consult the conference programme one last time to determine the dome for the final session of the day.

This time it was Dome 3 for Emanuele Zanchettin’s “Performance Tips For Faster SQL Queries”.

IMG_20170408_160333Emanuele starts his session by talking about debugging.  He reminds us that debugging database queries often starts at the application layer with developers digging through C# code, but he states that debugging sometimes also stops there too.  We need to be aware that we often need to debug down into SQL Server itself.  With that, Emanuele asks the attendees if they’d rather have a talk full of slides or a talk full of real-world demos.  The attendees unanimously vote for demos, so from here Emanuele opens up his SQL Server Management Studio tool and begins to show us some T-SQL code.

He first creates a demo database which includes at least one table of over 2 million rows.  He writes some simple SELECT queries that contain a few joined tables.  We run the queries and see that they execute in a very short space of time, however, in looking at the execution plan generated for our query, we can see that we can make the query perform even better.  So Emanuele’s first tip is to make sure you always check the execution plans generated for your queries as they can often indicate where an additional index on the source tables would greatly improve query performance.

IMG_20170408_073649Unfortunately, it was at this point that I had to leave Emanuele’s session in order to make an early start on my journey back home. I missed the end of Emanuele’s session as well as the final conference wrap-up and prize giving session, but I’d had a fantastic day at another incredibly well-run, well-organised and very informative SQLBits event.

As ever, I can’t wait until the next one!

DDD North 2016 In Review

IMG_20161001_083505

On Saturday, 1st October 2016 at the University of Leeds, the 6th annual DDD North event was held.  After a great event last year, at the University of Sunderland in the North East, this year’s event was held in Leeds as is now customary for the event to alternate between the two locations each year.

After arriving and collecting my badge, it was a short walk to the communal area for some tea and coffee to start the day.  Unfortunately, there were no bacon butties or Danish pastries this time around, but I’d had a hearty breakfast before setting off on the journey to Leeds anyway.

The first session of the day was Pete Smith’sThe Three Problems with Software Development”.   Pete starts by talking about Conway’s Game of Life and how this game is similar to how software development often works, producing complex behaviours from simple building blocks.  Pete says how his talk will examine some “heuristics” for software development, a sort of “series of steps” for software development best practice.

IMG_20161001_093811Firstly, we look at the three problems themselves.  Problem Number 1 is about dividing and breaking down software components.  Pete tells us that this isn’t just code or software components themselves, but can also relate to people and teams and how they are “broken down”.  Problem Number 2 is how to choose effective tools, processes and approaches to your software development and Problem Number 3 is effective communication.

Looking at problem number 1 in more detail, Pete talks about “reasons for change”.  He says that we should always endeavour to keep things together that need to change together.  He shows an example of two simple web pages of lists of teachers and of students.  The ASP.NET MVC view’s mark-up for both of these view is almost identical.  As developers we’d be very tempted to abstract this into a single MVC view and only alter, using variables, the parts that differ between teachers and students, however, Pete suggests that this is not a good approach.  Fundamentally, teachers and students are not the same thing, so even if the MVC views are almost identical and we have some amount of repetition, it’s good to keep them separate – for example, if we need to add specific abilities to only one of the types of teachers or students, having two separate views makes that much easier.

Nest we look at how we can best identify reasons for change.  We should look at what parts of an application get deployed together, we should also look at the domain and the terminology used – are two different domain entities referred to by the same name?  Or two different names for the same entity?  We should consider the “ripple effect” of change – what something changes, what else has to change?  Finally, the main thing to examine is logic vs intent.  Logic is the code and behaviour and can (and should) be refactored and reused, however, intent should never be reused or refactored (in the previous example, the teachers and students were “intents” as they represent two entirely different things within the domain).

In looking at Problem Number 2 is more details, Pete says that we should promote good change practices.  We should reduce coupling at all layer in the application and the entire software development process, but don’t over-abstract.  We need to have strong test coverage for this when done in the software itself.  Not necessarily 100% test coverage, but a good suite of robust tests.  Pete says that in large organisations we should try to align teams with the reasons for change, however, in smaller organisations, this isn’t something that you’d need to worry about to much as the team will be much smaller anyway.

Next, Pete makes the strong suggestion that MVC controllers that do very little - something generally considered to be a good thing - is “considered harmful”!  What he really means is that blanket advice is considered harmful – controllers should, generally, do as little as they need to but they can be larger if they have good reasons for it.  When we’re making choices, it’s important to remain dogmatic.  Don’t forget about the trade-offs and don’t get taken in by the “new shiny” things.  Most importantly, when receiving advice, always remember the context of the advice.  Use the right tool for the job and always read differing viewpoints for any subject to gain a more rounded understanding of the problem.  Do test the limits of the approaches you take, learn from your mistakes and always focus on providing value.

In examining Problem Number 3, Pete talks about communication and how it’s often impaired due to the choice of language we use in software development.  He talks about using the same names and terminology for essentially different things.  For example, in the context of ASP.NET MVC, we have the notion of a “controller”, however, Angular also has the notion of a “controller” and they’re not the same thing.  Pete also states how terminology like “serverless architecture” is a misnomer as it’s not serverless and how “devops”, “agile” etc. mean different things to different people!  We should always say what we mean and mean what we say! 

Pete talks about how code is communication.  Code is read far more often than it’s written, so therefore code should be optimized for reading.  Pete looks at some forms of communication and states that things like face-to-face communication, pair programming and even perhaps instant messaging are often the better forms of communication rather than things like once-a-day stand-ups and email.  This is because the best forms of communication offer instant feedback.  To improve our code communication, we should eliminate implicit knowledge – such as not refactoring those teacher and student views into one view.  New programmers would expect to be able to find something like a TeacherList.cshtml file within the solution.  Doing this helps to improve discovery, enabling new people to a codebase to get up to speed more quickly.  Finally, Pete repeats his important point of focusing on refactoring the “logic” of the application and not the “intent”.

Most importantly, the best thing we can do to communicate better is to simply listen.  By listening more intently, we ensure that we have the correct information that we need and we can learn from the knowledge and experience of others.

IMG_20161001_103012After Pete’s talk it was time to head back to the communal area for more refreshments.  Tea, coffee, water and cans of coke were all available.  After suitable further watering, it was time to head back to the conference rooms for the next session.  This one was John Stovin’sThinking Functionally”.

John’s talk was held in one of the smaller rooms and was also one of the rooms located farthest away from the communal area.  After the short walk to the room, I made it there with only a few seconds to spare prior to the start of the talk, and it was standing room only in the room!

John starts his talk by mentioning how the leap from OO (Object-Oriented) programming to functional programming is similar to the leap from procedural programming to OO itself.  It’s a big paradigm shift!  John mentions how most of today’s non-functional languages are designed to closely mimic the way the computer itself processes machine code in the “von Neumann” style.  That is to say that programs are really just a big series of steps with conditions and branches along the way.  Functional programming helps in the attempt to break free from this by expressing programs as pure functions – a series of functions, similar to mathematical functions, that take an input and produce an output.

John mentions how, when writing functional programs, it’s important to try your best to keep your functions “pure”.  This means that the function should have no side-effects.  For example a function that writes something to the console is not pure, since the side-effect is the output on the console window.  John states that even throwing an exception from a function is a side-effect in itself!

IMG_20161001_104700

We should also endeavour to always keep our data immutable.  This means that we never try to assign a new value to a variable once it has already been initialized with a value – it’s a single assignment.  Write once but read many.  This helps us to reason about our data better as it improves readability and guarantees thread-safety of the data.  To change data in a functional program, we should perform an atomic “copy-and-modify” operation which creates a copy of the data,  but with our own changes applied.

In F#, most variables are immutable by default, and F# forces you to use a qualifier keyword, mutable, in order to make a variable mutable.  In C#, however, we’re not so lucky.  We can “fake” this, though, by wrapping our data in a type (class) – i.e. a money type, and only accepting values in the type’s constructor, ensuring all properties are either read-only or at least have a private setter.  Class methods that perform some operation on the data should return a whole new instance of the type.

We move on to examine how Functional Programming eradicates nulls.  Variables have to be assigned a value at declaration, and due to not being able to reassign values thanks to immutability, we can’t create a null reference.  We’re stuck with nulls in C#, but we can alleviate that somewhat via the use of such techniques as the Null Object Pattern, or even the use of an Option<T> type.  John continues saying that types are fundamental to F#.  It has real tuple and records – which are “multiplicative” types and are effectively aggregates of other existing types, created by “multiplying” those existing types together – and also discriminating unions which are “additive” types which are created by “summing” other existing types together.  For example, the “multiplicative” types aggregate or combine other types – a Tuple can contain two (or more) other types which are (e.g.) string and int, and a discriminated union, as an “additive” type, can act as the sum total of all of it’s constituent types, so a discriminated union of an int and a boolean can represent all of the possible values of an int AND all of the possible values of a boolean.

John continues with how far too much C# code is written using granular primitive types and that in F#, we’re encouraged to make all of our code based on types.  So, for example, a monetary amount shouldn’t be written as simply a variable of type decimal or float, but should be wrapped in a strong Money type, which can enforce certain constraints around how that type is used.  This is possible in C# and is something we should all try to do more of.  John then shows us some F# code declaring an F# discriminated union:

type Shape =
| Rectangle of float * float
| Circle of float

He states how this is similar to the inheritance we know in C#, but it’s not quite the same.  It’s more like set theory for types!

IMG_20161001_111727John continues by discussing pattern matching.  He says how this is much richer in F# than the kind-of equivalent if() or switch() statements in C# as pattern matching can match based upon the general “shape” of the type.  We’re told how functional programming also favours recursion over loops.  F#’s compiler has tail recursion, where the compiler can re-write the function to pass additional parameters on a recursive call and therefore negate the need to continually add accumulated values to the stack as this helps to prevent stack overflow problems.   Loops are problematic in functional programming as we need a variable for the loop counter which is traditionally re-assigned to with every iteration of the loop – something that we can’t due in F# due to variable immutability.

We continue by looking at lists and sequences.  These are very well used data structures in functional programming.  Lists are recursive structures and are either an empty list or a “head” with a list attached to it.  We iterate over the list by taking the “head” element with each pass – kind of like popping values off a stack.  Next we look at higher-order functions.  These are simply functions that take another function as a parameter, so for example, virtually all of the LINQ extension methods found in C# are higher-order functions (i.,e. .Where, .Select etc.) as these functions take a lambda function to act as a predicate.  F# has List and Seq and the built-in functions for working with these are primarily Filter() and Map().  These are also higher-order functions.  Filter takes a predicate to filter a list and Map takes a Func that transforms each list element from one type to another.

John goes on to mention Reactive Extensions for C# which is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators.  These operators are also higher-order functions and are very “functional” in their architecture.  The Reactive Extensions (Rx) allow composability over events and are great for both UI code and processing data streams.

IMG_20161001_113323John then moves on to discuss Railway-oriented programming.  This is a concept whereby all functions both accept and return a type which is of type Result<TSuccess, TFailure>.  All functions return a “Result<T,K>” type which “contains” a type that indicates either success or failure.  Functions are then composable based upon the types returned, and execution path through code can be modified based upon the resulting outcome of prior functions.

Using such techniques as Railway-oriented programming, along with the other inherent features of F#, such as a lack of null values and immutability means that frequently programs are far easier to reason about in F# than the equivalent program written in C#.  This is especially true for multi-threaded programs.

Finally, John recaps by stating that functional languages give a level of abstraction above the von Neumann architecture of the underlying machine.  This is perhaps one of the major reasons that FP is gaining ground in recent years as machine are now powerful enough to allow this (previously, old-school LISP programs – LISP being one of the very first functional languages originally design back in 1958 - often used purpose built machines to run LISP sufficiently well).  John recommends a few resources for further reading – one is the F# for Fun and Profit website.

After John’s session, it was time for a further break and additional refreshment.  Since John’s session had been in a small room and one which was farthest away from the communal area where the refreshments where, and given that my next session was still in this very same conference room, I decided that I’d stay where I was and await the next session, which was Matteo Emili’s “I Read The Phoenix Project And I Loved It. Now What?”

IMG_20161001_115558

Matteo’s session was all about introducing a “devops” culture into somewhere that doesn’t yet have such a culture.  The Phoenix Project is a development “novel” which tells a story of doing just such a thing.  Matteo starts by mentioning The Phoenix Project book and how it’s a great book.  I  must concur that the book is very good, having read it myself only a few weeks before attending DDD North.  Matteo that asks that, if we’d read the book and would like to implement it’s ideas into our own places of work, we should be very careful.  It’s not so simple, and you can’t change an entire company overnight, but you can start to make small steps towards the end goal.

There are three critical concepts that cause failure and a breakdown in an effective devops culture.  They are bottlenecks, lack of communication and boundaries between departments.  In order to start with the introduction of a devops culture, you need to start “out-of-band”.  This means you’ll need to do something yourself, without the backing of your team, in order to prove a specific hypothesis.  Only when you’re sure something will work should you then introduce the idea to the team.

Starting with bottlenecks, the best way to eliminate them is to automate everything that can be automated.  This reduces human error, is entirely repeatable, and importantly frees up time and people for other, more important, tasks.  Matteo reminds us that we can’t change what we can’t measure and in the loop of “build-measure-learn”, the most important aspect is measure.  We measure by gathering metrics on our automations and our process using logging and telemetry and it’s only from these metrics will we know whether we’re really heading in the right direction and what is really “broken” or needs improvement.  We should gather insights from our users as well by utilising such tools and software as Google Analytics, New Relic, Splunk & HockeyApp for example.  Doing this leads to evidence based management allowing you to use real world numbers to drive change.

IMG_20161001_120736

Matteo explains that resource utilisation is key.  Don’t bring a whole new change management process out of the blue.  Use small changes that generate big wins and this is frequently done “out-of-band”.  One simple thing that can be done to help break down boundaries between areas of the company is a company-wide “stand up”.  Do this once a week, and limit it to 1-2 minutes per functional area.  This greatly improves communication and helps areas understand each other better.  The implementation of automation and the eradication of boundaries form the basis of the road to continuous delivery. 

We should ensure that our applications are properly packaged to allow such automation.  MSDeploy is such a tool to help enable this.  It’s an old technology, having first been released around 2003, but it’s seeing a modern resurgence as it can be heavily utilised with Azure.  Use an infrastructure-as-code approach.  Virtual Machines, Servers, Network topology etc. should all be scripted and version controlled.  This allows automation.  This is fair easy to achieve with cloud-based infrastructure in Azure by using Azure ARM or by using AWS CloudFormation with Amazon Web Services.  Some options for achieving the same thing with on-premise infrastructure are Chef, Puppet or even Powershell Desired State Configuration.  Databases are often neglected with regard to DevOps scenarios, however, by using version control and performing small, incremental changes to database infrastructure and the usage of packages (such as SQL Server’s DACPAC files), this can help to move Database Lifecycle Management into a DevOps/continuous delivery environment.

This brings us to testing.  We should use test suites to ensure our scripts and automation is correct and we must remember the golden rule.  If something is going to fail, it must fail fast.  Automated and manual testing should be used to ensure this.  Accountability is important so tests are critical to the product, and remediation (recovery from failure) should be something that is also automated.

Matteo summarises, start with changing people first, then change the processes and the tools will follow.  Remember, automation, automation, automation!  Finally, tackle the broader technical side and blend individual competencies to the real world requirements of the teams and the overall business.

IMG_20161001_083329After Matteo’s session, it was time for lunch.  All of the attendees reconvened in the communal area where we were treated to a selection of sandwiches and packets of crisps.  After selecting my lunch, I found a vacant spot in the corner of the rather small communal area (which easily filled to capacity once all of the different sessions had finished and all of the conference’s attendees descending on the same space) to eat it.  Since lunch break was 1.5 hours and I’d eaten my lunch within the first 20 minutes, I decided to step outside to grab some fresh air.  It was at this point I remembered a rather excellent little pub just 2 minutes walk down the road from the university venue hosting the conference.  Well, never one to pass up the opportunity of a nice pint of real ale, I heading off down the road to The Pack Horse.

IMG_20161001_133433Once inside, I treated myself to lovely pint of Laguna Seca from a local brewery, Burley Street Brewhouse, and settled down in the quiet pub to enjoy my pint and reflect on the morning’s sessions.  During the lunch break, there are usually some grok talks being held, which are are 10-15 minute long “lightning” talks, which attendees can watch whilst they enjoy their lunch.  Since DDD North was held very close to the previous DDD Reading event (only a matter of a few weeks apart) and since the organisers were largely the same for both events, I had heard that the grok talks would be largely the same as those that had taken place, and which I’d already seen, at DDD Reading only a matter of weeks prior.  Due to this, I decided the pub was a more attractive option over the lunch time break!

After slowly drinking and savouring my pint, it was time to head back to the university’s mechanical engineering department and to the afternoon sessions of DDD North 2016.

The afternoon’s first session was, luckily, in one of the “main” lecture halls of the venue, so I didn’t have too far to travel to take my seat for Bart Read’sHow To Speed Up .NET & SQL Server Apps”.

Bart’s session is al about performance.  Performance of our application’s code and performance of the databases that underlie our application.  Bart starts by introducing himself and states that, amongst other things, he was previously an employee of Red Gate, who make quite a number of SQL Server tools so paying close attention to performance monitoring in something that Bart has done for much of his career.

IMG_20161001_142359He states that we need to start with measurement.  Without this, we can’t possibly know where issues are occurring within our application.  Surprisingly, Bart does say that when starting to measure a database-driven application, many of the worst areas are not within the code itself, and are almost always down in the database layer.  This may be from an errant query or general lack of helpful database additions (better indexes etc.)

Bart mentions the tools that he himself uses as part of his general “toolbox” for performance analysis of an application stack.  ANTS Memory Profiler from Red Gate will help analyse memory consumption issues.  dotMemory from JetBrains is another good choice in the same area.  ANTS Performance Profiler from Red Gate will help analyse the performance of .NET code and monitor it’s CPU consumption.  Again, JetBrains have dotTrace in the same space.  There’s also the lesser known .NET Memory Profiler which is a good option.  For network monitoring, Bart uses Wireshark.  For general testing tools, Bart recommends BlazeMeter (for load testing) and Neustar.

Bart also stresses the importance of the ongoing usage of production monitoring tools.  Services such as New Relic, AppDynamics etc. can provide ongoing metrics for your running application when it’s live in production and are invaluable to understand exactly how your application is behaving in a production environment.

arithabortBart shares a very handy tip regarding usage of SQL Server Management Studio for general debugging of SQL Server queries.  He states that we should always UNCHECK the SET ARITHABORT option inside SSMS’s options menu.  Doing this prevents SQL Server from aborting any queries that perform arithmetic overflows or divide-by-zero operations, meaning that your query will continue to run, giving you a much clearer picture of what the query is actually doing (and how long it takes to run).

From here, Bart shares with us 3 different real-world performance scenarios that he has been involved in, how he went about diagnosing the performance issues and how he fixed them.

The first scenario was helping a client’s customer support team who were struggling as it was taking them 40 seconds to retrieve one of their customer’s details from their system when on a support phone call.  The architecture of the application was a ASP.NET MVC web application in C# and using NHibernate to talk to 2 different SQL Server instances - one server was a primary and the other, a linked server.

Bart started by using ANTS Performance Profiler on the web layer and was able to highlight “hotspots” of slow running code, precisely in the area where the application was calling out to the database.  From here, Bart could see that one of the SQL queries was taking 9 seconds to complete.  After capturing the exact SQL statement that was being sent to the database, it was time to fire up SSMS and use SQL Server Profiler in order to run that SQL statement and gain further insight into why it was taking so long to run.

IMG_20161001_144719After some analysis, Bart discovered that there was a database View on the primary SQL Server that was pulling data from a table on the linked server.  Further, there was no filtering on the data pulled from the linked server, only filtering on the final result set after multiple tables of data had been combined.  This meant that the entire table’s data from the linked server was being pulled across the network to the primary server before any filtering was applied, even though not all of the data was required (the filtering discarded most of it).  To resolve the problem, Bart added a simple WHERE clause to the data that was being selected from the linked server’s table and the execution time of the query went from 9 seconds to only 100 milliseconds!

Bart moves on to tell us about the second scenario.   This one had a very similar application architecture as the first scenario, but the problem here was a creeping increase in memory usage of the application over time.  As the memory increased, so the performance of the application decreased and this was due to the .NET garbage collector having to examine more and more memory in order to determine which objects to garbage collect.  This examination of memory takes time.  For this scenario, Bart used ANTS Memory Profiler to show specific objects that were leaking memory.  After some analysis, he found it was down to a DI (dependency injection) container (in this case, Windsor) having an incorrect lifecycle setting for objects that it created and thus these objects were not cleaned up as efficiently as they should have been.  The resolution was to simply configure the DI container to correctly dispose of unneeded objects and the excessive memory consumption disappeared.

IMG_20161001_150655From here, we move onto the third scenario.  This was a multi-tenanted application where each customer had their own database.  It was an ASP.NET Web application but used a custom ADO layer written in C++ to access the database.  Bart spares us the details, but tells us that the problem was ultimately down to locking, blocking and deadlocking in the database.  Bart uses this to remind us of the various concurrency levels in SQL Server.  There’s object level concurrency and row level concurrency, and when many people are trying to read a row that’s concurrently being written to, deadlocks can occur.  There’s many different solution available for this and one such solution is to use a READ COMMITED SNAPSHOT isolation level on the database.  This uses TempDB to help “scale” the demands against the database, so it’s important that the TempDB is stored on a fast storage medium (a fast SSD drive for example).  The best solution is a more disciplined ordering of object access and this is usually implemented with a Unit Of Work pattern, but Bart tells us that this is difficult to achieve with SQL Server.

Finally, Bart tells us all about scenario number four.  The fundamental problem with this scenario was networking, and more specifically it was down to network latency that was killing the application’s performance.  The application architecture here was not a problem as the application was using Virtual Machines running on VMWare’s vSphere with lots and lots of CPU and Memory to spare.  The SQL Server was running on bare metal to ensure performance of the database layer.  Bart noticed that the problem manifested itself when certain queries were run.  Most of the time, the query would complete in less than 100ms, but occasionally spikes of 500-600ms could be seen when running the exact same query.  To diagnose this issue, Bart used WireShark on both ends of the network, that is to say on the application server where the query originated and on the database server where the data was stored, however, as soon as Wireshark was attached to the network, the performance problem disappeared!

This ultimately turned out to be an incorrect setting on the virtual NIC as Bart could see the the SQL Server was sending results back to the client in only 1ms, however, it was a full 500ms to receive the results when measured from the client (application) side of the network link.  It was disabling the “receive side coalescing” setting that fixed the problem.  Wireshark itself temporarily disables this setting, hence the problem disappearing when Wireshark was attached.

IMG_20161001_152003Bart finally tells us that whilst he’s mostly a server-side performance guy, he’s made some general observations about dealing with client-side performance problems.  These are generally down to size of payload, chattiness of the client-side code, garbage collection in JavaScript code and the execution speed of JavaScript code.  He also reminds us that most performance problems in database-driven applications are usually found at the database layer, and can often be fixed with simple things like adding more relevant indexes, adding stored procedures and utilising efficient cached execution plans.

After Bart’s session, it was time for a final refreshment break before the final session of the day.  For me, the final session was Gary McClean Hall’s “DDD: the God That Failed

Gary starts his session by acknowledging that the title is a little clickbait-ish as his talk started life as a blog post he had previously written.  His talk is all about Domain Driven Design (DDD) and how he implemented DDD when he was working within the games industry.  Gary mentions that he’s the author of the book, “Adaptive Code via C#” and that when we he was working in the game industry, he had worked on the Championship Manager 2008 game.

Gary’s usage of DDD in game development started when there was a split between two companies involved in the Championship Manager series of games.  In the fall out of the split, one company kept the rights to the name, and the other company kept the codebase!  Gary was with the company that had the name but no code and they needed to re-create the game, which had previously been many years in development, in a very compressed timescale of only 12 months.

IMG_20161001_155048Gary starts with a definition of DDD.   It is modelling for complicated domains.  Gary is keen to stress the word “complicated”.  Therefore, we need to be able to identify what exactly is a complicated domain.  In order to help with this, it’s often best to create a “DDD Maturity Model” for the domain in which we’re working.  This is a series of topics which can be further expanded upon with the specifics for that topic (if any) within out domain.  The topics are:

The Domain
Domain Entity Behaviour
Decoupled Domain
Aggregate Roots
Domain Events
CQRS
Bounded Contexts
Polyglotism

By examining the topics in the list above and determining the details for those topics within our own domain, we can evaluate our domain and it’s relative complexity and thus its suitability to be modelled using DDD.

IMG_20161001_155454Gary continues by showing us a typical structure of a Visual Studio solution that purports to follow the Domain Driven Design pattern.  He states that he sees many such solutions configured this way, but it’s not really DDD and usually represent a very anaemic domain.  Anaemic domain models are collections of classes that are usually nothing more than properties with getters and setters, but little to no behaviour.  This type of model is considered an anti-pattern as they offer very low cohesion and high coupling.

If you’re working with such a domain model, you can start to fix things.  Looking for areas of the domain that can benefit from better types rather than using primitive types is a good start.  A classic example of this is a class to represent money.  Having a “money” class allows better control over the scale of the values you’re dealing with and can also encompass currency information as well.  This is preferable to simply passing values around the domain as decimals or ints.

Commonly, in the type of anaemic domain model as detailed above, there are usually repositories associated with entity models within the domain, and it’s usually a single repository per entity model.  This is also considered an anti-pattern as most entities within the domain will be heavily related and thus should be persisted together in the same transaction.  Of course, the persistence of the entity data should be abstracted from the domain model itself.

Gary then touches upon an interested subject, which is the decoupling within a DDD solution.  Our ASP.NET views have ViewModels, our domain has it’s Domain Models and the persistence (data) layer has it’s own data models.  One frequent piece of code plumbing that’s required here is extensive mapping between the various models throughout the layers of the application.  In this regard, Gary suggests reading Mark Seemann’s article, “Is layering worth the mapping?”  In this article, Mark suggests that the best way to avoid having to perform extensive mapping is to move less data around between the layers of our application.  This can sometimes be accomplished, but depending upon the nature of the application, this can be difficult to achieve.

IMG_20161001_160741_1So, looking back at the “repository-per-entity” model again, we’re reminded that it’s usually the wrong approach.  In order to determine the repositories of our domain, we need to examine the domain’s “Aggregate Roots”.  A aggregate root is the top-level object that “contains” additional other child objects within the domain.  So, for example, a class representing a Customer could be an aggregate root.  Here, the customer would have zero, one or more Order classes as children, and each Order class could have one or more OrderItems as children, with each OrderItem linking out to a Product class.  It’s also possible that the Product class could be considered an aggregate root of the domain too, as the product could be the “root” object that is retrieved within the domain, and the various order items across multiple orders for many different customers  could be retrieved as part of the product’s object graph.

To help determine the aggregate roots within our domain, we first need to examine and determine the bounded contexts.  A bounded context is a conceptually related set of objects within the domain that will work together and make sense for some of the domain’s behaviours.  For example, the customer, order, orderitem and product classes above could be considered part of a “Sales” context within the domain.  It’s important to note that a single domain entity can exist in more than one bounded context, and it’s frequently the case that the actually objects within code that represent that domain entity can be entirely different objects and classes from one bounded context to the next.  For example, within the Sales bounded context, it’s possible that only a small subset of the product data is required, therefore the Product class within the Sales bounded context has a lot less properties/data than the Product class in a different bounded context – for example, there could be a “Catalogue” context, with the Product entity as it’s aggregate root, but this Product object is different from the previous one and contains significantly more properties/data.

IMG_20161001_161509The number of different bounded contexts you have within your domain determines the domain’s breadth.  The size of the bounded contexts (i.e. the number of related objects within it) determines the domains depth.  The size of a given bounded context’s depth determines the importance of that area of the domain to the user of the application.

Bounded contexts and the aggregate roots within them will need to communicate with one another in order that behaviour within the domain can be implemented.  It’s important to ensure that aggregate roots and especially bounded contexts are not coupled to each other, so communication is performed using domain events.  Domain events are an event that is raised by one aggregate root or bounded context’s entity that is broadcast to the rest of the domain.  Other entities within other bounded contexts or aggregate roots will subscribe to the domain events that they may be interested in, in order for them to respond to actions and behaviour in other areas of the domain.  Domain events in a .NET application are frequently modelled using the built-in events and delegates functionality of the .NET framework, although there are other options available such as the Reactive Extensions library as well as specific patterns of implementation.

IMG_20161001_161830

One difficult area of most applications, and somewhere where the “pure” DDD model may break down slightly is search.  Many different applications will require the ability to search across data within the domain, and frequently search is seen as a cross-cutting concern as the result data returned can be small amounts of data from many different aggregates and bounded contexts in one amalgamated data set.  One approach that can be used to mitigate this is the CQRS – Command and Query Responsibility Segregation pattern.

Essentially, this pattern states that the models and code that we use to read data does not necessarily have to be the same models and code that we use to write data.  In fact, most of the time, these models and code should be different.  In the case of requiring a search across disparate data within the DDD-modelled domain, it’s absolutely fine to forego the strict DDD model and to create a specific “view” – this could be a database stored procedure or a database view – that retrieves the exact cross-cutting data that you need.  Doing this prevents using the DDD model to specifically create and hydrate entire aggregate roots of object graphs (possibly across multiple different bounded contexts) as this is something that could be a very expensive operation as most of the retrieved data wouldn’t be required.

Gary reminds us that DDD aggregates can still be painful when using a relational database as the persistence storage due to the impedance mismatch of the domain models in code and the tables within the database.  It’s worth examining Document databases or Graph databases as the persistent storage as these can often be a better choice. 

Finally, we learn that DDD is frequently not justified in applications that are largely CRUD based or for applications that make very extensive use of data queries and reports (especially with custom result sets).  Therefore, DDD is mostly appropriate for those applications that have to model a genuinely complex domain with specific and complex domain objects and behaviours and where a DDD approach can deliver real value.

IMG_20161001_165949After Gary’s session was over, it was time for all of the attendees to gather in the largest of the conference rooms for the final wrap-up.  There were only a few prize give-aways on this occasion, and after those were awarded to the lucky attendees who had their feedback forms drawn at random, it was time to thank the great sponsors of the event, without whom there simply wouldn’t be a DDD North.

I’d had a great time at yet another fantastic DDD event, and am already looking forward to the next one!