DDD 12 In Review

IMG_20170610_085641On Saturday 10th June 2017 in sunny Reading, the 12th DeveloperDeveloperDeveloper event was held at Microsoft’s UK headquarters.  The DDD events started at the Microsoft HQ back in 2005 and after some years away and a successful revival in 2016, this year’s DDD event was another great occasion.

I’d travelled down to Reading the evening before, staying over in a B&B in order to be able to get to the Thames Valley Park venue bright and early on the Saturday morning.  I arrived at the venue and parked my car in the ample parking spaces available on Microsoft’s campus and headed into Building 3 for the conference.  I collected my badge at reception and was guided through to the main lounge area where an excellent breakfast selection awaited the attendees.  Having already had a lovely big breakfast at the B&B where I was staying, I simply decided on another cup of coffee, but the excellent breakfast selection was very well appreciated by the other attendees of the event.

IMG_20170610_085843

I found a corner in which to drink my coffee and double-check the agenda sheets that we’d been provided with as we entered the building.   There’d been no changes since I’d printed out my own copy of the agenda a few nights earlier, so I was happy with the selections of talks that I’d chosen to attend.   It’s always tricky at the DDD events as there are usually at least 3 parallel tracks of talks and invariably there will be at least one or two timeslots where multiple talks that you’d really like to attend will overlap.

IMG_20170611_104806Occasionally, these talks will resurface at another DDD event (some of the talks and speakers at this DDD in Reading had given the same or similar talks in the recently held DDD South West in Bristol back in May 2017) and so, if you’re really good at scheduling, you can often catch a talk at a subsequent event that you’d missed at an earlier one!

As I finished off my coffee, more attendees turned up and the lounge area was now quite full.  It wasn’t too much longer before we were directed by the venue staff to make our way to the relevant conference rooms for our first session of the day.

Wanting to try something a little different, I’d picked Andy Pike’s Introducing Elixir: Self-Healing Applications at ZOMG scale as my first session.

Andy started his sessions by talking about where Elixir, the language came from.  It’s a new language that is built on top of the Erlang virtual machine (BEAM VM) and so has it’s roots in Erlang.  Like Erlang, Elixir compiles down to the same bytecode that the VM ultimately runs.  Erlang was originally created at the Ericsson company in order to run their telecommunications systems.  As such, Erlang is built on top of a collection of middleware and libraries known as OTP or Open Telecom Platform.  Due to its very nature, Erlang and thus Elixir has very strong concurrency, fault tolerance and distributed computing at it’s very core.  Although Elixir is a “new” language in terms perhaps of it’s adoption, it’s actually been around for approximately 5 years now and is currently on version 1.4 with version 1.5 not too far off in the future.

Andy tells of how Erlang is used by WhatsApp to run it’s global messaging system.  He provides some numbers around this.  WhatsApp have 1.2 billion users, process around 42 billion messages per day and can manage to handle around 2 million connections on each server!   That’s some impressive performance and concurrency figures and Andy is certainly right when he states that very few other, if any, platforms and languages can boast such impressive statistics.

Elixir is a functional language and its syntax is heavily inspired by Ruby.  The Elixir language was first designed by José Valim, who was a core contributor in the Ruby community.  Andy talks about the Elixir REPL that ships in the Elixir installation which is called “iex”.  He shows us some slides of simple REPL commands showing that Elixir supports all the same basic intrinsic types that you’d expect of any modern language, integers, strings, tuples and maps.  At this point, things look very similar to most other high level functional (and perhaps some not-quite-so-functional) languages, such as F# or even Python.

Andy then shows us something that appears to be an assignment operator, but is actually a match operator:

a = 1

Andy explains that this is not assigning 1 to the variable ‘a’, but is “matching” a against the constant value 1.  If ‘a’ has no value, Elixir will bind the right-hand side operand to the variable, then perform the same match.  An alternative pattern is:

^a = 1

which performs the matching without the initial binding.  Andy goes on to show how this pattern matching can work to bind variables to values in a list.  For example, given the code:

success = { :ok, 42 }
{ ;ok, result } = success

this will bind the value of 42 to the variable result and subsequently perform the match of the tuple to the variable ‘success’ which now matches.  We’re told how the colon in front of the ok variable makes it an “atom”.  This is similar to a constant, where the variables name is it’s own value.

IMG_20170610_095341Andy shows how Elixir’s code is grouped into functions and functions can be contained within modules.  This is influenced by Ruby and how it also groups it’s code.  We then move on to look at lists.  These are handled in a very similar way to most other functional languages in that a list is merely seen as a “head” and a “tail”.  The “head” is the first value in the list and the “tail” is the entire rest of the list.  When processing the items in a list in Elixir, you process the head and then, perhaps recursively, call the same method passing the list “tail”.  This allows a gradual shortening of the list as the “head” is effectively removed with each pass through the list.  In order for such recursive processing to be performant, Elixir includes tail-call optimisation which allows the compiler to eliminate the necessity of maintaining state through each successive call to the method.  This is possible when the last line of code in the method is the recursive call.

Elixir also has guard clauses built right into the language.  Code such as:

def what_is(x) when is_number(x) and X > 0 do…

helps to ensure that code is more robust by only being invoked when ‘x’ is not only a number but also has some specific value too.  Andy states that, between the usage of such guard clauses and pattern matching, you can probably eliminate around 90-95% of all conditionals within your code (i.e. if x then y).

Elixir is very expressive within it’s allowed characters for function names, so functions can (and often do) have things like question marks in their name.  It’s a convention of the language that methods that return a Boolean value should end in a question mark, something shared with Ruby also, i.e. String.contains? "elixir of life", "of"  And of course, Elixir, like most other functional languages has a pipe operator(|>) which allows the piping of the result of one function call into the input of another function call, so instead of writing:

text = "THIS IS MY STRING"
text = String.downcase(text)
text = String.reverse(text)
IO.puts text

Which forces us to continually repeat the “text” variable, we can instead write the same code like this:

text = "THIS IS MY STRING"
|> String.downcase
|> String.reverse
|> IO.puts

Andy then moves on to show us an element of the Elixir language that I found particular intriguing, doctests.  Elixir makes function documentation a first class citizen within the language and not only does the doctest code provide documentation for the function – Elixir has a function, h, that when passed another function name as a parameter, will display the help for that function – but also serves as a unit test for the function, too!  Here’s a sample of an Elixir function containing some doctest code:

defmodule MyString do
   @doc ~S"""
   Converts a string to uppercase.

   ## Examples
       iex> MyString.upcase("andy")
       "ANDY"
   """
   def upcase(string) do
     String.upcase(string)
   end
end

The doctest code not only shows the textual help text that is shown if the user invokes the help function for the method (i.e. “h MyString”) but the examples contained within the help text can be executed as part of a doctest unit test for the MyString method:

defmodule MyStringTest do
   use ExUnit.Case, async: true
   doctest MyString
end

This above code uses the doctest code inside the MyString method to invoke each of the provided “example” calls and assert that the output is the same as that defined within the doctest!

After taking a look into the various language features, Andy moves on talk about the real power of Elixir which it inherits from it’s Erlang heritage – processes.  It’s processes that provide Erlang, and thus Elixir with the ability to scale massively, provide it with fault-tolerance and its highly distributed features.

Wen Elixir functions are invoked, they can effectively be “wrapped” within a process.  This involves spawning a process that contains the function.  Processes are not the same as Operating System processes, but are much more lightweight and are effectively only a C struct that contains a pointer to the function to call, some memory and mailbox (which will hold messages sent to the function).  Processes have a Process ID (PID) and will, once spawned, continue to run until the function contained within terminates or some error or exception occurs.  Processes can communicate with other processes by passing messages to those processes.  Here’s an example of a very simple module containing a single function and how that function can be called by spawning a separate process:

defmodule HelloWorld do
     def greet do
		IO.puts "Hello World"
     end
end

HelloWorld.greet                 # This is a normal function call.
pid = spawn(HelloWorld, :greet)  # This spawns a process containing the function

Messages are sent to processes by invoking the “send” function, providing the PID and the parameters to send to the function:

send pid, { :greet, “Andy” }

This means that invoking functions in processes is almost as simple as invoking a local function.

Elixir uses the concept of schedulers to actually execute processes.  The Beam VM will supply one scheduler per core of CPU available, giving the ability to run highly concurrently.  Elixir also uses supervisors as part of the Beam VM which can monitor processes (or even monitor other supervisors) and can kill processes if they misbehave in unexpected ways.  Supervisors can be configured with a “strategy”, which allows them to deal with errant processes in specific ways.  One common strategy is one_for_one which means that if a given process dies, a single new one is restarted in it’s place.

Andy then talks about the OTP heritage of Elixir and Erlang and from this there is a concept of a “GenServer”.  A GenServer is a module within Elixir that provides a consistent way to implement processes.  The Elixir documentation states:

A GenServer is a process like any other Elixir process and it can be used to keep state, execute code asynchronously and so on. The advantage of using a generic server process (GenServer) implemented using this module is that it will have a standard set of interface functions and include functionality for tracing and error reporting. It will also fit into a supervision tree.

The GenServer provides a common set of interfaces and API’s that all processes can adhere to, allowing common conventions such as the ability to stop a process, which is frequently implemented like so:

GenServer.cast(pid, :stop)

Andy then talks about “nodes”.  Nodes are separate actual machines that can run Elixir and the Beam VM and these nodes can be clustered together.  Once clustered, a node can start a process not only on it’s own node, but on another node entirely.  Communication between processes, irrespective of the node that the process is running on is handled seamlessly by the Beam VM itself.  This provides Elixir solutions great scalability, robustness and fault-tolerance.

Andy mentions how Elixir has it’s own package manager called “hex” which gives access to a large range of packages providing lots of great functionality.  There’s “Mix”, which is a build tool for Elixir, OTP Observer for inspection of IO, memory and CPU usage by a node, along with “ETS”, which is an in-memory key-value store, similar to Redis, just to name a few.

Andy shares some book information for those of us who may wish to know more.  He suggests “Programming Elixir” and “Programming Phoenix”, both part of the Pragmatic Programmers series and also “The Little Elixir & OTP Guidebook” from Manning.

Finally, Andy wraps up by sharing a story of a US-based Sports news website called “Bleacher Report”.  The Bleacher Report serves up around 1.5 billion pages per day and is a very popular website both in the USA and internationally.  Their entire web application was originally built on Ruby on Rails and they required approximately 150 servers in order to meet the demand for the load.  Eventually, they re-wrote their application using Elixir.  They now serve up the same load using only 5 servers.  Not only have they reduced their servers by an enormous factor, but they believe that with 5 servers, they’re actually over-provisioned as the 5 servers are able to handle the load very easily.  High praise indeed for Elixir and the BeamVM.  Andy has blogged about his talk here.

After Andy’s talk, it was time for the first coffee break.  Amazingly, there were still some breakfast sandwiches left over from earlier in the morning, which many attendees enjoyed.  Since I was still quite full from my own breakfast, I decided a quick cup of coffee was in order before heading back to the conference rooms for the next session.  This one was Sandeep Singh’s Goodbye REST; Hello GraphQL.

IMG_20170610_103642

Sandeep’s session is all about the relatively new technology of GraphQL.  It’s a query language for your API comprising of a server-side runtime for processing queries along with a client-side framework providing an in-browser IDE, called GraphiQL.  One thing that Sandeep is quick to point out is that GraphQL has nothing to do with Graph databases.  It can certainly act as a query layer over the top of a graph database, but can just as easily query any kind of underlying data such as RDBMS’s through to even flat-file data.

Sandeep first talks about where we are today with API technologies.  There’s many of them, XML-RPC, REST, ODATA etc. but they all have their pros and cons.  We explore REST in a bit more detail, as that’s a very popular modern day architectural style of API.  REST is all about resources and working with those resources via nouns (the name of the resource) and verbs (HTTP verbs such as POST, GET etc.), there’s also HATEOAS if your API is “fully” compliant with REST.

Sandeep talks about some of the potential drawbacks with a REST API.   There’s the problem of under-fetching.  This is seen when a given application’s page is required to show multiple resources at once, perhaps a customer along with recently purchased products and a list of recent orders.  Since this is three different resources, we would usually have to perform three distinct different API calls in order to retrieve all of the data required for this page, which is not the most performant way of retrieving the data.  There’s also the problem of over-fetching.  This is where REST API’s can take a parameter to instruct them to include additional data in the response (i.e. /api/customers/1/?include=products,orders), however, this often results in data that is additional to that which is required.  We’re also exposing our REST endpoint to potential abuse as people can add arbitrary inclusions to the endpoint call.  One way to get around the problems of under or over fetching is to create ad-hoc custom endpoints that retrieve the exact set of data required, however, this can quickly become unmaintainable over time as the sheer number of these ad-hoc endpoints grows.

IMG_20170610_110327GraphQL isn’t an architectural style, but is a query language that sits on top of your existing API. This means that the client of your API now has access to a far more expressive way of querying your API’s data.  GraphQL responses are, by default, JSON but can be configured to return XML instead if required, the input queries themselves are similarly structured to JSON.  Here’s a quick example of a GraphQL query and the kind of data the query might return:

{
	customer {
		name,
		address
	}
}
{
	"data" : {
		"customer" : {
			"name" : "Acme Ltd.",
			"address" : ["123 High Street", "Anytown", "Anywhere"]
		}
	}
}

GraphQL works by sending the query to the server where it’s translated by the GraphQL server-side library.  From here, the query details are passed on to code that you have written on the server in order that the query can be executed.  You’ll write your own types and resolvers for this purpose.  Types provide the GraphQL queries with the types/classes that are available for querying – this means that all GraphQL queries are strongly-typed.  Resolvers tell the GraphQL framework how to turn the values provided by the query into calls against your underlying API.  GraphQL has a complete type system within it, so it supports all of the standard intrinsic types that you’d expect such as strings, integers, floats etc. but also enums, lists and unions.

Sandeep looks at the implementations of GraphQL and explains that it started life as a NodeJS package.  It’s heritage is therefore in JavaScript, however, he also states that there are many implementations in numerous other languages and technologies such as Python, Ruby, PHP, .NET and many, many more.  Sandeep says how GraphQL can be very efficient, you’re no longer under or over fetching data, and you’re retrieving the exact fields you want from the tables that are available.  GraphQL allows you to version your queries and types also by adding deprecated attributes to fields and types that are no longer available.

IMG_20170610_110046We take a brief looks at the GraphiQL GUI client which is part of the GraphQL client side library.  It displays 3 panes within your browser showing the schemas, available types and fields and a pane allowing you to type and perform ad-hoc queries.  Sandeep explains that the schema and sample tables and fields are populated within the GUI by performing introspection over the types configured in your server-side code, so changes and additions there are instantly reflected in the client.  Unlike REST, which has obvious verbs around data access, GraphQL doesn't really have these.  You need introspection over the data to know how you can use that data, however, this is a very good thing.  Sandeep states how introspection is at the very heart of GraphQL – it is effectively reflection over your API – and it’s this that leads to the ability to provide strongly-typed queries.

We’re reminded that GraphQL itself has no concept of authorisation or any other business logic as it “sits on top of” the existing API.  Such authorisation and business logic concerns should be embedded within the API or a lower layer of code.  Sandeep says that the best way to think of GraphQL is like a “thin wrapper around the business logic, not the data layer”.

IMG_20170610_111157GraphQL is not just for reading data!  It has full capabilities to write data too and these are known as mutations rather than queries.  The general principle remains the same and a mutation is constructed using JSON-like syntax and sent to the server for resolving to a custom method that will validate the data and persist it the data store by invoking the relevant API endpoint.  Sandeep explains how read queries can be nested, so you can actually send one query to the server that actually contains syntax to perform two queries against two different resources.    GraphQL has a concept of "loaders".  These can batch up actual queries to the database to prevent issues when asking for such things all Customers including Orders.  Doing something like this normally results in a N+1 issue where by Orders are retrieved by issuing a separate query for each customer, resulting in degraded performance.  GraphQL loaders works by enabling the rewriting of the underlying SQL that can be generated for retrieving the data so that all Orders are retrieved for all of the required Customers in a single SQL statement.  i.e. Instead of sending queries like the following to the database:

SELECT CustomerID, CustomerName FROM Customer
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 1
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 2
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 3

We will instead send queries such as:

SELECT CustomerID, CustomerName FROM Customer
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID IN (1,2,3)

IMG_20170610_110644Sandeep then looks at some of the potential downsides of using GraphQL.  Caching is somewhat problematic as you can no longer perform any caching at the network layer as each query is now completely dynamic.  Despite the benefits, there are also performance considerations if you intend to use GraphQL as your underlying data needs to be correctly structured in order to work with GraphQL in the most efficient manner.  GraphQL Loaders should also be used to ensure N+1 problems don’t become an issue.  There’s security considerations too. You shouldn’t expose anything that you don’t want to be public since everything through the API is available to GraphQL and you’ll need to be aware of potentially malicious queries that attempt to retrieve too much data.  One simple solution to such queries is to use a timeout.  If a given query takes longer than some arbitrarily defined timeout value, you simply kill the query, however this may not be the best approach.  Other approaches taken by big websites currently using such functionality is to whitelist all the acceptable queries.  If a query is received that isn’t in the whitelist, it doesn’t get run.  You also can’t use HTTP codes to indicate status or contextual information to the client.  Errors that occur when processing the GraphQL query are contained within the GraphQL response text itself which is returned to the client with 200 HTTP success code.  You’ll need to have your own strategy for exposing such data to the user in a friendly way.   Finally Sandeep explains that, unlike other querying technologies such as ODATA, GraphQL has no intrinsic ability to paginate data.  All data pagination must be built into your underlying API and business layer, GraphQL will merely pass the paging data – such as page number and size – onto the API and expect the API to correctly deal with limiting the data returned.

IMG_20170610_103123After Sandeep’s session, it’s time for another coffee break.  I quickly grabbed another coffee from the main lounge area of the conference, this time accompanied by some rather delicious biscuits, before consulting my Agenda sheet to see which room I needed to be in for my next session.  Turned out that the next room I needed to be in was the same one I’d just left!  After finishing my coffee and biscuits I headed back to Chicago 1 for the next session and the last one before the lunch break.  This one was Dave Mateer’s Fun With Twitter Stream API, ELK Stack, RabbitMQ, Redis and High Performance SQL.

Dave’s talk is really all about performance and how to process large sets of data in the most efficient and performant manner.  Dave’s talk is going to be very demo heavy and so to give us a large data set to work with, Dave starts by looking at Twitter, and specifically, it’s Stream API.  Dave explains that the full Twitter firehose, which is reserved only for Twitter’s own use, currently has 1.3 million tweets per second flowing through it.  As a consumer, you can get access to a deca-firehose which contains 1/10th of the full firehose (i.e. 130000 tweets per second) but this costs money, however, Twitter does expose a freely available Stream API although it’s limited to 50 tweets per second.  This is still quite a sizable amount of data to process.

IMG_20170610_120247Dave starts by showing us a demo of a simple C# console application that uses the Twitter Stream API to gather real-time tweets for the hashtag of #scotland which are then echoed out to the console.   Our goal is to get the tweet data into a SQL Server database as quickly and as efficiently as possible.  Dave now says that to simulate a larger quantity of data, he’s going to read in a number of pre-saved text files containing tweet data that he’s previously collected.  These files represent around 6GB of raw tweet data, containing approximately 1.9 million tweets.  He then adds to the demo to start saving the tweets into SQL Server.  Dave mentions that he's using Dapper to access SQL Server and that previously he tried such things using Entity Framework, which was admittedly some time in the past, but that it was a painful experience and not very performant.   Dave likes Dapper as it's a much simpler abstraction over the database, so therefore much more performant.  It's also a lot easier to optimize your queries when using Dapper as the abstraction isn’t so great and you’re not hiding the implementation details too much.

IMG_20170610_121958Next, Dave shows us a Kibana interface.  As well as writing to SQL Server, he's also saving the tweets to a log file using SeriLog and then using LogStash to send those logs to ElasticSearch allowing viewing the raw log data with Kibana (also known as the ELK stack).  Dave then shows us how easy it is to really leverage the power of tools like Kibana by creating a dashboard for all of the data.

From here, Dave begins to talk about performance and just how fast we can process the large quantity of tweet data.  The initial run of the application, which was simply reading in each tweet from the file and performing an INSERT via Dapper to insert the data to the SQL Server database was able to process approximately 420 tweets per second.  This isn’t bad, but it’s not a great level of performance.  Dave digs out SQL Server Profiler to see where the bottle-necks are within the data access code, and this shows that there are some expensive reads – the data is normalized so that the user that a tweet belongs to is stored in a separate table and looked up when needed.  It’s decided that adding indexes on the relevant columns used for the lookup data might speed up the import process.  Sure enough, after adding the indexes and re-running the tweet import, we improve from 420 to 1600 tweets per second.  A good improvement, but not an order of magnitude improvement.  Dave wants to know if we can change the architecture of our application and go even faster.  What if we want to try to achieve a 10x level of increase in performance?

Dave states that, since his laptop has multiple cores, we should start by changing the application architecture to make better use of parallel processing across all of the available cores in the machine.  We set up an instance of the RabbitMQ message queue, allowing us to read the tweet data in from the files and send it to a RabbitMQ queue.  The queue is explicitly set to durable and each message is set to persistent in order to ensure we have the ability to continue where we left off from in the event of a crash or server failure.  From here, we can have multiple instances of another application that pull messages off the queue, leveraging RabbitMQ’s ability to effectively isolate each of the client consumers, ensuring that the same message is not sent to more than one client.  Dave then sets up Redis.  This will be used for "lookups" that are required when adding tweet data.  So users (and other data) is added to the DB first, then all data is cached in Redis, which is an in-memory key/value store and is often used for caching scenarios.  As tweets are added to the RabbitMQ message queue, required ID/Key lookups for Users and other data are done using the Redis cache rather than performing a SQL Server query for the data, thus improving the performance.  Once processed and the relevant values looked up from Redis, Dave uses SQL Server Bulk Copy to get the data into SQL Server itself.  SQL Server Bulk Copy provides a significant performance benefit over using standard INSERT T-SQL statements.   For Dave’s purposes, he Bulk Copies the data into temporary tables within SQL Server and then, at the end of the import, runs a single T-SQL statement to copy the data from the temporary tables to the real tables. 

IMG_20170610_125524Having re-architected the solution in this manner, Dave then runs his import of 6GB of tweet data again.  As he’s now able to take advantage of the multiple CPU cores available, he runs 3 console applications in parallel to process all of the data.  Each console application completes their jobs within around 30 seconds, and whilst they’re running, Dave shows us the Redis dashboard which is indicating that Redis is receiving around 800000 hits to the cache per second!  The result, ultimately, is that the application’s performance has increased from processing around 1600 tweets per second to around 20000!  An impressive improvement indeed!

Dave then looks at some potential downsides of such re-architecture and performance gains.  He shows how he’s actually getting duplicated data imported into his SQL Server and this is likely due to race conditions and concurrency issues at one or more points within the processing pipeline.  Dave then quickly shows us how he’s got around this problem with some rather ugly looking C# code within the application (using temporary generic List and Dictionary structures to pre-process the data).  Due to the added complexity that this performance improvement brings, Dave argues that sometimes slower is actually faster in that, if you don't absolutely really need so much raw speed, you can remove things like Redis from the architecture, slowing down the processing (albeit to a still acceptable level) but allowing a lot of simplification of the code.

IMG_20170610_130715After Dave’s talk was over, it was time for lunch.  The attendees made their way to the main conference lounge area where we could choose between lunch bags of sandwiches (meat, fish and vegetarian options available) or a salad option (again, meat and vegetarian options).

Being a recently converted vegetarian, I opted for a lovely Greek Salad and then made my way to the outside area along with, it would seem, most of the other conference attendees.

IMG_20170610_132208It had turned into a glorious summer’s day in Reading by now and the attendees and some speakers and other conference staff were enjoying the lovely sunshine outdoors while we ate our lunch.  We didn’t have too long to eat, though, as there were a number of Grok talks that would be taking place inside one of the major conference rooms over lunchtime.  After finishing my lunch (food and drink was not allowed in the session rooms themselves) I heading back towards Chicago 1 where the Grok Talks were to be held.

I’d managed to get there early enough in order to get a seat (these talks are usually standing room only) and after settling myself in, we waited for the first speaker.  This was to be Gary Short with a talk about Markov, Trump and Countering Radicalisation.

Gary starts by asking, “What is radicalisation?”.  It’s really just persuasion – being able to persuade people to hold an extreme view of some information.  Counter-radicalisation is being able to persuade people to hold a much less extreme belief.  This is hard to achieve, and Gary says that it’s due to such things as Cognitive Dissonance and The “Backfire” Effect (closely related to Confirmation Bias) – two subjects that he doesn’t have time to go into in a 10 minute grok talk, but he suggests we Google them later!  So, how do we process text to look for signs of radicalisation?  Well, we start with focus groups and corpus's of known good text as well as data from previously de-radicalised people (Answers to questions like “What helped you change?” etc.)  Gary says that Markov chains are used to process the text.  They’re a way of “flowing” through a group of words in such a way that the next word is determined based upon statistical data known about the current word and what’s likely to follow it.  Finally, Gary shows us a demo of some Markov chains in action with a C# console application that generates random tweet-length sentences based upon analysis of a corpus of text from Donald Trump.  His application is called TweetLikeTrump and is available online.

The next Grok talk is Ian Johnson’s Sketch Notes.  Ian starts by defining Sketch Notes.  They’re visual notes that you can create for capturing the information from things like conferences, events etc.   He states that he’s not an artist, so don’t think that you can’t get started doing Sketch Noting yourself.  Ian talks about how to get started with Sketch Notes.  He says it’s best to start by simply practising your handwriting!   Find a clear way of quickly and clearly writing down text using pen & paper and then practice this over and over.  Ian shared his own structure of a sketch note, he places the talk title and the date in the top left and right corners and the event title and the speaker name / twitter handle in the bottom corners.  He goes on to talk more about the component parts of a sketch note.   Use arrows to create a flow between ideas and text that capture the individual concepts from the talk.  Use colour and decoration to underline and underscore specific and important points – but try to do this when there’s a “lull” in the talk and you don’t necessarily have to be concentrating on the talk 100% as it’s important to remember that the decoration is not as important as the content itself.  Ian shares a specific example.  If the speaker mentions a book, Ian would draw a little book icon and simply write the title of the book next to the icon.  He says drawing people is easy and can be done with a simple circle for the head and no more than 4 or 5 lines for the rest of the body.  Placing the person’s arms in different positions helps to indicate expression.  Finally, Ian says that if you make a mistake in the drawing (and you almost certainly will do, at least when starting out) make a feature of it!  Draw over the mistake to create some icon that might be meaningful in the entire context of the talk or create a fancy stylised bullet point and repeat that “mistake” to make it look intentional!  Ian has blogged about his talk here.

Next up is Christos Matskas’s grok talk on Becoming an awesome OSS contributor.   Christos starts by asking the audience who is using open source software?  After a little thought, virtually everyone in the room raises their hand as we’re all using open source software in one way or another.  This is usually because we'll be working on projects that are using at least one NuGet package, and NuGet packages are mostly open source.  Christos shares the start of his own journey into becoming an OSS contributor which started with him messing up a NuGet Package restore on his first day at a new contract.  This lead to him documenting the fix he applied which was eventually seen by the NuGet maintainers and he was offered to write some documentation for them.  Christos talks about major organisations using open source software including Apple, Google, Microsoft as well as the US Department of Defence and the City of Munich to name but a few.  Getting involved in open source software yourself is a great way to give back to the community that we’re all invariably benefiting from.  It’s a great way to improve your own code and to improve your network of peers and colleagues.  It’s also great for your CV/Resume.  In the US, its almost mandatory that you have code on a GitHub profile.  Even in the UK and Europe, having such a profile is not mandatory, but is a very big plus to your application when you’re looking for a new job.  You can also get free software tools to help you with your coding.  Companies such as JetBrains, RedGate, and many, many others will frequently offer many of their software products for free or a heavily discounted price for open source projects.   Finally, Christos shares a few websites that you can use to get started in contributing to open source, such as up-for-grabs.net, www.firsttimersonly.com and the twitter account @yourfirstPR.

The final grok talk is Rik Hepworth’s Lability.  Lability is a Powershell module, available via Github, that uses Azure’s DSC (Desired State Configuration) feature to facilitate the automated provisioning of complete development and testing environments using Windows Hyper-V.  The tool extends the Powershell DSC commands to add metadata that can be understood by the Lability tool to configure not only the virtual machines themselves (i.e. the host machine, networking etc.) but also the software that is deployed and configured on the virtual machines.  Lability can be used to automate such provisioning not only on Azure itself, but also on a local development machine.

IMG_20170610_141738After the grok talks were over, I had a little more time available in the lunch break to grab another quick drink before once again examining the agenda to see where to go for the first session of the afternoon, and penultimate session of the day.  This one was Ian Russell’s Strategic Domain-Driven Design.

Ian starts his talk by mentioning the “bible” for Domain Driven Design and that’s the book by Eric Evans of the same name.  Ian asks the audience who has read the book to which quite a few people raise their hands.  He then asks who has read past Chapter 14, to which a lot of people put their hands down.  Ian states that, if you’ve never read the book, the best way to get the biggest value from the book is to read the last 1/3rd of the book first and only then return to read the first 2/3rds!

So what is Domain-Driven Design?  It’s an abstraction of reality, attempting to model the real-world we see and experience before us in a given business domain.  Domain-Driven Design (DDD) tries to break down the domain into what is the “core” business domain – these are the business functions that are the very reason for a business’s being – and other domains that exist that are there to support the “core” domain.  One example could be a business that sells fidget spinners.  The actual domain logic involved with selling the spinners to customers would be the core domain, however, the same company may need to provide a dispute resolution service for unhappy customers.  The dispute resolution service, whilst required by the overall business, would not be the core domain but would be a supporting or generic domain.

DDD has a ubiquitous language.  This is the language of the business domain and not the technical domain.  Great care should be taken to not use technical terms when discussing domains with business stake holders, and the terms within the language should be ubiquitous across all domains and across the entire business.  Having this ubiquitous language reduces the chance of ambiguity and ensures that everyone can relate to specific component parts of the domain using the same terminology.  DDD has sub-domains, too.  These are the core domain – the main business functionality, the supporting domains – which exist solely to support the core, and generic domains - such as the dispute resolution domain, which both supports the core domain but is also generic and could apply to other businesses too.  DDD has bounded contexts.  These are like sub-domains but don’t necessarily map directly to sub-domains.  They are explicitly boundaries from other areas of the business.  Primarily, bounded contexts can be developed in software independently from each other.  They could be written by different development teams and could even use entirely different architectures and technologies in their construction.

Ian talks about driving out the core concepts by creating a “shared kernel”.  These are the concepts that exist in the overlaps between bounded contexts.  These concepts don’t have to be shared between the bounded contexts and they may be different – the concept of an “account” might mean different things within the finance bounded context to the concept of an “account” within the shipping bounded context, for example.  Ian talks about the concept of an “anti-corruption layer” as part of each bounded context.  This allows bounded contexts to communicate with items from the shared kernel but where those concepts actually do differ between contexts, the anti-corruption layer will prevent corruption from incorrect implementations of concepts being passed to it.  Ian mentions domain events next.  He says that these are something that are not within Eric Evans’ book but are often documented in other DDD literature.  Domain events are essentially just ”things that occur” within the domain. For example, a new customer registered on the company’s website is an domain event.  Events can be created by users, by the passage of time, by external system, or even by other domain events.

All of this is Strategic domain-driven design.  It’s the modelling and understanding of the business domain without ever letting any technological considerations interfere with the modelling and understanding.  It’s simply good architectural practice and the understanding of how different parts of the business domain interact with, and communicate with other parts of the business domain.

Ian suggests that there’s three main ways in which to achieve strategic domain-driven design.  These are Behavioural Driven Design (BDD), Prototyping and Event Storming.  BDD involves writing acceptance tests for our software application using a domain-specific language within our application code that allows the tests to use the ubiquitous language of the domain, rather than the explicitly technical language of the software solution.  BDD facilitates engagement with business stakeholders in the development of the acceptance tests that form part of the software solution.  One common way to do this is to run three amigos sessions which allow developers, QA/testers and domain experts to write the BDD tests, usually in the standard Given-When-Then style.   Prototyping consists of producing images and “wireframes” that give an impression of how a completed software application could look and feel.  Prototypes can be low-fidelity, just simple static images and mock-ups, but it’s better if you can produce high-fidelity prototypes, which allows varying levels of interaction with the prototype. Tools such as Balsamiq and InVision amongst others can help with the production of high-fidelity prototypes.  Event storming is a particular format of meeting or workshop that has developers and domain experts collaborating on the production of a large paper artefact that contains numerous sticky notes of varying colours.  The sticky notes’ colour represent various artifacts within the domain such as events, commands, users, external systems and others.  Sticky notes are added, amended, removed and moved around the paper on the wall by all meeting attendees.  The resulting sticky notes tends to naturally cluster into the various bounded contexts of the domain, allowing the overall domain design to emerge.  If you run your own Event Storming session, Ian suggests to start by trying to drive out the domain events first, and for each event, attempt to first work backwards to find the cause or causes of that event, then work forwards, investigating what this event causes - perhaps further events or the requirement for user intervention etc.

IMG_20170610_145159Ian rounds off his talk by sharing some of his own DDD best practices.  We should strive for creative collaboration between developers and domain experts at all stages of the project, and a fostering of an environment that allows exploration and experimentation in order to find the best model for the domain, which may not be the first model that is found.  Ian states that determining the explicit context boundaries are far more important than finding a perfect model, and that the focus should always be primarily on the core domain, as this is the area that will bring the greatest value to the business.

After Ian’s talk was over, it was time for another coffee break, the last of the day.  I grabbed a coffee and once again checked my agenda sheet to see where to head for the final session of the day.   This last session was to be Stuart Lang’s Async in C#, the Good, the Bad and the Ugly.

IMG_20170610_153023Stuart starts his session by asking the audience to wonder why he’s doing a session on Async in C#' today.  It’s 2017 and Async/Await has been around for over 6 years!  Simply put, Stuart says that, whilst Async/Await code is fairly ubiquitous these days, there are still many mistakes made with some implementations and the finer points of asynchronous code are not as well understood.  We learn how Async is really an abstraction, much like an ORM tool.  If you use it in a naïve way, it’s easy to get things wrong.

Stuart mentions Async’s good parts.  We can essentially perform non-blocking waiting for background I/O operations, thus allowing our user interfaces to remain responsive.  But then comes the bad parts.  It’s not always entirely clear what is asynchronous code.  If we call an Async method in someone else’s library, there’s no guarantee that what we’re really executing is asynchronous code, even if the method returns a Task<T>.  Async can sometimes leads to the need to duplicate code, for example, when your code has to provide both asynchronous and synchronous versions of each method.  Also, Async can’t be used everywhere.  It can’t be used in a class constructor or inside a lock statement.

IMG_20170610_155205One of the worst side-effects of poor Async implementation is deadlocks.  Stuart states that, when it comes to Async, doing it wrong is worse than not doing it at all!  Stuart shows us a simple Async method that uses some Task.Delay methods to simulate background work.  We then look at what the Roslyn compiler actually translates the code to, which is a large amount of code that effectively implements a complete state machine around the actual method’s code. 

IMG_20170610_160038Stuart then talks about Synchronization Context.  This is an often misunderstood part of Async that allows awaited code, that can be resumed on a different thread, to synchronize with other threads.  For example, if awaited code needs to update some element on the user interface in a WinForms or WPF application, it will need to synchronize that change to the UI thread, it can’t be performed by a thread-pool thread that the awaited code would be running on.  Stuart talks about blocking on Async code, for example:

var result = MyAsyncMethod().Result;

We should try to never do this!   Doing so can cause code within the Async method to block on the synchronization context as awaited code within the Async method may be trying to restart on the same synchronization context that is already blocked by the "outer" code that is performing the .Result on the main Async method.

Stuart then shows us some sample code that runs as an ASP.NET page with the Async code being called from within the MVC controller.  He outputs the threads used by the Async code to the browser, and we then examine variations of the code to see how each awaited piece of code uses either the same or different threads.  One way of overcoming the blocking issue when using .Result at the end of an Async method call is to write code similar to:

var result = Task.Run(() => MyAsyncMethod().Result).Result;

It's messy code, but it works.  However, code like this should be heavily commented because if someone removes that outer Task.Run(); the code will start blocking again and will fail miserably.

Stuart then talks about the vs-threading library which provides a JoinableTaskFactory.   Using code such as:

jtf.Run(() => MyAsyncMethod())

ensures that awaited code resumes on the same thread that's blocked, so Stuart shows the output from his same ASP.NET demo when using the JoinableTaskFactory and all of the various awaited blocks of code can be seen to always run and resume on the same thread.

IMG_20170610_163019Finally, Stuart shares some of the best practices around deadlock prevention.  He asserts that the ultimate goal for prevention has to be an application that can provide Async code from the very “bottom” level of the code (i.e. that which is closest to I/O) all the way up to the “top” level, where it’s exposed to other client code or applications.

IMG_20170610_164756After Stuart’s talk is over, it’s time for the traditional prize/swag giving ceremony.  All of the attendees start to gather in the main conference lounge area and await the attendees from the other sessions which are finishing shortly afterwards.  Once all sessions have finished and the full set of attendees are gathered, the main organisers of the conference take time to thank the event sponsors.  There’s quite a few of them and, without them, there simply wouldn’t be a DDD conference.  For this, the sponsors get a rapturous round of applause from the attendees.

After the thanks, there’s a small prize giving ceremony with many of the prizes being given away by the sponsors themselves.  Like most times, I don’t win a prize – given that there was only around half a dozen prizes, I’m not alone!

It only remained for the organisers to announce the next conference in the DDD calendar, which although doesn’t have a specific date at the moment, will take place in October 2017.  This one is DDD North.  There’s also a surprise announcement of a DDD in Dublin to be held in November 2017 which will be a “double capacity” event, catering for around 600 – 800 conference delegates.  Now there’s something to look forward to!

So, another wonderful DDD conference was wrapped up, and there’s not long to wait until it’s time to do it all over again!

UPDATE (20th June 2017):

As last year, Kevin O'Shaughnessy was also in attendance at DDD12 and he has blogged his own review and write-up of the various sessions that he attended.  Many of the sessions attended by Kevin were different than the sessions that I attended, so check out his blog for a fuller picture of the day's many great sessions.

 

DDD South West 2017 In Review

IMG_20170506_090757This past Saturday 6th May 2017, the seventh iteration of the DDD South West conference was held in Bristol, at the Redcliffe Sixth Form Centre.

This was my second DDD South West conference that I’d attended, having attended the prior DDD South West conference in 2015 (they’d had a year off in 2016), so returning the 2017, the DDD South West team had put on another fine DDD conference.

Since the conference is in Bristol, I’d stayed over the evening before in Gloucester in order to break up the long journey down to the south west.  After an early breakfast on the Saturday morning, I made the relatively short drive from Gloucester to Bristol in order to attend the conference.

IMG_20170506_085557Arriving at the Redcliffe centre, we were signed in and made our way to the main hall where we were treated to teas, coffees and danish pastries.  I’d remembered these from the previous DDDSW conference, and they were excellent and very much enjoyed by the arriving conference attendees.

After a short while in which I had another cup of coffee and the remaining attendees arrived at the conference, we had the brief introduction from the organisers, running through the list of sponsors who help to fund the event and some other house-keeping information.  Once this was done it was time for us to head off to the relevant room to attend our first session.  DDDSW 7 had 4 separate tracks, however, for 2 of the event slots they also had a 5th track which was the Sponsor track – this is where one of the technical people from their primary sponsor gives a talk on some of the tech that they’re using and these are often a great way to gain insight into how other people and companies do things.

I heading off down the narrow corridor to my first session.  This was Ian Cooper’s 12-Factor Apps In .NET.

IMG_20170506_092930Ian starts off my introducing the 12 factors, these are 12 common best practices for building web based applications, especially in the modern cloud-first world.  There’s a website for the 12 factors that details them.  These 12 factors and the website itself were first assembled by the team behind the Heroku cloud application platform as the set of best practices that most of the most successful applications running on their platform had.

Ian states how the 12 factors cover 3 broad categories: Design, Build & Release and Management.  We first look at getting setup with a new project when you’re a new developer on a team.  Setup automation should be declarative.  It should ideally take no more than 10 minutes for a new developer to go from a clean development machine to a working build of the project.  This is frequently not the case for most teams, but we should work to reduce the friction and the time it takes to get developers up and running with the codebase.  Ian also state that a single application, although it may consist of many different individual projects and “moving parts” should remain together in a single source code repository.  Although you’re keeping multiple distinct parts of the application together, you can still easily build and deploy the individual parts independently based upon individual .csproj files.

Next, we talk about application servers.  We should try to avoid application servers if possible.  What is an application server?  Well, it can be many things, but the most obvious one in the world of .NET web applications is Internet Information Server itself.  This is an application server and we often require this for our code to run within it.  IIS has a lot of configuration of it’s own and so this means that our code is then tied to and dependent upon that specifically configured application server.  An alternative to this is have our code self-host, which is entirely possible with ASP.NET Web API and ASP.NET Core.  By self-hosting and running our endpoint of a specific port to differentiate it from other services that may also potentially run on the same server, we ensure our code is far more portable, and platform agnostic.

Ian then talks about backing services.  These are the concerns such as databases, file systems etc. that our application will inevitably need to talk to.  But we need to ensure that we treat them as the potentially ephemeral systems that they are, and therefore accept that such a system may not even have a fixed location.  Using a service discovery service, such as Consul.io, is a good way to remove our application’s dependence on a a required service being in a specific place.

Ian mentions the port and adapters architecture (aka hexagonal architecture) for organising our codebase itself.  He likes this architecture as it’s not only a very clean way to separate concerns, and keep the domain at the heart of the application model, it also works well in the context of a “12-factor” compliant application as the terminology (specifically around the use of “ports”) is similar and consistent.  We then look at performance of our overall application.  Client requests to our front-end website should be responded to in around 200-300 milliseconds.  If, as part of that request, there’s some long-running process that needs to be performed, we should offload that processing to some background task or external service, which can update the web site “out-of-band” when the processing is complete, allowing the front-end website to render the initial page very quickly.

This leads on to talking about ensuring our services can start up very quickly, ideally no more than a second or two, and should shutdown gracefully also.  If we have slow startup, it’s usually because of the need to built complex state, so like the web front-end itself, we should offload that state building to a background task.  If our service absolutely needs that state before it can process incoming requests, we can buffer or queue early request that the service receives immediately after startup until our background initialization is complete.  As an aside, Ian mentions supervisorD as a handy tool to keep our services and processes alive.  One great thing about keeping our startup fast and lean, and our shutdown graceful, we essentially have elastic scaling with that service!

Ian now starts to show us some demo code that demonstrates many of the 12 best practices within the “12-factors”.  He uses the ToDoBackend website as a repository of sample code that frequently follows the 12-factor best practices.  Ian specifically points out the C# ASP.NET Core implementation of ToDoBackend as it was contributed by one of his colleagues.  The entire ToDoBackend website is a great place where many backend frameworks from many different languages and platforms can be compared and contrasted.

Ian talks about the ToDoBackend ASP.NET Core implementation and how it is built to use his own libraries Brighter and Darker.  These are open-source libraries implementing the command dispatcher/processor pattern which allow building applications compliant with the CQRS principle, effectively allowing a more decomposed and decoupled application.  MediatR is a similar library that can be used for the same purposes.  The Brighter library wraps the Polly library which provides essential functionality within a highly decomposed application around improved resilience and transient fault handling, so such things as retries, the circuit breaker pattern, timeouts and other patterns are implemented allowing you to ensure your application, whilst decomposed, remains robust to transient faults and errors – such as services becoming unavailable.

Looking back at the source code, we should ensure that we explicitly isolate and declare our application’s dependencies.  This means not relying on pre-installed frameworks or other library code that already exists on a given machine.  For .NET applications, this primarily means depending on only locally installed NuGet packages and specifically avoiding referencing anything installed in the GAC.  Other dependencies, such as application configuration, and especially configuration that differs from environment to environment – i.e. database connection strings for development, QA and production databases, should be kept within the environment itself rather than existing within the source code.  We often do this with web.config transformations although it’s better if our web.config merely references external environment variables with those variables being defined and maintained elsewhere. Ian mentions HashiCorp’s Vault project and the Spring Boot projects as ways in which you can maintain environment variables and other application “secrets”.  An added bonus of using such a setup is that your application secrets are never added to your source code meaning that, if your code is open source and available on something like GitHub, you won’t be exposing sensitive information to the world!

Finally, we turn to application logging.  Ian states how we should treat all of our application’s logs as event streams.  This means we can view our logs as a stream of aggregated, time-ordered events from the application and should flow continuously so long as the application is running.  Logs from different environments should be kept as similar as possible. If we do both of these things we can store our logs in a system such as Splunk, AWS CloudWatch Logs, LogEntries or some other log storage engine/database.  From here, we can easily manipulate and visualize our application’s behaviour from this log data using something like the ELK stack.

After a few quick Q&A’s, Ian’s session was over and it was time to head back to the main hall where refreshments awaited us.  After a nice break with some more coffee and a few nice biscuits, it was quickly time to head back through the corridors to the rooms where the sessions were held for the next session.  For this next session, I decided to attend the sponsors track, which was Ed Courtenay’s “Growing Configurable Software With Expression<T>

IMG_20170506_105002Ed’s session was a code heavy session which used a single simple application based around filtering a set of geographic data against a number of filters.  The session looked at how configurability of the filtering as well as performance could be improved with each successive iteration refactoring the application to using an ever more “expressive” implementation.  Ed starts by asking what we define as configurable software.  We all agree that software that can change it’s behaviour at runtime is a good definition, and Ed says that we can also think of configurable software as a way to build software from individual building blocks too.  We’re then introduced to the data that we’ll be using within the application, which is the geographic data that’s freely available from geonames.org.

From here, we dive straight into some code.  Ed shows us the IFilter interface that exposes a single method which provide a filter function:

using System;

namespace ExpressionDemo.Common
{
    public interface IFilter
    {
        Func<IGeoDataLocation, bool> GetFilterFunction();
    }
}

The implementation of the IFilter interface is fairly simple and combines a number of actual filter functions that filter the geographic data by various properties, country, classification, minimum population etc.

public Func<IGeoDataLocation, bool> GetFilterFunction()
{
    return location => FilterByCountryCode(location) && FilterByClassification(location)
    && FilterByPopulation(location) && FilterByModificationDate(location);
}

Each of the actual filtering functions is initially implemented as a simple loop, iterating over the set of allowed data (i.e. the allowed country codes) and testing the currently processed geographic data row against the allowed values to determine if the data should be kept (returned true from the filter function) or discarded (returning false):

private bool FilterByCountryCode(IGeoDataLocation location)
{
	if (!_configuration.CountryCodes.Any())
		return true;
	
	foreach (string countryCode in _configuration.CountryCodes) {
		if (location.CountryCode.Equals(countryCode, StringComparison.InvariantCultureIgnoreCase))
			return true;
	}
	return false;
}

Ed shows us his application with all of his filters implemented in such a way and we see the application running over a reduced geographic data set of approximately 10000 records.  We process all of the data in around 280ms, however Ed tells us that it's really a very naive implementation and with only a small improvement in the filter implementations, we can improve this.  From here, we look at the same filters, this time implemented as a Func<T, bool>:

private Func<IGeoDataLocation, bool> CountryCodesExpression()
{
	IEnumerable<string> countryCodes = _configuration.CountryCodes;

	string[] enumerable = countryCodes as string[] ?? countryCodes.ToArray();

	if (enumerable.Any())
		return location => enumerable
			.Any(s => location.CountryCode.Equals(s, StringComparison.InvariantCultureIgnoreCase));

	return location => true;
}

We're doing the exact same logic, but instead of iterating over the allow country codes list, we're being more declarative and simply returning a Func<> which performs the selection/filter for us.  All of the other filters are re-implemented this way, and Ed re-runs his application.  This time, we process the exact same data in around 250ms.  We're starting to see an improvement.  But we can do more.  Instead of returning Func<>'s we can go even further and return an Expression.

We pause looking at the code to discuss Expressions for a moment.  These are really "code-as-data".  This means that we can decompose our algorithm that performs some specific type of filtering and can then express that algorithm as a data structure or tree.  This is a really powerful way or expressing our algorithm and allows our application simply functionally "apply" our input data to the expression rather than having to iterate or loop over lists as we had in the very first implementation.  What this means for our algorithms is increased flexibility in the construction of the algorithm, but also increase performance.  Ed shows us the implementation filter using an Expression:

private Expression<Func<IGeoDataLocation, bool>> CountryCodesExpression()
{
	var countryCodes = _configuration.CountryCodes;

	var enumerable = countryCodes as string[] ?? countryCodes.ToArray();
	if (enumerable.Any())
		return enumerable.Select(CountryCodeExpression).OrElse();

	return location => true;
}

private static Expression<Func<IGeoDataLocation, bool>> CountryCodeExpression(string code)
{
	return location => location.CountryCode.Equals(code, StringComparison.InvariantCultureIgnoreCase);
}

Here, we've simply taken the Func<> implementation and effectively "wrapped" the Func<> in an Expression.  So long as our Func<> is simply a single expression statement, rather than a multi-statement function wrapped in curly brackets, we can trivially turn the Func<> into an

Expression:

public class Foo
{
	public string Name
	{
		get { return this.GetType().Name; }
	}
}

// Func<> implementation
Func<Foo,string> foofunc = foo => foo.Name;
Console.WriteLine(foofunc(new Foo()));

// Same Func<> as an Expression
Expression<Func<Foo,string>> fooexpr = foo => foo.Name;
var func = fooexpr.Compile();
Console.WriteLine(func(new Foo()));

Once again, Ed refactors all of his filters to use Expression functions and we again run the application.  And once again, we see an improvement in performance, this time processing the data in around 240ms.  But, of course, we can do even better!

The next iteration of the application has us again using Expressions, however, instead of merely wrapping the previously defined Func<> in an Expression, this time we're going to write the Expressions from scratch.  At this point, the code does admittedly become less readable as a result of this, however, as an academic exercise in how to construct our filtering using ever more lower-level algorithm implementations, it's very interesting.  Here's the country filter as a hand-rolled Expression:

private Expression CountryCodeExpression(ParameterExpression location, string code)
{
	return StringEqualsExpression(Expression.Property(location, "CountryCode"), Expression.Constant(code));
}

private Expression StringEqualsExpression(Expression expr1, Expression expr2)
{
	return Expression.Call(typeof (string).GetMethod("Equals",
		new[] { typeof (string), typeof (string), typeof (StringComparison) }), expr1, expr2,
		Expression.Constant(StringComparison.InvariantCultureIgnoreCase));
}

Here, we're calling in the Expression API and manually building up the expression tree with our algorithm broken down into operators and operands.  It's far less readable code, and it's highly unlikely you'd actually implement your algorithm in this way, however if you're optimizing for performance, you just might do this as once again we run the application and observe the results.  It is indeed slightly faster than the previous Expression function implementation, taking around 235ms.  Not a huge improvement over the previously implementation, but an improvement nonetheless.

Finally, Ed shows us the final implementation of the filters.  This is the Filter Implementation Bridge.  The code for this is quite complex and hairy so I've not reproduced it here.  It is, however, available in Ed's Github repository which contains the entire code base for the session.

The Filter Implementation Bridge involves building up our filters as Expressions once again, but this time we go one step further and write the resulting code out to a pre-compiled assembly.  This is a separate DLL file, written to disk, which contains a library of all of our filter implementations.  Our application code is then refactored to load the filter functions from the external DLL rather than expecting to find them within the same project.  Because the assembly is pre-compiled and JIT'ed, when we invoke the assembly's functions, we should see another performance improvement.  Sure enough, Ed runs the application after making the necessary changes to implement this and we do indeed see an improvement in performance.  This time processing the data set in around 210ms.

Ed says that although we've looked at performance and the improvements in performance with each successive refactor, the same refactorings have also introduced flexibility in how our filtering is implemented and composed.  By using Expressions, we can easily change the implementation at run-time.  If we have these expressions exported into external libraries/DLL's, we could fairly trivially decide to load a different DLL containing a different function implementation.  Ed uses the example of needing to calculate VAT and the differing ways in which that calculation might need to be implemented depending upon country.  By adhering to a common interface and implementing the algorithm as a generic Expression, we can gain both flexibility and performance in our application code.

After Ed’s session it time for another refreshment break.  We heading back through the corridors to the main hall for more teas, coffees and biscuits.  Again, with only enough time for a quick cup of coffee, it was time again to trek back to the session rooms for the 3rd and final session before lunch.  This one was Sandeep Singh’s “JavaScript Services: Building Single-Page Applications With ASP.NET Core”.

IMG_20170506_120854Sandeep starts his talk with an overview of where Single Page Applications (SPA’s) are currently.  There’s been an explosion of frameworks in the last 5+ years that SPA’s have been around, so there’s an overwhelming choice of JavaScript libraries to choose from for both the main framework (i.e. Angular, Aurelia, React, VueJS etc.) and for supporting JavaScript libraries and build tools.  Due to this, it can be difficult to get an initial project setup and having a good cohesion between the back-end server side code and the front-end client-side code can be challenging.  JavaScriptServices is a part of the ASP.NET Core project and aims to simplify and harmonize development for ASP.NET Core developers building SPA’s.  The project was started by Steve Sanderson, who was the initial creator of the KnockoutJS framework.

IMG_20170506_121721Sandeep talks about some of the difficulties with current SPA applications.  There’s often the slow initial site load whilst a lot of JavaScript is sent to the client browser for processing and eventual display of content, and due to the nature of the application being a single page and using hash-based routing for each page’s URL, and each page being loaded because of running some JavaScript, SEO (Search Engine Optimization) can be difficult.  JavaScriptServices attempts to improve this by combining a set of SPA templates along with some SPA Services, which contains WebPack middleware, server-side pre-rendering of client-side code as well as routing helpers and the ability to “hot swap” modules on the fly.

WebPack is the current major module bundler for JavaScript, allowing many disparate client side asset files, such as JavaScript, images, CSS/LeSS,Sass etc. to be pre-compiled/transpiled and combined in a bundle for efficient delivery to the client.  The WebPack middleware also contains tooling similar to dotnet watch, which is effectively a file system watcher and can rebuild, reload and re-render your web site at development time in response to real-time file changes.  The ability to “hot swap” entire sets of assets, which is effectively a whole part of your application, is a very handy development-time feature.  Sandeep quickly shows us a demo to explain the concept better.  We have a simple SPA page with a button, that when clicked will increment a counter by 1 and display that incrementing count on the page.  If we edit our Angular code which is called in response to the button click, we can change the “increment by 1” code to say “increment by 10” instead.  Normally, you’d have to reload your entire Angular application before this code change was available to the client browser and the built-in debugging tools there.  Now, however, using JavaScript services hot swappable functionality, we can edit the Angular code and see it in action immediately, even continuing the incrementing counter without it resetting back to it’s initial value of 0!

Sandeep continues by discussing another part of JavaScriptServices which is “Universal JavaScript”, also known as “Isomorphic JavaScript”.  This is JavaScript which is shared between the client and the server.  How does JavaScriptServices deal with JavaScript on the server-side?   It uses a .NET library called Microsoft.AspNetCore.NodeServices which is a managed code wrapper around NodeJS.  Using this functionality, the server can effectively run the JavaScript that is part of the client-side code, and send pre-rendered pages to the client.  This is a major factor in being able to improve the initial slow load time of traditional SPA applications.  Another amazing benefit of this functionality is the ability to run an SPA application in the browser even with JavaScript disabled!  Another part of the JavaScriptServices technology that enables this functionality is RoutingServices, which provides handling for static files, and a “fallback” route for when explicit routes aren’t declared.  This means that a request to something like http://myapp.com/orders/ would be rendered on the client-side via the client JavaScript framework code (i.e. Angular) after an AJAX request to retrieve the mark-up/data from the server, however, if client-side routing is unavailable to process such a route (in the case that JavaScript is disabled in the browser, for example) then the request is sent to the server so that the server may render the same page (perhaps requiring client-side JavaScript which is now processed on the server!) on the server before sending the result back to the client via the means of a standard HTTP(s) request/response cycle.

I must admit, I didn’t quite believe this, however, Sandeep quickly setup some more demo code and showed us the application working fine with a JavaScript enabled browser, whereby each page would indeed be rendered by the JavaScript of the client-side SPA.  He then disabled JavaScript in his browser and re-loaded the “home” page of the application and was able to demonstrate still being able to navigate around the pages of the demo application just as smoothly (due to the page’s having been rendered on the server and sent to the client without needing further JavaScript processing to render them).  This was truly mind blowing stuff.  Since the server-side NodeServices library wraps NodeJS on the server, we can write server-side C# code and call out to invoke functionality within any NodeJS package that might be installed.  This means that if have a NodeJS package that, for example, renders some HTML markup and converts it to a PDF, we can generate that PDF with the Node package from C# code, sending the resulting PDF file to the client browser via a standard ASP.NET mechanism.  Sandeep wraps up his talk by sharing some links to find out more about JavaScriptServices and he also provides us with a link to his GitHub repository for the demo code he’s used in his session.

IMG_20170506_130734After Sandeep’s talk, it was time to head back to the main conference area as it was time for lunch.  The food at DDDSW is usually something special, and this year was no exception.  Being in the South West of England, it’s obligatory to have some quintessentially South West food – Pasties!  A choice of Steak or Cheese and Onion pasty along with a packet of crisps, a chocolate bar and a piece of fruit and our lunches were quite a sizable portion of food.  After the attendees queued for the pasties and other treats, we all found a place to sit in the large communal area and ate our delicious lunch.

IMG_20170506_130906The lunch break at DDDSW was an hour and a half and usually has some lightning or grok talks that take place within one of the session rooms over the lunch break.  Due to the sixth form centre not being the largest of venues (some of the session rooms were very full during some sessions and could get quite hot) and the weather outside turning into being a rather nice day and keeping us all very warm indeed, I decided that it would be nice to take a walk around outside to grab some fresh air and also to help work off a bit of the ample lunch I’d just enjoyed.

I decided a brief walk around the grounds of the St. Mary Redcliffe church would be quite pleasant and so off I went, capturing a few photos along the way.  After walking around the grounds, I still had some time to kill so decided to pop down to the river front to wander along there.  After a few more minutes, I stumbled across a nice old fashioned pub and so decided that a cheeky half pint of ale would go down quite nice right about now!  I ordered my ale and sat down in the beer garden overlooking the river watching the world go by as I quaffed!  A lovely afternoon lunch break it was turning out to be.

IMG_20170506_134245-EFFECTSAfter a little while longer it was time to head back to the Redcliffe sixth form centre for the afternoon’s sessions.  I made my way back up the hill, past the church and back to the sixth form centre.  I grabbed a quick coffee in the main conference hall area before heading off down the corridor to the session room for the first afternoon session.  This one was Joel Hammond-Turner’s “Service Discovery With Consul.io for the .NET Developer

IMG_20170506_143139Joel’s session is about the Consul.io software used for service discovery.  Consul.io is a distributed application that allows applications running on a given machine, to query consul for registered services and find out where those services are located either on a local network or on the wider internet.  Consul is fully open source and cross-platform.  Joel tells us the Consul is a service registry – applications can call Consul’s API to register themselves as an available service at a known location – and it’s also service discovery – applications can run a local copy of Consul which will synchronize with other instances of Consul on other machines in the same network to ensure the registry is up-to-date and can query Consul for a given services’ location.  Consul also includes a built-in DNS Server, a Key/Value store and a distributed event engine.  Since Consul requires all of these features itself in order to operate, they are exposed to the users of Consul for their own use.  Joel mentions how his own company, Landmark, is using Consul as part of a large migration project aimed at migrating a 30 year old C++ legacy system to a new event-driven distributed system written in C# and using ASP.NET Core.

Joel talks about distributed systems and they’re often architected from a hardware or infrastructure perspective.  Often, due to each “layer” of the solution requiring multiple servers for failover and redundancy, you’ll need a load balancer to direct requests to this layer to a given server.  If you’ve got web, application and perhaps other layers, this often means load balancers at every layer.  This is not only potentially expensive – especially if your load balancers are separate physical hardware – it also complicates your infrastructure.  Using Consul can help to replace the majority of these load balancers since Consul itself acts as a way to discover a given service and can, when querying Consul’s built-in DNS server, randomize the order in which multiple instances of a given service are reported to a client.  In this way, requests can be routed and balanced between multiple instances of the required service.  This means that load balancing is reduced to a single load balancer at the “border” or entry point of the entire network infrastructure.

Joel proceeds by showing us some demo code.  He starts by downloading Consul.  Consul runs from the command line.  It has a handy built-in developer mode where nothing is persisted to disk, allowing for quick and easy evaluation of the software.  Joel say how Consul command line API is very similar to that of the source control software, Git, where the consul command acts as an introducer to other commands allowing consul to perform some specific function (i.e. consul agent –dev starts the consul agent service running in developer mode).

The great thing with Consul is that you’ll never need to know where on your network the consul service is itself.  It’s always running on localhost!  The idea is that you have Consul running on every machine on localhost.  This way, any application can always access Consul by looking on localhost on the specific Consul port (8300 by default).  Since Consul is distributed, every copy of Consul will synchronize with all others on the network ensuring that the data is shared and registered services are known by each and every running instance of Consul.  The built in DNS Server will provide your consuming app with the ability to get the dynamic IP of the service my using a DNS name of something like [myservicename].services.consul.  If you've got multiple servers for that services (i.e. 3 different IP's that can provide the service) registered, the built-in DNS service will automagically randomize the order in which those IP addresses are returned to DNS queries - automatic load-balancing for the service!

Consuming applications, perhaps written in C#, can use some simple code to both register as a service with Consul and also to query Consul in order to discover services:

public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory, IApplicationLifetime lifetime)
{
	loggerFactory.AddConsole(Configuration.GetSection("Logging"));
	loggerFactory.AddDebug();
	app.UseMvc();
	var serverAddressFeature = (IServerAddressesFeature)app.ServerFeatures.FirstOrDefault(f => f.Key ==typeof(IServerAddressesFeature)).Value;
	var serverAddress = new Uri(serverAddressFeature.Addresses.First());
	// Register service with consul
	var registration = new AgentServiceRegistration()
	{
		ID = $"webapi-{serverAddress.Port}",
		Name = "webapi",
		Address = $"{serverAddress.Scheme}://{serverAddress.Host}",
		Port = serverAddress.Port,
		Tags = new[] { "Flibble", "Wotsit", "Aardvark" },
		Checks = new AgentServiceCheck[] {new AgentCheckRegistration()
		{
			HTTP = $"{serverAddress.Scheme}://{serverAddress.Host}:{serverAddress.Port}/api/health/status",
			Notes = "Checks /health/status on localhost",
			Timeout = TimeSpan.FromSeconds(3),
			Interval = TimeSpan.FromSeconds(10)
		}}
	};
	var consulClient = app.ApplicationServices.GetRequiredService<IConsulClient>();
	consulClient.Agent.ServiceDeregister(registration.ID).Wait();
	consulClient.Agent.ServiceRegister(registration).Wait();
	lifetime.ApplicationStopping.Register(() =>
	{
		consulClient.Agent.ServiceDeregister(registration.ID).Wait();
	});
}

Consul’s services can be queried based upon an exact service name or can be queries based upon a tag (registered services can assign multiple arbitrary tags upon registration to themselves to aid discovery).

As well as service registration and discovery, Consul also provides service monitoring.  This means that Consul itself will monitor the health of your service so that should one of the instances of a given service become unhealthy or unavailable, Consul will prevent that service’s IP Address from being served up to consuming clients when queried.

Joel now shared with us some tips for using Consul from within a .NET application.  He says to be careful of finding Consul registered services with .NET’s built-in DNS resolver.  The reason for this is that .NET’s DNS Resolver is very heavily cached, and it may serve up stale DNS data that may not be up to date with the data inside Consul’s own DNS service.  Another thing to be aware of is that Consul’s Key/Value store will always store values as byte arrays.  This can sometimes be slightly awkward if we’re storing mostly strings inside, however, it’s trivial to write a wrapper to always convert from a byte array when querying the Key/Value store.  Finally, Joel tells us about a more advanced feature of Consul which is the “watches”.  Theses are effectively events that Consul will fire when Consul’s own data changes.  A good use for this would be to have some code that runs in response to one of these events that can re-write the border load balancer rules to provide you with a fully automatic means of keeping your network infrastructure up-to-date and discoverable.

In wrapping up, Joel shares a few links to his GitHub repositories for both his demo code used during his session and the slides.

IMG_20170506_154655After Joel’s talk it was time for another refreshment break.  This one being the only one of the afternoon was accompanied by another DDD South West tradition – cream teas!  These went down a storm with the conference attendees and there was so many of them that many people were able to have seconds.  There was also some additional fruit, crisps and chocolate bars left over from lunch time, which made this particular break quite gourmand.

After a few cups of coffee which were required to wash down the excellent cream teas and other snack treats, it was time to head back down the corridor to the session rooms for the final session of the day.  This one was Naeem Sarfraz’s “Layers, Abstractions & Spaghetti Code: Revisiting the Onion Architecture”.

Naeem’s talk was a retrospective of the last 10+ years of .NET software development.  Not only a retrospective of his own career, but he posits that it’s quite probably a retrospective of the last 10 years of many people’s careers.  Naeem starts by talking about standards.  He asks us to cast our minds back to when we started new jobs earlier in our career and that perhaps one of the first things that we had to do in starting in the job was to read page after page of “standards” documents for the new team that we’d joined.  Coding standards, architecture standards etc. How times have changed nowadays where the focus is less on onerous documentation but more of expressive code that is easier to understand as well as such practices as pair programming allowing new developers to get up to speed with a new codebase quickly without the need to read large volumes of documentation.

IMG_20170506_160127Naeem then moves on to show us some of the code that he himself wrote when he started his career around 10 years ago.  This is typical of a lot of code that many of us write when we first started developing, and has methods which are called in response to some UI event that has business logic, database access and web code (redirections etc.) all combined in the same method!

Naeem says he quickly realised that this code was not great and that he needed to start separating concerns.  At this time, it seemed that N-Tier became the new buzzword and so newer code written was compliant with an N-Tier architecture.  A UI layer that is decoupled from all other layers and only talks to the Business layer which in turn only talks to the Data layer.  However, this architecture was eventually found to be lacking too.

Naeem stops and asks us about how we decompose our applications.  He says that most of the time, we end up with technical designs taking priority over everything else.  We also often find that we can start to model our software by starting at the persistence (database) layer first, this frequently ends up bleeding into the other layers and affects the design of them.  This is great for us as techies, but it’s not the best approach for the business.  What we need to do is start to model our software on the business domain and leave technical designs for a later part of the modelling process.

Naeem shows us some more code from his past.  This one is slightly more separated, but still has user interface elements specifically designed so that rows of values from a grid of columns are designed in such a way that they can be directly mapped to a DataTable type, allowing easy interaction with the database.  This is another example of a data-centric approach to the entire application design.

IMG_20170506_162337We then move on to look at another way of designing our application architecture.  Naeem tells us how he discovered the 4+1 approach, which has us examining the software we’re building by looking at it from different perspectives.  This helps to provide a better, more balanced view of what we’re seeking to achieve with it’s development.

The next logical step along the architecture path is that of Domain-Driven Design.  This takes the approach of placing the core business domain, described with a ubiquitous language – which is always in the language of the business domain, not the language of any technical implementation – at the very heart of the entire software model.  One popular way of using domain-driven design (DDD) is to adopt an architecture called “Onion architecture”.  This is also known as “Hexagonal architecture” or “Ports and adapters architecture”. 

We look at some more code, this time code that is compliant with DDD.  However, on closer inspection, it really isn’t.  He says how this code adopts all the right names for a DDD compliant implementation, however, it’s only the DDD vernacular that’s been adopted and the code still has a very database-centric, table-per-class type of approach.  The danger here is that we can appear to be following DDD patterns, but we’re really just doing things like implementing excessive interfaces on all repository classes or implementing generic repositories without really separating and abstracting our code from the design of the underlying database.

Finally, we look at some more code, this time written by Greg Young and adopting a CQRS based approach.  The demo code is called SimplestPossibleThing and is available at GitHub.  This code demonstrates a much cleaner and abstracted approach to modelling the code.  By using a CQRS approach, reads and writes of data are separated also, with the code implementing Commands - which are responsible for writing data and Queries – which are responsible for reading data.  Finally, Naeem points us to a talk given by Jimmy Bogard which talks about architecting application in “slices not layers”.  Each feature is developed in isolation from others and the feature development includes a “full stack” of development (domain/business layer, persistence/database layer and user interface layers) isolated to that feature.

IMG_20170506_170157After Naeem’s session was over, it was time for all the attendees to gather back in the main conference hall for the final wrap-up by the organisers and the prize draw.  After thanking the various people involved in making the conference what it is (sponsors, volunteers, organisers etc.) it was time for the prize draw.  There were some good prizes up for grabs, but alas, I wasn’t to be a winner on this occasion.

Once again, the DDDSW conference proved to be a huge success.  I’d had a wonderful day and now just had the long drive home to look forward to.  I didn’t mind, however, as it had been well worth the trip.  Here’s hoping the organisers do it all all over again next year!

SqlBits 2017 In Review

IMG_20170408_073513This past Saturday 8th April 2017, the annual SqlBits conference took place in the International Centre in Telford, Shropshire.  The event is a four day conference, with the first three days being a paid conference and the final day, the Saturday, always being a free community day.

I’d had to get up quite early for this event, setting my alarm for 5:30am to allow me to get my all-important cup of coffee before setting off for the approximately 90 mile journey to arrive at the conference for the opening time of 7:30am!

IMG_20170408_074118After arriving at the venue, which on this occasion had free car parking in the venue’s ample car park, I heading indoors to search for the registration booth.  It appeared that I’d arrived slightly early as I joined a small crowd that was gathered in one of the venue’s corridors just outside the entrance to a large hall where the event itself was taking place.  After a short wait, we were informed that we could enter the hall and proceed to the registration booths on the way in.

After registering and receiving my badge, conference programme and goodie bag, I heading to the catering area for another all-important cup of coffee and some breakfast.

IMG_20170408_074857This year, having recently become vegetarian, I knew the almost obligatory bacon butties would not be an option, so I quickly acquired a cup of coffee (the most important thing) and searched for the vegetarian breakfast option.  The choice was of either a fruit smoothie or a fruit bowl.  I selected the fruit bowl and made my way to one of the many tables dotted around the venue to consume my breakfast and take a peek through the goodie bag!

After a short while, an announcement came over a venue wide loudspeaker system to tell the attendees that the first sessions would be starting in 10 minutes.  This year, there were eight tracks of talks, each one presented in one of eight separate domes scattered throughout the venue.  I quickly finished my coffee, collected my things and made my way to Dome 3 for my first session, Ust Oldfield’s “A Deep Dive Into Data Lakes”.

IMG_20170408_081230Ust first introduced himself as a consultant working for Adatis who provide Business Intelligence (BI) consultancy services.  He says that, as a company, they were fully conversant with standard data warehouses, but needed to move forward in order to understand the relatively recent phenomenon of data lakes on the Microsoft Azure platform.

Ust asks the audience who already knows about Data Lakes, not many of us do, so he asks if we’re familiar with Hadoop – the Apache Foundations distributed computing framework – to which a few more people are familiar.  Ust explains that, under the hood, Azure’s Data Lake is a combination of distributed Azure BLOB storage (and so can work with any file type or size) with Hadoop overlaid on top to provide the distributed compute capability.

Azure Data Lake (hereafter referred to as ADL) uses things called “Extents” to contain its data, which are 250MB blocks that all storage is divided into.  Ust explains that ADL uses the “lambda” architecture and allows users to perform computations and queries using a language called U-SQL, which he says is like a cross between T-SQL used in SQL Server and C#.  All of the files that are added to a data lake can be set to automatically expire and be deleted, and so ADL contains functionality to allow some automated maintenance of the data held within it.

All data warehouses and lakes in ADL go through three stages which are the ingestion of raw data, and enrichment phase (where the data is verified, de-duplicated, cleaned and augmented with additional data from other sources) and finally is curated and presented for user consumption.  U-SQL scripts specify how the raw input data is transformed into output data and U-SQL includes both traditional ways to select and filter raw data, similar to how SQL Server would provide such functionality, but also includes other methods of transforming data more specific to distributed data sets, such as MapReduce

ADL provides a dashboard within the Azure website where ADL can be accessed and scripts created and run against the ADL data, however, there is also Microsoft Visual Studio tooling available so that many of the ADL functions can be accessed through Visual Studio.  One very interesting feature is that U-SQL scripts that would normally be confined to running within Azure can be downloaded to a local machine and debugged using Microsoft Visual Studio and it’s important to note that some functionality of ADL can only be accessed via the ADL Visual Studio tooling.

When performing queries against ADL data, U-SQL scripts are split and parallelized into multiple “vertexes”, which are the discrete units of computation within ADL.  Each vertex can be independent or dependent upon a previous vertex completing it’s computation.  You can manage vertexes and their dependencies within ADL, but this is a piece of functionality only available within the ADL Visual Studio tooling.

Ust shows us a demo of some U-SQL queries running over some sample ADL data.  He demonstrates how, despite ADL like almost all Azure features is a pay-as-you-go service, reworking your queries to be longer running queries that use fewer ADLAU’s (Azure Data Lake Analysis Units – the discrete single compute/billing units within ADL) can actually save you a lot of money.  This is due to how ADL charges are calculated, meaning that it’s far more expensive to use more ADLAU’s than it is to use fewer but for a longer time.

Ust shows us some tips around using “partition elimination”, which is a mechanism whereby data is pre-filtered prior to being distributed and computed upon by your standard U-SQL scripts.  Partition Elimination is best implemented with a deliberately defined file naming system (i.e. MyLogs_2017_05_01.txt, MyLogs_2017_05_16.txt etc.)  Using such a mechanism, you can filter the data files to be included within the U-SQL compute based upon partial filename matches and wildcards (i.e. you could process MyLogs for the month of April 2017 with a pre-filter such as MyLogs_2017_04_**.txt).  Ust tells us some more about the ADL data and the requirements for its storage.  He says that indexes are mandatory within ADL data, but that we can only have a single clustered index on each table.  Currently, ADL does not support non-clustered indexes however this is something that may come in the future.

Finally, Ust talks about data “skew”, which is the mechanism of how your dataset is distributed throughout the cluster for computing.  Data can be split for processing based upon a round-robin technique, which guarantees an even distribution of data across all nodes in the cluster, but does not guarantee that similar data will be kept together and processed by the same node.  This can cause a performance degradation of the compute function as potentially separate nodes must communicate much more to transfer related data when it’s on multiple nodes.  The other technique for data distribution is to split the data based upon a hash.  This guarantees that related data will be kept together on the same node – thus potentially improving the compute performance – but can now no longer guarantee that the data is evenly distributed across all the nodes in the cluster.  This means that some nodes will have significantly more work to perform than other nodes which can again impact overall compute performance.  Therefore, it’s essential that you understand the general “shape” of your raw data in order to maximise the compute performance – and thus the overall cost – of your ADL service.

After Ust’s session, we had a quick coffee break and I grabbed another cup of coffee.  There was just enough time to drink my coffee and take a very quick look around the main hall of the venue before I had to make my way back to Dome 3 for the next session.  This one was Hugo Kornelis’s “Normalization Beyond Third-Normal Form”.

IMG_20170408_093319Hugo starts his sessions by reminding us of some key concepts that we need to be aware of when performing any data modelling activity.  He talks about the “Universe Of Discourse” which is the view of reality as defined by the data/software model, it’s not necessarily the view of actual reality.  We then look at the purpose of database normalization.  We recall that normalization is the process of organising our data into columns and tables in such a way as to reduce redundancy and improve data integrity.  Hugo points out that normalization’s purpose is not to prevent incorrect data but to prevent impossible, inconsistent or business rule violating data.  We can’t stop the user from entering false data into the name column, but we can prevent them from providing us with a non-date value for their birthdate.  Hugo also reminds us that normalization is never performed at a database level, only at a table level.  It’s perfectly possible to have a database that, across it’s many tables, contains multiple forms of normalization.

Next, we look at what defines normalization.  Hugo tells us that it’s based upon Functional Dependency.  This is a constraint that dictates that for every value in Column A in a relationship, there is exactly none, or one value for Column B (i.e. A –> B).  Column A can actually be a composite of multiple actual columns (i.e. {A,B} –> C) and Hugo gives the conference-specific example of a SqlBits Dome number and a chair number which can define the exact name of the attendee sitting there.  It’s possible that the composite can exist on the other side of the relationship (i.e. A –> {B,C}) however this can be reduced to two constraints of A -> B and A –> C. 

Hugo reminds us of 3rd normal form.  This is the most “popular” normal form that many people take their databases to and then stop there.  3rd normal form (3NF) states that, in a given table, every non-key column is dependent upon the key, the whole key and nothing but the key (so help me Codd!).  We can use an algorithm called “Bernstein’s Algorithm for Synthesis of a 3rd Normal Form Schema” to help us create a database schema that is guaranteed to be in 3rd normal form, so long as all of our functional dependencies are known up front.  Hugo also mentions Boyce-Codd normal form, which is based upon 3rd normal form but extends the requirement that all columns in a table, including key columns, must be dependent upon the key.  When all columns in a table are dependent upon the key, there should usually be no duplicated data within that table’s row.

Hugo proceeds by detailing something called Elementary Key normal form.  This is perhaps a little known and used type of normal form, based upon 3rd normal form but where the constraint is defined as only non-elementary columns being dependent upon the key.  So what is an non-elementary column?  Well, it’s where functional dependencies such as {A,B} –> C does not have the reverse dependencies of either C –> A or C –> B.  This can also be expressed as where every full non-trivial functional dependency of the form A –> B, then either A is a key or B is (a part of) an elementary key.  Hugo explains that, in practice, Elementary Key normal form is almost identical to 3rd normal form.

From here, Hugo takes us into the more elaborate normal forms.  We start with 4th normal form. 4th normal form, unlike the lower normal forms, is less concerned with functional dependencies, but rather with multi-valued dependencies.  These are best explained with an example.  Hugo uses a table representing the availability of experts to discuss SQL problems on given days of the week:

Day Expert Subject
Monday Jim Design
Monday Jim Tuning
Tuesday Jim Debugging
Tuesday Fred Design

Looking at this data, we can infer the following fact: On Monday, you can ask Jim about Design.  From this fact, we can further infer two additional facts: On Monday, you can ask Jim questions, and Jim knows about Design.  In looking at the two facts that we’ve inferred, we can see that it is not possible to work backwards and infer the first original fact merely from the two subsequent facts.  This is a violation of 4th normal form.  In order to make this data compliant with 4NF, we must separate the information regarding days of the week and subjects into different tables, each table then becomes compliant with 4th normal form:

Expert Day
Jim Monday
Jim Tuesday
Fred Tuesday
 
Expert Subject
Jim Design
Jim Tuning
Jim Debugging
Fred Design

After 4NF we move on to look at 5th normal form.  5th normal form is based upon 4th normal form but extends the rules to dictate that there must be no “join dependencies” between the columns except based upon key.   A join dependency is effectively the ability to take a single table, split it into multiple tables and be able to recreate the original table by constructing a query that joins the split tables back into one.  In practice, a table being in 5NF effectively means that if a column has the same value in multiple rows and to remove the value from the table requires the removal of multiple columns then our table is not compliant with 5th normal form. 5NF is so closely related to 4NF that it’s very rare for a table compliant with 4NF to not also be compliant with 5NF.

Expert
Jim
Fred
 
Day
Monday
Tuesday
 
Subject
Design
Tuning
Debugging

Hugo briefly touches upon 6th normal form.  He starts by stating that 6th normal form is very hard to find in practice, being far more an academic curiosity.  6NF is based upon 5NF but further constrains the join dependencies to state that no join dependencies, even those implied by key, are allowed to exist within the table.  This effectively means that there can never be any NULL values within any columns of a 6NF table.  There would be no need for NULLs as we could simply remove the entire row.  The primary reason we don’t see tables and especially entire databases that conform to 6th normal form is that 6NF largely implies that our entire data schema is modelled using a very large number of tables with each table having only a key column and a data value column.  Today’s real-world database platforms are simply not optimised to operate with such a data schema and so data normalisation to this level is rarely, if ever, performed in the real world.

Hugo next talks about Optimal Normal Form.  This is based upon 6NF but prevents the “splitting” of tables if “elementary fact types” would be split. Elementary fact types are multiple columns that would have to remain together in a single table to ensure integrity of data.  Again, optimal normal form is very rarely found in the real-world.

IMG_20170408_105616Finally, Hugo talks about a entirely different type of data normalization, and this is known as Domain/Key Normal Form.  Domain Key Normal Form (DKNF) is not based upon functional dependencies like all other forms of normalization, but is instead based solely upon domain constrains and key constraints.  Domains in this context refers to the range of values that are allowed within the given column.  These are not the values allowed by the data type of the underlying column, but rather the values allowed by the business logic of the domain.  An example that violates DKNF could be shown as follows with a school report card and grade for students whereby the score is a value between 0 and 100, and the status of FAIL or PASS (FAIL for scores below 50):

Student Score Status
James 78 PASS
William 63 PASS
David 48 FAIL
Timothy 57 PASS

From the table above, we can see that it would be possible to enter a value of FAIL in the Status column for the row containing James’ name.  The database constraints would not prevent us from doing this, however, we would be violating our business rules that state that Scores greater than or equal to 50 are a PASS status.  In order to correct this data so as not to violate DKNF, we would change it as follows by splitting into two tables:

Student Score
James 78
William 63
David 48
Timothy 57
 
Status Minimum Score Maximum Score
FAIL 0 49
PASS 50 100

By splitting the data, we ensure that business logic is captured and no table data can violate the domain rules.  An interesting side-effect of complying with DKNF is that you’ll also comply with 5NF too.  The relevance of DKNF, despite being a very different form or normalization that other forms, is that data integrity against business rules can now be expressed and enforced from the database design alone, something that has traditionally been enforced only within application code that is responsible for reading and writing data to and from the database.  It should be noted, however, that compliance with DKNF isn’t always possible and depends very much on the business domain.

After this, Hugo’s session was complete and it was time for another short coffee break.  I quickly grabbed another coffee from one of the numerous catering stands dotted throughout the venue and checked my programme for the Dome that I would need to head towards for the next session.  That was Dome 4 and Conor Cunningham’s “SQL Server vNext and SQL Azure – Upcoming Features”.

IMG_20170408_111016

Conor’s talk was originally intended to be given by Lindsey Allen, however, a scheduling mix-up had resulting in Lindsay being unable to give the presentation.  Instead we were provided with some excellent content from Conor Cunningham who is the Principal Software Architect for Microsoft on the SQL Server Query Processor Team.  Conor is here to tell us all about the new features that will be coming in the upcoming versions of the on-premise SQL Server product as well as SQL Azure.

Firstly, Conor tells us that both the on-premise SQL Server product and the SQL Azure product share the exact same codebase.  SQL Azure has a monthly release cadence and so is always the first product to receive new SQL Server functionality and have that functionality available to the public whilst on-premise SQL Server currently has a release cadence of approximately 1 year and so receives the same features from SQL Azure in each of it’s subsequent public releases.

A big feature coming in SQL Server vNext (the official marketing title is not yet decided) is the ability to run it on Linux.  This isn’t just a version of SQL Server that’s specially built for Linux, but the exact same binaries that run the Windows version of SQL Server.  Microsoft has built an abstraction layer, known as a “PAL” (or Platform Abstraction Layer) which is used to align all operating system or platform specific code in one place and allow the rest of the codebase to stay operating system agnostic.  Moreover, SQL Server when run on Linux will effectively be SQL Server running inside a Docker container.  Previously, SQL Server has relied on Windows Server Failover Clustering (WSFC) to provide clustering capability to SQL Server, however, as part of the work required to allow SQL Server to run on Linux, this is being abstracted away to allow 3rd party cluster management software to be used.  Initially, SQL Server on Linux will support an open-source product called Pacemaker, however more cluster management product support will follow in time.

IMG_20170408_113255There have been big improvements within the In-Memory Tables features of SQL Server.  These improvements mean that In-Memory tables, which were previously constrained in how they operated compared to normal disk-based tables, will now operate much closer to how standard tables operate, supporting many more features including JSON support, CROSS Apply, CASE statements amongst others.

IMG_20170408_113256Another major set of improvement work within SQL Server vNext are improvements in the area of ColumnStore indexes.  ColumnStore indexes are perhaps one of the best new features to be added to SQL Server in recent years and allow potentially significant performance enhancements for queries on tables using such an index.  ColumnStore index now have support for BLOB column data types and the index itself is now compressed, reducing space and storage requirements as well as improving performance.  Further, rebuilds of ColumnStore indexes will now no longer cause significant blocking of the tables upon which the index is being rebuilt, meaning that users of the database are no longer severely negatively impacted by such rebuilds.

IMG_20170408_114604SQL Server vNext also includes advancements in “Adaptive Query Processing”.  This is a major new area of functionality in SQL Server and will receive even more improvements in future versions of SQL Server beyond vNext.  Adaptive Query Processing is a series of algorithms that work within SQL Server’s Query Processor in order to improve query performance by analysing query plans, SQL Server data and other meta-data.  It aims to improve query performance without introducing any query degradation from incorrect query plan optimisations.  It does this dynamically adjusting joins (i.e. switching from hash joins to merge or loop joins, or vice-versa), adjusting memory grants in order to ensure efficient allocation of memory without under or over allocating and interleaving compilation and execution for the most complex queries in order to maximise their performance.

IMG_20170408_120426Another major new feature of SQL Server vNext will be support for Graph Databases.  Graph databases are highly specialised databases that have their data in graph structures, using nodes, edges and properties in which to store their data allowing for semantic querying of data.  Common applications of graph databases are for querying large graphs of data such as those found inside a social network.  Graph data and the ability to efficiently query it makes questions such as “How many friends of Person A are also friends of Person B?” and “Which friends of Person A are also friends of the friends of Person B?” very easy to answer, something a relational database would have difficulty in achieving in an efficient manner.  SQL Server vNext’s support for graph databases promises to offer full CRUD support for node and edge creation, query language extensions to allow querying of graph data as well as allowing queries to span both standard relational SQL Server data and graph data at the same time.

Conor continues his exploration of the new SQL Azure features by telling us about a new feature in SQL Azure that can automatically create indexes for table columns inferred from usage of the column within queries, the maximum database size has also improved, now supporting databases up to 4TB in size.  There’s also some long-awaited improvements to the syntax of the T-SQL query language itself.  There are new string concatenation and aggregation functions as well as a TRIM function (finally!).  New Japanese collation families have been added also and new bulk insert operations have been added to support specific new standards such as RFC 4180 CSV file formats.

IMG_20170408_123009After a brief Q&A at the end of Conor’s session it was time for another refreshment/toilet break.  I decided I was all coffee’d out by this point, having had around 4 or 5 cups of coffee so far, and it still only being just past 12pm.  There was one more session before the break for lunch and so after consulting my conference programme, I headed off to Dome 2 for the somewhat light-hearted session that was Denny Cherry’s “Things You Should Never Do In SQL Server”.

Denny introduced himself first and indicated that this session was to be a bit lighter than other sessions in the day, being a look at some of things that you should not do rather than the things you should.  As such, he reminded us that everything on his slides was wrong!

He started by talking about the enforcement of data integrity in SQL Server and tells us that we really shouldn’t use things like triggers, stored procedures or even application code to enforce data integrity.  SQL Server is a fully relational database and we can leverage what SQL Server is good at by designing our schemas to provide such integrity for us.  Denny talks about a book that he reviewed as a technical editor, which was so bad that he implored the publisher to completely scrap the book.  One of the pearls of wisdom in this book, he says, was the recommendation to use 32-bit editions of SQL Server for “local offices” reserving the 64-bit edition of SQL Server only for large corporations.  Don’t do this.  We all run on 64 bit operating systems today, so where available, we should always be running 64 bit application software, too.  He states that recommendations from third party software vendors should always be questioned, too.

Next up is migrating databases.  Something seen quite frequently is the “copy database wizard” to migrate databases from one server to another.  This is error prone and simply not as good an option as something like log shipping, which has been around for decades is a very robust and mature technique for performing migrations.  Then we look at the account under which SQL Server will run.  Whilst it’s true that very old versions of SQL Server (pre-2005 versions at least) required local administrator privileges in order to run, modern versions of SQL Server do not require such special privileges at all.  SQL Server does require some additional permissions above those usually found in a standard local user account, but not many more.  Always run with the minimum permissions you need.

Next we look at SQL Profiler.  It’s a great tool for debugging issues on a SQL Server instance, but it should never really be connected to your production database.  This can negatively impact performance of the database.  It’s far better to use it against either a local server or an offline backup or staging server.  Moreover, the very latest versions of SQL Server have functionality that SQL Profiler unfortunately doesn’t support.

Denny then moves on to look at the SQL queries we write.  He says that it’s really not worth the effort to ensure that SQL is written in a cross-platform manner (i.e. ANSI SQL).  Whilst it’ll work, of course, you’re really giving up a lot of functionality and performance improvements that have been built into the platform specific dialects of SQL used on each database platform.  Using SQL that is written specifically for the platform you’re targeting will always allow you to write code in the most performant manner.  Moreover, it’s incredibly rare to have to need your SQL written in a cross-platform way as it’s incredibly rare to actually want to ever migrate your databases to an entirely new database platform.

Denny then looks at the some anti-patterns with data itself.  He states that you should always use NULL where there’s an absence of data, and not values such as empty strings, minimum dates (i.e. 01/01/1990) and other “magic” values.  He also says that you should never blindly design your database schema to a specific level of normalization.  You should always consider the application that the data will support and the required performance of that data and domain design and then design your database schema accordingly.  Next we talk about transaction logs.  Denny says he’s seen a number of people simply deleting transaction logs in order to reclaim disk space.  This is a bad idea, and if you find you really need more disk space, you should simply buy more disk space rather than severely impact your ability to recover your database from crashes by deleting the transaction log.  On the subject of transaction logs, Denny reminds us not to use RAID 5 for our disk array that will store the transaction log.  RAID 5 is not optimized for write intensive operations – which the transaction log requires – and so the performance will suffer as a result.  Also, never ever use AUTO_SHRINK to automatically reclaim disk space.  Whilst this does reclaim disk space, the negatives of doing this far outweigh the positives.

Next we look at columns and schema.  Denny reminds us to always use the correct and most appropriate data type for our data, and always be aware of the kinds of data we’ll be working with.  For example, in the US, the zip code (equivalent to postcode in the UK) is entirely numeric – it’s a 5 digit number.  An integer column might seem appropriate here, but some states (Maine) have their zip code start with two zeroes and it must always be written in 5 digit format with these leading zeroes.  Further, zip codes / post codes are not always numeric once you move beyond the USA, so it’s highly likely you’ll need to support alphabetic characters in there too.  Also, don’t assume that certain values will never change and therefore use them as a primary key.  Some developers have previously used a US Social Security Number as a primary key thinking that it’ll never change, and whilst it very rarely does, it’s not guaranteed not to.

Some developers believe that views will improve performance, however, they really don’t.  And don’t ever be tempted to use nested views as they are considered evil and incredibly difficult to debug.  Don’t require RDP access to a SQL Server in order to run queries against it, it’s not only a security risk, but it’s simply not required.  In thinking about the permissions that you can grant to users and other objects in SQL Server, always ensure that you grant the minimum amount of permissions required.  Also, don’t ever revoke permissions from the built-in database roles, such as the public role.  Denny talks about a time that an over-zealous auditor at a client insisted that certain permissions be removed from the public database role (this was a permission allowing access to the underlying Windows Registry).  When revoked this caused users within the public role to be entirely unable to log in to the SQL Server due to SQL Server’s own requirement to access the Windows Registry when a user logs in!  And with that in mind, like third-party vendors, don’t ever listen to external auditors about what permissions you should or should not assign/revoke on your SQL Server.

IMG_20170408_133339After Denny’s session, it was time for lunch.  All of the attendees gathered in the main hall and headed towards one of the 4 main catering points throughout the venue.  As a vegetarian, there was a rather nice option which was Butternut squash & Sage tortellini with roasted Mediterranean vegetables.  This was followed by either a Strawberry bavarois or a warm chocolate brownie with toffee sauce.  Well, I couldn’t resist the chocolate brownie, so after collecting my meal along with some freshly squeezed orange juice to wash it all down, I found an empty spot at one of the many tables and ate my lunch.

After my lunch I decided to take a stroll around the grounds of the International Centre as it had become such a lovely sunny day outside.  The morning’s sessions had been great but intensive, so this gentle stroll allowed me to clear my head, get some fresh air, enjoy the sunshine and put myself back in the right frame of mind required for the final two sessions of the day.  I made my way back inside the venue as the lunch break drew to a close and once again consulted my programme to determine the correct dome for the next session.  This time it was Dome 6 for Richard Douglas’s “Understanding the Transaction Log”.  After taking my seat, the speaker announced that he wasn’t actually Richard at all, as unfortunately, Richard was suffering from food poisoning so this talk was to be given by one of Richard’s colleagues, John Martin.

IMG_20170408_142519John starts by saying that the transaction log is not just all about backups.  There are numerous and varied uses for the transaction log, and it’s integral to a well-running and performant SQL Server.  John talks about the three recovery models for the transaction log.  There’s “Simple”, “Full” and “Bulk” modes and John is keen to point out that even when using Simple mode, everything is still logged to the transaction log.  There may not be as much detail as could be found if using Full or Bulk modes, but everything is still there.  One of the golden rules is to only ever have one transaction log file for each database.  You can have numerous data files, however, transaction logs should be kept within a single file.  This improves performance and makes recovery and backups much easier.  John reminds us what the various modes do.  Simple mode is effectively auto-maintenance and auto-shrinking of our log – SQL Server will take care of all of this itself.  This may or may not be a good thing depending upon your use case.  In Full mode, SQL Server will not perform any of it’s own maintenance or shrinking at all.  You’re responsible for doing this.  This is very often the better choice as you will know your own use cases better than SQL Server does and so can schedule such maintenance for the most convenient and appropriate times.

IMG_20170408_144507John reminds us of the set of operations within SQL Sever that are “minimally logged”, even when operating in Full recovery mode.  Many common operations , such as “SELECT * INTO”, the CREATE / ALTER & DROP’ping of indexes etc. are all minimally logged in the transaction log, and this minimal logging impacts your ability to perform a complete “point-in-time” restore of your database in the event of a crash.

Select_InsertAfter this we look at the general process by which a standard SELECT or INSERT statement is processed by SQL Server, we can see how there’s a lot of moving parts to the entire process flow and how the transaction log is central to ensuring that SQL Server is able to provide the D (Durability) from the ACID properties that we require from our database engine.  John reminds us that it’s not until our data is fully persisted to the transaction log file that SQL Server considers the data persisted and committed to the database as it’s only after this step that the data could be recovered in the event of a server crash.

John moves on to talk about the internals of the transaction log and how SQL Server uses the space within the log file.  Transaction log files are split into multiple VLFs (Virtual Log File).  These VLF’s exist within the same single physical file on disk.  A VLF can be in either an active or inactive state, and at any given time, there’s always at least one active VLF.  SQL Server will manage the creation of new VLFs as the transaction log grows over time, but it’s possible to control some of how VLF’s are created yourself.  The creation of too many VLF’s within a file is thought to be a bad thing and there’s various discussions about how best to manage this.  Ordinarily, you will have either 4, 8 or 16 VLF’s inside the transaction log file and the VLF is sized appropriately based upon the “chunk” size (chunks are the size by which the transaction log physical file grows on disk).

We look at the makeup of a VLF and John tells us that within a VLF there’s many “log blocks”.  Log Blocks are between 512 bytes and 60KB in size, and a VLF will have as many log blocks as is required to fill the VLF space.  John states how it’s better to try to keep the log blocks at the largest possible size of 60KB as performance is better with fewer, but larger log blocks rather than with many smaller log blocks.  Inside the log block are the individual log records.  Each log record is identified by a unique Log Sequence Number (LSN).  The log sequence number is made up from the VLF sequence number, the log block number and the log record number.

John talks about how the transaction log grows over time.  First we look at log growth when using the “Simple” recovery mode.  SQL Server will periodically instigate a “checkpoint”.  This is the flushing of dirty pages of data within memory to disk and any VLF’s that have no open transactions within them are cleared down.  Note that this does not reclaim any disk space from the VLF, however.  If additional logs are added SQL Server moves through the VLF’s within the file looking for space to write the log records, primarily looking for currently inactive VLFs that have been previously cleared and reusing those. If we reach the end of the file, we “wrap around” to the beginning of the file searching for inactive VLF’s where we can write the transactions.  If no inactive VLFs are found, we must increase the size of the physical transaction log file.  This negatively impacts the server, which has to pause all activity whilst the physical file is extended on disk.  Then we look at log growth with “Full” or “Bulk” recovery mode, which is almost the same as log growth with the simple recovery mode, but instead of a checkpoint, we have a transaction file backup occurring instead which ensures we have much more transaction data available for a full recovery of the state of the server if required.

IMG_20170408_152331At this point, John talks about how we can make things go faster with SQL Server by improving transaction log performance.  We start firstly with a good I/O architecture – consider the balance of reads and writes of your data and select an appropriate RAID strategy that’s optimized for your own use case, and remember that using a bigger RAID cache is always better.  Within the architecture of your SQL Server itself, it’s best if you can determine the actual required size of your physical transaction log file at the beginning.  This is difficult to predict, but if you can get close, your server will perform better as a result.  We should be aware of such things as page splits, and particularly the “evil” variety as not all page splits are bad.  These bad splits can have serious negative performance impacts.  John also cautions against using “delayed durability” which is effectively asynchronous transaction writes.  These cause the server to consider data persisted even though it’s not yet fully written to disk.  Depending upon your application, delayed durability could be ok, but if your system must never lose a single transaction then don’t use it.  One time when it might be appropriate to temporarily enable delayed durability is during large scale purging of data from the database.  John tells us to keep an eye on our indexes.  Too many of those means that each one is a discrete data structure that must be updated and persisted to disk individually which can hurt performance.

Finally John tells us about monitoring the SQL Server transaction log file and we can use both external tools such as PERFMON for that as well as built-in SQL Server system stored procedures such as the sys.dm_io_virtual_file_stats stored procedure.  Regularly reviewing these monitors and logs can help identify bottlenecks within the server and highlight the areas where issues may arise.

IMG_20170408_153231After John’s session, it was time for the final afternoon coffee break, which this time was accompanied by a rather nice selection of cakes!  After a cup of coffee and a cake or two (the scones and carrot cake was particularly nice!) it time to consult the conference programme one last time to determine the dome for the final session of the day.

This time it was Dome 3 for Emanuele Zanchettin’s “Performance Tips For Faster SQL Queries”.

IMG_20170408_160333Emanuele starts his session by talking about debugging.  He reminds us that debugging database queries often starts at the application layer with developers digging through C# code, but he states that debugging sometimes also stops there too.  We need to be aware that we often need to debug down into SQL Server itself.  With that, Emanuele asks the attendees if they’d rather have a talk full of slides or a talk full of real-world demos.  The attendees unanimously vote for demos, so from here Emanuele opens up his SQL Server Management Studio tool and begins to show us some T-SQL code.

He first creates a demo database which includes at least one table of over 2 million rows.  He writes some simple SELECT queries that contain a few joined tables.  We run the queries and see that they execute in a very short space of time, however, in looking at the execution plan generated for our query, we can see that we can make the query perform even better.  So Emanuele’s first tip is to make sure you always check the execution plans generated for your queries as they can often indicate where an additional index on the source tables would greatly improve query performance.

IMG_20170408_073649Unfortunately, it was at this point that I had to leave Emanuele’s session in order to make an early start on my journey back home. I missed the end of Emanuele’s session as well as the final conference wrap-up and prize giving session, but I’d had a fantastic day at another incredibly well-run, well-organised and very informative SQLBits event.

As ever, I can’t wait until the next one!