DDD North 2016 In Review

IMG_20161001_083505

On Saturday, 1st October 2016 at the University of Leeds, the 6th annual DDD North event was held.  After a great event last year, at the University of Sunderland in the North East, this year’s event was held in Leeds as is now customary for the event to alternate between the two locations each year.

After arriving and collecting my badge, it was a short walk to the communal area for some tea and coffee to start the day.  Unfortunately, there were no bacon butties or Danish pastries this time around, but I’d had a hearty breakfast before setting off on the journey to Leeds anyway.

The first session of the day was Pete Smith’sThe Three Problems with Software Development”.   Pete starts by talking about Conway’s Game of Life and how this game is similar to how software development often works, producing complex behaviours from simple building blocks.  Pete says how his talk will examine some “heuristics” for software development, a sort of “series of steps” for software development best practice.

IMG_20161001_093811Firstly, we look at the three problems themselves.  Problem Number 1 is about dividing and breaking down software components.  Pete tells us that this isn’t just code or software components themselves, but can also relate to people and teams and how they are “broken down”.  Problem Number 2 is how to choose effective tools, processes and approaches to your software development and Problem Number 3 is effective communication.

Looking at problem number 1 in more detail, Pete talks about “reasons for change”.  He says that we should always endeavour to keep things together that need to change together.  He shows an example of two simple web pages of lists of teachers and of students.  The ASP.NET MVC view’s mark-up for both of these view is almost identical.  As developers we’d be very tempted to abstract this into a single MVC view and only alter, using variables, the parts that differ between teachers and students, however, Pete suggests that this is not a good approach.  Fundamentally, teachers and students are not the same thing, so even if the MVC views are almost identical and we have some amount of repetition, it’s good to keep them separate – for example, if we need to add specific abilities to only one of the types of teachers or students, having two separate views makes that much easier.

Nest we look at how we can best identify reasons for change.  We should look at what parts of an application get deployed together, we should also look at the domain and the terminology used – are two different domain entities referred to by the same name?  Or two different names for the same entity?  We should consider the “ripple effect” of change – what something changes, what else has to change?  Finally, the main thing to examine is logic vs intent.  Logic is the code and behaviour and can (and should) be refactored and reused, however, intent should never be reused or refactored (in the previous example, the teachers and students were “intents” as they represent two entirely different things within the domain).

In looking at Problem Number 2 is more details, Pete says that we should promote good change practices.  We should reduce coupling at all layer in the application and the entire software development process, but don’t over-abstract.  We need to have strong test coverage for this when done in the software itself.  Not necessarily 100% test coverage, but a good suite of robust tests.  Pete says that in large organisations we should try to align teams with the reasons for change, however, in smaller organisations, this isn’t something that you’d need to worry about to much as the team will be much smaller anyway.

Next, Pete makes the strong suggestion that MVC controllers that do very little - something generally considered to be a good thing - is “considered harmful”!  What he really means is that blanket advice is considered harmful – controllers should, generally, do as little as they need to but they can be larger if they have good reasons for it.  When we’re making choices, it’s important to remain dogmatic.  Don’t forget about the trade-offs and don’t get taken in by the “new shiny” things.  Most importantly, when receiving advice, always remember the context of the advice.  Use the right tool for the job and always read differing viewpoints for any subject to gain a more rounded understanding of the problem.  Do test the limits of the approaches you take, learn from your mistakes and always focus on providing value.

In examining Problem Number 3, Pete talks about communication and how it’s often impaired due to the choice of language we use in software development.  He talks about using the same names and terminology for essentially different things.  For example, in the context of ASP.NET MVC, we have the notion of a “controller”, however, Angular also has the notion of a “controller” and they’re not the same thing.  Pete also states how terminology like “serverless architecture” is a misnomer as it’s not serverless and how “devops”, “agile” etc. mean different things to different people!  We should always say what we mean and mean what we say! 

Pete talks about how code is communication.  Code is read far more often than it’s written, so therefore code should be optimized for reading.  Pete looks at some forms of communication and states that things like face-to-face communication, pair programming and even perhaps instant messaging are often the better forms of communication rather than things like once-a-day stand-ups and email.  This is because the best forms of communication offer instant feedback.  To improve our code communication, we should eliminate implicit knowledge – such as not refactoring those teacher and student views into one view.  New programmers would expect to be able to find something like a TeacherList.cshtml file within the solution.  Doing this helps to improve discovery, enabling new people to a codebase to get up to speed more quickly.  Finally, Pete repeats his important point of focusing on refactoring the “logic” of the application and not the “intent”.

Most importantly, the best thing we can do to communicate better is to simply listen.  By listening more intently, we ensure that we have the correct information that we need and we can learn from the knowledge and experience of others.

IMG_20161001_103012After Pete’s talk it was time to head back to the communal area for more refreshments.  Tea, coffee, water and cans of coke were all available.  After suitable further watering, it was time to head back to the conference rooms for the next session.  This one was John Stovin’sThinking Functionally”.

John’s talk was held in one of the smaller rooms and was also one of the rooms located farthest away from the communal area.  After the short walk to the room, I made it there with only a few seconds to spare prior to the start of the talk, and it was standing room only in the room!

John starts his talk by mentioning how the leap from OO (Object-Oriented) programming to functional programming is similar to the leap from procedural programming to OO itself.  It’s a big paradigm shift!  John mentions how most of today’s non-functional languages are designed to closely mimic the way the computer itself processes machine code in the “von Neumann” style.  That is to say that programs are really just a big series of steps with conditions and branches along the way.  Functional programming helps in the attempt to break free from this by expressing programs as pure functions – a series of functions, similar to mathematical functions, that take an input and produce an output.

John mentions how, when writing functional programs, it’s important to try your best to keep your functions “pure”.  This means that the function should have no side-effects.  For example a function that writes something to the console is not pure, since the side-effect is the output on the console window.  John states that even throwing an exception from a function is a side-effect in itself!

IMG_20161001_104700

We should also endeavour to always keep our data immutable.  This means that we never try to assign a new value to a variable once it has already been initialized with a value – it’s a single assignment.  Write once but read many.  This helps us to reason about our data better as it improves readability and guarantees thread-safety of the data.  To change data in a functional program, we should perform an atomic “copy-and-modify” operation which creates a copy of the data,  but with our own changes applied.

In F#, most variables are immutable by default, and F# forces you to use a qualifier keyword, mutable, in order to make a variable mutable.  In C#, however, we’re not so lucky.  We can “fake” this, though, by wrapping our data in a type (class) – i.e. a money type, and only accepting values in the type’s constructor, ensuring all properties are either read-only or at least have a private setter.  Class methods that perform some operation on the data should return a whole new instance of the type.

We move on to examine how Functional Programming eradicates nulls.  Variables have to be assigned a value at declaration, and due to not being able to reassign values thanks to immutability, we can’t create a null reference.  We’re stuck with nulls in C#, but we can alleviate that somewhat via the use of such techniques as the Null Object Pattern, or even the use of an Option<T> type.  John continues saying that types are fundamental to F#.  It has real tuple and records – which are “multiplicative” types and are effectively aggregates of other existing types, created by “multiplying” those existing types together – and also discriminating unions which are “additive” types which are created by “summing” other existing types together.  For example, the “multiplicative” types aggregate or combine other types – a Tuple can contain two (or more) other types which are (e.g.) string and int, and a discriminated union, as an “additive” type, can act as the sum total of all of it’s constituent types, so a discriminated union of an int and a boolean can represent all of the possible values of an int AND all of the possible values of a boolean.

John continues with how far too much C# code is written using granular primitive types and that in F#, we’re encouraged to make all of our code based on types.  So, for example, a monetary amount shouldn’t be written as simply a variable of type decimal or float, but should be wrapped in a strong Money type, which can enforce certain constraints around how that type is used.  This is possible in C# and is something we should all try to do more of.  John then shows us some F# code declaring an F# discriminated union:

type Shape =
| Rectangle of float * float
| Circle of float

He states how this is similar to the inheritance we know in C#, but it’s not quite the same.  It’s more like set theory for types!

IMG_20161001_111727John continues by discussing pattern matching.  He says how this is much richer in F# than the kind-of equivalent if() or switch() statements in C# as pattern matching can match based upon the general “shape” of the type.  We’re told how functional programming also favours recursion over loops.  F#’s compiler has tail recursion, where the compiler can re-write the function to pass additional parameters on a recursive call and therefore negate the need to continually add accumulated values to the stack as this helps to prevent stack overflow problems.   Loops are problematic in functional programming as we need a variable for the loop counter which is traditionally re-assigned to with every iteration of the loop – something that we can’t due in F# due to variable immutability.

We continue by looking at lists and sequences.  These are very well used data structures in functional programming.  Lists are recursive structures and are either an empty list or a “head” with a list attached to it.  We iterate over the list by taking the “head” element with each pass – kind of like popping values off a stack.  Next we look at higher-order functions.  These are simply functions that take another function as a parameter, so for example, virtually all of the LINQ extension methods found in C# are higher-order functions (i.,e. .Where, .Select etc.) as these functions take a lambda function to act as a predicate.  F# has List and Seq and the built-in functions for working with these are primarily Filter() and Map().  These are also higher-order functions.  Filter takes a predicate to filter a list and Map takes a Func that transforms each list element from one type to another.

John goes on to mention Reactive Extensions for C# which is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators.  These operators are also higher-order functions and are very “functional” in their architecture.  The Reactive Extensions (Rx) allow composability over events and are great for both UI code and processing data streams.

IMG_20161001_113323John then moves on to discuss Railway-oriented programming.  This is a concept whereby all functions both accept and return a type which is of type Result<TSuccess, TFailure>.  All functions return a “Result<T,K>” type which “contains” a type that indicates either success or failure.  Functions are then composable based upon the types returned, and execution path through code can be modified based upon the resulting outcome of prior functions.

Using such techniques as Railway-oriented programming, along with the other inherent features of F#, such as a lack of null values and immutability means that frequently programs are far easier to reason about in F# than the equivalent program written in C#.  This is especially true for multi-threaded programs.

Finally, John recaps by stating that functional languages give a level of abstraction above the von Neumann architecture of the underlying machine.  This is perhaps one of the major reasons that FP is gaining ground in recent years as machine are now powerful enough to allow this (previously, old-school LISP programs – LISP being one of the very first functional languages originally design back in 1958 - often used purpose built machines to run LISP sufficiently well).  John recommends a few resources for further reading – one is the F# for Fun and Profit website.

After John’s session, it was time for a further break and additional refreshment.  Since John’s session had been in a small room and one which was farthest away from the communal area where the refreshments where, and given that my next session was still in this very same conference room, I decided that I’d stay where I was and await the next session, which was Matteo Emili’s “I Read The Phoenix Project And I Loved It. Now What?”

IMG_20161001_115558

Matteo’s session was all about introducing a “devops” culture into somewhere that doesn’t yet have such a culture.  The Phoenix Project is a development “novel” which tells a story of doing just such a thing.  Matteo starts by mentioning The Phoenix Project book and how it’s a great book.  I  must concur that the book is very good, having read it myself only a few weeks before attending DDD North.  Matteo that asks that, if we’d read the book and would like to implement it’s ideas into our own places of work, we should be very careful.  It’s not so simple, and you can’t change an entire company overnight, but you can start to make small steps towards the end goal.

There are three critical concepts that cause failure and a breakdown in an effective devops culture.  They are bottlenecks, lack of communication and boundaries between departments.  In order to start with the introduction of a devops culture, you need to start “out-of-band”.  This means you’ll need to do something yourself, without the backing of your team, in order to prove a specific hypothesis.  Only when you’re sure something will work should you then introduce the idea to the team.

Starting with bottlenecks, the best way to eliminate them is to automate everything that can be automated.  This reduces human error, is entirely repeatable, and importantly frees up time and people for other, more important, tasks.  Matteo reminds us that we can’t change what we can’t measure and in the loop of “build-measure-learn”, the most important aspect is measure.  We measure by gathering metrics on our automations and our process using logging and telemetry and it’s only from these metrics will we know whether we’re really heading in the right direction and what is really “broken” or needs improvement.  We should gather insights from our users as well by utilising such tools and software as Google Analytics, New Relic, Splunk & HockeyApp for example.  Doing this leads to evidence based management allowing you to use real world numbers to drive change.

IMG_20161001_120736

Matteo explains that resource utilisation is key.  Don’t bring a whole new change management process out of the blue.  Use small changes that generate big wins and this is frequently done “out-of-band”.  One simple thing that can be done to help break down boundaries between areas of the company is a company-wide “stand up”.  Do this once a week, and limit it to 1-2 minutes per functional area.  This greatly improves communication and helps areas understand each other better.  The implementation of automation and the eradication of boundaries form the basis of the road to continuous delivery. 

We should ensure that our applications are properly packaged to allow such automation.  MSDeploy is such a tool to help enable this.  It’s an old technology, having first been released around 2003, but it’s seeing a modern resurgence as it can be heavily utilised with Azure.  Use an infrastructure-as-code approach.  Virtual Machines, Servers, Network topology etc. should all be scripted and version controlled.  This allows automation.  This is fair easy to achieve with cloud-based infrastructure in Azure by using Azure ARM or by using AWS CloudFormation with Amazon Web Services.  Some options for achieving the same thing with on-premise infrastructure are Chef, Puppet or even Powershell Desired State Configuration.  Databases are often neglected with regard to DevOps scenarios, however, by using version control and performing small, incremental changes to database infrastructure and the usage of packages (such as SQL Server’s DACPAC files), this can help to move Database Lifecycle Management into a DevOps/continuous delivery environment.

This brings us to testing.  We should use test suites to ensure our scripts and automation is correct and we must remember the golden rule.  If something is going to fail, it must fail fast.  Automated and manual testing should be used to ensure this.  Accountability is important so tests are critical to the product, and remediation (recovery from failure) should be something that is also automated.

Matteo summarises, start with changing people first, then change the processes and the tools will follow.  Remember, automation, automation, automation!  Finally, tackle the broader technical side and blend individual competencies to the real world requirements of the teams and the overall business.

IMG_20161001_083329After Matteo’s session, it was time for lunch.  All of the attendees reconvened in the communal area where we were treated to a selection of sandwiches and packets of crisps.  After selecting my lunch, I found a vacant spot in the corner of the rather small communal area (which easily filled to capacity once all of the different sessions had finished and all of the conference’s attendees descending on the same space) to eat it.  Since lunch break was 1.5 hours and I’d eaten my lunch within the first 20 minutes, I decided to step outside to grab some fresh air.  It was at this point I remembered a rather excellent little pub just 2 minutes walk down the road from the university venue hosting the conference.  Well, never one to pass up the opportunity of a nice pint of real ale, I heading off down the road to The Pack Horse.

IMG_20161001_133433Once inside, I treated myself to lovely pint of Laguna Seca from a local brewery, Burley Street Brewhouse, and settled down in the quiet pub to enjoy my pint and reflect on the morning’s sessions.  During the lunch break, there are usually some grok talks being held, which are are 10-15 minute long “lightning” talks, which attendees can watch whilst they enjoy their lunch.  Since DDD North was held very close to the previous DDD Reading event (only a matter of a few weeks apart) and since the organisers were largely the same for both events, I had heard that the grok talks would be largely the same as those that had taken place, and which I’d already seen, at DDD Reading only a matter of weeks prior.  Due to this, I decided the pub was a more attractive option over the lunch time break!

After slowly drinking and savouring my pint, it was time to head back to the university’s mechanical engineering department and to the afternoon sessions of DDD North 2016.

The afternoon’s first session was, luckily, in one of the “main” lecture halls of the venue, so I didn’t have too far to travel to take my seat for Bart Read’sHow To Speed Up .NET & SQL Server Apps”.

Bart’s session is al about performance.  Performance of our application’s code and performance of the databases that underlie our application.  Bart starts by introducing himself and states that, amongst other things, he was previously an employee of Red Gate, who make quite a number of SQL Server tools so paying close attention to performance monitoring in something that Bart has done for much of his career.

IMG_20161001_142359He states that we need to start with measurement.  Without this, we can’t possibly know where issues are occurring within our application.  Surprisingly, Bart does say that when starting to measure a database-driven application, many of the worst areas are not within the code itself, and are almost always down in the database layer.  This may be from an errant query or general lack of helpful database additions (better indexes etc.)

Bart mentions the tools that he himself uses as part of his general “toolbox” for performance analysis of an application stack.  ANTS Memory Profiler from Red Gate will help analyse memory consumption issues.  dotMemory from JetBrains is another good choice in the same area.  ANTS Performance Profiler from Red Gate will help analyse the performance of .NET code and monitor it’s CPU consumption.  Again, JetBrains have dotTrace in the same space.  There’s also the lesser known .NET Memory Profiler which is a good option.  For network monitoring, Bart uses Wireshark.  For general testing tools, Bart recommends BlazeMeter (for load testing) and Neustar.

Bart also stresses the importance of the ongoing usage of production monitoring tools.  Services such as New Relic, AppDynamics etc. can provide ongoing metrics for your running application when it’s live in production and are invaluable to understand exactly how your application is behaving in a production environment.

arithabortBart shares a very handy tip regarding usage of SQL Server Management Studio for general debugging of SQL Server queries.  He states that we should always UNCHECK the SET ARITHABORT option inside SSMS’s options menu.  Doing this prevents SQL Server from aborting any queries that perform arithmetic overflows or divide-by-zero operations, meaning that your query will continue to run, giving you a much clearer picture of what the query is actually doing (and how long it takes to run).

From here, Bart shares with us 3 different real-world performance scenarios that he has been involved in, how he went about diagnosing the performance issues and how he fixed them.

The first scenario was helping a client’s customer support team who were struggling as it was taking them 40 seconds to retrieve one of their customer’s details from their system when on a support phone call.  The architecture of the application was a ASP.NET MVC web application in C# and using NHibernate to talk to 2 different SQL Server instances - one server was a primary and the other, a linked server.

Bart started by using ANTS Performance Profiler on the web layer and was able to highlight “hotspots” of slow running code, precisely in the area where the application was calling out to the database.  From here, Bart could see that one of the SQL queries was taking 9 seconds to complete.  After capturing the exact SQL statement that was being sent to the database, it was time to fire up SSMS and use SQL Server Profiler in order to run that SQL statement and gain further insight into why it was taking so long to run.

IMG_20161001_144719After some analysis, Bart discovered that there was a database View on the primary SQL Server that was pulling data from a table on the linked server.  Further, there was no filtering on the data pulled from the linked server, only filtering on the final result set after multiple tables of data had been combined.  This meant that the entire table’s data from the linked server was being pulled across the network to the primary server before any filtering was applied, even though not all of the data was required (the filtering discarded most of it).  To resolve the problem, Bart added a simple WHERE clause to the data that was being selected from the linked server’s table and the execution time of the query went from 9 seconds to only 100 milliseconds!

Bart moves on to tell us about the second scenario.   This one had a very similar application architecture as the first scenario, but the problem here was a creeping increase in memory usage of the application over time.  As the memory increased, so the performance of the application decreased and this was due to the .NET garbage collector having to examine more and more memory in order to determine which objects to garbage collect.  This examination of memory takes time.  For this scenario, Bart used ANTS Memory Profiler to show specific objects that were leaking memory.  After some analysis, he found it was down to a DI (dependency injection) container (in this case, Windsor) having an incorrect lifecycle setting for objects that it created and thus these objects were not cleaned up as efficiently as they should have been.  The resolution was to simply configure the DI container to correctly dispose of unneeded objects and the excessive memory consumption disappeared.

IMG_20161001_150655From here, we move onto the third scenario.  This was a multi-tenanted application where each customer had their own database.  It was an ASP.NET Web application but used a custom ADO layer written in C++ to access the database.  Bart spares us the details, but tells us that the problem was ultimately down to locking, blocking and deadlocking in the database.  Bart uses this to remind us of the various concurrency levels in SQL Server.  There’s object level concurrency and row level concurrency, and when many people are trying to read a row that’s concurrently being written to, deadlocks can occur.  There’s many different solution available for this and one such solution is to use a READ COMMITED SNAPSHOT isolation level on the database.  This uses TempDB to help “scale” the demands against the database, so it’s important that the TempDB is stored on a fast storage medium (a fast SSD drive for example).  The best solution is a more disciplined ordering of object access and this is usually implemented with a Unit Of Work pattern, but Bart tells us that this is difficult to achieve with SQL Server.

Finally, Bart tells us all about scenario number four.  The fundamental problem with this scenario was networking, and more specifically it was down to network latency that was killing the application’s performance.  The application architecture here was not a problem as the application was using Virtual Machines running on VMWare’s vSphere with lots and lots of CPU and Memory to spare.  The SQL Server was running on bare metal to ensure performance of the database layer.  Bart noticed that the problem manifested itself when certain queries were run.  Most of the time, the query would complete in less than 100ms, but occasionally spikes of 500-600ms could be seen when running the exact same query.  To diagnose this issue, Bart used WireShark on both ends of the network, that is to say on the application server where the query originated and on the database server where the data was stored, however, as soon as Wireshark was attached to the network, the performance problem disappeared!

This ultimately turned out to be an incorrect setting on the virtual NIC as Bart could see the the SQL Server was sending results back to the client in only 1ms, however, it was a full 500ms to receive the results when measured from the client (application) side of the network link.  It was disabling the “receive side coalescing” setting that fixed the problem.  Wireshark itself temporarily disables this setting, hence the problem disappearing when Wireshark was attached.

IMG_20161001_152003Bart finally tells us that whilst he’s mostly a server-side performance guy, he’s made some general observations about dealing with client-side performance problems.  These are generally down to size of payload, chattiness of the client-side code, garbage collection in JavaScript code and the execution speed of JavaScript code.  He also reminds us that most performance problems in database-driven applications are usually found at the database layer, and can often be fixed with simple things like adding more relevant indexes, adding stored procedures and utilising efficient cached execution plans.

After Bart’s session, it was time for a final refreshment break before the final session of the day.  For me, the final session was Gary McClean Hall’s “DDD: the God That Failed

Gary starts his session by acknowledging that the title is a little clickbait-ish as his talk started life as a blog post he had previously written.  His talk is all about Domain Driven Design (DDD) and how he implemented DDD when he was working within the games industry.  Gary mentions that he’s the author of the book, “Adaptive Code via C#” and that when we he was working in the game industry, he had worked on the Championship Manager 2008 game.

Gary’s usage of DDD in game development started when there was a split between two companies involved in the Championship Manager series of games.  In the fall out of the split, one company kept the rights to the name, and the other company kept the codebase!  Gary was with the company that had the name but no code and they needed to re-create the game, which had previously been many years in development, in a very compressed timescale of only 12 months.

IMG_20161001_155048Gary starts with a definition of DDD.   It is modelling for complicated domains.  Gary is keen to stress the word “complicated”.  Therefore, we need to be able to identify what exactly is a complicated domain.  In order to help with this, it’s often best to create a “DDD Maturity Model” for the domain in which we’re working.  This is a series of topics which can be further expanded upon with the specifics for that topic (if any) within out domain.  The topics are:

The Domain
Domain Entity Behaviour
Decoupled Domain
Aggregate Roots
Domain Events
CQRS
Bounded Contexts
Polyglotism

By examining the topics in the list above and determining the details for those topics within our own domain, we can evaluate our domain and it’s relative complexity and thus its suitability to be modelled using DDD.

IMG_20161001_155454Gary continues by showing us a typical structure of a Visual Studio solution that purports to follow the Domain Driven Design pattern.  He states that he sees many such solutions configured this way, but it’s not really DDD and usually represent a very anaemic domain.  Anaemic domain models are collections of classes that are usually nothing more than properties with getters and setters, but little to no behaviour.  This type of model is considered an anti-pattern as they offer very low cohesion and high coupling.

If you’re working with such a domain model, you can start to fix things.  Looking for areas of the domain that can benefit from better types rather than using primitive types is a good start.  A classic example of this is a class to represent money.  Having a “money” class allows better control over the scale of the values you’re dealing with and can also encompass currency information as well.  This is preferable to simply passing values around the domain as decimals or ints.

Commonly, in the type of anaemic domain model as detailed above, there are usually repositories associated with entity models within the domain, and it’s usually a single repository per entity model.  This is also considered an anti-pattern as most entities within the domain will be heavily related and thus should be persisted together in the same transaction.  Of course, the persistence of the entity data should be abstracted from the domain model itself.

Gary then touches upon an interested subject, which is the decoupling within a DDD solution.  Our ASP.NET views have ViewModels, our domain has it’s Domain Models and the persistence (data) layer has it’s own data models.  One frequent piece of code plumbing that’s required here is extensive mapping between the various models throughout the layers of the application.  In this regard, Gary suggests reading Mark Seemann’s article, “Is layering worth the mapping?”  In this article, Mark suggests that the best way to avoid having to perform extensive mapping is to move less data around between the layers of our application.  This can sometimes be accomplished, but depending upon the nature of the application, this can be difficult to achieve.

IMG_20161001_160741_1So, looking back at the “repository-per-entity” model again, we’re reminded that it’s usually the wrong approach.  In order to determine the repositories of our domain, we need to examine the domain’s “Aggregate Roots”.  A aggregate root is the top-level object that “contains” additional other child objects within the domain.  So, for example, a class representing a Customer could be an aggregate root.  Here, the customer would have zero, one or more Order classes as children, and each Order class could have one or more OrderItems as children, with each OrderItem linking out to a Product class.  It’s also possible that the Product class could be considered an aggregate root of the domain too, as the product could be the “root” object that is retrieved within the domain, and the various order items across multiple orders for many different customers  could be retrieved as part of the product’s object graph.

To help determine the aggregate roots within our domain, we first need to examine and determine the bounded contexts.  A bounded context is a conceptually related set of objects within the domain that will work together and make sense for some of the domain’s behaviours.  For example, the customer, order, orderitem and product classes above could be considered part of a “Sales” context within the domain.  It’s important to note that a single domain entity can exist in more than one bounded context, and it’s frequently the case that the actually objects within code that represent that domain entity can be entirely different objects and classes from one bounded context to the next.  For example, within the Sales bounded context, it’s possible that only a small subset of the product data is required, therefore the Product class within the Sales bounded context has a lot less properties/data than the Product class in a different bounded context – for example, there could be a “Catalogue” context, with the Product entity as it’s aggregate root, but this Product object is different from the previous one and contains significantly more properties/data.

IMG_20161001_161509The number of different bounded contexts you have within your domain determines the domain’s breadth.  The size of the bounded contexts (i.e. the number of related objects within it) determines the domains depth.  The size of a given bounded context’s depth determines the importance of that area of the domain to the user of the application.

Bounded contexts and the aggregate roots within them will need to communicate with one another in order that behaviour within the domain can be implemented.  It’s important to ensure that aggregate roots and especially bounded contexts are not coupled to each other, so communication is performed using domain events.  Domain events are an event that is raised by one aggregate root or bounded context’s entity that is broadcast to the rest of the domain.  Other entities within other bounded contexts or aggregate roots will subscribe to the domain events that they may be interested in, in order for them to respond to actions and behaviour in other areas of the domain.  Domain events in a .NET application are frequently modelled using the built-in events and delegates functionality of the .NET framework, although there are other options available such as the Reactive Extensions library as well as specific patterns of implementation.

IMG_20161001_161830

One difficult area of most applications, and somewhere where the “pure” DDD model may break down slightly is search.  Many different applications will require the ability to search across data within the domain, and frequently search is seen as a cross-cutting concern as the result data returned can be small amounts of data from many different aggregates and bounded contexts in one amalgamated data set.  One approach that can be used to mitigate this is the CQRS – Command and Query Responsibility Segregation pattern.

Essentially, this pattern states that the models and code that we use to read data does not necessarily have to be the same models and code that we use to write data.  In fact, most of the time, these models and code should be different.  In the case of requiring a search across disparate data within the DDD-modelled domain, it’s absolutely fine to forego the strict DDD model and to create a specific “view” – this could be a database stored procedure or a database view – that retrieves the exact cross-cutting data that you need.  Doing this prevents using the DDD model to specifically create and hydrate entire aggregate roots of object graphs (possibly across multiple different bounded contexts) as this is something that could be a very expensive operation as most of the retrieved data wouldn’t be required.

Gary reminds us that DDD aggregates can still be painful when using a relational database as the persistence storage due to the impedance mismatch of the domain models in code and the tables within the database.  It’s worth examining Document databases or Graph databases as the persistent storage as these can often be a better choice. 

Finally, we learn that DDD is frequently not justified in applications that are largely CRUD based or for applications that make very extensive use of data queries and reports (especially with custom result sets).  Therefore, DDD is mostly appropriate for those applications that have to model a genuinely complex domain with specific and complex domain objects and behaviours and where a DDD approach can deliver real value.

IMG_20161001_165949After Gary’s session was over, it was time for all of the attendees to gather in the largest of the conference rooms for the final wrap-up.  There were only a few prize give-aways on this occasion, and after those were awarded to the lucky attendees who had their feedback forms drawn at random, it was time to thank the great sponsors of the event, without whom there simply wouldn’t be a DDD North.

I’d had a great time at yet another fantastic DDD event, and am already looking forward to the next one!

DDD 11 In Review

IMG_20160903_084627This past Saturday 3rd September 2016, the 11th DDD (DeveloperDeveloperDeveloper) conference was held at Microsoft’s UK HQ in Reading.  Although I’ve been a number of DDD events in recent years, this was my first time at the original DDD event (aka Developer Day aka DDD Reading) which spawned all of the other localised DDD events.

IMG_20160903_090335After travelling the evening before and staying overnight in a hotel in Swindon, I set off bright and early to make the 1 hour drive to Reading.  After arriving and checking in, collecting my badge along the way, it was time to grab a coffee and one of the hearty breakfast butties supplied.  Coffee and sausage sandwich consumed, it was time to familiarise myself with the layout of the rooms.  There were 4 parallel tracks of talks, and there had also been a room change from the printed agendas that we received upon checking in.  After finding the new rooms, and consulting my agenda sheet it was time for me to head off to the first talk of the day.  This was Gary Short’sHow to make your bookie cry”.

With a promise of showing us all how to make money on better exchanges and also how to “beat the bookie”, Gary’s talk was an interesting proposition and commanded a full room of attendees.  Gary’s session is all about machine learning and how data science can help us do many things, including making predictions on horse races in an attempt to beat the bookie.  Gary starts by giving the fundamental steps of machine learning – Predict – Measure – Analyze – Adjust.  But, we start with measure as we need some data to set us off on our way. 

IMG_20160903_093802Gary states that bookie odds in the UK are expressed as fractions and that this hides the inherent probabilities of each horse winning in a given race.  Bookies ultimately will make a profit on a given race as the probabilities of all of the horses add up to more than 1!  So, we can beat the bookie if we build a better data model.  We do this with data.   We can purchase horse racing data, which means we’re already at a loss given the cost of the data, or we can screen scrape it from a sports website, such as BBC Sport.  Gary shows us a demo of some Python code used to scrape the data from the BBC website.  He states that Python is one of two “standard” languages used within Data Science, the other language being R.  After scraping a sufficiently sized dataset over a number of days, we can analyze that data by building a Logistic Regression Model.  Gary shows how to use the R language to achieve this, ultimately giving us a percentage likelihood of a given horse winning a new race based upon its past results, its weight and the jockey riding it.

Gary next explains a very important consideration within Data Science known as The Turkey Paradox.  You’re a turkey on a farm, you have to decide if today you’re going to get fed or go to market.  If your data model only has the data points of being fed at 9am for the last 500 days, you’ll never be able to predict if today is the day you go to market - as it’s never happened before.  There is a solution to this - it’s called Active Learning or Human in the Loop learning.   But.  It turns out humans are not very good at making decisions.

Gary next explains the differences between System 1 and System 2 thinking.  System 2 is very deliberate actions - you think first and deliberately make the action.  System 1 is reflexive - when you put your hand on a hot plate, you pull it away without even thinking.  It uses less of the brain.  System 1 is our “lizard brain” from the days when we were cavemen.  And it takes precedence over System 2.  Gary talks about the types of System 1 thinking.  There’s Cognitive Dissonance – holding onto a belief in the face of mounting contrary evidence.  Another is bait-and-switch – substituting a less favourable option after being “baited” with a more favourable one, and yet another type is the “halo effect” – beautiful things are believed to be desirable.  We need to ensure that, when using human-in-the-loop additions to our data model, we don’t fall foul of these problems.

IMG_20160903_092511Next, we explore Bayes’ theorem.  A theorem describing how the conditional probability of each of a set of possible causes for a given observed outcome can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause.  Gary uses this theorem over our horse racing data model to demonstrate Bayes inference using prior probabilities to predict future ones.  This is using the raw scraped data, with no human-in-the-loop additions, but we can add our own additions which become prior probabilities and can be used to compute further probabilities using Bayes theorem.

Gary concludes that, once we’ve acquired, trained and analyzed our data model, we can beat the bookie if our odds are shorter than the bookie’s.  Another way, it not to beat the bookie at all!  We can make money simply by beating other gamblers.  We can do this using betting exchanges - backing and laying bets and getting other gamblers to bet against your prediction of the outcome of an event.  Finally, you can also profit from “trading arbitrage” – whereby the clever placing of bets when two different bookies have the same event outcome at two different odds can produce a profit from the difference between those odds.

IMG_20160903_104403After a short coffee break, it was onto the second session of the day, which was Ali Kheyrollahi’sMicroservice Architecture at ASOS”.  Ali first explains the background of the ASOS company where he works.  They’re a Top 35 online retailer, within the Top 10 of online fashion retailers, they have a £1.5 billion turnover and, for their IT, they process around 10000 requests per second.  Ali states that ASOS is at it’s core a technology company, and it’s through this that they succeed with IT – you’ve got to be a great tech business, not just a great tech function.  Tech drives the agenda and doesn’t chase the rest of the business.

Ali asks “Why Microservices?” and states that it’s really about scaling the people within the business, not just the tech solution.  Through decoupling the entire solution, you decentralise decision making.  Core services can be built in their own tech stack by largely independent teams.  It allows fast and frequent releases and deployments of separate services.  You reduce the complexity of each service, although, Ali does admit that you will, overall, increase the complexity of the overall solution.

The best way achieve all of this is through committed people. Ali shows a slide which mentions the German army’s “Auftragstaktik” which is method of commanding in which the commander gives subordinate leaders a specific mission, a timescale of achievement and the forces required to meet the goal, however, the individual leaders are free to engage their own subordinates services are they see fit.  It’s about telling them how to think, not what to think.  He also shares a quote from “The Little Prince” that embodies this thinking, “If you wish to build a ship, do not divide the men into teams and send them to the forest to cut wood. Instead, teach them to long for the vast and endless sea.”  If you wish to succeed with IT and Microservices in particular, you have to embrace this culture.  Ali states that with a “triangle” of domain modelling, people and a good operation model, this really all equals successful architecture.

Ali hands over to his colleague Dave Green who talks about how ASOS, like many companies, started with a legacy monolithic system.  And like most others, they had to work with this system as it stood – they couldn’t just throw it out and start over again it was after all handling nearly £1 billion in transaction per year, however, despite fixing some of the worst performance problems of the monolithic system, they ultimately concluded that it would be easier and cheaper to build a new system than to fix the old one.  Dave explains how they have a 2 tier IT system within the company – there’s the enterprise domain and the digital domain.  The enterprise domain is primarily focused on buy off-the-shelf software to run the Finance, HR and other aspects of the business.  They’re project lead.  Then there’s the digital domain, much more agile, product lead and focused on building solutions rather than buying them.

Ali state how ASOS is a strategic partner with Microsoft and is heavily invested in cloud technology, specifically Microsoft’s Azure platform.  He suggests that ASOS may well be the largest single Azure user this side of the Atlantic ocean!  He talks about the general tech stack, which is C# and using TeamCity for building and Octopus Deploy for deployment.  There’s also lots of other tech used, however, and other teams are making use of Scala, R, and other languages where it’s appropriate.  The database stack is primarily SQL Server, but they also use Redis and MongoDB.

IMG_20160903_111122Ali talks about one of the most important parts of building a distributed micro service based solution – the LMA stack – that’s Logging, Monitoring and Altering.  All micro services are build to adhere to some core principles.  All queries and commands use HTTP API, but there’s no message brokers or ESB-style pseudo microservices.  They exist outside of the services, but never inside.  For the logging, Ali states how logging is inherent within all parts of every service, however, they do most logging and instrumentation whenever there is any kind of I/O – network, file system or database reads and writes.  As part of their logging infrastructure, they use Woodpecker, which is a queue and topic monitoring solution for Azure Service Bus. 

All of the logs and Woodpecker output is fed into a Log collector and processor.  They don’t use LogStash for this, which is a popular component, but instead use ConveyorBelt.  This play better with Azure and some of the Azure-specific implementation and storage of certain log data.  Both LogStash and ConveyorBelt, however, have the same purpose – to quickly collect and push log data to ElasticSearch.  From here, they use the popular Kibana product to visualise that data.  So rather than a ELK stack (ElasticSearch, LogStash, Kibana), it’s a ECK stack (ElasticSearch, ConveyorBelt, Kibana).

Ali concludes his talk by discussing lessons learnt.  He says, if you’re in the cloud - build for failure as the cloud is a jungle!  Network latency and failures add up so it's important to understand and optimize time from the user to the data.  With regard to operating in the cloud in general, Ignore the hype - trust no one.  Test, measure, adopt/drop, monitor and engage with your provider.  It's difficult to manage platform costs, so get automation and monitoring of the cloud infrastructure to prevent developers creating erroneous VM’s that they forget to switch off!  Finally, distributed computing is hard, geo-distribution is even harder.  Expect to roll up your sleeves. Maturity in areas can be low and is changing rapidly.

IMG_20160903_115726After Ali’s talk there was another coffee break in the communal area before we all headed off to the 3rd session of the day.  For me, this was Mark Rendle’sSomewhere over the Windows”.  Mark’s talk revolved around .NET core and it’s ability to run cross-platform.  He opened by suggesting that, being the rebel he is, the thought he’d come to Microsoft UK HQ and give a talk about how to move away from Windows and onto a Linux OS!

Mark starts by saying that Window is great, and a lot of the intrinsic parts of Windows that we use as developers, such as IIS and .NET are far too deeply tied into a specific version of Windows.  Mark gives the example that IIS has only just received support for HTTP2, but that it’s only the version of IIS contained within the not-yet-released Windows Server 2016 that’ll support it.  He says that, unfortunately, Windows is stuck in a rut for around 4 years, and every 4 years Microsoft’s eco-system has to try to catch up with everybody else with a new version of Windows.

.NET Core will help us as developers to break away from this getting stuck in a rut.  .NET Core runs on Windows, Linux and Mac OSX.  It’s self-contained so that you can simply ship a folder containing your application’s files and the .NET core runtime files, and it’ll all “just work”.  Mark mentions ASP.NET Core, which actually started the whole “core” thing at Microsoft and  they then decided to go for it with everything else.  ASP.NET Core is a ground-up rewrite, merges MVC and Web API into a unified whole and has it’s own built-in web server, Kestrel which is incredibly fast.  Mark says how his own laptop has now been running Linux Mint for the last 1.5 years and how he’s been able to continue being a “.NET developer” despite not using Windows as his main, daily OS.

Mark talks about how, in this brave new world, we’re going to have to get used to the CLI – Command Line Interface.  Although some graphical tooling exists, the slimmed down .NET core will take us back to the days of developing and creating our projects and files from the CLI.  Mark says he uses Guake as his CLI of choice on his Linux Mint install.  Mark talks about Yeoman - the scaffolding engine used for ASP.NET Core bootstrap.  It’s a node package, and mark admits that pretty much all web development these days, irrespective of platform, is pretty much dependent on node and it’s npm package manager.  Even Microsoft’s own TypeScript is a node package.  Mark shows creating a new ASP.NET Core application using Yeoman.  The yeoman script creates the files/folders, does a dotnet restore command to restore nuget packages then does a bower restore to restore front-end (i.e. JavaScript) packages from Bower.

Mark says that tooling was previously an issue with developing on Linux, but it’s now better.  There’s Visual Studio 2015 Update 3 for Windows only, but there's  also Project Rider and Xamarin Studio which can run on Linux in which .NET Core code can be developed.  For general editors, there’s VS Code, Atom, SubLime Text 3, Vim or Emacs! VS Code and Atom are both based on Electron.

Mark moves on to discuss logging in an application.  In .NET Core it’s a first class citizen as it contains a LoggerFactory.  It’ll write to STDOUT and STDERROR and therefore it works equally well on Windows and Linux. This is an improvement over the previous types of logging we could achieve which would often result in writing to Windows-only log stores (for example, the Windows Event Log). 

Next, Mark moves on to discuss Docker.  He’s says that the ability to run your .NET Core apps on a lightweight and fast web server such as NGINX, inside a Docker container, is one of the killer reasons to move to and embrace the Linux platform as a .NET Developer.  Mark first gives the background of “what is docker?”  They’re “containers” which are like small, light-weight VM’s (Virtual Machines). The processes within them run on the host OS, but they’re isolated from other processes in other containers.  Docker containers use a “layered” file system.  What this means is that Docker containers, or “images” which are the blueprints for a container instance can be layered on top of each other.  So, we can get NGINX as a Docker image - which can be a “base” image but upon which you can “layer” additional images of your own, so your web application can be a subsequent layered image which together form a single running instance of a Docker container, and you get a nice preconfigured NGINX instance from the base container for free!  Microsoft even provide a “base” image for ASP.NET Core which is based upon Debian 8. Mark suggests using jwilder/nginx-proxy as the base NGinX image.  Mark talks about how IIS is the de-facto standard web server for Windows, but nowadays, NGinX is the de-facto standard for Linux.  We need to use NGinX as Kestrel (the default webserver for ASP.NET Core) is not a production webserver and isn’t “hardened”.  NGinX is a production web server, hardened for this purpose.

To prevent baking configuration settings in the Docker image (say database connections) we can use Docker Compose.  This allows us to pass in various environment settings at the time when we run the Docker container.  It uses YAML.  It also allows you to easily specify the various command line arguments that you might otherwise need to pass to Docker when running an image (i.e. -p 5000:5000 - which binds port 5000 in the Docker image to port 5000 on the localhost). 

Mark then shows us a demo of getting an ELK stack (Elastic Search, LogStash & Kibana) up and running.  The ASP.NET Core application can simply write it’s logs to its console, which on Linux, is STDOUT.  There is then a LogStash input processor, called Gelf, that will grab anything written to STDOUT and process it and store it within LogStash.  This is then immediately visible to Kibana for visualisation!

Mark concludes that, ultimately, the main benefits of the “new way” with .NET and ASP.NET Core are the same as the fundamental benefits of the whole Linux/Unix philosophy that has been around for years.  Compose your applications (and even you OS) out of many small programs that are designed to do only one thing and to do it well.

IMG_20160903_131124After Mark’s session, which slightly overran, it was time for lunch.  Lunch at DDD 11 was superb.  I opted for the chicken salad rather than a sandwich, and very delicious (and filling) it was too, which a large portion of chicken contained within.  This was accompanied by some nice crisps, a chocolate bar, an apple and some flavoured water to wash it all down with!

I ate my lunch on the steps just outside the building, however, the imminently approaching rain soon started to fall and it put a stop to the idea of staying outside in the fresh air for very long! IMG_20160903_132320  That didn’t matter too much as not long after we’d managed to eat our food we were told that the ubiquitous “grok talks” would be starting in one of the conference rooms very soon.

I finished off my lunch and headed towards the conference room where the grok talks were being held.   I was slightly late arriving to the room, and by the time I had arrived all available seating was taken, with only standing room left!  I’d missed the first of the grok talks, given by Rik Hepworth about Azure Resource Templates however, I’d seen a more complete talk given by Rik about the same subject at DDD North the previous year.   Unfortunately, I also missed most of the following grok talk by Andrew Fryer which discussed Power BI, a previously stand-alone product, but is now hosted within Azure.

I did catch the remaining two grok talks, the first of which was Liam Westley’sWhat is the point of Microsoft?”  Liam’s talk is about how Microsoft is a very different company today to what it was only a few short years ago.  He starts by talking about how far Microsoft has come in recent years, and how many beliefs today are complete reversals of previously held positions – one major example of this is Microsoft’s attitude towards open source software.  Steve Ballmer, the previous Microsoft CEO famously stated that Linux was a “cancer” however, the current Microsoft is embracing Linux on both Azure and for it’s .NET Development tools.    Liam states that Microsoft’s future is very much in the cloud, and that they’re investing heavily in Azure.  Liam shows some slides which acknowledge that Amazon has the largest share of the public cloud market (over 50%) whilst Azure only currently has around 9%, but that this figure is growing all the time.  He also talks about how Office 365 is a big driver for Microsoft's cloud and that we should just accept that Office has “won” (i.e. better than LibreOffice, OpenOffice etc.).  Liam wraps up his quick talk with something rather odd – a slide that shows a book about creating cat craft from cat hair!

The final grok talk was by Ben Hall, who introduced us very briefly to an interesting website that he’s created called Katacoda.  The website is an interactive learning platform and aims to help developers learn all about new and interesting technologies from right within their browser!  It allows developers to test out and play with a variety of new technologies (such as Docker, Kubernetes, Git, CoreOS, CI/CD with Jenkins etc.) right inside your browser in an interactive CLI!  He says it’s completely free and that they’re improving the number of “labs” being offered all the time.

IMG_20160903_143625After the grok talks, there was a little more time to grab some refreshments prior to the first session of the afternoon, and penultimate session of the day, João “Jota” Pedro Martins’Azure Service Fabric and the Actor Model”.  Jota’s session is all about Azure Service Fabric, what it is and how it can help you with distributed  applications in the cloud.  Azure Service Fabric is a PaaS v2 (Platform As A Service) which supports both stateful and stateless services using the Actor model.  It’s a platform for applications that are “born in the cloud”.  So what is the Actor Model?  Well, it’s a model of concurrent computation that treat “actors” – which are distinct, independent units of code – as the fundamental, core primitives of an application.  An application is composed of numerous actors, and these actors communicate with each other via messages rather than method calls.  Azure Service Fabric is built into Azure, but it’s also downloadable for free and can be used not only within Microsoft’s Azure cloud, but also inside the clouds of other providers too, such as Amazon’s AWS.  IMG_20160903_143735Azure Service Fabric is battle hardened, and has Microsoft’s long-standing “Project Orleans” at it’s core.

The “fabric” part of the name is effectively the “cluster” of nodes that run as part of the service fabric framework, this is usually based upon a minimum configuration of 1 primary node with at least 2 secondary nodes, but can be configured in numerous other ways.  The application’s “actors” run inside these nodes and communicate with each other via message passing.  Nodes are grouped into replica sets and will balance load between themselves and failover from one node to another if a node becomes unresponsive, taking “votes” upon who the primary node will be when required.  Your microservices within Service Fabric can be any executable process that you can run, such as an ASP.NET website, a C# class library, even a NodeJS application or even some Java application running inside a JVM.  Currently Azure Service Fabric doesn’t support Linux, but support for that is being developed.

Your microservices can be stateless or stateful.  Stateless services are simply as there’s no state to store, so messages consumed by the service are self-contained.  Stateful services can store state inside of Service Fabric itself, and Service Fabric will take care of making sure that the state data stored is replicated across nodes ensuring availability in the event of a node failure.  Service Fabric clusters can be upgraded with zero downtime, you can have part of the cluster responding to messages from a previous version of your microservice whilst other parts of the cluster, those that have already had the microservices upgraded to a new version, can process messages from your new microservice versions.  You can create a simple 5 node cluster on your own local development machine by downloading Azure Service Fabric using the Microsoft Web Platform Installer.

IMG_20160903_145754Jota shows us a quick demo, creating a service fabric solution within Visual Studio.  It has 2 projects within the solution, one is the actual project for your service and the other project is effectively metadata to help service fabric know how to instantiate and control your service (i.e. how many nodes within cluster etc.).  Service Fabric exposes a Reliable Services API and built on top of this is a Reliable Actors API.  It’s by implementing the interfaces from the Reliable Actors API that we create our own reliable services.  Actors operate in an asynchronous and single-threaded way.  Actors act as effectively singletons. Requests to an actor are serialized and processed one after the other and the runtime platform manages the lifetime and lifecycle of the individual actors.  Because of this, the whole system must expect that messages can be received by actors in a non-deterministic order.

Actors can implement timers (i.e. perform some action every X seconds) but “normal” timers will die if the Actor on a specific node dies and has to fail over to another node.  You can use a IActorReminder type reminder which effectively allow the same timer-based action processing but will survive and continue to work if an Actor has to failover to another node.  Jota reminds us that the Actor Model isn’t always appropriate to all circumstances and types of application development, for example, if you have some deep, long-running logic processing that must remain in memory with lots of data and state, it’s probably not suited to the Actos Model, but if your processing can be broken down into smaller, granular chunks which can handle and process the messages sent to them in any arbitrary order and you want to maximize easy scalability of your application, the Actors are a great model.  Remember, though, that since actor communicate via messages – which are passed over the network – you will have to contend with some latency.

IMG_20160903_151942Service Fabric contains an ActorProxy class.  The ActorProxy will retry failed sent messages, but there’s no “at-least-once” delivery guarantees - if you wish to ensure this, you'll need to ensure your actors are idempotent and can receive the same message multiple time.  It's also important to remember that concurrency is only turn-based, actors process messages one at a time in the order they receive them, which may not be the order they were sent in.  Jota talks about the built-in StateManager class of Service Fabric, which is how Service Fabric deals with persisting state for stateful services.  The StateManager has “"GetStateAsync and SetStateAsync methods which allow stateful actors to persist any arbitrary state (so long as it’s serializable).  One interesting observation of this is that the state is only persisted when the method that calls SetStateAsync has finished running. The state is not persisted immediately upon calling the SetStateAsync method!

Finally, Jota wraps up his talk with a brief summary.  He mentions how Service Fabric actors have behaviour and (optionally) state, are run in a performant, enterprise-ready scalable environment and are especially suited to web session state, shopping cart or any other scenarios with independent objects with their own lifetime, state and behaviour.  He does say that existing applications would probably need significant re-architecture to take advantage of Service Fabric, and that the Service Fabric API has some niggles which can be improved.

IMG_20160905_211919After João’s session, there’s time for one final quick refreshments break, which included a table full of various crisps, fruit and chocolate which had been left over from the excess lunches earlier in the afternoon as well as a lovely selection of various individually-wrapped biscuits!

Before long it was time for the final session of the day, this was Joseph Woodward’sBuilding Rich Client Applications with AngularJS2

Joe’s talk first takes us through the differences between AngularJS 1 and 2.  He states that, when AngularJS1 was first developed back in 2010, there wasn’t even any such thing as NodeJS!  AngularJS 1 was great for it’s time, but did have it’s share of problems.  It was written before ECMAScript 6/2015 was a de-facto standard in client-side scripting therefore it couldn’t benefit from classes, modules, promises or web components.  Eventually, though, the world changed and with both the introduction and ratification of ECMAScript 6 and the introduction of NodeJS, client side development was pushed forward massively.  We now had module loaders, and a component-driven approach to client-side web development, manifested by frameworks such as Facebook’s React that started to push the idea of bi-directional data flow.

IMG_20160903_155951Joe mentions how, with the advent of Angular2, it’s entire architecture is now component based.  It’s simpler too, so the controllers, scopes and directives of Angular1 are all now replaced with Components in Angular2 and the Services and Factories of Angular1 are now just Services in Angular2.  It is much more modular and has first class support for mobile, the desktop and the the web, being built on top of the EMCAScript 6 standard.

Joe mentions how Angular2 is written in Microsoft’s TypeScript language, a superset of JavaScript, that adds better type support and other benefits often found in more strongly-typed languages, such as interfaces.  He states that, since Angular2 itself is written in TypeScript, it’s best to write your own applications, which target Angular2, in TypeScript too.  Doing this allows for static analysis of your code (thus enforcing types etc.) as well as elimination of dead code via tree shaking which becomes a huge help when writing larger-scale applications.

Joe examines the Controller model used in Angular1 and talks about how controllers could communicate arbitrarily with pretty much any other controller within your application.  As your application grows larger, this becomes problematic as it becomes more difficult to reason about how events are flowing through your application.  This is especially true when trying to find the source of code that performs UI updates as these events are often cascaded through numerous layers of controllers.  In Angular2, however, this becomes much simpler as the component structure is that of a tree.  The tree is evaluated starting at the top and flowing down through the tree in a predictable manner.

IMG_20160903_160258_1In Angular2, Services take the place of the Services and Factories of Angular1 and Joe states how they’re really just JavaScript classes decorated with some additional attributes.  Joe further discusses how the very latest Release Candidate version of Angular2, RC6, has introduced the @NgModule directive.  NgModules allow you to build your application by acting as a container for a collection of services and components.  These are grouped together to for the module, from which your application can be built as a collection of one or more modules.  Joe talks about how components in Angular2 can be “nested”, allowing one parent component to contain the definition of further child components.  Data can flow between the parent and child components and this is all encapsulated from other components “outside”.

Next, Joe shows us some demos using a simple Angular2 application which displays a web page with a textbox and a number of other labels/boxes that are updated with the content of the textbox when that content changes.  The code is very simple for such a simple app, however, it shows how clearly defined and structured an Angular2 application can be.  Joe then changes the value of how many labels are created on the webpage to 10000 just to see how Angular2 copes with updating 10000 elements.  Although there’s some lag, as would be expected when performing this many independent updates, the performance isn’t too bad at all.

IMG_20160903_163209Finally, Joe talks about the future of Angular2.  The Angular team are going to improve static analysis and ensure that only used code and accessible code is included within the final minified JavaScript file.  There’ll be better tooling to allow generation of many of the “plumbing” around creating an Angular2 application as improvements around building and testing Angular2 applications.  Joe explains that this is a clear message that Angular2 is not just a framework, but a complete platform and that, although some developers are upset when Angular2 totally "changed the game" with no clear upgrade path from Aungular1, leaving a lot of A1 developers feeling left out, Google insist that Angular2 is developed in such a way that it can evolve incrementally over time as web technologies evolve and so there shouldn’t be the same kind of wholesale “break from the past” kind of re-development in the future of Angular as a platform.  Indeed, Google themselves are re-writing their AdWords product (an important product generating significant revenue for Google) using their own Dart language and using Angular2 as the platform.  And with that, Joe’s session came to an end.  He was so impressed with the size of his audience, though, that he insisted on taking a photo of us all, just to prove to his wife that we was talking to a big crowd!

After this final session of the day it was time for all the attendees to gather in the communal area for to customary “closing ceremony”.  This involved big thanks to all of the sponsors of the event as well as prize draw for numerous goodies.  Unfortunately, I didn’t win anything in the prize draws, but I’d had a brilliant time at my first DDD in Reading.  Here’s hoping that they continue the “original” DDD’s well into the future.

PANO_20160903_090157

UPDATE: Kevin O’Shaughnessy has also written a blog post reviewing his experience at DDD 11, which is an excellent read.  Apart from the session by Mark Rendle, Kevin attended entirely different sessions to me, so his review is well worth a read to get a fuller picture of the entire DDD event.

SQLBits 2016 In Review

IMG_20160507_072440On 7th May 2016 in Liverpool, the 15th annual SQLBits event took place in the new Liverpool Exhibition Centre.  The event had actually been running since Wednesday 4th, however, as with all other SQLBits events, the Saturday is a free, community day.

This particular SQLBits was rather special, as Microsoft had selected the event as the UK launch event for SQL Server 2016.  As such the entire conference had a very large Microsoft presence.

Since the event was in my home town, I didn’t have too far to travel to get to the venue.  That said, I did have to set my alarm for 6am (a full 45 minutes earlier than I usually do on a working weekday!) to ensure I could get the two different trains required to get me to the venue in good time.  The Saturday day is jam packed with content and as such, the event opened at the eye-watering time of 7:30am!

IMG_20160507_072440After arriving at the venue just as it was opening at 7:30am, I heading straight to the registration booth to confirm my registration and collect my conference lanyard.  Once collected, it was time to head into the main hall.  The theme for this years SQLBits was “SQLBits in Space” so the entire hall had the various rooms where the sessions would take place as giant inflatable white domes.  In between the domes and around the main hall there was plenty of space and sci-fi themed objects.

After a short while, the venue staff started to wheel out the morning refreshments of tea & coffee, shortly followed by the obligatory bacon, sausage and egg sandwiches!

After enjoying the delicious breakfast, it was soon time to head off the the relevant “dome” for the first session of the day.  The SQLBits Saturday event had 9 different tracks, so choosing what talk to attend was difficult and there was always bound to be clashes of interesting content throughout the day.  For the first session, I decided to attend Aaron Bertrand’s T-SQL: Bad Habits and Best Practices.

Aaron’s talk is all about the various bad habits that we can sometimes pick up when writing T-SQL code and also the myths that have built up around certain approaches to achieving specific things with T-SQL.  Aaron starts by stating that we should ensure that we don’t make blind assumptions about anything in SQL Server.  We can’t always say that a seek is better than a scan (or vice-versa) or that a clustered index is better than a non-clustered one.  It always depends.  The first big myth we encounter is that it’s often stated that using SELECT * when retrieving all columns from a database is bad practice (instead of naming all columns individually).  This can be bad practice as we don’t know exactly what columns we’ll be getting – e.g. future added columns will be returned in the query, however, it’s often stated that another reason it’s bad practice is due to SQL Server having to look up the database meta data to figure out the column names.  The reality is that SQL Server will do this anyway, even with named columns!

Next, Aaron shows us a little tip using SQL Server Management Studio.  It’s something that many audience members already knew, bit it was new to me. He showed how you can drag-and-drop the “Columns” node from the left-hand treeview into a query window and it will add a comma-separated list of all of the tables columns to the query text!

Aaron continues by warning us about omitting explicit lengths from varchar/nvarchar data types.  Without specifying explicit lengths, varchars can very easily be truncated to a single character as this simple T-SQL shows:

DECLARE @x VARCHAR = 'testing'; 
SELECT [myCol] = @x;

We’re told that we should always use the correct data types for our data!  This may seem obvious, but many times we see people storing dates as varchars (strings) simply to ensure they can preserve the exact formatting that they’re using.  This is a presentation concern, though, and doing this means we lose the ability to perform correct sorting and date arithmetic on the values.  Also, avoid using datatype such as MONEY simply because it sounds appropriate.  MONEY is a particularly bad example and should always be replaced with decimal

Aaron reminds us to always explicitly use a schema prefix when referencing tables and SQL Server objects within our queries (i.e. Use [dbo].[TableName] rather than just [TableName]).  Doing this ensure that, if two different users of our query have different default schemas, there won’t be any strange potential side-effects to our query.

We’re reminded not to abuse the ORDER BY clause.  Using ORDER BY with an Ordinal column number after it can easily break if columns are added, removed or their order in the schema altered.  Be aware of the myth that tables have a “natural order”, they don’t.  Omitting an ORDER BY clause may appear to order the data the same way each time, however, this can easily change if additional indexes are added to the table.

We should always use the SET NOCOUNT ON directive as this cut down on noisy chatter in our application’s communication with SQL Server, but make sure you always test this first.  Applications built using older technologies, such as the original ADO from the Classic ASP era can be reliant upon the additional count message being returned when NOCOUNT is off.

Next, Aaron highlights the cost of poorly written date / range queries.  He tells us that we shouldn’t use non-sargable expressions on a column – for example, if we use a WHERE clause which does something like WHERE YEAR([DateCoulmn]) = 2016, SQL Server will not be able to utilise any indexes that may exist on that column and will have to scan the entire table to compute the YEAR() function for the date column in question – a very expensive operation.  We’re told not use use the BETWEEN keyword as it’s imprecise – does BETWEEN include the boundary conditions or only everything between them?  It’s far better to explicitly use a greater than and less than clause for date ranges – e.g.  WHERE [OrderDate] > ‘1 Feb 2016’ AND [OrderDate] < '1 March 2016'. This ensures we’re not incorrectly including outlying boundary values (i.e. midnight on 28th Feb which is actually 1st March!).  Regarding dates, we should also be aware of date format strings.  Formatting a date with many date format strings can give entirely different values for different languages. The only two “safe” format strings which work the same across all languages are YYYYMMDD and the full ISO 8601 Date format string, “YYYY-MM-DDTHH:MM:SS”.

Aaron continues by reminding us to use the MERGE statement wisely.  We must remember that it effectively turns two statements into one, but this can potentially mess with triggers, especially if they rely on @@ROWCOUNT.  Next up is cursors.  We shouldn’t default to using a cursor if we can help it.  Sometimes, it can be difficult to think in set-based terms to avoid the cursor, but it’s worth the investment of time to see if some computation can be performed in a set-based way.  If you must use a cursor, it’s almost always best to apply the LOCAL FAST_FORWARD qualifier on our cursor definition as the vast majority of cursors we’ll use are “firehose” cursors (i.e. we iterate over each row of data once from start to end in a forward-only manner).  Remember that applying no options to the cursor definition effectively means the cursor is defined with the default options, which are rather “heavy-handed” and not always the most performant.

We’re reminded that we should always use sp_executesql when executing dynamic SQL rather than using the EXEC() statement.  sp_executesql allows the use of strongly-typed parameters (although unfortunately not for dynamic table or column names) which reduces the chances of SQL injection.  It’s not complete protection against injection attacks, but it’s better than nothing.  We’re also reminded not use to CASE or COALESCE in sub-queries.  COALESCE turns into a CASE statement within the query plan which means SQL Server will effectively evaluate the inner query twice.  Aaron asks that we remember to use semi-colons to separate our SQL statements.  It protects against future edits to the query/code and ensures atomic statements continue to operate in that way.

Aaron says that we should not abuse the COUNT() function.  We very often write code such as:

IF (SELECT COUNT(*) FROM [SomeTable]) > 0 THEN …..

when it’s really much more efficient to write:

IF EXISTS (SELECT 1 FROM [SomeTable]) THEN….

We don’t really need the count in the first query so there’s no reason to use it.  Moreover, if you do really need a table count, it’s much better to query the sys.partitions table to get the count:

-- Do this:
SELECT SUM(rows) FROM sys.partitions where index_id IN (0,1)
AND object_id = (SELECT object_id FROM sys.tables WHERE name = 'Addresses')
-- Instead of this:
SELECT COUNT(*) FROM Addresses

Aaron’s final two points are to ensure we don’t overuse the NOLOCK statement.  It’s a magic “go-faster stripes” turbo button for your query but it will produce inaccurate results.  This is fine if, for example, you only  need a “ballpark” row count, however, it’s almost always better to use a scope-levelled READ COMMITED SNAPSHOT isolation level for your query instead.  This must be tested, though, as this can place a heavy load on the tempdb.  Finally, we should remember to always wrap every query we do with a BEGIN TRANSACTION and a COMMIT/ROLLBACK transaction.  Remember – SQL Server doesn’t have an “undo” button!  And it’s perfectly fine to simply BEGIN a transaction when writing ad-hoc queries in SQL Server Management Studio, even if we don’t explicitly close it straight away.  The transaction will remain so long as the connection remains open, so we can always manually perform the commit or the rollback at a slightly later point in time.

And with that, Aaron’s session was over.  An excellent and informative start to the day.

IMG_20160507_073553After a coffee break, during which time there was some left over breakfast bacon sandwiches available for those people who fancied a second breakfast, it was time to head off to the next session.  For this one, I’d chosen something a little leftfield.  This wasn’t a session directly based upon technology, but rather was a session based upon employment within the field of technology.  This was Alex Whittle’s Permy, Contractor Or Freelance.

Alex’s session was focused on how we might get employed within the technology sector, the various options open to us in gaining meaningful employment and the pros and cons associated with each approach.

Alex starts his talk by introducing himself and talking us through his own career history so far.  He started as an employee developer, then a team lead and then director of software before branching out on his own to become a contractor.  From there, he became a freelancer and finally started his own consultancy company.

Alex talks about an employer’s expectations for the various types of working relationship.  For permanent employees, the focus is very much on your overall personality, attitude and ability to learn.  Employers are making a long term bet with a permanent employee.  For contractors, it’s your existing experience in a given technology or specific industry that will appeal most to the client.  They’re looking for someone who can deliver without needing any training “on-the-job” although you’ll get time to “figure it out” whilst you’re there.  You’ll also have “tech-level” conversations with your client, so largely avoiding the politics that can come with a permanent role.  Finally, as a freelancer, you’ll be engaged because of your technical expertise and your ability to deliver quickly.  You’re expected to be a business  expert too and you’re engagement will revolve around “senior management/CxO” level conversations with the client.

Alex moves on to discuss the various ways of marketing yourself based upon the working relationship.  For permanent employees its recruitment agencies, LinkedIn and keeping your CV up to date.  You’re main marketing point is your stability so you’re CV needs to show a list of jobs with good lengths of tenure for each one.  One or two shorter tenures is acceptable, but you’ll need to be able to potentially explain it well to a prospective employer.  For contractors, it’s much the same avenues for marketing, recruitment agencies, LinkedIn and a good CV, but here the focus is quite different.  Firstly, a contractor’s CV can be much longer than a permanent employee’s CV, which is usually limited to 3 pages.  A contractors CV can be up to 4-6 pages long and should highlight relevant technical and industry experience as well as show contract extensions and renewals (although older roles should be in summary only).  For freelancers, it’s not really about your CV at all.  Clients are now not interesting in you per-say, they’re interested in your company.  This is where company reputation and you’re ability to really sell the company itself has the biggest impact.  For all working relationships, one of the biggest factors is networking.  Networking will lead to contacts, which will lead to roles.  Never underestimate the power of simply speaking to people!

We then move on to talk about cash flow in the various types of working relationship.  Alex states how for permanent employees, there’s long term stability, holiday and sickness pay and also a pension.  It’s the “safest” and lowest stress option.  For contractors, cash flow has medium term stability.  There’s no holiday or sickness pay and you’d need to pay for your own pension.  You need to build a good cash buffer of at least 6 months living expenses, but you can probably get started on the contracting road with only 1 or 2 months of cash buffer.  Finally, the freelance option is the least “safe” and has the least stability of cash flow.  It’s often very “spiky” and can range of short periods of good income interspersed with longer periods of little or no income.  For this reason, it’s essential to build a cash buffer of at least 12 months living expenses, although the quieter income periods can be mitigated by taking on short term contracts.

Alex shares details on the time when he had quit his permanent job to go contracting.  He says he sent around 20-30 CV’s to various contract job per week for the first 3 weeks but didn’t get a single interview.  A helpful recruiter eventually told him that it was probably largely to do with the layout of his CV.  This recruiter spent 30 minutes with him on the phone, helping him to reformat his CV after which he sent out another 10 CV to various contract roles and got almost 10 interviews as a result!

We continue by looking into differences in accounting structures between the various working types.  As a permanent employee, there’s nothing to worry about at all here, it’s all sorted for you as a PAYE employee.  As a contractor, you’ll send out invoices usually once a month, but since you’ll rarely have more than one client at a time, the invoicing requirements are fairly simple.  You will need to do real-time PAYE returns as you’ll be both a director and employee of your Ltd. company and you’ll need to perform year-end tax returns and quarterly VAT returns, however, you can use the flat-rate VAT scheme if it’s applicable to you.  This can boost your income as you charge your clients VAT at 20% but only have to pay 14.5% to HMRC!  As a freelancer, you’ll also be sending out invoices, however, you may have more than one client at a time so you may have multiple invoices per month thereby requiring better management of them (such software as Xero or Quickbooks can help here).  One useful tip that Alex shares at this point is that, as a freelancer, it can be very beneficial to join the Federation of Small Businesses (FSB) as they can help to pay for things like tax investigations, should you ever receive one.

Alex then talks about how you can, as an independent contractor, either operate as a sole-trader, work for an umbrella company, or can run your own Limited company.  Limited company is usually the best route to go down as Limited companies are entirely separate legal entities so you’re more protected personally (although not from things like malpractice), however, the previous tax efficiency of company dividends that used to be enjoyed by Ltd’s no longer applies due to the loophole in the law being closed.  As a sole trader, you are the company – the same legal entity, so you can’t be VAT registered and your not personally protected from liability.  When working for an umbrella company, you become a permanent employee of the umbrella company. They invoice on your behalf and pay your PAYE.  This affords you the same protection as any other employee and takes away some of the management of invoicing etc. however, this is probably the least cost efficient way of working since the umbrella company will take a cut of your earnings.

We then move onto the thorny issue of IR35. This is legislation that designed to catch contractors who are really operating more as “disguised employees”.  IR35 is constantly evolving and application by HMRC can be inconsistent.  The best ways to mitigate being “caught” inside of IR35 legislation are to perform tasks that an employee does not do.  For example, advertising your business differentiates you from an employee, ensuring your contracts have a “right of substitution” (whereby the actual worker/person performing the work can be changed), having multiple contracts at any one time – whilst sometimes difficult for a contractor to achieve - can greatly help, showing that you are taking on risk (especially financial risk) along with being able to show that you don’t receive any benefits from the engagement as an employee would do (for example, no sick pay).

Finally, Alex asks, “When should you change?”  He puts a number of questions forward that we’d each need to answer for ourselves.  Are you happy with your current way of working?  Understand the relative balance of income versus stress from the various working practices.  Define your goals regarding work/life balance.  Ask yourself why you would want to change, how do you stand to benefit?  Where do you live?  Be aware that very often, contracts may not be readily available in your area, and that you may need to travel considerable distance (or even stay away from home during the working week), and finally, Alex asks that you ask yourself, “Are you good enough?”.  Alex closes by re-stating the key takeaways.  Enjoy your job, figure out your goals, increase your profile, network, remember that change can be good – but only for the right reasons, and start now – don’t wait.

IMG_20160507_105119

After another coffee break following Alex’s session, it’s time for the next one.  This one was Lori Edwards’ SQL Server Statistics – What are the chances?

Lori opens by asking “What are statistics?”.  Just as Indexes provide a “path” to find some data, usually based upon a single column, statistics contain information relating to the distribution of data within a column across the entire set of rows within the table.  Statistics are always created when you create an index, but you can create statistics without needing an index.

Statistics can help with predicates in your SQL Server queries.  Predicates are the conditions within your WHERE or ORDER BY clauses.  Statistics contain information about density, which refers to the number of unique values in the column along with cardinality which refers to the uniqueness of a given value.  There’s a number of different ways to create statistics, you can simply add an index, you can use AUTO CREATE STATISTICS and CREATE STATISTICS directives as well as using a system stored procedure, sp_createstats.  If you’re querying on a column, statistics for that column will be automatically created for you if they don’t already exist, however, if you anticipate heavy querying utilising a given column, it’s best to ensure that statistics are created ahead of time.

Statistics are quite small and don’t take up as much space as indexes.  You can view statistics by running the sp_helpstats system stored procedure or you can query the sys.stats system table or even the sys.dm_db_stats table.  The best way of examining statistics, though, is to use the database console command, DBCC SHOW_STATISTICS.  When viewing statistics, low density values indicate a low level of uniqueness.  Statistics histograms show a lot of data, RANGE_HI_KEY is the highest key value, whilst RANGE_ROWS indicates how many rows there are between the HI_KEYS in different column values.

The SQL Server Query Optimizer uses statistics heavily to generate the optimized query plan.  Note, though, that optimized query plans are necessarily optimal for every situation, they’re the most optimal general purpose plans.  It’s purpose is to come up with a good plan, fast, and statistics are necessary for this to be able to happen.  To make the most of the cardinality estimates from statistics, it’s best ensure you use parameters to queries and stored procedures, use temp tables where necessary and keep column orders consistent.  Table variables and table-valued parameters can negatively affect cardinality.  Whether the query optimizer selects a serial or parallel plan can be affected by cardinality, as can the choice to use an index seek versus an index scan.  Join algorithms (i.e. hash match, nested loops etc.) can also be affected.

From here, the query optimizer will decide how much memory it thinks it needs for a given plan, so memory grants are important.  Memory grants are effectively the cost of the operation multiplied by the number of rows that the operation is performed against, therefore, it’s important for the query optimizer to have accurate row count data from the statistics. 

2016-06-07 21_42_25-I5xLz.png (766×432)One handy tip that Lori shares is in interpreting some of the data from the “yellow pop-up” box when hovering over certain parts of a query plan in SQL Server Management Studio.  She states how the “Estimated Number Of Rows” is what the table’s statistics say there are, whilst the “Actual Number Of Rows” are what the query plan actually encountered within the table.  If there’s a big discrepancy between these values, you’ll probably need to update the statistics for the table!

Statistics are automatically updating by SQL Server, although, they’re only updated after a certain amount of data has been added or updated within the table.  You can manually update statistics yourself by calling the sp_updatestats system stored procedure.

By default, tables inside a database will have AUTO UPDATE STATISTICS switched on, which is what causes the statistics to be updated automatically by SQL Server occasionally – usually after around 500 rows or 20% of the size of the table have been added/modified.  It’s usually best to leave this turned on, however, if you’re dealing with a table that contains a very large number of rows and has either many new rows added or many rows modified, it may be better to turn off the automatic updating of statistics and perform the updates manually after either a specific number of modifications or at certain appropriate times.

Finally, it’s important to remember that whenever statistics are updated or recomputed, any execution plans built on those statistics that were previously cached will be invalidated.  They’ll need to be recompiled and re-cached.

After Lori’s session, there’s another quick coffee break, and then it’s on to the next session.  This one was Mark Broadbent’s Lock, Block & Two Smoking Barrels.  Mark’s session focused on SQL Server locks.  Different types of locks, how they’re acquired and how to best design our queries and applications to ensure we don’t lock data for any longer than we need to.

Mark first talks about SQL Server’s transactions.  He explains that transactions are not committed to the the transaction logs immediately.  They are processed through in-memory buffers first before being flushed to disk.  Moreover, the logs need to grow to a certain size before they get flushed to disk so there’s always a possibility of executing a COMMIT TRANSACTION statement yet the transaction isn’t visible within the transaction log until sometime later.  The transaction being available in the transaction log is the D in ACID – Durability, but Mark highlights that it’s really delayed durability.

IMG_20160507_125814Next, Mark talks about concurrency versus correctness. He reminds us of some of the laws of concurrency control.  The first is that concurrent execution should not cause application programs to malfunction. The second is that concurrent execution should not have lower throughput or higher response times than serial execution.  To balance concurrency and correctness, SQL Server uses isolation, and there are numerous isolation levels available available to us, all of which offer differing levels of concurrency versus correctness.

Mark continues by stating that SQL Server attempts to perform our queries in as serial a manner as possible, and it uses a technique called transaction interleaving in order to achieve this between multiple concurrent and independent transactions.  Isolation levels attempt to solve the interleaving dependency problems.  They can’t completely cure them, but they can reduce the issues caused by interleaving dependencies.  Isolation levels can be set at the statement, transaction or session levels.  There are 4 types defined by the ANSI standards, but SQL Server 2005 (and above) offer a fifth level.  It’s important to remember that not all isolation levels can be used everywhere, for example, the FILESTREAM data type is limited in the isolation levels that it supports.

We’re told how SQL Server’s locks are two-phased and are considered so if every LOCK is succeeded by an UNLOCK.  SQL Server has different levels of locks, and they can exist at a various levels of granularity from row locks, to page locks all the way up to table locks.  When SQL Server has to examine existing locks in order to acquire a new or additional lock, it will only ever compare locks on the same resource.  This means that row locks are only ever compared to other row locks, page locks compared to other page locks and table locks to other table locks.  They’re all separate.  That said, SQL Server will automatically perform lock escalation when certain conditions occur, so when SQL Server has acquired more than 5000 other locks of either row or page type, it will escalate those locks to a single table level lock.  Table locks are the least granular kind of lock and a very bad for performance and concurrency within SQL Server – basically the one query that holds the table level lock prevents any other query from accessing that table.  For this reason, it’s important to ensure that our queries are written in such a way as to minimize the locks that they need, and to ensure that when they do require locks that those locks as granular as can be.  Update locks will allow multiple updates against the same table and/or rows.  They’re compatible with shared locks but not other update locks or exclusive locks so it’s worth bearing in mind how many concurrent writes we attempt to make to our data.

Mark continues to show us some sample query code that demonstrates how some simple looking queries can cause concurrency problems and can result in lost updates to our data.  For example, Mark shows us the following query:

BEGIN TRAN T1
SELECT @newquantity = quantity FROM basket
SET @newquantity = @newquantity + 1
UPDATE some_other_table SET quantity = @newquantity
COMMIT TRAN T1

The above query can fail badly, with the required UPDATE being lost if multiple running transaction perform this query at the same time.  This is due to transaction interleaving.  This results in two SELECTs which happen simultaneously and acquire the quantity value, but the two UPDATEs get performed in interleaved transactions which means that the second UPDATE that runs is using stale data to update, effectively “overwriting” the first UPDATE (so the final newquantity value is one less than it should be).  The solution to this problem is to perform the quantity incrementing in-line within the UPDATE statement itself:

BEGIN TRAN T1
UPDATE some_other_table SET quantity = t2.newquantity FROM (SELECT quantity + 1 FROM basket) t2
COMMIT TRAN T1

Reducing the number of statements needed to perform some given function on our data is always the best approach.  It means our queries are being as granular as they can be, proving us with better atomic isolation and thereby reducing the necessity to interleave transactions.

IMG_20160507_133930After Mark’s session was over, it was time for lunch.  Lunch at the SQLBits conferences in previous years has always been excellent with a number of choices of hot, cooked food being available and this year was no different.  There was a choice of 3 meals, cottage pie with potato wedges, Moroccan chicken with couscous or a vegetarian option (I’m not quite sure what that was, unfortunately), each of which could be finished off with one of a wide selection of cakes and desserts!

IMG_20160507_123500I elected to go for the Moroccan chicken, which was delicious, and plumped for a very nice raspberry ripple creamy yoghurt.  An excellent lunch, as ever!

During lunch, I managed to catch up with a few old friends and colleagues who had also attended the conference, as well as talking to a few newly made acquaintances whilst wandering around the conference floor and the various sponsors stands.

After a good wander around, during which I was able to acquire ever more swag from the sponsors, it was soon time for the afternoon’s sessions.  There were only two more sessions left within the day, it now being around 14:30 after the late lunch hour was over, I headed off to find the correct “dome” for the first of the afternoon’s sessions, Erland Sommarskog’s Dynamic Search Conditions.

Erland’s talk will highlight the best approaches when dealing with dynamic WHERE and ORDER BY clauses in SQL Server queries, something that I’m sure most developers have had to deal with at some time or another.  For this talk, Erland will use his own Northgale database, which is the same schema as Microsoft’s old Northwind database, but with a huge amount of additional data added to it!

Erland first starts off by warning us about filtered indexes.  These are indexes that themselves have a WHERE condition attached to them (i.e. WHERE value <> [somevalue]) as these tend not to play very well with dynamic queries.  Erland continues by talking about how SQL Server will deal with parameters to queries.  It will perform parameter “sniffing” to determine how best to optimize a static query by closely examining the actual parameters we’re supplying.  Erland shows us both a good and bad example:  WHERE xxx = ISNULL(@xxx,xxx) versus WHERE xxx = (xxx = @xxx OR xxx IS NULL).  He explains how the intended query will fail if you use ISNULL in this situation.  We’re told how the SQL Server query optimizer doesn’t look at the stored procedure itself, so it really has no way of knowing if any parameters we pass in to it are altered or modified in any way by the stored procedures code.  For this reason, SQL Server must generate a query plan that is optimized for any and all possible values.  This is likely to be somewhat suboptimal for most of the parameters we’re likely to supply.  It’s for this reason that the execution plan can show things like an index scan against an index on a “FromDate” datetime column even if that parameter is not being passed to the stored procedure.  When we’re supplying only a subset of parameters for a stored procedure with many optional parameters, it’s often best to use the OPTION RECOMPILE statement to force a recompilation of the query every time it’s called.  This way, the execution plan is regenerated based upon the exact parameters in use for that call.  It’s important to note, however, that recompiling queries is an expensive operation, so it’s best to measure exactly how often you’ll need to perform such queries.  If you’re calling this query very frequently, you may well get the best performance from using purely dynamic SQL.

Erland then moves on to discuss dynamically ordering data.  He states that the CASE statement inside the ORDER BY clause is the best way to achieve this, for example: ORDER BY CASE @sortcolumn WHEN ‘OrderID’ THEN [Id] END, CASE @sortcolumn = ‘OrderDate’ THEN [Date] END…..etc.  This is a great way to achieve sorting my dynamic columns, however, there’s a gotcha with this method and that is that you have to be very careful of datatype differences between the columns in the different case clauses as this can often lead to error.

Next, we look at the permissions required in order to use such dynamic SQL and Erland says that it’s important to remember that any user who wishes to run such a dynamic query will require permissions to access the underlying table(s) upon which the dynamic query is based.  This differs from (say) a stored procedure where the user only need permissions to the stored procedure and not necessarily the underlying table upon which the stored procedure is based.  One trick that can be used to gain somewhat the best of both of these approaches is to use the sp_executesql system stored procedure.  Using this will create a nameless stored procedure from your query, it will cache it and execute it.  The stored cache can then be re-used on subsequent calls to the query with the nameless stored procedure being identified based upon a hash of the the query content itself.

Another good point that Erland mentions is to ensure that all SQL server objects (tables, functions etc.) referenced within a dynamic query should always be prefixed with the full schema name and not just referenced by the object name (i.e. use [dbo].[SomeTable] rather than [SomeTable]).  This is important as different users who run your dynamic SQL code could be using different default schemas – if they are and you haven’t specified the schema explicitly, the query will fail.

Erland also mentions that one very handy tip with dynamic queries is to always include a @debug input parameter of datatype bit, that can have a default setting of 0 (off).  It’ll allow you to always specify this parameter and pass in a value of 1 (on) to ensure that code such as IF @debug PRINT @sql will be run allowing you to output the actual T-SQL query generated by the dynamic code.  Erland says that you will need this eventually, so it’s always best to build it in from the start.

When building up your dynamic WHERE clause, one tricky condition is to know whether to add an AND at the beginning of the condition if you’re adding the 2nd or higher condition (the first condition of the WHERE clause won’t need the AND to be prepended of course).  One simple way around this is to make it so that all of the dynamically added WHERE clauses are always the 2nd or higher numbered condition by statically creating the first WHERE clause condition in your code as something benign such as “WHERE 1 = 1”.  This, of course, matches all records and all subsequently added WHERE clauses can always be prefixed with an AND, for example, “IF @CustomerPostCode THEN @sql += “ AND Postcode LIKE …..”, also it’s important to always add parameters into the dynamic SQL rather than concatenating values (i.e. avoid doing @sql += ‘ AND OrderId = ‘” + @OrderId + “’) as this will mess with the query optimizer and your generated queries will be less efficient overall as a result.  Moreover, raw concatenation of values can be a vector for SQL injection attacks.  For this same reason, you should always translate the values that you’ll use for WHERE and ORDER BY clauses that are passed into your stored procedure.  Translate the passed parameter value to a specific hard-coded value that you explicitly control.  Don’t just use the passed in parameter value directly.

Occasionally, it can be a useful optimization to inline some WHERE clause values in order to force a whole new query plan to be cached.  This is useful in the scenario when, for example, you're querying by order city and 60% of all orders are in same city.  You can inline that one city value to have a cached plan just for that city and a different single cached plan for all other cities.

Finally, for complex grouping, aggregation and the dynamic selection of the columns returned from the query, Erland says is often easiest to and more robust to construct these kind of queries in the client application code rather than in a dynamic SQL producing stored procedure.  One caveat around this is to ensure that you perform the entirety of your query client-side (or entirely server-side if you must) – don’t try to mix and match by performing some client-side and some server-side.

IMG_20160507_154920And with this, Erland’s session on dynamic SQL search conditions is complete.  After yet another short coffee break, we’re ready for the final session of what has been a long, but information-packed day.  And for the final session, I decided to attend Simon D’Morias’ “What is DevOps For Databases?”

Simon starts with explaining the term “DevOps” and reminds us that it’s the blurring of lines between the two traditionally separate disciplines of development and operations.  DevOps means that developers are far closer to the “operations” side of applications which frequently means getting involved with deployments, infrastructure and a continuous delivery process.  DevOps is about automation of application integration and deployment, provably and reliably.

Simon shows the three pillars upon which a successful DevOps process is built.  Develop, Deploy & Measure.  We develop some software, deploy it and the then measure the performance and reliability of the deployment.   From this measurement we can better plan and can thus feed this back into the next iteration of the cycle.  We’re told that to make these iterations work successfully, we need to keep changes small. From small changes, rather than larger ones, we can keep deployment simple and fast.  It allows us to gather frequent feedback on the process and allows continuous improvement of the deployment process itself.  With the teams behind the software (development, operations etc.) being more integrated, there’s a greater spread of knowledge about the software itself, the changes to the software in a given development/deployment cycle which improves early feedback.  Automation of these systems also ensures that the deployment is made easier and thus also contributes to better and earlier feedback.

When it comes to databases, DBA’s and other database professionals are frequently nervous about automating any changes to production databases, however, by keeping changes small and to a minimum within a given deployment cycle, and by having a continuously improving robust process for performing that deployment, we can ensure that each change is less risky than if we performed a single large change or upgrade to the system.  Continuous deployments also allow for detecting failures fast, which is a good thing to have.  We don’t want failures caused by changes to take a long time before they surface and we’re made aware of them.  Failing fast allows easy rollback and reliability of the process enables automation which further reduces risk.  Naturally, monitoring plays a large part of this and a comprehensive monitoring infrastructure allows detection of issues and failures and allows improves in reliability over time which, again, further reduces risk.

Simon moves on to discuss the things that can break DevOps.  Unreliability is one major factor that can break a DevOps process as even something running at 95% reliability is no good.  That 5% failure rate will kill you.  Requiring approval within the deployment chain (i.e. some manual approval, governance or compliance process) will break continuity and is a potential bottleneck for a successful DevOps deployment iteration also.  A greater “distance” between the development, operations and other teams will impact their ability to be knowledgeable about the changes being made and deployed.  This will negatively impact the team’s ability to troubleshoot and issues in the process, hindering the improvement of reliability.

IMG_20160507_164439It can often be difficult to know where to start with moving to an automated and continuous DevOps process.  The first step to to ensure we can “build ourselves a pipeline to live” – this is a complete end-to-end automated continuous integration process.  There are numerous tools available to help with this and SQL Server itself assists in this regard with the ability to package SQL server objects into a DACPAC package.  Simon insists that attempting to proceed with only a partial implementation of this process will not work.  It’s an all or nothing endeavour. Automating deployments to development and test environments, but not to the production environment (as some more nervous people may be inclined to do) is like building only half and bridge across a chasm!  Half a bridge is no bridge at all!

Simon concludes by showing us a quick demo of a simple continuous deployment process using Visual Studio to make some local database changes, which are committed to version control using Git and then pushed to Visual Studio Team Services (previously known as Visual Studio Online) which performs the “build” of the database objects and packages this into a DACPAC package.  This package is then automatically pushed to an Azure DB for deployment.

Finally, Simon suggests that one of the the best ways to ensure that our continuous deployment process is consistent and reliable is to ensure that there are minimal differences (ideally, no differences) between our various environments (development, test, staging, production etc.), and especially between our staging and production environments.

After Simon’s session was over, it was time for all of the conference attendees to gather in the main part of the exhibition hall and listen as one of the conference organisers read out those people who had won prizes by filling in forms and entering competitions run by each of the conference sponsors.  I didn’t win a prize, and actually, had entered very few competitions having been far too busy either attending the many sessions or drinking copious amounts of coffee in between them!  Once the prizes were all dished out, it was time for yet another fantastic SQLBits conference to sadly end.  It had been a long, but fantastic day at another superbly organised and run SQLBits conference.  Here’s hoping next year’s conference is even better!