DDD 12 In Review

IMG_20170610_085641On Saturday 10th June 2017 in sunny Reading, the 12th DeveloperDeveloperDeveloper event was held at Microsoft’s UK headquarters.  The DDD events started at the Microsoft HQ back in 2005 and after some years away and a successful revival in 2016, this year’s DDD event was another great occasion.

I’d travelled down to Reading the evening before, staying over in a B&B in order to be able to get to the Thames Valley Park venue bright and early on the Saturday morning.  I arrived at the venue and parked my car in the ample parking spaces available on Microsoft’s campus and headed into Building 3 for the conference.  I collected my badge at reception and was guided through to the main lounge area where an excellent breakfast selection awaited the attendees.  Having already had a lovely big breakfast at the B&B where I was staying, I simply decided on another cup of coffee, but the excellent breakfast selection was very well appreciated by the other attendees of the event.

IMG_20170610_085843

I found a corner in which to drink my coffee and double-check the agenda sheets that we’d been provided with as we entered the building.   There’d been no changes since I’d printed out my own copy of the agenda a few nights earlier, so I was happy with the selections of talks that I’d chosen to attend.   It’s always tricky at the DDD events as there are usually at least 3 parallel tracks of talks and invariably there will be at least one or two timeslots where multiple talks that you’d really like to attend will overlap.

IMG_20170611_104806Occasionally, these talks will resurface at another DDD event (some of the talks and speakers at this DDD in Reading had given the same or similar talks in the recently held DDD South West in Bristol back in May 2017) and so, if you’re really good at scheduling, you can often catch a talk at a subsequent event that you’d missed at an earlier one!

As I finished off my coffee, more attendees turned up and the lounge area was now quite full.  It wasn’t too much longer before we were directed by the venue staff to make our way to the relevant conference rooms for our first session of the day.

Wanting to try something a little different, I’d picked Andy Pike’s Introducing Elixir: Self-Healing Applications at ZOMG scale as my first session.

Andy started his sessions by talking about where Elixir, the language came from.  It’s a new language that is built on top of the Erlang virtual machine (BEAM VM) and so has it’s roots in Erlang.  Like Erlang, Elixir compiles down to the same bytecode that the VM ultimately runs.  Erlang was originally created at the Ericsson company in order to run their telecommunications systems.  As such, Erlang is built on top of a collection of middleware and libraries known as OTP or Open Telecom Platform.  Due to its very nature, Erlang and thus Elixir has very strong concurrency, fault tolerance and distributed computing at it’s very core.  Although Elixir is a “new” language in terms perhaps of it’s adoption, it’s actually been around for approximately 5 years now and is currently on version 1.4 with version 1.5 not too far off in the future.

Andy tells of how Erlang is used by WhatsApp to run it’s global messaging system.  He provides some numbers around this.  WhatsApp have 1.2 billion users, process around 42 billion messages per day and can manage to handle around 2 million connections on each server!   That’s some impressive performance and concurrency figures and Andy is certainly right when he states that very few other, if any, platforms and languages can boast such impressive statistics.

Elixir is a functional language and its syntax is heavily inspired by Ruby.  The Elixir language was first designed by José Valim, who was a core contributor in the Ruby community.  Andy talks about the Elixir REPL that ships in the Elixir installation which is called “iex”.  He shows us some slides of simple REPL commands showing that Elixir supports all the same basic intrinsic types that you’d expect of any modern language, integers, strings, tuples and maps.  At this point, things look very similar to most other high level functional (and perhaps some not-quite-so-functional) languages, such as F# or even Python.

Andy then shows us something that appears to be an assignment operator, but is actually a match operator:

a = 1

Andy explains that this is not assigning 1 to the variable ‘a’, but is “matching” a against the constant value 1.  If ‘a’ has no value, Elixir will bind the right-hand side operand to the variable, then perform the same match.  An alternative pattern is:

^a = 1

which performs the matching without the initial binding.  Andy goes on to show how this pattern matching can work to bind variables to values in a list.  For example, given the code:

success = { :ok, 42 }
{ ;ok, result } = success

this will bind the value of 42 to the variable result and subsequently perform the match of the tuple to the variable ‘success’ which now matches.  We’re told how the colon in front of the ok variable makes it an “atom”.  This is similar to a constant, where the variables name is it’s own value.

IMG_20170610_095341Andy shows how Elixir’s code is grouped into functions and functions can be contained within modules.  This is influenced by Ruby and how it also groups it’s code.  We then move on to look at lists.  These are handled in a very similar way to most other functional languages in that a list is merely seen as a “head” and a “tail”.  The “head” is the first value in the list and the “tail” is the entire rest of the list.  When processing the items in a list in Elixir, you process the head and then, perhaps recursively, call the same method passing the list “tail”.  This allows a gradual shortening of the list as the “head” is effectively removed with each pass through the list.  In order for such recursive processing to be performant, Elixir includes tail-call optimisation which allows the compiler to eliminate the necessity of maintaining state through each successive call to the method.  This is possible when the last line of code in the method is the recursive call.

Elixir also has guard clauses built right into the language.  Code such as:

def what_is(x) when is_number(x) and X > 0 do…

helps to ensure that code is more robust by only being invoked when ‘x’ is not only a number but also has some specific value too.  Andy states that, between the usage of such guard clauses and pattern matching, you can probably eliminate around 90-95% of all conditionals within your code (i.e. if x then y).

Elixir is very expressive within it’s allowed characters for function names, so functions can (and often do) have things like question marks in their name.  It’s a convention of the language that methods that return a Boolean value should end in a question mark, something shared with Ruby also, i.e. String.contains? "elixir of life", "of"  And of course, Elixir, like most other functional languages has a pipe operator(|>) which allows the piping of the result of one function call into the input of another function call, so instead of writing:

text = "THIS IS MY STRING"
text = String.downcase(text)
text = String.reverse(text)
IO.puts text

Which forces us to continually repeat the “text” variable, we can instead write the same code like this:

text = "THIS IS MY STRING"
|> String.downcase
|> String.reverse
|> IO.puts

Andy then moves on to show us an element of the Elixir language that I found particular intriguing, doctests.  Elixir makes function documentation a first class citizen within the language and not only does the doctest code provide documentation for the function – Elixir has a function, h, that when passed another function name as a parameter, will display the help for that function – but also serves as a unit test for the function, too!  Here’s a sample of an Elixir function containing some doctest code:

defmodule MyString do
   @doc ~S"""
   Converts a string to uppercase.

   ## Examples
       iex> MyString.upcase("andy")
       "ANDY"
   """
   def upcase(string) do
     String.upcase(string)
   end
end

The doctest code not only shows the textual help text that is shown if the user invokes the help function for the method (i.e. “h MyString”) but the examples contained within the help text can be executed as part of a doctest unit test for the MyString method:

defmodule MyStringTest do
   use ExUnit.Case, async: true
   doctest MyString
end

This above code uses the doctest code inside the MyString method to invoke each of the provided “example” calls and assert that the output is the same as that defined within the doctest!

After taking a look into the various language features, Andy moves on talk about the real power of Elixir which it inherits from it’s Erlang heritage – processes.  It’s processes that provide Erlang, and thus Elixir with the ability to scale massively, provide it with fault-tolerance and its highly distributed features.

Wen Elixir functions are invoked, they can effectively be “wrapped” within a process.  This involves spawning a process that contains the function.  Processes are not the same as Operating System processes, but are much more lightweight and are effectively only a C struct that contains a pointer to the function to call, some memory and mailbox (which will hold messages sent to the function).  Processes have a Process ID (PID) and will, once spawned, continue to run until the function contained within terminates or some error or exception occurs.  Processes can communicate with other processes by passing messages to those processes.  Here’s an example of a very simple module containing a single function and how that function can be called by spawning a separate process:

defmodule HelloWorld do
     def greet do
		IO.puts "Hello World"
     end
end

HelloWorld.greet                 # This is a normal function call.
pid = spawn(HelloWorld, :greet)  # This spawns a process containing the function

Messages are sent to processes by invoking the “send” function, providing the PID and the parameters to send to the function:

send pid, { :greet, “Andy” }

This means that invoking functions in processes is almost as simple as invoking a local function.

Elixir uses the concept of schedulers to actually execute processes.  The Beam VM will supply one scheduler per core of CPU available, giving the ability to run highly concurrently.  Elixir also uses supervisors as part of the Beam VM which can monitor processes (or even monitor other supervisors) and can kill processes if they misbehave in unexpected ways.  Supervisors can be configured with a “strategy”, which allows them to deal with errant processes in specific ways.  One common strategy is one_for_one which means that if a given process dies, a single new one is restarted in it’s place.

Andy then talks about the OTP heritage of Elixir and Erlang and from this there is a concept of a “GenServer”.  A GenServer is a module within Elixir that provides a consistent way to implement processes.  The Elixir documentation states:

A GenServer is a process like any other Elixir process and it can be used to keep state, execute code asynchronously and so on. The advantage of using a generic server process (GenServer) implemented using this module is that it will have a standard set of interface functions and include functionality for tracing and error reporting. It will also fit into a supervision tree.

The GenServer provides a common set of interfaces and API’s that all processes can adhere to, allowing common conventions such as the ability to stop a process, which is frequently implemented like so:

GenServer.cast(pid, :stop)

Andy then talks about “nodes”.  Nodes are separate actual machines that can run Elixir and the Beam VM and these nodes can be clustered together.  Once clustered, a node can start a process not only on it’s own node, but on another node entirely.  Communication between processes, irrespective of the node that the process is running on is handled seamlessly by the Beam VM itself.  This provides Elixir solutions great scalability, robustness and fault-tolerance.

Andy mentions how Elixir has it’s own package manager called “hex” which gives access to a large range of packages providing lots of great functionality.  There’s “Mix”, which is a build tool for Elixir, OTP Observer for inspection of IO, memory and CPU usage by a node, along with “ETS”, which is an in-memory key-value store, similar to Redis, just to name a few.

Andy shares some book information for those of us who may wish to know more.  He suggests “Programming Elixir” and “Programming Phoenix”, both part of the Pragmatic Programmers series and also “The Little Elixir & OTP Guidebook” from Manning.

Finally, Andy wraps up by sharing a story of a US-based Sports news website called “Bleacher Report”.  The Bleacher Report serves up around 1.5 billion pages per day and is a very popular website both in the USA and internationally.  Their entire web application was originally built on Ruby on Rails and they required approximately 150 servers in order to meet the demand for the load.  Eventually, they re-wrote their application using Elixir.  They now serve up the same load using only 5 servers.  Not only have they reduced their servers by an enormous factor, but they believe that with 5 servers, they’re actually over-provisioned as the 5 servers are able to handle the load very easily.  High praise indeed for Elixir and the BeamVM.  Andy has blogged about his talk here.

After Andy’s talk, it was time for the first coffee break.  Amazingly, there were still some breakfast sandwiches left over from earlier in the morning, which many attendees enjoyed.  Since I was still quite full from my own breakfast, I decided a quick cup of coffee was in order before heading back to the conference rooms for the next session.  This one was Sandeep Singh’s Goodbye REST; Hello GraphQL.

IMG_20170610_103642

Sandeep’s session is all about the relatively new technology of GraphQL.  It’s a query language for your API comprising of a server-side runtime for processing queries along with a client-side framework providing an in-browser IDE, called GraphiQL.  One thing that Sandeep is quick to point out is that GraphQL has nothing to do with Graph databases.  It can certainly act as a query layer over the top of a graph database, but can just as easily query any kind of underlying data such as RDBMS’s through to even flat-file data.

Sandeep first talks about where we are today with API technologies.  There’s many of them, XML-RPC, REST, ODATA etc. but they all have their pros and cons.  We explore REST in a bit more detail, as that’s a very popular modern day architectural style of API.  REST is all about resources and working with those resources via nouns (the name of the resource) and verbs (HTTP verbs such as POST, GET etc.), there’s also HATEOAS if your API is “fully” compliant with REST.

Sandeep talks about some of the potential drawbacks with a REST API.   There’s the problem of under-fetching.  This is seen when a given application’s page is required to show multiple resources at once, perhaps a customer along with recently purchased products and a list of recent orders.  Since this is three different resources, we would usually have to perform three distinct different API calls in order to retrieve all of the data required for this page, which is not the most performant way of retrieving the data.  There’s also the problem of over-fetching.  This is where REST API’s can take a parameter to instruct them to include additional data in the response (i.e. /api/customers/1/?include=products,orders), however, this often results in data that is additional to that which is required.  We’re also exposing our REST endpoint to potential abuse as people can add arbitrary inclusions to the endpoint call.  One way to get around the problems of under or over fetching is to create ad-hoc custom endpoints that retrieve the exact set of data required, however, this can quickly become unmaintainable over time as the sheer number of these ad-hoc endpoints grows.

IMG_20170610_110327GraphQL isn’t an architectural style, but is a query language that sits on top of your existing API. This means that the client of your API now has access to a far more expressive way of querying your API’s data.  GraphQL responses are, by default, JSON but can be configured to return XML instead if required, the input queries themselves are similarly structured to JSON.  Here’s a quick example of a GraphQL query and the kind of data the query might return:

{
	customer {
		name,
		address
	}
}
{
	"data" : {
		"customer" : {
			"name" : "Acme Ltd.",
			"address" : ["123 High Street", "Anytown", "Anywhere"]
		}
	}
}

GraphQL works by sending the query to the server where it’s translated by the GraphQL server-side library.  From here, the query details are passed on to code that you have written on the server in order that the query can be executed.  You’ll write your own types and resolvers for this purpose.  Types provide the GraphQL queries with the types/classes that are available for querying – this means that all GraphQL queries are strongly-typed.  Resolvers tell the GraphQL framework how to turn the values provided by the query into calls against your underlying API.  GraphQL has a complete type system within it, so it supports all of the standard intrinsic types that you’d expect such as strings, integers, floats etc. but also enums, lists and unions.

Sandeep looks at the implementations of GraphQL and explains that it started life as a NodeJS package.  It’s heritage is therefore in JavaScript, however, he also states that there are many implementations in numerous other languages and technologies such as Python, Ruby, PHP, .NET and many, many more.  Sandeep says how GraphQL can be very efficient, you’re no longer under or over fetching data, and you’re retrieving the exact fields you want from the tables that are available.  GraphQL allows you to version your queries and types also by adding deprecated attributes to fields and types that are no longer available.

IMG_20170610_110046We take a brief looks at the GraphiQL GUI client which is part of the GraphQL client side library.  It displays 3 panes within your browser showing the schemas, available types and fields and a pane allowing you to type and perform ad-hoc queries.  Sandeep explains that the schema and sample tables and fields are populated within the GUI by performing introspection over the types configured in your server-side code, so changes and additions there are instantly reflected in the client.  Unlike REST, which has obvious verbs around data access, GraphQL doesn't really have these.  You need introspection over the data to know how you can use that data, however, this is a very good thing.  Sandeep states how introspection is at the very heart of GraphQL – it is effectively reflection over your API – and it’s this that leads to the ability to provide strongly-typed queries.

We’re reminded that GraphQL itself has no concept of authorisation or any other business logic as it “sits on top of” the existing API.  Such authorisation and business logic concerns should be embedded within the API or a lower layer of code.  Sandeep says that the best way to think of GraphQL is like a “thin wrapper around the business logic, not the data layer”.

IMG_20170610_111157GraphQL is not just for reading data!  It has full capabilities to write data too and these are known as mutations rather than queries.  The general principle remains the same and a mutation is constructed using JSON-like syntax and sent to the server for resolving to a custom method that will validate the data and persist it the data store by invoking the relevant API endpoint.  Sandeep explains how read queries can be nested, so you can actually send one query to the server that actually contains syntax to perform two queries against two different resources.    GraphQL has a concept of "loaders".  These can batch up actual queries to the database to prevent issues when asking for such things all Customers including Orders.  Doing something like this normally results in a N+1 issue where by Orders are retrieved by issuing a separate query for each customer, resulting in degraded performance.  GraphQL loaders works by enabling the rewriting of the underlying SQL that can be generated for retrieving the data so that all Orders are retrieved for all of the required Customers in a single SQL statement.  i.e. Instead of sending queries like the following to the database:

SELECT CustomerID, CustomerName FROM Customer
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 1
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 2
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 3

We will instead send queries such as:

SELECT CustomerID, CustomerName FROM Customer
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID IN (1,2,3)

IMG_20170610_110644Sandeep then looks at some of the potential downsides of using GraphQL.  Caching is somewhat problematic as you can no longer perform any caching at the network layer as each query is now completely dynamic.  Despite the benefits, there are also performance considerations if you intend to use GraphQL as your underlying data needs to be correctly structured in order to work with GraphQL in the most efficient manner.  GraphQL Loaders should also be used to ensure N+1 problems don’t become an issue.  There’s security considerations too. You shouldn’t expose anything that you don’t want to be public since everything through the API is available to GraphQL and you’ll need to be aware of potentially malicious queries that attempt to retrieve too much data.  One simple solution to such queries is to use a timeout.  If a given query takes longer than some arbitrarily defined timeout value, you simply kill the query, however this may not be the best approach.  Other approaches taken by big websites currently using such functionality is to whitelist all the acceptable queries.  If a query is received that isn’t in the whitelist, it doesn’t get run.  You also can’t use HTTP codes to indicate status or contextual information to the client.  Errors that occur when processing the GraphQL query are contained within the GraphQL response text itself which is returned to the client with 200 HTTP success code.  You’ll need to have your own strategy for exposing such data to the user in a friendly way.   Finally Sandeep explains that, unlike other querying technologies such as ODATA, GraphQL has no intrinsic ability to paginate data.  All data pagination must be built into your underlying API and business layer, GraphQL will merely pass the paging data – such as page number and size – onto the API and expect the API to correctly deal with limiting the data returned.

IMG_20170610_103123After Sandeep’s session, it’s time for another coffee break.  I quickly grabbed another coffee from the main lounge area of the conference, this time accompanied by some rather delicious biscuits, before consulting my Agenda sheet to see which room I needed to be in for my next session.  Turned out that the next room I needed to be in was the same one I’d just left!  After finishing my coffee and biscuits I headed back to Chicago 1 for the next session and the last one before the lunch break.  This one was Dave Mateer’s Fun With Twitter Stream API, ELK Stack, RabbitMQ, Redis and High Performance SQL.

Dave’s talk is really all about performance and how to process large sets of data in the most efficient and performant manner.  Dave’s talk is going to be very demo heavy and so to give us a large data set to work with, Dave starts by looking at Twitter, and specifically, it’s Stream API.  Dave explains that the full Twitter firehose, which is reserved only for Twitter’s own use, currently has 1.3 million tweets per second flowing through it.  As a consumer, you can get access to a deca-firehose which contains 1/10th of the full firehose (i.e. 130000 tweets per second) but this costs money, however, Twitter does expose a freely available Stream API although it’s limited to 50 tweets per second.  This is still quite a sizable amount of data to process.

IMG_20170610_120247Dave starts by showing us a demo of a simple C# console application that uses the Twitter Stream API to gather real-time tweets for the hashtag of #scotland which are then echoed out to the console.   Our goal is to get the tweet data into a SQL Server database as quickly and as efficiently as possible.  Dave now says that to simulate a larger quantity of data, he’s going to read in a number of pre-saved text files containing tweet data that he’s previously collected.  These files represent around 6GB of raw tweet data, containing approximately 1.9 million tweets.  He then adds to the demo to start saving the tweets into SQL Server.  Dave mentions that he's using Dapper to access SQL Server and that previously he tried such things using Entity Framework, which was admittedly some time in the past, but that it was a painful experience and not very performant.   Dave likes Dapper as it's a much simpler abstraction over the database, so therefore much more performant.  It's also a lot easier to optimize your queries when using Dapper as the abstraction isn’t so great and you’re not hiding the implementation details too much.

IMG_20170610_121958Next, Dave shows us a Kibana interface.  As well as writing to SQL Server, he's also saving the tweets to a log file using SeriLog and then using LogStash to send those logs to ElasticSearch allowing viewing the raw log data with Kibana (also known as the ELK stack).  Dave then shows us how easy it is to really leverage the power of tools like Kibana by creating a dashboard for all of the data.

From here, Dave begins to talk about performance and just how fast we can process the large quantity of tweet data.  The initial run of the application, which was simply reading in each tweet from the file and performing an INSERT via Dapper to insert the data to the SQL Server database was able to process approximately 420 tweets per second.  This isn’t bad, but it’s not a great level of performance.  Dave digs out SQL Server Profiler to see where the bottle-necks are within the data access code, and this shows that there are some expensive reads – the data is normalized so that the user that a tweet belongs to is stored in a separate table and looked up when needed.  It’s decided that adding indexes on the relevant columns used for the lookup data might speed up the import process.  Sure enough, after adding the indexes and re-running the tweet import, we improve from 420 to 1600 tweets per second.  A good improvement, but not an order of magnitude improvement.  Dave wants to know if we can change the architecture of our application and go even faster.  What if we want to try to achieve a 10x level of increase in performance?

Dave states that, since his laptop has multiple cores, we should start by changing the application architecture to make better use of parallel processing across all of the available cores in the machine.  We set up an instance of the RabbitMQ message queue, allowing us to read the tweet data in from the files and send it to a RabbitMQ queue.  The queue is explicitly set to durable and each message is set to persistent in order to ensure we have the ability to continue where we left off from in the event of a crash or server failure.  From here, we can have multiple instances of another application that pull messages off the queue, leveraging RabbitMQ’s ability to effectively isolate each of the client consumers, ensuring that the same message is not sent to more than one client.  Dave then sets up Redis.  This will be used for "lookups" that are required when adding tweet data.  So users (and other data) is added to the DB first, then all data is cached in Redis, which is an in-memory key/value store and is often used for caching scenarios.  As tweets are added to the RabbitMQ message queue, required ID/Key lookups for Users and other data are done using the Redis cache rather than performing a SQL Server query for the data, thus improving the performance.  Once processed and the relevant values looked up from Redis, Dave uses SQL Server Bulk Copy to get the data into SQL Server itself.  SQL Server Bulk Copy provides a significant performance benefit over using standard INSERT T-SQL statements.   For Dave’s purposes, he Bulk Copies the data into temporary tables within SQL Server and then, at the end of the import, runs a single T-SQL statement to copy the data from the temporary tables to the real tables. 

IMG_20170610_125524Having re-architected the solution in this manner, Dave then runs his import of 6GB of tweet data again.  As he’s now able to take advantage of the multiple CPU cores available, he runs 3 console applications in parallel to process all of the data.  Each console application completes their jobs within around 30 seconds, and whilst they’re running, Dave shows us the Redis dashboard which is indicating that Redis is receiving around 800000 hits to the cache per second!  The result, ultimately, is that the application’s performance has increased from processing around 1600 tweets per second to around 20000!  An impressive improvement indeed!

Dave then looks at some potential downsides of such re-architecture and performance gains.  He shows how he’s actually getting duplicated data imported into his SQL Server and this is likely due to race conditions and concurrency issues at one or more points within the processing pipeline.  Dave then quickly shows us how he’s got around this problem with some rather ugly looking C# code within the application (using temporary generic List and Dictionary structures to pre-process the data).  Due to the added complexity that this performance improvement brings, Dave argues that sometimes slower is actually faster in that, if you don't absolutely really need so much raw speed, you can remove things like Redis from the architecture, slowing down the processing (albeit to a still acceptable level) but allowing a lot of simplification of the code.

IMG_20170610_130715After Dave’s talk was over, it was time for lunch.  The attendees made their way to the main conference lounge area where we could choose between lunch bags of sandwiches (meat, fish and vegetarian options available) or a salad option (again, meat and vegetarian options).

Being a recently converted vegetarian, I opted for a lovely Greek Salad and then made my way to the outside area along with, it would seem, most of the other conference attendees.

IMG_20170610_132208It had turned into a glorious summer’s day in Reading by now and the attendees and some speakers and other conference staff were enjoying the lovely sunshine outdoors while we ate our lunch.  We didn’t have too long to eat, though, as there were a number of Grok talks that would be taking place inside one of the major conference rooms over lunchtime.  After finishing my lunch (food and drink was not allowed in the session rooms themselves) I heading back towards Chicago 1 where the Grok Talks were to be held.

I’d managed to get there early enough in order to get a seat (these talks are usually standing room only) and after settling myself in, we waited for the first speaker.  This was to be Gary Short with a talk about Markov, Trump and Countering Radicalisation.

Gary starts by asking, “What is radicalisation?”.  It’s really just persuasion – being able to persuade people to hold an extreme view of some information.  Counter-radicalisation is being able to persuade people to hold a much less extreme belief.  This is hard to achieve, and Gary says that it’s due to such things as Cognitive Dissonance and The “Backfire” Effect (closely related to Confirmation Bias) – two subjects that he doesn’t have time to go into in a 10 minute grok talk, but he suggests we Google them later!  So, how do we process text to look for signs of radicalisation?  Well, we start with focus groups and corpus's of known good text as well as data from previously de-radicalised people (Answers to questions like “What helped you change?” etc.)  Gary says that Markov chains are used to process the text.  They’re a way of “flowing” through a group of words in such a way that the next word is determined based upon statistical data known about the current word and what’s likely to follow it.  Finally, Gary shows us a demo of some Markov chains in action with a C# console application that generates random tweet-length sentences based upon analysis of a corpus of text from Donald Trump.  His application is called TweetLikeTrump and is available online.

The next Grok talk is Ian Johnson’s Sketch Notes.  Ian starts by defining Sketch Notes.  They’re visual notes that you can create for capturing the information from things like conferences, events etc.   He states that he’s not an artist, so don’t think that you can’t get started doing Sketch Noting yourself.  Ian talks about how to get started with Sketch Notes.  He says it’s best to start by simply practising your handwriting!   Find a clear way of quickly and clearly writing down text using pen & paper and then practice this over and over.  Ian shared his own structure of a sketch note, he places the talk title and the date in the top left and right corners and the event title and the speaker name / twitter handle in the bottom corners.  He goes on to talk more about the component parts of a sketch note.   Use arrows to create a flow between ideas and text that capture the individual concepts from the talk.  Use colour and decoration to underline and underscore specific and important points – but try to do this when there’s a “lull” in the talk and you don’t necessarily have to be concentrating on the talk 100% as it’s important to remember that the decoration is not as important as the content itself.  Ian shares a specific example.  If the speaker mentions a book, Ian would draw a little book icon and simply write the title of the book next to the icon.  He says drawing people is easy and can be done with a simple circle for the head and no more than 4 or 5 lines for the rest of the body.  Placing the person’s arms in different positions helps to indicate expression.  Finally, Ian says that if you make a mistake in the drawing (and you almost certainly will do, at least when starting out) make a feature of it!  Draw over the mistake to create some icon that might be meaningful in the entire context of the talk or create a fancy stylised bullet point and repeat that “mistake” to make it look intentional!  Ian has blogged about his talk here.

Next up is Christos Matskas’s grok talk on Becoming an awesome OSS contributor.   Christos starts by asking the audience who is using open source software?  After a little thought, virtually everyone in the room raises their hand as we’re all using open source software in one way or another.  This is usually because we'll be working on projects that are using at least one NuGet package, and NuGet packages are mostly open source.  Christos shares the start of his own journey into becoming an OSS contributor which started with him messing up a NuGet Package restore on his first day at a new contract.  This lead to him documenting the fix he applied which was eventually seen by the NuGet maintainers and he was offered to write some documentation for them.  Christos talks about major organisations using open source software including Apple, Google, Microsoft as well as the US Department of Defence and the City of Munich to name but a few.  Getting involved in open source software yourself is a great way to give back to the community that we’re all invariably benefiting from.  It’s a great way to improve your own code and to improve your network of peers and colleagues.  It’s also great for your CV/Resume.  In the US, its almost mandatory that you have code on a GitHub profile.  Even in the UK and Europe, having such a profile is not mandatory, but is a very big plus to your application when you’re looking for a new job.  You can also get free software tools to help you with your coding.  Companies such as JetBrains, RedGate, and many, many others will frequently offer many of their software products for free or a heavily discounted price for open source projects.   Finally, Christos shares a few websites that you can use to get started in contributing to open source, such as up-for-grabs.net, www.firsttimersonly.com and the twitter account @yourfirstPR.

The final grok talk is Rik Hepworth’s Lability.  Lability is a Powershell module, available via Github, that uses Azure’s DSC (Desired State Configuration) feature to facilitate the automated provisioning of complete development and testing environments using Windows Hyper-V.  The tool extends the Powershell DSC commands to add metadata that can be understood by the Lability tool to configure not only the virtual machines themselves (i.e. the host machine, networking etc.) but also the software that is deployed and configured on the virtual machines.  Lability can be used to automate such provisioning not only on Azure itself, but also on a local development machine.

IMG_20170610_141738After the grok talks were over, I had a little more time available in the lunch break to grab another quick drink before once again examining the agenda to see where to go for the first session of the afternoon, and penultimate session of the day.  This one was Ian Russell’s Strategic Domain-Driven Design.

Ian starts his talk by mentioning the “bible” for Domain Driven Design and that’s the book by Eric Evans of the same name.  Ian asks the audience who has read the book to which quite a few people raise their hands.  He then asks who has read past Chapter 14, to which a lot of people put their hands down.  Ian states that, if you’ve never read the book, the best way to get the biggest value from the book is to read the last 1/3rd of the book first and only then return to read the first 2/3rds!

So what is Domain-Driven Design?  It’s an abstraction of reality, attempting to model the real-world we see and experience before us in a given business domain.  Domain-Driven Design (DDD) tries to break down the domain into what is the “core” business domain – these are the business functions that are the very reason for a business’s being – and other domains that exist that are there to support the “core” domain.  One example could be a business that sells fidget spinners.  The actual domain logic involved with selling the spinners to customers would be the core domain, however, the same company may need to provide a dispute resolution service for unhappy customers.  The dispute resolution service, whilst required by the overall business, would not be the core domain but would be a supporting or generic domain.

DDD has a ubiquitous language.  This is the language of the business domain and not the technical domain.  Great care should be taken to not use technical terms when discussing domains with business stake holders, and the terms within the language should be ubiquitous across all domains and across the entire business.  Having this ubiquitous language reduces the chance of ambiguity and ensures that everyone can relate to specific component parts of the domain using the same terminology.  DDD has sub-domains, too.  These are the core domain – the main business functionality, the supporting domains – which exist solely to support the core, and generic domains - such as the dispute resolution domain, which both supports the core domain but is also generic and could apply to other businesses too.  DDD has bounded contexts.  These are like sub-domains but don’t necessarily map directly to sub-domains.  They are explicitly boundaries from other areas of the business.  Primarily, bounded contexts can be developed in software independently from each other.  They could be written by different development teams and could even use entirely different architectures and technologies in their construction.

Ian talks about driving out the core concepts by creating a “shared kernel”.  These are the concepts that exist in the overlaps between bounded contexts.  These concepts don’t have to be shared between the bounded contexts and they may be different – the concept of an “account” might mean different things within the finance bounded context to the concept of an “account” within the shipping bounded context, for example.  Ian talks about the concept of an “anti-corruption layer” as part of each bounded context.  This allows bounded contexts to communicate with items from the shared kernel but where those concepts actually do differ between contexts, the anti-corruption layer will prevent corruption from incorrect implementations of concepts being passed to it.  Ian mentions domain events next.  He says that these are something that are not within Eric Evans’ book but are often documented in other DDD literature.  Domain events are essentially just ”things that occur” within the domain. For example, a new customer registered on the company’s website is an domain event.  Events can be created by users, by the passage of time, by external system, or even by other domain events.

All of this is Strategic domain-driven design.  It’s the modelling and understanding of the business domain without ever letting any technological considerations interfere with the modelling and understanding.  It’s simply good architectural practice and the understanding of how different parts of the business domain interact with, and communicate with other parts of the business domain.

Ian suggests that there’s three main ways in which to achieve strategic domain-driven design.  These are Behavioural Driven Design (BDD), Prototyping and Event Storming.  BDD involves writing acceptance tests for our software application using a domain-specific language within our application code that allows the tests to use the ubiquitous language of the domain, rather than the explicitly technical language of the software solution.  BDD facilitates engagement with business stakeholders in the development of the acceptance tests that form part of the software solution.  One common way to do this is to run three amigos sessions which allow developers, QA/testers and domain experts to write the BDD tests, usually in the standard Given-When-Then style.   Prototyping consists of producing images and “wireframes” that give an impression of how a completed software application could look and feel.  Prototypes can be low-fidelity, just simple static images and mock-ups, but it’s better if you can produce high-fidelity prototypes, which allows varying levels of interaction with the prototype. Tools such as Balsamiq and InVision amongst others can help with the production of high-fidelity prototypes.  Event storming is a particular format of meeting or workshop that has developers and domain experts collaborating on the production of a large paper artefact that contains numerous sticky notes of varying colours.  The sticky notes’ colour represent various artifacts within the domain such as events, commands, users, external systems and others.  Sticky notes are added, amended, removed and moved around the paper on the wall by all meeting attendees.  The resulting sticky notes tends to naturally cluster into the various bounded contexts of the domain, allowing the overall domain design to emerge.  If you run your own Event Storming session, Ian suggests to start by trying to drive out the domain events first, and for each event, attempt to first work backwards to find the cause or causes of that event, then work forwards, investigating what this event causes - perhaps further events or the requirement for user intervention etc.

IMG_20170610_145159Ian rounds off his talk by sharing some of his own DDD best practices.  We should strive for creative collaboration between developers and domain experts at all stages of the project, and a fostering of an environment that allows exploration and experimentation in order to find the best model for the domain, which may not be the first model that is found.  Ian states that determining the explicit context boundaries are far more important than finding a perfect model, and that the focus should always be primarily on the core domain, as this is the area that will bring the greatest value to the business.

After Ian’s talk was over, it was time for another coffee break, the last of the day.  I grabbed a coffee and once again checked my agenda sheet to see where to head for the final session of the day.   This last session was to be Stuart Lang’s Async in C#, the Good, the Bad and the Ugly.

IMG_20170610_153023Stuart starts his session by asking the audience to wonder why he’s doing a session on Async in C#' today.  It’s 2017 and Async/Await has been around for over 6 years!  Simply put, Stuart says that, whilst Async/Await code is fairly ubiquitous these days, there are still many mistakes made with some implementations and the finer points of asynchronous code are not as well understood.  We learn how Async is really an abstraction, much like an ORM tool.  If you use it in a naïve way, it’s easy to get things wrong.

Stuart mentions Async’s good parts.  We can essentially perform non-blocking waiting for background I/O operations, thus allowing our user interfaces to remain responsive.  But then comes the bad parts.  It’s not always entirely clear what is asynchronous code.  If we call an Async method in someone else’s library, there’s no guarantee that what we’re really executing is asynchronous code, even if the method returns a Task<T>.  Async can sometimes leads to the need to duplicate code, for example, when your code has to provide both asynchronous and synchronous versions of each method.  Also, Async can’t be used everywhere.  It can’t be used in a class constructor or inside a lock statement.

IMG_20170610_155205One of the worst side-effects of poor Async implementation is deadlocks.  Stuart states that, when it comes to Async, doing it wrong is worse than not doing it at all!  Stuart shows us a simple Async method that uses some Task.Delay methods to simulate background work.  We then look at what the Roslyn compiler actually translates the code to, which is a large amount of code that effectively implements a complete state machine around the actual method’s code. 

IMG_20170610_160038Stuart then talks about Synchronization Context.  This is an often misunderstood part of Async that allows awaited code, that can be resumed on a different thread, to synchronize with other threads.  For example, if awaited code needs to update some element on the user interface in a WinForms or WPF application, it will need to synchronize that change to the UI thread, it can’t be performed by a thread-pool thread that the awaited code would be running on.  Stuart talks about blocking on Async code, for example:

var result = MyAsyncMethod().Result;

We should try to never do this!   Doing so can cause code within the Async method to block on the synchronization context as awaited code within the Async method may be trying to restart on the same synchronization context that is already blocked by the "outer" code that is performing the .Result on the main Async method.

Stuart then shows us some sample code that runs as an ASP.NET page with the Async code being called from within the MVC controller.  He outputs the threads used by the Async code to the browser, and we then examine variations of the code to see how each awaited piece of code uses either the same or different threads.  One way of overcoming the blocking issue when using .Result at the end of an Async method call is to write code similar to:

var result = Task.Run(() => MyAsyncMethod().Result).Result;

It's messy code, but it works.  However, code like this should be heavily commented because if someone removes that outer Task.Run(); the code will start blocking again and will fail miserably.

Stuart then talks about the vs-threading library which provides a JoinableTaskFactory.   Using code such as:

jtf.Run(() => MyAsyncMethod())

ensures that awaited code resumes on the same thread that's blocked, so Stuart shows the output from his same ASP.NET demo when using the JoinableTaskFactory and all of the various awaited blocks of code can be seen to always run and resume on the same thread.

IMG_20170610_163019Finally, Stuart shares some of the best practices around deadlock prevention.  He asserts that the ultimate goal for prevention has to be an application that can provide Async code from the very “bottom” level of the code (i.e. that which is closest to I/O) all the way up to the “top” level, where it’s exposed to other client code or applications.

IMG_20170610_164756After Stuart’s talk is over, it’s time for the traditional prize/swag giving ceremony.  All of the attendees start to gather in the main conference lounge area and await the attendees from the other sessions which are finishing shortly afterwards.  Once all sessions have finished and the full set of attendees are gathered, the main organisers of the conference take time to thank the event sponsors.  There’s quite a few of them and, without them, there simply wouldn’t be a DDD conference.  For this, the sponsors get a rapturous round of applause from the attendees.

After the thanks, there’s a small prize giving ceremony with many of the prizes being given away by the sponsors themselves.  Like most times, I don’t win a prize – given that there was only around half a dozen prizes, I’m not alone!

It only remained for the organisers to announce the next conference in the DDD calendar, which although doesn’t have a specific date at the moment, will take place in October 2017.  This one is DDD North.  There’s also a surprise announcement of a DDD in Dublin to be held in November 2017 which will be a “double capacity” event, catering for around 600 – 800 conference delegates.  Now there’s something to look forward to!

So, another wonderful DDD conference was wrapped up, and there’s not long to wait until it’s time to do it all over again!

UPDATE (20th June 2017):

As last year, Kevin O'Shaughnessy was also in attendance at DDD12 and he has blogged his own review and write-up of the various sessions that he attended.  Many of the sessions attended by Kevin were different than the sessions that I attended, so check out his blog for a fuller picture of the day's many great sessions.

 

DDD 11 In Review

IMG_20160903_084627This past Saturday 3rd September 2016, the 11th DDD (DeveloperDeveloperDeveloper) conference was held at Microsoft’s UK HQ in Reading.  Although I’ve been a number of DDD events in recent years, this was my first time at the original DDD event (aka Developer Day aka DDD Reading) which spawned all of the other localised DDD events.

IMG_20160903_090335After travelling the evening before and staying overnight in a hotel in Swindon, I set off bright and early to make the 1 hour drive to Reading.  After arriving and checking in, collecting my badge along the way, it was time to grab a coffee and one of the hearty breakfast butties supplied.  Coffee and sausage sandwich consumed, it was time to familiarise myself with the layout of the rooms.  There were 4 parallel tracks of talks, and there had also been a room change from the printed agendas that we received upon checking in.  After finding the new rooms, and consulting my agenda sheet it was time for me to head off to the first talk of the day.  This was Gary Short’sHow to make your bookie cry”.

With a promise of showing us all how to make money on better exchanges and also how to “beat the bookie”, Gary’s talk was an interesting proposition and commanded a full room of attendees.  Gary’s session is all about machine learning and how data science can help us do many things, including making predictions on horse races in an attempt to beat the bookie.  Gary starts by giving the fundamental steps of machine learning – Predict – Measure – Analyze – Adjust.  But, we start with measure as we need some data to set us off on our way. 

IMG_20160903_093802Gary states that bookie odds in the UK are expressed as fractions and that this hides the inherent probabilities of each horse winning in a given race.  Bookies ultimately will make a profit on a given race as the probabilities of all of the horses add up to more than 1!  So, we can beat the bookie if we build a better data model.  We do this with data.   We can purchase horse racing data, which means we’re already at a loss given the cost of the data, or we can screen scrape it from a sports website, such as BBC Sport.  Gary shows us a demo of some Python code used to scrape the data from the BBC website.  He states that Python is one of two “standard” languages used within Data Science, the other language being R.  After scraping a sufficiently sized dataset over a number of days, we can analyze that data by building a Logistic Regression Model.  Gary shows how to use the R language to achieve this, ultimately giving us a percentage likelihood of a given horse winning a new race based upon its past results, its weight and the jockey riding it.

Gary next explains a very important consideration within Data Science known as The Turkey Paradox.  You’re a turkey on a farm, you have to decide if today you’re going to get fed or go to market.  If your data model only has the data points of being fed at 9am for the last 500 days, you’ll never be able to predict if today is the day you go to market - as it’s never happened before.  There is a solution to this - it’s called Active Learning or Human in the Loop learning.   But.  It turns out humans are not very good at making decisions.

Gary next explains the differences between System 1 and System 2 thinking.  System 2 is very deliberate actions - you think first and deliberately make the action.  System 1 is reflexive - when you put your hand on a hot plate, you pull it away without even thinking.  It uses less of the brain.  System 1 is our “lizard brain” from the days when we were cavemen.  And it takes precedence over System 2.  Gary talks about the types of System 1 thinking.  There’s Cognitive Dissonance – holding onto a belief in the face of mounting contrary evidence.  Another is bait-and-switch – substituting a less favourable option after being “baited” with a more favourable one, and yet another type is the “halo effect” – beautiful things are believed to be desirable.  We need to ensure that, when using human-in-the-loop additions to our data model, we don’t fall foul of these problems.

IMG_20160903_092511Next, we explore Bayes’ theorem.  A theorem describing how the conditional probability of each of a set of possible causes for a given observed outcome can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause.  Gary uses this theorem over our horse racing data model to demonstrate Bayes inference using prior probabilities to predict future ones.  This is using the raw scraped data, with no human-in-the-loop additions, but we can add our own additions which become prior probabilities and can be used to compute further probabilities using Bayes theorem.

Gary concludes that, once we’ve acquired, trained and analyzed our data model, we can beat the bookie if our odds are shorter than the bookie’s.  Another way, it not to beat the bookie at all!  We can make money simply by beating other gamblers.  We can do this using betting exchanges - backing and laying bets and getting other gamblers to bet against your prediction of the outcome of an event.  Finally, you can also profit from “trading arbitrage” – whereby the clever placing of bets when two different bookies have the same event outcome at two different odds can produce a profit from the difference between those odds.

IMG_20160903_104403After a short coffee break, it was onto the second session of the day, which was Ali Kheyrollahi’sMicroservice Architecture at ASOS”.  Ali first explains the background of the ASOS company where he works.  They’re a Top 35 online retailer, within the Top 10 of online fashion retailers, they have a £1.5 billion turnover and, for their IT, they process around 10000 requests per second.  Ali states that ASOS is at it’s core a technology company, and it’s through this that they succeed with IT – you’ve got to be a great tech business, not just a great tech function.  Tech drives the agenda and doesn’t chase the rest of the business.

Ali asks “Why Microservices?” and states that it’s really about scaling the people within the business, not just the tech solution.  Through decoupling the entire solution, you decentralise decision making.  Core services can be built in their own tech stack by largely independent teams.  It allows fast and frequent releases and deployments of separate services.  You reduce the complexity of each service, although, Ali does admit that you will, overall, increase the complexity of the overall solution.

The best way achieve all of this is through committed people. Ali shows a slide which mentions the German army’s “Auftragstaktik” which is method of commanding in which the commander gives subordinate leaders a specific mission, a timescale of achievement and the forces required to meet the goal, however, the individual leaders are free to engage their own subordinates services are they see fit.  It’s about telling them how to think, not what to think.  He also shares a quote from “The Little Prince” that embodies this thinking, “If you wish to build a ship, do not divide the men into teams and send them to the forest to cut wood. Instead, teach them to long for the vast and endless sea.”  If you wish to succeed with IT and Microservices in particular, you have to embrace this culture.  Ali states that with a “triangle” of domain modelling, people and a good operation model, this really all equals successful architecture.

Ali hands over to his colleague Dave Green who talks about how ASOS, like many companies, started with a legacy monolithic system.  And like most others, they had to work with this system as it stood – they couldn’t just throw it out and start over again it was after all handling nearly £1 billion in transaction per year, however, despite fixing some of the worst performance problems of the monolithic system, they ultimately concluded that it would be easier and cheaper to build a new system than to fix the old one.  Dave explains how they have a 2 tier IT system within the company – there’s the enterprise domain and the digital domain.  The enterprise domain is primarily focused on buy off-the-shelf software to run the Finance, HR and other aspects of the business.  They’re project lead.  Then there’s the digital domain, much more agile, product lead and focused on building solutions rather than buying them.

Ali state how ASOS is a strategic partner with Microsoft and is heavily invested in cloud technology, specifically Microsoft’s Azure platform.  He suggests that ASOS may well be the largest single Azure user this side of the Atlantic ocean!  He talks about the general tech stack, which is C# and using TeamCity for building and Octopus Deploy for deployment.  There’s also lots of other tech used, however, and other teams are making use of Scala, R, and other languages where it’s appropriate.  The database stack is primarily SQL Server, but they also use Redis and MongoDB.

IMG_20160903_111122Ali talks about one of the most important parts of building a distributed micro service based solution – the LMA stack – that’s Logging, Monitoring and Altering.  All micro services are build to adhere to some core principles.  All queries and commands use HTTP API, but there’s no message brokers or ESB-style pseudo microservices.  They exist outside of the services, but never inside.  For the logging, Ali states how logging is inherent within all parts of every service, however, they do most logging and instrumentation whenever there is any kind of I/O – network, file system or database reads and writes.  As part of their logging infrastructure, they use Woodpecker, which is a queue and topic monitoring solution for Azure Service Bus. 

All of the logs and Woodpecker output is fed into a Log collector and processor.  They don’t use LogStash for this, which is a popular component, but instead use ConveyorBelt.  This play better with Azure and some of the Azure-specific implementation and storage of certain log data.  Both LogStash and ConveyorBelt, however, have the same purpose – to quickly collect and push log data to ElasticSearch.  From here, they use the popular Kibana product to visualise that data.  So rather than a ELK stack (ElasticSearch, LogStash, Kibana), it’s a ECK stack (ElasticSearch, ConveyorBelt, Kibana).

Ali concludes his talk by discussing lessons learnt.  He says, if you’re in the cloud - build for failure as the cloud is a jungle!  Network latency and failures add up so it's important to understand and optimize time from the user to the data.  With regard to operating in the cloud in general, Ignore the hype - trust no one.  Test, measure, adopt/drop, monitor and engage with your provider.  It's difficult to manage platform costs, so get automation and monitoring of the cloud infrastructure to prevent developers creating erroneous VM’s that they forget to switch off!  Finally, distributed computing is hard, geo-distribution is even harder.  Expect to roll up your sleeves. Maturity in areas can be low and is changing rapidly.

IMG_20160903_115726After Ali’s talk there was another coffee break in the communal area before we all headed off to the 3rd session of the day.  For me, this was Mark Rendle’sSomewhere over the Windows”.  Mark’s talk revolved around .NET core and it’s ability to run cross-platform.  He opened by suggesting that, being the rebel he is, the thought he’d come to Microsoft UK HQ and give a talk about how to move away from Windows and onto a Linux OS!

Mark starts by saying that Window is great, and a lot of the intrinsic parts of Windows that we use as developers, such as IIS and .NET are far too deeply tied into a specific version of Windows.  Mark gives the example that IIS has only just received support for HTTP2, but that it’s only the version of IIS contained within the not-yet-released Windows Server 2016 that’ll support it.  He says that, unfortunately, Windows is stuck in a rut for around 4 years, and every 4 years Microsoft’s eco-system has to try to catch up with everybody else with a new version of Windows.

.NET Core will help us as developers to break away from this getting stuck in a rut.  .NET Core runs on Windows, Linux and Mac OSX.  It’s self-contained so that you can simply ship a folder containing your application’s files and the .NET core runtime files, and it’ll all “just work”.  Mark mentions ASP.NET Core, which actually started the whole “core” thing at Microsoft and  they then decided to go for it with everything else.  ASP.NET Core is a ground-up rewrite, merges MVC and Web API into a unified whole and has it’s own built-in web server, Kestrel which is incredibly fast.  Mark says how his own laptop has now been running Linux Mint for the last 1.5 years and how he’s been able to continue being a “.NET developer” despite not using Windows as his main, daily OS.

Mark talks about how, in this brave new world, we’re going to have to get used to the CLI – Command Line Interface.  Although some graphical tooling exists, the slimmed down .NET core will take us back to the days of developing and creating our projects and files from the CLI.  Mark says he uses Guake as his CLI of choice on his Linux Mint install.  Mark talks about Yeoman - the scaffolding engine used for ASP.NET Core bootstrap.  It’s a node package, and mark admits that pretty much all web development these days, irrespective of platform, is pretty much dependent on node and it’s npm package manager.  Even Microsoft’s own TypeScript is a node package.  Mark shows creating a new ASP.NET Core application using Yeoman.  The yeoman script creates the files/folders, does a dotnet restore command to restore nuget packages then does a bower restore to restore front-end (i.e. JavaScript) packages from Bower.

Mark says that tooling was previously an issue with developing on Linux, but it’s now better.  There’s Visual Studio 2015 Update 3 for Windows only, but there's  also Project Rider and Xamarin Studio which can run on Linux in which .NET Core code can be developed.  For general editors, there’s VS Code, Atom, SubLime Text 3, Vim or Emacs! VS Code and Atom are both based on Electron.

Mark moves on to discuss logging in an application.  In .NET Core it’s a first class citizen as it contains a LoggerFactory.  It’ll write to STDOUT and STDERROR and therefore it works equally well on Windows and Linux. This is an improvement over the previous types of logging we could achieve which would often result in writing to Windows-only log stores (for example, the Windows Event Log). 

Next, Mark moves on to discuss Docker.  He’s says that the ability to run your .NET Core apps on a lightweight and fast web server such as NGINX, inside a Docker container, is one of the killer reasons to move to and embrace the Linux platform as a .NET Developer.  Mark first gives the background of “what is docker?”  They’re “containers” which are like small, light-weight VM’s (Virtual Machines). The processes within them run on the host OS, but they’re isolated from other processes in other containers.  Docker containers use a “layered” file system.  What this means is that Docker containers, or “images” which are the blueprints for a container instance can be layered on top of each other.  So, we can get NGINX as a Docker image - which can be a “base” image but upon which you can “layer” additional images of your own, so your web application can be a subsequent layered image which together form a single running instance of a Docker container, and you get a nice preconfigured NGINX instance from the base container for free!  Microsoft even provide a “base” image for ASP.NET Core which is based upon Debian 8. Mark suggests using jwilder/nginx-proxy as the base NGinX image.  Mark talks about how IIS is the de-facto standard web server for Windows, but nowadays, NGinX is the de-facto standard for Linux.  We need to use NGinX as Kestrel (the default webserver for ASP.NET Core) is not a production webserver and isn’t “hardened”.  NGinX is a production web server, hardened for this purpose.

To prevent baking configuration settings in the Docker image (say database connections) we can use Docker Compose.  This allows us to pass in various environment settings at the time when we run the Docker container.  It uses YAML.  It also allows you to easily specify the various command line arguments that you might otherwise need to pass to Docker when running an image (i.e. -p 5000:5000 - which binds port 5000 in the Docker image to port 5000 on the localhost). 

Mark then shows us a demo of getting an ELK stack (Elastic Search, LogStash & Kibana) up and running.  The ASP.NET Core application can simply write it’s logs to its console, which on Linux, is STDOUT.  There is then a LogStash input processor, called Gelf, that will grab anything written to STDOUT and process it and store it within LogStash.  This is then immediately visible to Kibana for visualisation!

Mark concludes that, ultimately, the main benefits of the “new way” with .NET and ASP.NET Core are the same as the fundamental benefits of the whole Linux/Unix philosophy that has been around for years.  Compose your applications (and even you OS) out of many small programs that are designed to do only one thing and to do it well.

IMG_20160903_131124After Mark’s session, which slightly overran, it was time for lunch.  Lunch at DDD 11 was superb.  I opted for the chicken salad rather than a sandwich, and very delicious (and filling) it was too, which a large portion of chicken contained within.  This was accompanied by some nice crisps, a chocolate bar, an apple and some flavoured water to wash it all down with!

I ate my lunch on the steps just outside the building, however, the imminently approaching rain soon started to fall and it put a stop to the idea of staying outside in the fresh air for very long! IMG_20160903_132320  That didn’t matter too much as not long after we’d managed to eat our food we were told that the ubiquitous “grok talks” would be starting in one of the conference rooms very soon.

I finished off my lunch and headed towards the conference room where the grok talks were being held.   I was slightly late arriving to the room, and by the time I had arrived all available seating was taken, with only standing room left!  I’d missed the first of the grok talks, given by Rik Hepworth about Azure Resource Templates however, I’d seen a more complete talk given by Rik about the same subject at DDD North the previous year.   Unfortunately, I also missed most of the following grok talk by Andrew Fryer which discussed Power BI, a previously stand-alone product, but is now hosted within Azure.

I did catch the remaining two grok talks, the first of which was Liam Westley’sWhat is the point of Microsoft?”  Liam’s talk is about how Microsoft is a very different company today to what it was only a few short years ago.  He starts by talking about how far Microsoft has come in recent years, and how many beliefs today are complete reversals of previously held positions – one major example of this is Microsoft’s attitude towards open source software.  Steve Ballmer, the previous Microsoft CEO famously stated that Linux was a “cancer” however, the current Microsoft is embracing Linux on both Azure and for it’s .NET Development tools.    Liam states that Microsoft’s future is very much in the cloud, and that they’re investing heavily in Azure.  Liam shows some slides which acknowledge that Amazon has the largest share of the public cloud market (over 50%) whilst Azure only currently has around 9%, but that this figure is growing all the time.  He also talks about how Office 365 is a big driver for Microsoft's cloud and that we should just accept that Office has “won” (i.e. better than LibreOffice, OpenOffice etc.).  Liam wraps up his quick talk with something rather odd – a slide that shows a book about creating cat craft from cat hair!

The final grok talk was by Ben Hall, who introduced us very briefly to an interesting website that he’s created called Katacoda.  The website is an interactive learning platform and aims to help developers learn all about new and interesting technologies from right within their browser!  It allows developers to test out and play with a variety of new technologies (such as Docker, Kubernetes, Git, CoreOS, CI/CD with Jenkins etc.) right inside your browser in an interactive CLI!  He says it’s completely free and that they’re improving the number of “labs” being offered all the time.

IMG_20160903_143625After the grok talks, there was a little more time to grab some refreshments prior to the first session of the afternoon, and penultimate session of the day, João “Jota” Pedro Martins’Azure Service Fabric and the Actor Model”.  Jota’s session is all about Azure Service Fabric, what it is and how it can help you with distributed  applications in the cloud.  Azure Service Fabric is a PaaS v2 (Platform As A Service) which supports both stateful and stateless services using the Actor model.  It’s a platform for applications that are “born in the cloud”.  So what is the Actor Model?  Well, it’s a model of concurrent computation that treat “actors” – which are distinct, independent units of code – as the fundamental, core primitives of an application.  An application is composed of numerous actors, and these actors communicate with each other via messages rather than method calls.  Azure Service Fabric is built into Azure, but it’s also downloadable for free and can be used not only within Microsoft’s Azure cloud, but also inside the clouds of other providers too, such as Amazon’s AWS.  IMG_20160903_143735Azure Service Fabric is battle hardened, and has Microsoft’s long-standing “Project Orleans” at it’s core.

The “fabric” part of the name is effectively the “cluster” of nodes that run as part of the service fabric framework, this is usually based upon a minimum configuration of 1 primary node with at least 2 secondary nodes, but can be configured in numerous other ways.  The application’s “actors” run inside these nodes and communicate with each other via message passing.  Nodes are grouped into replica sets and will balance load between themselves and failover from one node to another if a node becomes unresponsive, taking “votes” upon who the primary node will be when required.  Your microservices within Service Fabric can be any executable process that you can run, such as an ASP.NET website, a C# class library, even a NodeJS application or even some Java application running inside a JVM.  Currently Azure Service Fabric doesn’t support Linux, but support for that is being developed.

Your microservices can be stateless or stateful.  Stateless services are simply as there’s no state to store, so messages consumed by the service are self-contained.  Stateful services can store state inside of Service Fabric itself, and Service Fabric will take care of making sure that the state data stored is replicated across nodes ensuring availability in the event of a node failure.  Service Fabric clusters can be upgraded with zero downtime, you can have part of the cluster responding to messages from a previous version of your microservice whilst other parts of the cluster, those that have already had the microservices upgraded to a new version, can process messages from your new microservice versions.  You can create a simple 5 node cluster on your own local development machine by downloading Azure Service Fabric using the Microsoft Web Platform Installer.

IMG_20160903_145754Jota shows us a quick demo, creating a service fabric solution within Visual Studio.  It has 2 projects within the solution, one is the actual project for your service and the other project is effectively metadata to help service fabric know how to instantiate and control your service (i.e. how many nodes within cluster etc.).  Service Fabric exposes a Reliable Services API and built on top of this is a Reliable Actors API.  It’s by implementing the interfaces from the Reliable Actors API that we create our own reliable services.  Actors operate in an asynchronous and single-threaded way.  Actors act as effectively singletons. Requests to an actor are serialized and processed one after the other and the runtime platform manages the lifetime and lifecycle of the individual actors.  Because of this, the whole system must expect that messages can be received by actors in a non-deterministic order.

Actors can implement timers (i.e. perform some action every X seconds) but “normal” timers will die if the Actor on a specific node dies and has to fail over to another node.  You can use a IActorReminder type reminder which effectively allow the same timer-based action processing but will survive and continue to work if an Actor has to failover to another node.  Jota reminds us that the Actor Model isn’t always appropriate to all circumstances and types of application development, for example, if you have some deep, long-running logic processing that must remain in memory with lots of data and state, it’s probably not suited to the Actos Model, but if your processing can be broken down into smaller, granular chunks which can handle and process the messages sent to them in any arbitrary order and you want to maximize easy scalability of your application, the Actors are a great model.  Remember, though, that since actor communicate via messages – which are passed over the network – you will have to contend with some latency.

IMG_20160903_151942Service Fabric contains an ActorProxy class.  The ActorProxy will retry failed sent messages, but there’s no “at-least-once” delivery guarantees - if you wish to ensure this, you'll need to ensure your actors are idempotent and can receive the same message multiple time.  It's also important to remember that concurrency is only turn-based, actors process messages one at a time in the order they receive them, which may not be the order they were sent in.  Jota talks about the built-in StateManager class of Service Fabric, which is how Service Fabric deals with persisting state for stateful services.  The StateManager has “"GetStateAsync and SetStateAsync methods which allow stateful actors to persist any arbitrary state (so long as it’s serializable).  One interesting observation of this is that the state is only persisted when the method that calls SetStateAsync has finished running. The state is not persisted immediately upon calling the SetStateAsync method!

Finally, Jota wraps up his talk with a brief summary.  He mentions how Service Fabric actors have behaviour and (optionally) state, are run in a performant, enterprise-ready scalable environment and are especially suited to web session state, shopping cart or any other scenarios with independent objects with their own lifetime, state and behaviour.  He does say that existing applications would probably need significant re-architecture to take advantage of Service Fabric, and that the Service Fabric API has some niggles which can be improved.

IMG_20160905_211919After João’s session, there’s time for one final quick refreshments break, which included a table full of various crisps, fruit and chocolate which had been left over from the excess lunches earlier in the afternoon as well as a lovely selection of various individually-wrapped biscuits!

Before long it was time for the final session of the day, this was Joseph Woodward’sBuilding Rich Client Applications with AngularJS2

Joe’s talk first takes us through the differences between AngularJS 1 and 2.  He states that, when AngularJS1 was first developed back in 2010, there wasn’t even any such thing as NodeJS!  AngularJS 1 was great for it’s time, but did have it’s share of problems.  It was written before ECMAScript 6/2015 was a de-facto standard in client-side scripting therefore it couldn’t benefit from classes, modules, promises or web components.  Eventually, though, the world changed and with both the introduction and ratification of ECMAScript 6 and the introduction of NodeJS, client side development was pushed forward massively.  We now had module loaders, and a component-driven approach to client-side web development, manifested by frameworks such as Facebook’s React that started to push the idea of bi-directional data flow.

IMG_20160903_155951Joe mentions how, with the advent of Angular2, it’s entire architecture is now component based.  It’s simpler too, so the controllers, scopes and directives of Angular1 are all now replaced with Components in Angular2 and the Services and Factories of Angular1 are now just Services in Angular2.  It is much more modular and has first class support for mobile, the desktop and the the web, being built on top of the EMCAScript 6 standard.

Joe mentions how Angular2 is written in Microsoft’s TypeScript language, a superset of JavaScript, that adds better type support and other benefits often found in more strongly-typed languages, such as interfaces.  He states that, since Angular2 itself is written in TypeScript, it’s best to write your own applications, which target Angular2, in TypeScript too.  Doing this allows for static analysis of your code (thus enforcing types etc.) as well as elimination of dead code via tree shaking which becomes a huge help when writing larger-scale applications.

Joe examines the Controller model used in Angular1 and talks about how controllers could communicate arbitrarily with pretty much any other controller within your application.  As your application grows larger, this becomes problematic as it becomes more difficult to reason about how events are flowing through your application.  This is especially true when trying to find the source of code that performs UI updates as these events are often cascaded through numerous layers of controllers.  In Angular2, however, this becomes much simpler as the component structure is that of a tree.  The tree is evaluated starting at the top and flowing down through the tree in a predictable manner.

IMG_20160903_160258_1In Angular2, Services take the place of the Services and Factories of Angular1 and Joe states how they’re really just JavaScript classes decorated with some additional attributes.  Joe further discusses how the very latest Release Candidate version of Angular2, RC6, has introduced the @NgModule directive.  NgModules allow you to build your application by acting as a container for a collection of services and components.  These are grouped together to for the module, from which your application can be built as a collection of one or more modules.  Joe talks about how components in Angular2 can be “nested”, allowing one parent component to contain the definition of further child components.  Data can flow between the parent and child components and this is all encapsulated from other components “outside”.

Next, Joe shows us some demos using a simple Angular2 application which displays a web page with a textbox and a number of other labels/boxes that are updated with the content of the textbox when that content changes.  The code is very simple for such a simple app, however, it shows how clearly defined and structured an Angular2 application can be.  Joe then changes the value of how many labels are created on the webpage to 10000 just to see how Angular2 copes with updating 10000 elements.  Although there’s some lag, as would be expected when performing this many independent updates, the performance isn’t too bad at all.

IMG_20160903_163209Finally, Joe talks about the future of Angular2.  The Angular team are going to improve static analysis and ensure that only used code and accessible code is included within the final minified JavaScript file.  There’ll be better tooling to allow generation of many of the “plumbing” around creating an Angular2 application as improvements around building and testing Angular2 applications.  Joe explains that this is a clear message that Angular2 is not just a framework, but a complete platform and that, although some developers are upset when Angular2 totally "changed the game" with no clear upgrade path from Aungular1, leaving a lot of A1 developers feeling left out, Google insist that Angular2 is developed in such a way that it can evolve incrementally over time as web technologies evolve and so there shouldn’t be the same kind of wholesale “break from the past” kind of re-development in the future of Angular as a platform.  Indeed, Google themselves are re-writing their AdWords product (an important product generating significant revenue for Google) using their own Dart language and using Angular2 as the platform.  And with that, Joe’s session came to an end.  He was so impressed with the size of his audience, though, that he insisted on taking a photo of us all, just to prove to his wife that we was talking to a big crowd!

After this final session of the day it was time for all the attendees to gather in the communal area for to customary “closing ceremony”.  This involved big thanks to all of the sponsors of the event as well as prize draw for numerous goodies.  Unfortunately, I didn’t win anything in the prize draws, but I’d had a brilliant time at my first DDD in Reading.  Here’s hoping that they continue the “original” DDD’s well into the future.

PANO_20160903_090157

UPDATE: Kevin O’Shaughnessy has also written a blog post reviewing his experience at DDD 11, which is an excellent read.  Apart from the session by Mark Rendle, Kevin attended entirely different sessions to me, so his review is well worth a read to get a fuller picture of the entire DDD event.