DDD North 2014 In Review

Outside the Entrance This past Saturday, 18th October 2014, saw another DDD (Developer, Developer, Developer) event.  This one was the 4th annual DDD North event, this year held at the University Of Leeds.

Communal Area After arriving and signing in, I proceeded through the corridors to the communal area where we were all greeted with a cup of coffee (or tea) and a nice Danish pastry!  It’s always a nice surprise to get a nice cake with your morning coffee, so although I wasn’t really hungry as I’d recently eaten a large breakfast, I decided that a Danish Pastry covered in sweet, sweet icing was too much of a temptation to be able to refuse!Danish Pastries  After this delightful breakfast, I headed down the corridor for the first of the day’s sessions.

The first session of the day is Liam Westley’sAn Actor’s Life For Me” which talks about parallel processing with multiple threads using the Task Parallel Library and utilising the Actor Model.  Liam introduces the Actor model and states it was first described by Carl Hewitt as early as 1973.  The dilemma we have for parallel processing is due to shared state, causing us to lock around areas of memory where multiple threads may try to access that state.  The Actor model solves this by not having shared state within the system, instead having each process take stateless data that is not shared and outputting stateless data to the next process in the processing pipeline.  Liam uses an analogy of making a cup of tea and the steps involved in that whilst also getting an itch that needs scratching whilst making that cup of tea.  The itch (and thus the scratch) can happen during any of the tea-making steps, thus increasing the combinations of how alternating between making tea and scratching can grow exponentially.Liam Westley's Actor Pattern

Liam talks about how CPU’s have been multi-threaded and multi-core for many years now, first arriving around the same time as .NET v1.0, whilst in the same time frame, our developer tools haven’t really kept up.  .NET 1.0 pretty much gave us raw access to how windows handles threads using the TheadPool, which meant managing multiple threads and sharing state between them was very difficult.  .NET 2.0 gave us a SynchronizationContext, but multi-threaded programming was still very hard.  Eventually, we got the much simplified Async & Await keywords, but now we have the Task Parallel Library which provides us with the Actor pattern.  This basically allows us to write our code in individual “blocks” which are essentially black boxes sharing no state with any other block.  We can then chain these blocks together into a processing pipeline, giving us the ability to perform some computational process without sharing state.

Liam then shows us a demo of a console application which produces an MD5 hash for a number of large files in a folder.  The first  iteration of the demo shows this happening without using the Task Parallel Library (TPL) and so performs no parallel processing and simply processes each file, one at a time on a single thread, taking some time to complete.  The second iteration Liam shows us uses the TPL, but still only works in a single-threaded manner by wrapping the hash calculation function as a TPL ActionBlock.  This iteration does the same as the single-threaded version, as again, no parallel processing is occurring.  The final iteration runs in a multi threaded manner by simply setting the block configuration (ExecutionDataFlowBlockOptions) property of MaxDegreeOfParallelism.  What’s really amazing about these ActionBlocks is that they inherently and implicitly handle all input and output buffering and queuing by themselves. This means we can add many blocks into the processing pipeline at a faster rate than they can be executed, and the TPL will handle the queuing for us.

20141018_095624 Liam next talks about separating the processing and calculating of the file hashes by performing these in a TransformBlock rather than an ActionBlock, and only using ActionBlocks to print the hash value to the UI.  The output of the TransformBlock (the hash value and the filename) is passed to the ActionBlock in the processing pipeline.

Liam then introduces the BufferBlock.  This acts as a propagator between other blocks and a FIFO queue of data.  Liam talks about how, in our example, we can add a BufferBlock in front of all of the TransformBlocks which will effectively evenly distribute the “load” as we provide the files to be hashed between the TransformBlocks. 

Next, Liam shows how we can use the LinkTo method which allows us to filter the passing of blocks along the processing pipeline, as the LinkTo method allows us to pass a predicate to perform the filtering.  This could be used (for example) to hash files of different types by different TransformBlocks (i.e. an MP3 file is processed differently than an MP4 file etc.).  Liam also introduces the TransformManyBlock which takes an IEnumerable of things to process.  This means we no longer have to have our own loop through each of the files to be processed, instead, we can simply pass in the contents of the folder’s files as a complete IEnumerable collection.

Finally, Liam mentions both the BroadcastBlock and the BatchBlock.  The Broadcast block is effectively a pub/sub mechanism as used in Message Buses etc. which allows fanning-out of the messages and broadcasting to other blocks.  The BatchBlock allows batching of inputs before passing the messages along the processing pipeline.

All in all, Liam’s talk was very informative and shows just how far we’ve come in our ability to relatively easily and simply perform parallel processing in a multi-threaded manner, taking advantage of all of the cores available to us on a modern day machine.  Liam’s demo code has been made available on GitHub for those interested in learning more.


20141018_110411 The next talk is Ian Cooper’sNot Just Layers! – What can pipelines and events do for you?”, which is a talk about Data Flow Architectures, and specifically Pipelines and Events.  Ian first talks about general software architecture and how processes evolve from basic application of a skill through to adoption of genuine craftsmanship and best-practices.  Software Architecture has many styles, but a single style can be explained as a series of component and connectors.  Components are the individual parts of an architecture that does something and the connectors are how multiple components talk to each other.

Ian states that Data Flow architectures are more driven by behaviour rather than state, and says that functional languages (such as F#) are better suited to behaviourally modelled architecture, whereas object oriented (OO) languages like C# are better suited to solve state driven processes and architectures.

Ian uses the KWIC (Keyword in context) algorithm, which is how Unix indexes text in its man pages, as the reference for the session.

Ian talks about pipes and filters, and states that it’s a flow of data processing along a pipeline of specific stages.  A push pipeline “pushes” tasks along the pipeline, the pipeline usually consisting of a pump at the front, which pushes data into the pipeline, with a series of filters which are the processing tasks and with each preceding filter responsible for pushing the data to the succeeding filter in the pipeline.  There’s also usually a sink at the end that provides the final end result.  There’s also Pull pipelines, of which .NET’s LINQ is an example, which have each filter further along the pipeline doing the pulling of the data from the previous filter, rather than the previous filter pushing the data on.

20141018_113104 Ian mentions how pipes and filters architecture is similar to a batch sequence architecture (See below for the subtle difference between them).  He talks about how errors that may happen in a long-running sequence that need the entire processing stream to be undo are better suited to a batch sequence architecture than a pipes and filter architecture, due to the more disconnected nature of the pipes and filter architecture.

Ian talks about parallel execution and the potential pub/sub problem of consumers awaiting data and not knowing when the entire workload is completed.  If individual steps are either faster or slower than the preceding or succeeding steps in the chain, this can cause problems with either no data, or too much data to process.  The solution to this problem is to introduce a “buffer” in between steps within the chain.  Such things as Message Queues (i.e. MSMQ, RabbitMQ etc) or in-memory caching mechanisms (such as those provided by tools like Redis) can offer this.

20141018_113427 Ian then show us an in-memory demo of a program using the pipes and filters architecture.  Ian states that, ideally, filters in a pipeline shouldn’t really know about other filters, but its okay for them to be aware of an abstraction of a new filter that’s next in the pipeline, but not the concrete instance of that filter.  Ian uses the KWIC algorithm for the demo code.  Ian shows the same demo using the manual pipeline and filters, and also a LINQ implementation.  The LINQ example has its filters implemented as fluent method calls simply chained together (i.e. TextLines.Shift(x=>x).RemoveNoise(x => x).Sort() etc.).  Ian then show the same example as written in F#.  This shows the pipeline, using F#’s pipeline operator “|>” is even simpler to see from the code that implements it.

Ian shows us the demo code using a message queue (using MSMQ behind the scenes), this shows a pull based pipeline where each filter down the chain pulls messages from a message queue to which messages are posted by the preceding filter in the pipeline chain.  Ian also shows us the pipeline running in a parallel manner, using the Task Parallel Library.  Each filter has distinct Inputs and Outputs defined as BlockingCollection<T> allowing the data to flow in and out, but to be blocked on the individual thread if the next filter in the pipeline isn’t ready to receive that data.

Finally, Ian talks about Batch Sequences and how they differ slightly from a pipes and filters architecture.  He talks about how you did Batch Sequencing many years ago with magnetic tapes being passed from one reel-to-reel processing machine to the next!  The main difference between Batch Sequence and Pipes and Filters is that in a batch sequence, each filter has to complete the entire workload of data before passing everything as output to the next filter in the chain.  By contrast, pipes and filters will have its filter only process one small piece of work or one individual piece of data before passing it down the processing chain.  This means that true pipes and filters is much better suited to being parallelized than a batch sequence architecture.


20141018_125418_LLS The next session is Richard Tasker’sBDD and why you should be doing it”.  Richard starts by introducing BDD (Behaviour Driven Development) and where it originated.  It was first proposed by Dan North as a “solution” to some of the failings of TDD such as: Where do you start with TDD? What to test and what not to test? and How much to test in one go?

Richard starts by talking about his first exposures to understanding BDD.  This started with writing expressive names for standard unit tests.  This helps understand what the test is testing and thus, what the code is doing.  I.e. the expression of a behaviour of the code.  It’s from here that we can see how we can make the mental leap from testing and exercising small methods of of code, but a more user-centric behaviour of the overall application.

Richard shows a series of Database Entity Relationship diagrams as the first mechanism he used to design an application used to model car parts and their relation to vehicles.  This had to go through a number of iterations to fully realise the entities involved and their relationships to each other and it wasn’t the most effective way to achieve the overall design.  Using a series of User Stories which could be turned into BDD tests was the way forward.

Richard next introduced the MoSCoW method as the way in which he started writing his BDD tests.  Using this method combined with the new style of user story templates emphasises the behaviour and business function.  Instead of writing “As a <type of user> I want <some functionality> so that <some benefit>”, we instead write, “In order to <achieve some value>, as a <type of user>, I should have <some functionality>”.  The last part of the user story gets the relevant must/should/could/won’t wording in order to help achieve effective prioritization with the customer.

Cynefin_as_of_1st_June_2014 Richard then introduces SpecFlow as his BDD tool of choice.  He shows a simple demo of a single SpecFlow acceptance test, backed by a number of standard unit tests.  Richard says that you probably don’t want to do this for every individual tiny part of your application as this can lead to an abundance of unit tests and further lead to a test maintenance burden.  To help solve this, Richard talks about Decision Frameworks, of which a popular one is called “Cynefin”.   It defines states of Obvious, Chaotic, Complex and Complicated.  Each area of the application and discrete pieces of functionality can be assessed to see which of the four Cynefin states they may fall into.  From here, we can decide how many or how few BDD Acceptance tests are best utilised for that feature to deliver the best return on investment.  Richard says that Acceptance tests are often best used in Complicated & Complex states, but are often less useful in Obvious & Chaotic states.

Richard closes his session with “why” we should be doing BDD.  He talks about many of the benefits of adopting BDD and says that it is a great helper for teams that are new to TDD.  Richard says that BDD helps to reduce communication barriers between the developers and other technical professionals and the perhaps less technical business stakeholders and that BDD also helps with prioritizing which features should be implemented before others.  BDD also helps with naming things and defining the specific behaviours of our application in a more user-oriented way and also helps to define the meaning of “done”. 


20141018_131051_LLS After Richard’s talk, it was lunchtime.  Lunch was served in the same communal area where we’d all gathered earlier at breakfast time and consisted to a rather nice sandwich, a bag of crisps and a drink.  It was nice that all three ingredients could be chosen by each individual attendee from a selection available.

20141018_131444_LLS After enjoying this very nice lunch, I decided to skip the Grok talks (these are short, 10 minute talks that generally happen over lunchtime at the various DDD conferences) and get some fresh air outside.  That didn’t last too long, as I found the Pack Horse pub just down the road from the area of the university used for the conference.  This is a pub belonging to a small local microbrewery called The Burley Street Brewhouse.  I decided I had to go in and sneak a cheeky pint of bitter as a lunchtime treat.  It was indeed a lovely pint and afterwards, I headed back to the university and to the DDD North conference.  I went back in via an entrance close to the communal area still housing some conference attendees and realised that a number of sandwiches and crisps were still available for any attendee that wanted 2nd helpings!  I was still a bit peckish after my liquid refreshment (and knowing that I wouldn’t be eating until quite late in the evening at the after conference Geek Dinner) I decided to go for seconds!  After enjoying my second helpings, I headed off for the first session of the afternoon.


20141018_143120_LLS The first afternoon session is Andrew MacDonald’sCQRS & Event Sourcing”.  Andrew first talks about the how & why of starting development in a brand new project.  Andrew has his own development project, treevue.com, for which he decided to try out CQRS and event sourcing as they were two new interesting techniques that Andrew believed could help with the development of his software.  treevue.com is a web product which offers virtual data rooms.  Andrew talks about the benefits of CQRS & Event sourcing such as allowing a truly abstracted data storage model, providing domain driven design without noise and that separating reads and writes to the data model via CQRS could open up new possibilities for the software.  Andrew states that it’s not appropriate for everything and quotes Udi Dahan who said that most people who have used CQRS shouldn’t have done so!

CQRS is Command Query Responsibility Segregation and allows commands (processes that alter our data) to be separate from and entirely distinct from Queries (processes that only read our data but don’t change it).  The models behind each of these can be entirely different, even when referring to the same domain entities, so a data model for reading (for example) a Customer type can have a different design when reading than when writing.

Architectures Compared_thumbAndrew talks about the overall architecture of a system that employs CQRS vs. one that doesn’t.  Without CQRS, reads and writes flow through the same layers of our application.  With CQRS, we can have entirely different architectures for reading vs. writing.  Usually the writing architecture is similar to the entire non-CQRS architecture, flowing through many layers including data access, validation layers etc., but often the reading architecture uses a much flatter set of layers to read the data as concerns such as validation are generally not required in this context.  The two separate reading and writing stacks can often even connect to separate databases which provide “eventual consistency” with each other.  This also means reading and writing can scale independently of each other, and given that many apps read far more than write, this can be invaluable.

image19 Andrew then introduces Event Sourcing which, whilst separate and different from CQRS, does play well with it.  Andrew shows a typical relational model of a purchase order with multiple purchase order line item types related to it and a separate shipping info type attached.  This model only allows us to see the state of the order and its data as it stands right now.  Event sourcing shows the timeline of events against the purchase order as each alteration to the entity is stored separately in an event queue/database.  i.e. A line item is added with an (incorrect) quantity of 4.  But corrected with a later event deducting 2 from the line item, leaving a line item with a correct quantity of 2.  This provides us with the ability to not only see how the data looks “right now”, but to be able to create the entire state of the entity model at any given point in time.

Andrew then proceeds to talk about Azure’s role in his treevue app and how he’s utilised Azure’s Table Storage as a first class citizen.  He then shows us a quick demo and some code using EventProcessors and CommandProcessors which effectively implement the CQRS pattern. 

Finally, Andrew shows how he uses something called a “snapshot” when reading domain aggregates, which is effectively a caching layer used to improve performance around building the domain aggregate models from the various events that make up a specific state of the model as at a certain point in time.  This is particularly important when running applications in the cloud and using such technology as Azure Table Storage, as this will only serve back a maximum of 1000 rows per query before you, as the developer, have to make further requests for more data.  Andrew points out that the demo code is available on GitHub for those interested in diving deeper and learning more from his own implementation.


20141018_154117_LLS The final session for today is David Whitney’sLessons Learnt running a public API”.  David is a freelance consultant who has worked for many companies writing large public API’s.  The company used for reference during David’s talk is the work he did with Just Giving.  David states how the project to build the Just Giving API grew so large that the API eventually became the company’s biggest revenue stream.

David’s talk is a fast-paced set of tips, tricks and lessons that he has personally learned over the many years working with clients developing their large public-facing API’s.

David starts with stating that your API is your public facing contract to the world, and that it will live or die by the strength of it’s documentation.  If it’s bad, people will write bad implementations, and you can’t blame them when that happens.  Documentation for APIs can either be created first, which then drives the design of the API, or it can be performed the other way around, where you write the API first and document it afterwards.  Either approach is viable, so long as documentation does indeed exist and is sufficiently comprehensive to allow your consumers to build quality implementations of your API.  David says it’s often best to host the docs with the API itself so that if you hit the API endpoint with a web browser as a human user, you’ll serve up the API documentation.

David states that the DTO’s returned from API calls should provide “examples” of themselves.  This is a simple mechanism that lets users “discover” your API and helps them to understand just how they should use it.  Code such as this:

public interface IProvideAnExampleOf<TMyself>
    ExampleOf<TMyself>[] BuildExample();

public class ExampleOf<T>
    public string Description { get; set; }
    public T Example { get; set; }

    public ExampleOf(string description, T example)
        Description = description;
        Example = example;

will enable your API to provide examples of itself to your users.  David states that anything you can do to help your API consumers will greatly cut down the inevitable avalanche of help requests that will hit you.

Following on from individual examples, it’s good to have your API and it’s documentation provide “recipes” for how to use large sections of your API and how to call discrete service endpoints in a coherent chain in order to achieve a specific outcome.  Recipes help your users to “fall into the pit of success”.  Providing things like a complete web application, ideally written in multiple languages, that exercises various parts of your API is even better.

David next talks about versioning of your API, and says that it’s something you have to ensure you have a policy on from Day 1.  Retrofitting versioning is very hard and often leads to broken or awkward implementations.  Adding version numbers to the URI is perhaps the easiest to achieve, but it’s not really the best approach.  It’s far better to add the API version in the HTTP header.

He continues by talking about modifying existing API calls.  Don’t.  Just don’t do it at any cost!  If you really must, you can add additional data to the return values of your API endpoints, but you must never change or remove anything that’s already there.  You must also never rename anything.  If you need to do any of this, use a new version.  This leads into Content Types, and here David states that you’ll really need to provide all the different content types that people will realistically use.  Whilst many web developers today see JSON as the de-facto standard, many companies – especially large enterprises – are still using XML as their de-facto standard.  Your API is going to have to support both.  David also mentions that JSONP is another, growing, standard that you may well have to support, but be careful if you do as you’ll need to be mindful of possible errors caused by CORS (Cross Origin Resource Sharing) which is the ability of resources such as JavaScript to be able to be called from domains other than the one where the resource is hosted.

David talks about the importance of making Statistics for your API available and public.  You need to ensure you’re gathering performance and other statistics on every method call.  One possibility is returning some statistics back to the consumer directly in the HTTP response header after every request to your API, such as the server name that serviced the consumer’s request.  This is especially useful if you’ve got a large server farm and need help debugging service call issues.  Also you should ensure you publically expose your statistics in a dashboard via status updates, uptime pages and more.  For one, it’ll help you deflect any criticisms that your performance is broken, and it’ll provide consumers with confidence that your API is up, that it stays up and that you’re on top of maintaining this.  (Unless, of course, your performance really is broken in which case that same fancy dashboard will help you have visibility into diagnosing and correcting the issue!).  David next mentions the importance of a good staging server for user testing.  Don’t simply expose an internal “test” server that you may have cobbled together.  David relates first hand experience of just how difficult it can be getting users to stop using your “test” server after you’ve allow them access!

20141018_162628_LLS The next part of the session focuses on the overall approach to design of your API.  David stresses that it’s good to go back and read the original documentation on RESTful architecture, written by Roy Fielding as a doctoral dissertation back in the year 2000.  Further, it’s important to lean on existing conventions – always return canonical URI’s rather than relative ones and always supply ID’s and URI’s when returning data that refers to any domain or service entity.  As well as ensuring you follow existing standards, it’s also important to investigate new, emerging standards too.  Standards such as HAL (Hypertext Application Language) and JSON API can ensure that should such standards quickly become mainstream, you can adapt your API to support them.

David continues his session by talking about the cardinal sins of API design.  First thing you must never do is this:

    "PageType": 1,
    "SomeText": "This is some text"

What, exactly, is PageType 1?  We’re talking, of course, about magic numbers.  Don’t do it.  This forces your consumers to go off and look it up in the documentation, and whilst that documentation should definitely exist, there’s no reason why you can’t provide a more meaningful value to your consumer.  You have to think like a consumer at all times and try to imagine the applications they’re going to build using your API.  Also, don’t ever ask a user for data that your API itself can’t supply – i.e. Don’t ever request some specific identifier for a resource if you don’t provide that identifier when returning that resource in other requests.  Build your services RESTfully, don’t build XML-RPC with SOAP envelopes.  Be resource oriented, and always ensure you use the correct HTTP verbs for all of your services actions – especially understand the difference between POST & PUT.

Make sure you understand multi-tenancy and how that will impact the design and implementation of your API.  Good load balancers and proxies can balance based on request headers, so it’s really easy and useful to provide multi-tenancy in this manner.  Also ensure you use a good sandbox environment for testing and don’t forget to implement good rate limiting!   Users and consumers will make mistakes in their code and you don’t want them to take down your service when they do.

David talks about error handling and says you should validate everything you can when requests are made to your API.  Try to return errors in batches if possible, and always make sure that error messages are useful and readable.  Similar the magic numbers above, don’t return only an arcane error code to your consumers and force them to have to cross reference it from deep within your documentation.

20141018_163740_LLS David moves onto authentication for your API and states that this is an area that can get a bit painful.  Basic HTTP Auth will get you going, and can be sufficient if your API is (and will remain) fairly small scale, however, if your API is large or likely to grow to a larger scale – and especially if your API will be used by users via third-parties, you’ll quickly grow out of Basic Auth and need something more robust.  He says that OpenAuth is the best worst alternative.  It provides good security but can be painful to implement.  Fortunately, there are many third-party providers out there to whom you can outsource your authorisation concerns.

David then discusses providing support for your API to your users.  He says the best approach is to simply put it all out there in the public domain.  This provides transparency which is a good thing, but can also encourage a “self-service” model where people within the community will start to help provide answers and solutions to other community members.  Something as simple as a Google Group or a tag on Stack Overflow can get you started.

David closes his session by stating that, as your API grows over time, always ensure that you’re never attempting to serve only a single customer.  Keep your API clean and generic and it will remain useful to all consumers, rather than compromising that usefulness for just a minority of users.  And finally, if your API is or will become a first-class product for your business, just as the Just Giving API became for them, make sure you have a full product team within your business to deal with its day to day operation and its ongoing maintenance and development.  It’s all too easy to think that the API isn’t strictly a “product” due to its highly technical and slightly opaque nature, however, doing so would be a mistake.


20141018_173357_LLS After David’s session, we all congregated in the main lecture theatre for the wrap up presentation from Andy Westgarth, one of the conference organisers.  This involved thanking the very generous sponsors of the event as without them there simply wouldn’t be a DDD conference, and it also involved a prize giving session – the prizes consisting of books, T-shirts, some Visual Studio headphones and a main prize of a Surface Pro 3!

After the excellent day, I headed to the pub which was very conveniently located immediately across the road from the venue entrance.  I had a few hours to kill until the Geek Dinner which was to be held later that evening at Pizza Express in Leed’s Corn Exchange.  I enjoyed a couple of pints of Leeds Pale Ale before heading off to the Pizza Express venue for my dinner.

20141018_224309_LLS The Geek Dinner was attended by approximately 40 people and a fantastic time was had by all.  I was sat close one of the day’s earlier speakers, Andrew MacDonald, and we had a good old chin wag about past projects, work, and life as a software developer in general.

Overall, the DDD North 2014 event and the Geek Dinner afterwards was a fantastic success, and a great time was had by all.  Andy promised that there’d be another one in 2015, which will be held back up in the North-East of England due to the alternating location of DDD North, so here’s looking forward to another wonderful DDD North conference in 2015.

DDD East Anglia 2014 Review

DDD East Anglia Entrance Well, it’s that time of year again when a few DDD events come around.  This past Saturday saw the 2nd ever DDD East Anglia, bigger and better than last year’s inaugural event.

I’d set off on the previous night and stayed over on the Friday night in Kettering.  I availed myself of Kettering town centre’s seemingly only remaining open pub, The Old Market Inn (the Cherry Tree two doors down was closed for refurbishment) and enjoyed a few pints before heading back to my B&B.  The following morning, after a hearty breakfast, I set off on the approximately 1 hour journey into Cambridge and to the West Road Concert Hall, the venue for this year’s DDD East Anglia.

After arriving at the venue and registering, I quickly grabbed a cup of water before heading off across the campus to the lecture rooms and the first session of the day.

The first session is David Simner’s “OWIN, Katana and ASP.NET vNext – Eliminating the pain of IIS”.  David starts by summing up the existing problems with Microsoft’s IIS Server such as its cryptic error messages when simply trying to create or add a new website through to differing versions with differing support for features on differing OS versions.  e.g. Only IIS 8+ supports WebSockets, and IIS8 requires Windows 8 - it can’t be installed on lower versions of Windows.

David continues by calling out “http.sys” - the core of servicing web requests on Windows.  It’s a kernel-space driver that handles the request, looks at the host headers, url etc. and then finds the user space process that will then service the request.  It’s also responsible for dealing with the cryptography layer for SSL packets.  Although http.sys is the “core” of IIS, Microsoft has opened up http.sys to allow other people to use it directly without going through IIS.

David mentions how some existing technologies already support “self-hosting” meaning they can service http requests without requiring IIS. These technologies include WebAPI, SignalR etc., however, the problem with this self-hosting is that these technologies can’t interoperate this way.  Eg. SignalR doesn’t work within WebAPI’s self-hosting.

David continues by introducing OWIN and Katana.  OWIN is the Open Web Interface for .NET and Katana is a Microsoft implementation of OWIN.  Since OWIN is open and anyone can write their own implementation of it, this opens up the entire “web processing” service on Windows and allow us to both remove the dependence on IIS as well as have many differing technologies easily interoperate within the OWIN framework.  New versions of IIS will effectively be OWIN “hosts” as well as Katana being an OWIN host.  Many other implementation written by independent parties could potentially exist, too.

David asks why we should care about all of this, and states that OWIN just “gets out your way” - the framework doesn’t hinder you when you’re trying to do things.  He says it simply “does what you want” and that it does this due to it’s rich eco-system and community providing many custom developments for hosts, middleware, servers and adapters (middleware is the layer that provides a web development framework, i.e. ASP.NET MVC, NancyFX etc. and an adapter is things like System.Web etc. which serves to pass the raw data from the request coming through http.sys to the middleware layer.)

20140913_101244_LLS The 2nd half of David’s talk is a demo of writing a simple web application (using VS 2013) that runs on top of OWIN/Katana.  David creates a standard “Web Application” in VS2013, but immediately pulls in the Nuget package OwinHost (This is actually Katana!).  To use Katana, we need a class with the “magic” name of “Startup” which Katana looks for at startup and runs it.  The Startup class has a single void method called Configuration that takes an IAppBuilder argument, this method runs once per application run and exists to configure the OWIN middleware layer.  This can include such calls as:

app.UseWebApi(new HttpConfiguration(blah blah configure WebAPI etc.); 
app.Use<[my own custom class that inherits from OwinMiddleware]>();

David starts with writing a test that checks for access to a non-existent page and ensure it returns a 404 error.  In order to perform this test, we can use a WebApp.Start method (which is part of the Microsoft.Owin.Hosting – This is the Katana implementation of an OWIN Host) and allows the test method to effectively start the web processing “process” in code.  The test can then perform things like:

var httpClient= new Httpclient(); 
var result = httpclient.GetAsync(“http://localhost:5555”); 
Assert.Equal(result.StatusCode, 404);

Using OWIN in this way, though, can lead to flaky tests due to how TCP ports work within Windows and the fact that even when the code has finished executing, it can be a while before windows will “tear down” the TCP port allowing other code to re-use it.  To get around this, we can use another Nuget package, Microsoft.OWIN.Testing, which allows us to effectively bypass sending the http request to an actual TCP port and process it directly in memory.  This means our tests don’t even need to use an actual URL!

David shows how easy it is to write your own middleware layer, which consists of his own custom class (inheriting from OwinMiddleware) which contains a single method that invokes the next “task” in the middleware processing chain, but then returns to the same method to check that we didn’t take too long to process that next method.  (This is easily done as each piece of middleware processing is an async Task allowing us to do things like:

context.Invoke(next middleware processing method).ContinueWith(_ => LogIfWeTookTooLong(context));

Ultimately, the aim with OWIN and Katana, is to make EVERTHING X-copy-able.  Literally no more installing or separately configuring things like IIS.  It can all be done within code to configure your application, which can then be simply x-copy’d from one place to another.


  20140913_103920_LLSThe next session up is Pete Smith’s “Beyond Responsive Design – UI for the Modern Web Application”

Pete starts by reminding us how we first built web applications for the desktop, then the mobile phone market exploded and we had to make our web apps work well on mobile phones, each of which had their own screen sizes/resolutions etc.  Pete talks about how normal desktop designed web apps don’t really look well on constrained mobile phone screens.  We first tried to solve it with responsive design, but that often leads to having to support multiple code bases, one for desktop and one for mobile.  Pete says that there’s many problems with web apps.  What do we do with all the screen space on a big desktop screen?  There’s no real design guidelines or principles. 

Pete starts to look at design paradigms on mobile apps and shows how menus work on Android using the Hamburger button that allows a menu to slide out from the side of the screen.  This is doable due to Android devices often having fairly large screens for a mobile device.  However, the concept of menus on iPhones (for example), where the screen is much narrower, don’t slide out (from the side of the screen) but rather slide up from the bottom of the screen.  Pete continues through other UI design patterns like dialogs, header bars and property sheets and how they exist for the same reasons, but are implemented entirely differently on desktops and each different mobile device.  Pete states that some of these design patterns work well, such as hamburger menus, and flyout property sheets (notifications), however, some don’t work so well, such as dialogs that purposely don’t fill the entire mobile device screen, but keep a small border around the dialog.  Pete says that screen real estate is at a premium on a mobile device, so why intentionally reserve a section of the screen that’s not used?

The homogenous approach to modern web app development is to use design patterns that work well on both desktop devices as well as mobile devices.  Pete uses the new Azure portal with its concept of “blades” of information that flyout and stack horizontally, but scroll vertically independently from each other.  This is a design paradigm that works well on both the desktop as well as translating well to mobile device “pages” (think of how android “pages” have header bars that have back and forward buttons).

Pete that shows us a demo of a fairly simple mock-up of the DDD East Anglia website and shows how the exact same design patterns of a hamburger menu (that flies in from the left) and “property sheets” that fly in from the right (used for speaker bio’s etc.) work exactly the same (with responsive design for the widths etc.) on both a desktop web app and on mobile devices such as an iPad.

20140913_113421_LLS Pete shows us the code for his sample application, showing some LESS stylesheets, which he says are invaluable for laying out an application like this as the actual page layout is best achieved by absolutely positioning many of the page elements (the hamburger menu, the header bar, the left-hand menu etc.) using LESS mixins.  The main page uses HTML5 semantic markup and simply includes the headerbar and the menu icons on it, the left-hand menu (that by default is visible on devices with an appropriate width) and an empty <main> section that will contain the individual pages that will be loaded dynamically with JavaScript.

Pete finalises by showing a “full-blown” application that he’s currently writing for his client company to show that this set of design paradigms does indeed scale to a complete large application!  Pete is very passionate about bringing a comprehensive set of working design guidelines and paradigms to the wider masses that he’s started his own open working group to do this, called OWAG – The Open Web Apps Group.  They can be found at:  http://www.github.com/owag


20140913_120744_LLS The next session is Matt Warren’s “Performance is a feature!” which tells us that performance of our applications is a first-class feature which should be treated the same as usability and all other basic functionality of our application.  Performance can be applied at every layer of our application from the UI right down to the database or even the “raw metal” of our servers, however, Matt’s talk will focus on extracting the best performance of the .NET CLR (Common Language Runtime) – Matt does briefly touch upon the raw metal, which he calls the “Mechanical Sympathy” layer and mentions to look into the Disruptor pattern which allows certain systems (for example, high frequency trading applications) to scale to processing many millions of messages per second!

Matt uses Stack Overflow as a good example of a company taking performance very seriously, and cites Jeff Atwood’s blog post, “Performance is a feature”, as well as some humorous quotations (See images) as something that can provide inspiration to for improvement.20140913_120734_LLS

Matt starts by asking Why does performance matter?, What do we need to know? and When do we need to optimize performance?

The Why starts by stating that it can save us money.  If we’re hosting in the cloud where we pay per hour, we can save money by extracting more performance from fewer resources.  Matt continues to say that we can also save power by increasing performance (and money too as a result) and furthermore, bad performance can lead to broken applications or lost customers if our applications are slow.

Matt does suggest that we need to be careful and land somewhere in the middle of the spectrum between “optimizing everything all the time” (which can back us into a corner) versus “don’t optimize anything” (the extreme end of the “performance optimization is the root of all evil” approach).  Matt mentions various quotes by famous software architects, such as Rico Mariani from Microsoft who states “Never give up your performance accidentally”.

Matt continues with the “What”.  He starts by saying that “averages are bad” (such as “average response time”), we need to look at the edge cases and the outlier values.  We also need useful and meaningful metrics and numbers around how we can measure our performance.  For web site response times, we can say that most users should see pages load in 0.5 to 1.5 seconds, and that almost no-one should wait longer than 3 seconds, however, how do we define “almost no-one”.  We need absolute numbers to ensure we can accurately measure and profile our performance.  Matt also states that there’s a known fact that if only 1% of pages take (for example) more than 3 seconds to load, much more than 1% of users will be affected by this!

Matt continues with the When?  He says that we absolutely need to measure our performance within our production environment.  This is totally necessary to ensure that we’re measuring based upon “real-world” usage of our applications and everything that entails. 

20140913_123553_LLS Matt talks about the How? of performance.  It’s all about measuring.  Measure, measure, measure!  Matt mentions the Stack Overflow developed “MiniProfiler” for measuring where the time is spent when rendering a complete webpage as well as OpServer, which will profile and measure the actual servers that serve up and process our application.  Matt talks about micro-benchmarking which is profiling small individual parts of our code, often just a single method.  He warns to be careful of the GC (Garbage collector) as this can and will interfere with our measurements and shows some code involving forcing a GC.Collect() before timing the code (usually using a Stopwatch instance) which can help.  He states that allocations (of memory) is cheap but cleaning up after memory is released, isn’t.  Another tool that can help with this is Microsoft’s “PerfView” tool which can be run on the server and will show (amongst lots of other useful information) how and where the Garbage Collector is being called to clean up after you.

Matt finishes up by saying that static classes, although often frowned upon for other reasons, can really help with performance improvements.  He says to not be afraid to write your own tools, citing Stack Overflow’s “Dapper” and “Jil” tools to perform their own database access and JSON processing, which has been, performance-wise, far better for them than other similar tools that are available.  He says the main thing, though, is to “know your platform”.  For us .NET developers, this is the CLR, and understanding its internals on a fundamental and deep level is essential for really maximizing the performance of our own code that runs on top of it.  Matt talks, finally, about how the team at Microsoft learned a lot of performance lessons when building the Roslyn compiler and how some seemingly unnecessary code can greatly help performance.  One example was a method writing to a log file and that adding .ToString() to int values before passing to the logger can prevent boxing of the values, thus having a beneficial knock-on effect on the Garbage Collector.


20140913_130008_LLS After Matt’s talk it was time for lunch.  As is the custom at these events, lunch was the usual brown-bag affair with a sandwich, a packet of crisps, some fruit and a bottle of water.  There were some grok talks happening over lunch in the main concert hall, and I managed to catch one given by Iris Classon on Windows Universal application development which is developing XAML based applications for both Windows desktop and Windows Phone.



20140913_145501_LLS After lunch is Mark Rendle’s “The vNext Big Thing – ASP.NET shrinks down and grows up”.  Mark’s talk is all about the next version of ASP.NET that is currently in development at Microsoft.  The entire redevelopment is based around slimming down ASP.NET and making the entire framework as modular and composable as possible.  This is largely as a response to other web frameworks that already offer this kind of platform, such as NodeJs.  Mark even calls it NodeCS!

Mark states that they’re making a minimalist framework and runtime and that it’s all being developed as fully open source.  It’s built so that everything is shippable as a Nuget package, and it’s all being written to use runtime compilation using the new Roslyn compiler.  One of the many benefits that this will bring is the ability to “hot-swop” components and assemblies that make up a web application without ever having to stop and re-start the application!  Mark gives the answer to “Why are Microsoft doing this?” by stating that it’s all about helping versioning of .NET frameworks, making the ASP.NET framework modular, so you only need to install the bits you need, and improving the overall performance of the framework.

The redevelopment of ASP.NET starts with a new CLR.  This is the “CoreCLR”.  This is a cut-down version of the existing .NET CLR and strips out everything that isn’t entirely necessary for the most “core” functions.  There’s no “System.Web” in the ASP.NET vNext version.  This means that there’s no longer any integrated pipeline and it also means that there’s no longer any ASP.NET WebForms!

As part of this complete re-development effort, we’ll get a brand new version of ASP.NET MVC.  This will be ASP.NET MVC 6.  The major new element to MVC 6 will be the “merging” of MVC and WebAPI.  They’ll now be both one and the same thing.  They’ll also be built to be very modular and MVC will finally become fully asynchronous just as WebAPI has been for some time already.  Due to this, one interesting thing to note is that the ubiquitous “Controller” base class that all of our MVC controllers have always inherited from is now entirely optional!

Mark continues by taking a look at another part of the complete ASP.NET re-boot.  Along with new MVC’s and WebAPI’s, we’ll also get a brand new version of the Entity Framework ORM.  This is Entity Framework 7 and most notable about this is that the entire notion of database first (or designer-driven) database mapping is going away entirely!  It’s code-first only!  There’ll also be no ADO.NET and Entity Framework will now finally feature first-class support for non-SQL databases (i.e. NoSQL/Document databases, Azure Tables).

The new version of ASP.NET will bring with it lots of command line tooling, and there’s also going to be first class support for both Mac and Linux.  The goal, ala NodeJS, is to be able to write your entire application in something as simple as a text editor, with all of the application and configuration code in simple text-based code files.  Of course, the next version of Visual Studio (codenamed, Visual Studio 14) will have full support for the new ASP.NET platform.  Mark also details how the configuration of ASP.NET vNext developed applications will no longer use XML (or even a web.config).  They’ll use the currently popular JSON format instead inside of a new “config.json” file.

Mark proceeds by showing us a quick demo of the various new command line tools which are all named starting with the letter K.  There’s KVM, which is the K Version Manager and is used for managing different versions of the .NET runtime and framework.  Then there is KPM which is the K Package Manager, and operates similar to many other package managers, such as NodeJS’s “npm”, and allows you to install packages and individual components of the ASP.NET stack.  The final command line tool is K itself.  This is the K Runtime, and its command line executable is simply called “K”.  It is a small, lightweight process that is the runtime core of ASP.NET vNext itself. 

Mark then shows us a very quick sample website that consists of nothing more than 2-3 lines of JSON configuration, only 1 line of real actual code (a call to app.UseStaticFiles() within the Startup class’s “Configure” method) and a single file of static html and the thing is up and running, writing the word “Hurrah” to the page.  The Startup.cs class is effectively a single class replacement for the entire web.config and the entire contents of the App_Start folder!   The Configure method of the Startup class is effectively a series of calls to various .UseXXX methods on the app object:


Mark shows us where all the source code is. It’s all right there on public GitHub repositories and the current compiled binaries and packages can be found on myget.org.  Mark closes the talk by showing the same simple web app from before, but now demonstrating that this web app, written using the “alpha” bits from ASP.NET vNext can be run on an Azure website instance quite easily.  He commits his sample code to a GitHub repository that is linked to auto-deploy to a newly created Azure website and lets us watch as Azure pulls down all the required NuGet packages and eventually compiles his simple web application is real-time and spins up the website in his browser!


20140913_155842_LLS The final talk of the day is Barbara Fusinska’s “Architecture – Why so serious?” talk. This talk is about Barbara’s belief that all software developers should be architects too.  She starts by asking “What is architecture?”.  There are a number of answers to this question, depending upon who you ask.  Network distribution, Software Components, Services, API’s, Infrastructure, Domain Design.  All of these and more can be a part of architecture. 

Barbara says her talk will be given by showing a simple demo application called “Let’s go out” which is a simple scheduler application.  She will show how architecture has permeated all the different parts of the application.  Barbara starts with the “basics”.  She broaches the subject of application configuration and says how it’s best to start as you mean to go on by using an Ioc Container to manage the relationships and dependencies between objects within the application.

She continues by saying that one of the biggest and most fundamental problems of virtually all applications is how to pass data between the code of our application and the database, and vice-versa.  She mentions ORM’s and suggests that the traditional large ORM’s are often far too complicated and can frequently bog us down with complexity.  She suggests that the more modern Micro-ORM’s (of which there are Dapper, PetaPOCO & Massive amongst others) offer a better approach and are a much more lightweight layer between the code and the data.  Micro-ORM’s “bring SQL to the front” which is, after all, what we use to talk to our database.  Barbara suggests that it’s often better to not attempt to entirely abstract the SQL away or attempt to hide it too much, as can often happen with a larger, more fully-featured ORM tool.  On the flip-side, Barbara says that full-blown ORMs will provide us with an implicit unit of work pattern implementation and are better suited to Domain driven design within the database layer.  For Barbara’s demo application, she uses Mark Rendle’s Simple.Data micro-ORM.

Barbara says that the Repository pattern is really an anti-pattern and that it doesn’t really do much for your application.  She talks about how repositories often will end up with many, many methods that are effectively doing very similar things, and are used in only one place within our application.  For example, we often end up with “FindCustomersByID”, “FindCustomersByName”, “FindCustomerByCategory” etc. that all effectively select data from the customers database table and only differ by how we filter the customers.

Barbara shows how her own “read model” is a single class that deals with only reading data from the database and actually lives very close to the code that will use it, often an MVC controller action.  This is similar to a CQRS pattern and the read model is very separate and distinct from the domain model.  Barbara shows how she uses a “command pattern” to provide the unit of work and the identity pattern for the ORM.  Barbara talks about the Services within her application and how these are very much all based upon the domain model.  She talks about only exposing a method to perform some functionality, rather than exposing properties for example.  This not just to the user, but to other programmers who might have access to our classes.  She makes the property accessors private to the class and only allows access to them via a public method.  She shows how her application allows moving a schedule entry, but the business rules should only allow it to be moved forward in time.  Exposing DateTime properties would allow setting any dates and times, including those in the past and thus violating the domain rules.  By only allowing these properties to be set via a public method, which performs this domain validation, the setting of the dates and times can be better controlled.

Barbara says that the Command pattern is actually a better approach than using Services as they can greatly reduce dependencies within things like MVC Controllers.  Rather than having dependencies on multiple services like this:

public void MyCustomerOrderController(ICustomerService customerService, IOrderService orderservice, IActivityService activityService)

Where this controller’s purpose is to provide a mechanism to work with Customers, the orders placed by those customers and the activity on those orders.  We can, instead, “wrap” these services up into commands.  These commands will, internally, use multiple services to implement a single domain “command” like so:

public void MyCustomerOrderController(IAddActivityToCustomerOrderCommand addActivityCommand)

Providing a single domain command to perform the specific domain action.  This means that the MVC Controller that’s used for the UI that allows customers to be added to activities only has one dependency, the Command class itself.


20140913_163951_LLS With the final session over, it was time to head back to the main concert hall to wrap up the days proceedings, thank all those who were involved in the event and to distribute the prizes, generously donated by the various event sponsors.  No prizes for me this time around, although some very lucky attendees won quite a few prizes each!

After the wrap up there was a drinks reception in the same concert hall building, however, I wasn’t able to attend this as I had to set off on the long journey back home.  It was another very successful DDD event, and I can’t wait until they do it all over again next year!

DDD North 2013 In Review


On Saturday 12th October 2013, in a slightly wet and windy Sunderland, the 3rd DDD North Developer conference took place.  DDD North events are free one day conferences for .NET and the wider development community, run by developers for developers.  This was the 3rd DDDNorth, and my 3rd DDD event in general (I’d missed the first DDD North, but did get to attend DDD East Anglia earlier this year) and this year’s DDDNorth was better than ever.


The day started when I arrived at the University Of Sunderland campus.  I was travelling from Newcastle after having travelled to the North-East on the Friday evening beforehand.  I’m lucky in that I have in-laws in Newcastle so was staying with them for the duration of the weekend making the journey to Sunderland fairly easy.  Well, not that easy.  I don’t really know Sunderland so I’d had to use my Sat-Nav which was great until we got close to the City Centre at which point my Sat-Nav took me on an interesting journey around Sunderland’s many roundabouts! :)


I eventually arrived at the Sir Tom Cowie Campus at the University of Sunderland and parked my car, thanks to the free (and ample) car parking provided by the university.


I’d arrived reasonably early for the registration, which opened at 8:30am, however there was still a small queue which I dutifully joined to wait to be signed in.  Once I was signed in, it was time to examine the goodie bag that had been handed to me upon entrance to what was inside.  There was some promotional material from some of the great sponsors of the events as well as a pen (very handy, as I always forget to bring pens to these events!) along with other interesting swag (the pen-cum-screwdriver was a particularly interesting item).


The very next task was to find breakfast!  Again, thanks to some of the great sponsors of DDDNorth, the organisers were able to put on some sausage and bacon breakfast rolls for the attendees.  This was a very welcome addition to the catering that was provided last time around at DDD North.



Once the bacon roll had been acquired, I was off to find perhaps the most important part of the morning’s requirements.  Caffeine.  Now equipped with a bacon roll and a cup of coffee, I was ready for the long but very exciting day of sessions ahead of me.


DDD North is somewhat larger than DDD East Anglia (although the latter will surely grow over time) so whereas DDD East Anglia had 3 parallel tracks of sessions, DDD North has 5!  This can frequently lead to difficulties in deciding which session to attend but it is really testament to the variety and quality of the sessions at DDD North.  So, having taken the difficult choices of which sessions to attend, I headed off the room for my first session.


20131012_092813The first session up was Phil Trelford’s F# Eye 4 the C# Guy.  This session was one of three sessions during the day dedicated to F#.  Phil’s session was aimed at developers currently using C# and he starts off by saying that, although F# offers some advantages over C#, there’s no “one true language” and it’s often the correct approach to use a combination of languages (both C# and F#) within a single application.  Phil goes on to talk about the number and variety of companies that are currently using and taking advantage of the features of F#.  F# was used within Halo 3 for the multi-player component which uses a self-improving machine learning algorithm to monitor, rate and intelligently match players with similar abilities together in games.  This same algorithm was also tweaked and later used within the Bing search engine to match adverts to search queries.  Phil also shares with us a quotation from a company called Kaggle who were previously predominantly a C# development team and who moved a lot of their C# code to F# with great success.  They said, that their F# was “consistently shorter, easier to read, easier to refactor and contained far fewer bugs” compared to the equivalent C# code.


Phil talks about the the features of the F# language next. It’s statically typed and multi-paradigm.  Phil states that it’s not entirely a functional language, but is really “functional first" and is also object-oriented.  It’s also completely open source!  Phil’s next step is to show a typical class in C#, the standard Person class with Name and Age properties:


public class Person
    private string _Name;
    private int _Age;

    public Person(string name, int age)
        _Name = name;
        _Age = age;

    public string Name
        get { return _Name; }
        set { _Name = value; }

    public int Age
        get { return _Age; }
        set { _Age = value; }

    public override string ToString()
        return string.Format("{0} {1}", _Name, _Age);


Phil’s point here is that although this is a simple class with only two properties, the amount of times that the word “name” or “age” is repeated is excessive.  Phil calls this the “Local Government Pattern” as everything has to be declared in triplicate! :)  Here’s the same class, with the same functionality, but written in F#:


namespace People

type Person (name, age) = 
    member person.Name = name
    member person.Age = age

    override person.ToString() = 
        sprintf "%s %d" name age


Much shorter, and with far less repetition.  But it can get far better than that.  Here’s the same class again (albeit minus the .ToString() override) in a single line of F#:


type Person = { Name: string, Age: int }


Phil continues his talk to discuss how, being a fully-fledged, first class citizen of a language in the .NET world, F# code and components can fully interact with C# components, and vice-versa.  F# also has the full extent of the .NET Framework at it’s disposal, too.  Phil shows some more F# code, this one being something called a “discriminated union”:


type Shape = 
      | Circle of float 
      | Square of float * float 
      | Rectangle of float 

I’d come across the discriminated unions before, but as an F# newbie, I only barely understood them.  Something that really helped me at least, as a C# guy, was when Phil explained the IL that is generated from the code.  In the above example, the Shape class is defined as an abstract base class and the Circle, Square and Rectangle classes are concrete implementations of the abstract Shape class!  Although thinking of these unions as base and derived classes isn’t strictly true when thinking of F# and it’s functional-style paradigm, it certainly helped me in mentally mapping something in F# back to the equivalent concept in C# (or a more OOP-style language).


Phil continues by mentioning some of the best ways to get up to speed with the F# language.  One of Phil’s favourite methods for complete F# newbies, is the F# Koans GitHub repository.  Based upon the Ruby Koans, this repository contains “broken” F# code that is covered by a number of unit tests.  You run the unit tests to see them fail and your job is to “fix” the broken code, usually by “filling in the blanks” that are purposely left there, thereby allowing the test to pass.  Each time you fix a test, you learn a little more about the F# syntax and the language constructs.  I’ve already tried the first few of these and they’re a really good mechanism for a beginner to use to get to grips with F#.  Phil states that he uses these Koans to train new hires in F# for the company he works for.  Phil also gives a special mention to the tryfsharp.org website which also allows newbies to F# to play with the language.  What’s special about tryfsharp.org is that you can try out the F# language entirely from within your web-browser, needing no other installed software on your PC.  It even contains full IntelliSense!


Phil’s talk continues with a discussion of meta-programming and F#’s “quotations”.  These are similar to C#’s Expressions but more powerful.  They’re a more advanced subject (and worthy of a talk all of their own no doubt) but effectively allow you to represent F# code in an expression tree which can be evaluated at runtime.  From here, we dive into BDD and testing of F# code in general.  Phil talks about a BDD library (his own, called TickSpec) and how even text-based BDD test definitions are much more terse within F# rather than the equivalent C# BDD definitions (See the TickSpec homepage for some examples of this).  Not only that, but Phil shows a remarkable ability to be able to debug his BDD text-based definitions within the Visual Studio IDE, including setting breakpoints, running the program in debug mode and breaking in his BDD text file!  He also tells a story of how he was able, with a full suite of unit and BDD tests wrapped around the code, to convert a 30,000+ line C# code base into a 200 line F# program that not only perfectly replicated the behaviour of the C# program, but was actually able to deliver even more – all within less than 1/10th of the lines of code!


Phil shows us his “Cellz” spread sheet application written in F# next.  He says it’s only a few hundred lines of code and is a good example of a medium sized F# program.  He also states that his implementation of the code that parses and interprets user-defined functions within the spread sheet “cell” is sometimes as little as one line of code!  We all ponder as to whether Excel’s implementations are as succinct! :)  As well as Cellz, there’s a number of other project’s of Phil’s that he tells us about.  One is a mocking framework, similar to C#’s Moq library, which of course, had to be called Foq!    There is also a “Mario” style game that we are shown that was created with the FunScript set of type providers allowing JavaScript to be created from F# code.  Phil also shows us a PacMan clone, running in the browser, created with only a few hundred lines of F# code.


Nearing the end of Phil’s talk, he shows us some further resources for our continued education, pointing out a number of books that cover the F# language.  Some specific recommendations are “Programming F#” as well as Phil’s own book (currently in early-access), “F# Deep Dives” which is co-authored by Tomas Petricek (whom I’d seen give an excellent talk on F# at DDD East Anglia).  Finally, Phil mentions that, although F# is a niche language with far fewer F# programmers than C# programmers, it’s a language that can command some impressive salaries! :)  Phil shows us a slide that indicates the UK average salary of F# programmers is almost twice that of a C# programmer.  So, there may not be as much demand for F# at the moment, but with that scarcity comes great rewards! :)


Overall, Phil’s talk was excellent and very enlightening.  It certainly helped me as a predominantly C# developer to get my head around the paradigm shift that is functional programming.  I’ve only scratched the surface so far, but I’m now curious to learn much more.



After a quick coffee break back in the main hall of the campus (during which time I was able to snaffle a sausage baguette which had been left over from the morning breakfast!), I headed off to one of the largest rooms being used during the entire conference for my next session.  This one was Kendall Miller’s Scaling Systems: Architectures That Grow.



Kendall opens his talk by saying that the entire session will be entirely technology agnostic.  He says that what he’s about to talk about are concepts that can apply right across the board and across the complete technology spectrum.  In fact, the concepts that Kendall is about to discuss regarding scalability in terms of how to achieve it and the things that can prevent you achieving it are not only technology agnostic, but they haven’t changed in over 30+ years of computing!


Kendall first asks, “What is scalability?”  Scaling is the ability for a system to cope under a certain demand.  That demand is clearly different for different systems.  Kendall shows us some slides that differentiate between the “big boys” such as Amazon, Microsoft, Twitter etc., who are scaling to anything between 30-60 million unique visitors per day and those of us mere mortals that only need to scale to a few thousand or even hundred users per day.  If we have a website that needs to handle 25,000 unique visitors per day, we can calculate that this is approximately 125,000 pages per day.  In the USA, there’s around 11 “high traffic” hours (these are the daytime hours, but spread across the many time zones of North America).  This gives us a requirement of around 12,000 pages/hour, and that divides down to only 3.3 pages per second.  This isn’t such a large amount to expect of our webserver and, in the grand scheme of things, is effectively “small fry” and should be easily achievable in any technology.  If we’re thinking about how we need our own systems to scale, it’s important to understand what we’re aiming for.  We may actually not need all that much scalability!  Scalability costs money, so we clearly don’t need to aim for scalability to millions of daily visitors to our site if we’re only likely to ever attract a few thousand.


We then ask, “What is availability?”  Availability is having a request being completed in a given amount of time.  It’s important to think about the different systems that this can apply to and the relative time that users of those systems will expect for a request to be completed.  For example, simply accessing a website from your browser is a request/response cycle that’s expected to be completed within a very short amount of time.  Delays here can turn people away from your website.  Contrast this with (for example) sending an email.  Here, it’s expected that email delivery won’t necessarily be instantaneous and the ability of the “system” in question to respond to the user’s request can take longer.  Of course, it’s expected that the email will eventually be delivered otherwise the system couldn’t be said to be “available”!


Regarding websites, Kendall mentions that in order to achieve scalability we need only concern ourselves with the dynamic pages.  Our static content should be inherently scalable in this day and age as scaling static content has long been a “solved problem”.  Geo-located CDN’s can help in this regard and have been used for a long time.  Kendall tells us that achieving scalability is simple in principle, but obviously much harder to implement in practice.  That said, once we understand the principles required for scalability, we can seek to ensure our implementations adhere to them.


There’s only 3 things required to make us scale.  And there’s only 1 thing that prevents us from scaling!


Kendall then introduces the 4 principles we need to be aware of:  ACD/C. 


This acronym is explained as Asynchronicity, Caching, Distribution & Consistency.  The first three are the principles which, when applied, give us scalability.  The last one, Consistency (or at least the need for our systems to remain in a consistent internal state) is the one that will stand in the way of scalability.  Kendall goes on to elaborate on each of the 4 principles, but he also re-orders them in the order in which they should be applied when attempting to implement scalability in a system that perhaps has none already.  We need to remember that scalability isn’t finite and that we need to ensure we work towards a scalability goal that makes sense for our application and it’s demands.


Kendall first introduces us to our own system’s architecture.  All systems have this architecture he says…!   Must admit, it’s a fairly popular one:




Kendall then talks about the principles we should apply, and the order in which we should apply them to an existing system in order to add scalability.


The first principle to add to a system is Caching.  Caching is almost always the easiest to start with to introduce some scalability in a system/application that needs it.  Caching is saving or storing the results of earlier work so that it can be reused at some later point in time.  After all, the very best performing queries are those ones that never have to be run!  Sometimes, caching alone can prevent around 99% of processing that really needn’t be done (i.e. a request for a specific webpage may well serve up the same page content over a long period of time, thus multiple requests within that time-scale can serve up the cached content).  Caching should be applied in front of everything that is time consuming and it’s easiest to apply in a left-to-right order (working from adding a cache in front of the web server, through to adding one in front of the application server, then finally the database server).


Once in place, the caches can use very simple strategies, as these can be incredibly effective despite their simplicity.  Microsoft’s Entity Framework uses a strategy that removes all cached entries as soon as a user commits a write (add/update/delete) to the database.  Whilst on the surface this may seem excessive to eradicate all of the cache, it’s really not as in the vast majority of systems, reads from the database outnumber writes by an order of magnitude.  For this reason, the cache is still incredibly effective and is still extensively re-used in real-world usage.  We’re reminded that applications ask lots of repeated questions.  Stateless applications even more so, but the answers to these questions rarely change.  Authorative information, such as the logged on user’s name, is expensive to repeatedly query for as it’s required so often.  Such information is the prime candidate to be cached.


An interesting point that Kendall makes here is to question the conventional wisdom that “the fewest lines of code is the fastest”.  He says that very often, that’s not really the case as very few lines of code in a method that is doing a lot of work implies that much of your processing is being off-loaded to other methods or classes that are doing your work for you.  This can often slow things down, especially if those other methods and/or classes are not specifically built to utilise cached data.  Very often, having more lines of code in a method can actually be the faster approach as your method is in total control of all of the processing work that needs to be done.  You’re doing all of the work yourself and so can ensure that the processing uses your newly cached data rather than expecting to have to read (or re-read it) from disk or database.


Distribution is the next thing to tackle after Caching.  Distribution is spreading the load around multiple servers and having many things doing your work for you rather than just one.  It’s important to note that the less state that’s held within your system, the better (and wider) you can distribute the load.  If we think of session state in a web application, such state will often prevent us from being able to fulfil user requests by any one of many different webservers.  We’re often in a position where we’ll require at least “Server Affinity” (also known as “sticky sessions”) to ensure that each specific user’s requests are always fulfilled by the same server in a given session.  Asynchronous code can really help here as it means that processing can be offloaded to other servers to be run in the background whilst the processing of the main work can continue to be performed in the foreground without having to wait for the response from the background processes.


Distribution is hardest when it comes to the database.  Databases, and indeed other forms of storage, are fundamentally state and scaling state is very difficult.  This is primarily due to the need to keep that state consistent across it’s distributed load.  This is the same consistency, or the requirement of consistency, that can hinder all manner of scalability and is one of the core principles.  One technique of scaling your storage layer is to use something called “Partitioned Storage Zones”.  These are similar to the server affinity (or sticky sessions) used on the web server when state needs to be maintained except that storage partitioning is usually more permanent.  We could have 5 separate database servers and split out (for example) 50 customers across those 5 database servers with 10 customers on each server.  We don’t need to synchronize the servers as any single given customer will only ever use the one server to which they’ve been permanently assigned.


After distribution comes Asynchronicity.  Asynchronicity (or Async for short) is always the hardest to implement and so is the last one to be attempted in order to provide scalability.  Async is the decoupling of operations to ensure that the minimum amount of work is performed within the “critical path” of the system.  The critical path is the processing that occurs to fulfil a user’s request end-to-end.  A user request to a web server for a given resource will require processing of the request, retrieval and processing of data before returning to the user.  If the retrieval and processing of data requires significant and time-consuming computation, it would be better if the user was not “held up” whilst waiting for the computation to complete, but for the response to be sent to the user in a more expedient fashion, with the results of the intensive computation delivered to the user at a later point in time.  Work should always be “queued” in this manner so that load is smoothed out across all servers and applications within the system.


One interesting Async technique, which is used by Amazon for their “recommendation” engine, is “Speculative Execution”.  This is some asynchronous processing that happens even though the user may never have explicitly requested such processing or may never even be around to see the results of such processing.  This is a perfectly legitimate approach and, whilst seemingly contrary to the notion of not doing any work unless you absolutely have to, “speculative execution” can actually yield performance gains.  It’s always done asynchronously so it’s never blocking the critical path of work being performed, and if the user does eventually require the results of the speculative execution, it’ll be pre-computed and cached so that it can be delivered to the user incredibly quickly.  Another good async technique is “scheduled requests”.  These are simply specific requests from the user for some computation work to be done, but the request is queued and the processing is performed at some later point in time.  Some good examples of these techniques are an intensive report generation request from the user that will have it’s results available later, or a “nightly process” that runs to compute some daily total figures (for example, the day’s financial trading figures).  When requested the next day, the previous day’s figures do not need to be computed in real-time at all and the system can simply retrieve the results from cache or persistent storage. This obviously improves the user’s perception of the overall speed of the system.  Amazon uses an interesting trick that actually goes against async in that they actually “wait” for an order’s email confirmation to be sent before displaying the order confirmation web page to the user.  It’s one of only a few areas of Amazon’s site that specifically isn’t async and is very intentionally done this way as the user’s perception of an order being truly finalized is of receiving the confirmation email in their inbox!


Kendall next talks about the final principle, which of the 4 principles is the one that actually prevents scalability, or at least complicates it significantly.  It’s the principle of Consistency.  Consistency is the degree to which all parties within the system observe some state that exists within the system at the same time.  Of course, the other principles of distribution and asynchronicity that help to provide scalability will directly impact the consistency of a system.  With this in mind, we need to recognize that scalability and scaling is very much about compromise.


There are numerous consistency challenges when scaling a system.  Singleton data structures (such as a numbering system that must remain contiguous) are particularly challenging as having multiple separate parts of a system that can generate the next number in sequence would require locking and synchronicity around the number generation in order to prevent the same number being used twice.  Kendall also talks about state that can be held at two separate endpoints of a process, such as a layer that reads some data from a database, and how this must be shared consistently – changes to the database after the data has been read must ideally be communicated to the layer that has previously read the data to be informed of the change.  Within the database context, this consistency extends to ensuring multiple database servers are kept consistent in the data that they hold and queries across partitioned datasets must be kept in sync.  All of these consistency challenges will cause compromise with the system, however, consistency can be achieved if the approach by the other 3 principles (Caching, Distribution & Async) are themselves implemented in a consistent manner and work towards the same goals.


Finally, Kendall discusses how we can actually implement all of these concepts within a real-world system.  The key to this is to test your existing system and gather as many timings and metrics as you possibly can.  Remember, scaling is about setting a realistic target that makes sense for your application.  Once armed with metrics and diagnostic data, we can set specific targets that our scalability must reach.  This could be something like, “all web pages must return to the user within 500ms”.  You would then start to implement, working from left to right within your architecture, and implementing the principles in the order of simplicity and which will provide the biggest return on investment. Caching first, then Distribution, finally Async.  But, importantly, when you hit your pre-defined target, you stop.  You’re done.



After another coffee break back in the main hall, during which time I was able to browse through the various stalls set up by the conference’s numerous sponsors, chat with some of the folks running those stalls, and even grab myself some of the swag that was spread around, it was time for the final session before lunch.  This one was Matthew Steeples’You’ve Got Your Compiler In My Service”.


Matthew’s talk was about the functionality and features that the upcoming Microsoft Roslyn project will offer to .NET developers.  Roslyn is a “compiler-as-a-service”.  This means that the C# compiler offered by Roslyn will be available to be interacted with via other C# code.  Traditionally, compilers – and the existing C# compiler is no exception – are effectively “black boxes” and operate in one direction only.  Raw source code is fed in at one end, and after “magic” happening in the middle, compiled executable binary code came out from the other end.  In the case of the C# compiler, it’s actually IL code that gets output, ready to be JIT’ed by the .NET runtime.  But once that IL is output, there’s really no simple way to return from the IL back to the original source code.  Roslyn will change that.


Roslyn represents a deconstruction of the existing C# compiler.  It’s exposes all of the compiler’s functionality publically allowing a developer to use Roslyn to construct new C# code with C# code!  Traditional compilers will follow a series of steps to convert the raw text-based source code into something that the compiler can understand in order to convert it into working machine code.  These steps can vary from one compiler to another, but generally consist of a step to first breakdown the text into individual words and characters that can be further processed.  This step is known as “parsing”.  Next, the parsed text must be examined for language keywords that the compiler understands as being part of the language, as well as user-defined variable names and other tokens.  This is known as “lexical analysis”.  This is followed by “syntax analysis”, which is the understanding of (and verification against) the syntactical rules of the language.  Next comes the “semantic analysis” which is the checking of the semantics of the languages expression (for example, ensuring that the expression with an if statement’s condition evaluates to a boolean).  Finally, after all of this analysis, “code generation” can take place.


Roslyn, on the other hand, takes a different approach, and effectively turns the compiler of both the C# and VB languages into a large object model, exposing an API that programmers can easily interact with (For example: An object called “CatchClause” exists within the Roslyn.Compiler namespace that effectively represents the “catch” statement from within the try..catch block).


Creating code via Roslyn is achieved by creating a top-level object known as a Syntax Tree.  Syntax Trees contain a vast hierarchy of child objects, literally as a tree data structure and usually contain multiple Compilation Units (a compilation unit is a single class or “module” of code).  Each compilation unit, in turn, contains further objects and nodes that represent (for example) a complete C# class, starting with the class declaration itself including its scope and modifiers, drilling down the the methods (and their scoping and modifiers) and ultimately the individual lines of code contained within.  These syntax trees ultimately represent an entire C# (or VB!) program and can either be declared and created within other C# code, or parsed from raw text.  Specifically, Syntax Trees have three important attributes.  They represent all of the source code in full fidelity meaning every keyword, every variable name, every operator.  In fact, they’ll represent everything right down to the whitespace.   The second important attribute of a Syntax Tree is that, due to the first attribute, they’re completely reversible.  This means that code parsed from a raw text file into the SyntaxTree object model, is completely reversible back to the raw text source code.  The third and final attribute is that of immutability.  Once created, Syntax Trees cannot be changed.  This means they’re completely thread-safe.


Syntax Trees break down all source code into only three types of object.  Nodes, Tokens and Trivia.  Nodes are syntactic constructs of the language like declarations, statements, clauses and expressions.  Nodes generally also act as parent objects for other child objects and nodes within the Syntax Tree.  Tokens are the individual language grammar keywords but can also be identifiers, literals and punctuation.  Tokens have properties that represent (for example) their type (a token representing a string literal in code will have a property that represents the fact that the literal is of type string) as well as other meta-data for the token, but tokens can never be parents of other objects within the Syntax Tree.  Finally, trivia, is everything else within the source code and are primarily concerned with largely insignificant text such as whitespace, comments, pre-processor directives etc.


The following bit of C# code shows how we can use Roslyn to parse a literal text representation of a simple “Hello World” application:


var tree = SyntaxTree.ParseText(@"
    using System;
    namespace HelloRoslyn
        class Program
            static void Main(string[] args)
                Console.WriteLine(""Hello World"");

Once this code has been executed, the tree variable will hold a complete syntax tree that represents the entire program as defined in the string literal.  Once created, tree variable’s syntax tree can be executed (i.e. the “Hello World” program can be run), it can be turned into IL (Intermediate Language), or turned back into the same source code!


The following C# code is the equivalent of the code above, except that here we’re not just parsing from the raw source code text, we’re actually creating and building up the syntax tree by hand using the built-in Roslyn objects that represent the various facets of the C# language:


using System;
using Roslyn.Compilers.CSharp;

namespace HelloRoslyn
  class Program
    static void Main()
      string program = Syntax.CompilationUnit(
        usings: Syntax.List(Syntax.UsingDirective(name: Syntax.ParseName("System"))),
        members: Syntax.List<MemberDeclarationSyntax>(
            name: Syntax.ParseName("HelloRoslyn"),
            members: Syntax.List<MemberDeclarationSyntax>(
                identifier: Syntax.Identifier("Program"),
                members: Syntax.List<MemberDeclarationSyntax>(
                    returnType: Syntax.PredefinedType(Syntax.Token(SyntaxKind.VoidKeyword)),
                    modifiers: Syntax.TokenList(Syntax.Token(SyntaxKind.StaticKeyword)),
                    identifier: Syntax.ParseToken("Main"),
                    parameterList: Syntax.ParameterList(),
                    bodyOpt: Syntax.Block(
                      statements: Syntax.List<StatementSyntax>(
                              kind: SyntaxKind.MemberAccessExpression,
                              expression: Syntax.IdentifierName("Console"),
                              name: Syntax.IdentifierName("WriteLine"),
                              operatorToken: Syntax.Token(SyntaxKind.DotToken)),
                              arguments: Syntax.SeparatedList(
                                  expression: Syntax.LiteralExpression(
                                    kind: SyntaxKind.StringLiteralExpression,
                                    token: Syntax.Literal("\"Hello world\"", "Hello world")

Phew!  That’s quite some code there to create the Syntax Tree for a simple “Hello World” console application!  Although Roslyn can be quite verbose, and building up syntax trees in code can be incredibly cumbersome, the functionality offered by Roslyn is incredibly powerful.  So, why on earth would we need this kind of functionality?


Well, one current simple usage of Roslyn is to create a “plug-in” for the Visual Studio IDE.  This plug-in can interact with the source code editor window to dynamically interrogate the current user edited source and perform alterations.  These could be refactoring and code generation, similar to the functionality that’s currently offered by the ReSharper or JustCode tools.  Of course, those tools can perform a myriad of interactions with the code editor windows of the Visual Studio IDE, however they probably currently have to implement their own parsing and translation engine over the code that’s edited by the user.  Roslyn makes this incredibly easy to accomplish within your own plug-in utilities.  Other usages of Roslyn include the ability for an application to dynamically “inject” code into itself.  At this point Matthew shows us a demo of a simple Windows Forms application with a simple textbox on the form.  He proceeds to type out a C# class declaration into the form’s textbox.  He ensures that this class declaration implements a specific interface that the Windows Forms application already knows about.  Once entered, the running WinForms app can take the raw text from the textbox, and using Roslyn, convert this text into a Syntax Tree.  This Syntax Tree can then be invoked as actual code, as though it were simply a part of the running application.  In this case, Matthew’s example has an interface the defines a single “GetDate” method that returns a string.  Matthew types his class into the WinForms textbox and returns the current Date and Time in the current locale.  This is then executed and invoked by the running application and the result is displayed on the Form.  Matthew then shows how the code within the textbox can be easily altered to return the same Date and Time but in the UTC time zone.  One click of a button and the new code is parsed, interpreted and invoked using Roslyn to immediately show the new result on the Windows Form.


Roslyn, as a new C# compiler, is itself written in C#.  Some of the current complexities with the Roslyn toolkit is that the current C# compiler, which is written in C++, doesn’t entirely conform to the C# specification.  This makes it fairly tricky to reproduce the compiler in accordance with the C# specification, and the current dilemma is whether Roslyn should embrace the C# specification entirely (thus making it slightly incompatible with the existing C# compiler) or whether to faithfully reproduce the existing C# compiler’s behaviour even though it doesn’t strictly conform to the specification.


Matthew wraps up his talk with a summary of the Roslyn compiler’s abilities, which are extensive and powerful despite it still only being a CTP (Community Technology Preview) of the final functionality, and offers the link to the area on MSDN where you can download Roslyn and learn all about this new “compiler-as-a-service” which will, eventually, become a standard part of Visual Studio and C# (and VB!) development in general.



After Matthew’s talk it was time for lunch.  Lunch at DDD North this year was just as great as last year.  We all wandered off to the main entrance hall where the staff of the venue were frantically trying to put out as many bags with a fantastic variety of sandwiches, fruit and chocolate bars as they could before the hoards of hungry developers came along to whisk them away.  The catering really was excellent as it was possible to pre-order specific lunches for those with specific dietary requirements, as well as ensuring there was a wide range of vegetarian options available too.


I examined the available options, which took a little while as I, too, have specific dietary requirements in that I’m a fussy bugger as I don’t like mayonnaise!  It took a little while to find a sandwich that didn’t come loaded with mayo, but after only a short while, I found one.  And a lovely sandwich it was too!  Along with my crisps, chocolate and fruit, I found a place to sit down and quietly eat my lunch whilst contemplating the quantity and quality of the information I’d learned so far.



During the lunch break, there were a number of “grok talks” taking place in the largest of the lecture theatres that were being used for the conference (this was the same theatre where Kendall Miller had given his talk earlier).  Whilst I always try to take in at least one or two (if not all) of the grok talks that take place during the DDD (and other) conferences, unfortunately on this occasion I was too busy stuffing my face, wandering around the main hall and browsing the many sponsors stands as well as chatting away to some old and new friends that I’d met up with there.  By the time I realised the grok talks were talking place, it was too late to attend.


After an lovely lunch, it was time for the first of the afternoon’s sessions, one of two remaining in the day.  This session saw us gathering in one of the lecture halls only to find that the projector had decided to stop working.  The DDD volunteers tried frantically to get the thing working again, but ultimately, it proved to be a futile endeavour.  Eventually, we were told to head across the campus to the other building that was being used for the conference and to a “spare room”, apparently reserved for such an eventuality.


After a brisk, but slightly soggy walk across the campus forecourt (the weather at this point was fairly miserable!) we entered the David Goldman Informatics Centre and trundled our way to the spare room.  We quickly sat ourselves down and the speaker quickly set himself up as we were now running slightly behind schedule.  So, without further ado, we kicked off the first afternoon session which was MongoDB For C# Developers, given by Simon Elliston Ball.


Simon’s talk was an introduction to the MongoDB No-SQL database and specifically how we as C# developers can utilise the functionality provided by MongoDB.  Mongo is a document-oriented database and stores it’s data as a collection of key/value pairs within a document.  These documents are then stored together as collections within a database.  A document can be thought of as a single row in a RDBMS database table, and the collection of documents can be thought of as the table itself, finally multiple collections are grouped together as a database, however, this analogy isn’t strictly correct.  This is very different from the relational structure you can can find in today’s popular database systems such as Microsoft’s SQL Server, Oracle, MySQL & IBM’s DB2 to name just a few of them.  Document oriented databases usually store their data represented in JSON format, and in the case of MongoDB, it uses a flavour of JSON known as BSON which is Binary JSON.  An example JSON document could something as simple as:


    "firstName": "John",
    "lastName": "Smith",
    "age": 25


However, the same document could be somewhat more complex, like this:


    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    "phoneNumbers": [
            "type": "home",
            "number": "212 555-1234"
            "type": "fax",
            "number": "646 555-4567"


This gives us an ability that RDBMS database don’t have and that’s the ability to nest multiple values for a single “key” in a single document.  RDBMS’s would require multiple tables joined together by a foreign key in order to represent this kind of data structure, but for document-oriented databases, this is fairly standard.  Furthermore, MongoDB is a schema-less database which means that documents within the same collection don’t even need to have the same structure.  We could take our two JSON examples from above and safely store them within the exact same collection in the same database!  Of course, we have to be careful when we’re reading them back out again, especially if we’re trying to deserialize the JSON into a C# class.  Importantly, as MongoDB uses BSON rather than JSON, it can offer strong typing of the values that are assigned to keys.  Within the .NET world, the MongoDB client framework allows decorating POCO classes with annotations that will aid in the mapping between the .NET data types and the BSON data types.


So, given this incredible flexibility of a document-oriented database, what are the downsides?  Well, there are no joins within MongoDB.  This means we can’t join documents (or records) from one collection with another as you could do with different tables within a RDBMS system.  If your data is very highly relational, a document-oriented database is probably not the right choice, but but a lot of data structures can be represented by documents.  MongoDB allows an individual document to be up to 16MB in size, and given that we can have multiple values for a given key within the document, we can probably represent an average hierarchical data/object graph using a single document.


Simon makes a comparison between MongoDB and another popular document-oriented database, RavenDB.  Simon highlights how RavenDB, being the newer document-oriented database offers ACID-compliance and transactions that stretch over multi-documents.  He states that MongoDB’s transactions are only per document.  MongoDB’s replication supports a Master-Slave configuration, but Raven’s replication is Master-Master and that MongoDB supports being used from within many different languages with native client libraries for JavaScript, Java, Python, Ruby, .NET, Scala, Erlang and many more.  RavenDB is effectively .NET only (at least as far as native client libraries go) however RavenDB does offer a REST-based API and is thus callable from any language that can reach a URI.


Simon continues by telling us about how we can get to play with MongoDB as C# developers.  The native C# MongoDB client library is distributed as a NuGet package which is easily installable from within any Visual Studio project.  The NuGet package contains the client library which enables easy access to a MongoDB Server instance from .NET as well as containing types that provides the aforementioned annotations to decorate your POCO classes to enable easy mapping of your .NET types to the MongoDB BSON types.  Once installed, accessing some data within a MongoDB database can be performed quite easily:


var client = new MongoClient(connectionString);
var server = client.GetServer(); 
var database = server.GetDatabase("MyDatabase");
var collection = database.GetCollection("MyCollection");


One of the nice things with MongoDB is that we don’t have to worry about explicitly closing or disposing of the resources that we’ve acquired with the above code.  Once these objects fall out of scope, the MongoDB client library will automatically close the database connection and release the connection back to the connection pool.  Of course, this can be done explicitly too, but it’s nice to know that failure to do so won’t leak resources.


Simon explains that all of Mongo’s operations are as “lazy” as they possibly can be, thus in the code above, we’re only going to hit the database to retrieve the documents from “MyCollection” once we start iterating over the collection variable.  The code above shows a simple query that simply returns all of the documents within a collection.  We can compose more complex queries in a number of ways, but perhaps the way that will be most familiar to C# developers is with LINQ-style query:


var readQuery = Query<Person>.EQ(p => p.PersonID == 2);
Person thePerson = personCollection.FindOne(readQuery);

This style of query allows retrieving a strongly-typed “Person” object using a Lambda expression as the argument to the EQ function of the Query object.  The resulting configured query object is then passed to the .FindOne method of the collection to allow retrieval of one specific Person object based upon the predicate of the query.  The newer versions of MongoDB support most of the available LINQ operators and expressions and collections can easily be exposed to the client code as an IQueryable:


var query =
   from person in personCollection.AsQueryable()
   where person.LastName == "Smith"
   select person;

foreach (var person in query)
// ....[snip]....


We can also create cursors to iterate over an entire collection of documents using the MongoCursor object:


MongoCursor<Person> personCursor = personCollection.FindAll();
personCursor.Skip = 100;
personCursor.Limit = 10;

foreach(var person in personCursor)
// .....[snip]....

Simon further explains how Mongo’s Update operations are trivially simple to perform too, often merely requiring the setting of the object properties, and calling the .Save method against the collection, passing in the updated object:


person.LastName = "Smith";

Simon tells us that MongoDB supports something known as “write concerns”.  This mechanism allows us to return control to our code only after the master database and all slave servers have been successfully updated with our changes.  Without these write concerns, control will return to our code before the changes have persisted across all database servers, returning control to our code after only the master server has been updated whilst the slaves continue to update asynchronously in the background.  Unlike most RDBMS systems, UPDATEs to MongoDB will, by default, only ever affect one document, and this is usually the first document that the update query finds.  If you wish to perform a multi document update, you must explicitly tell MongoDB to perform such an update.


As stated earlier, documents are limited to 16MB in size however MongoDB provides a way to store a large “blob” of data (for example, if you needed to store a video file) using a technology called GridFS.  GridFS sits on top of MongoDB and allows you to store a large amount of binary data in “chunks”, even if this data exceeds the 16MB document limit.  Large files are committed to the database with a simple command such as:


database.GridFS.Upload(filestream, "mybigvideo.wmv").


This will upload the large video file to the database, which will break down the file into many small chunks.  Querying and retrieving this data is as simple as retrieving a normal document, and the database and the database driver are responsible for re-combining all of the chunks of the file to allow you to retrieve the file correctly with no further work required on the developers behalf.


MongoDB supports GeoSpatial functionality which allows querying location and geographic data for results that are “near” or within a certain distance of a specific geographic location:


database = server.GetDatabase("MyDatabase");
var collection = database.GetCollection("MyCollection");
var query = Query.EQ("Landmarks.LandMarkType", new BsonString("Statue"));
double lon = 54.9117468;
double lat = -1.3737675;
var earthRadius = 6378.0; // km
var rangeInKm = 100.0; // km
var options = GeoNearOptions
              .SetMaxDistance(rangeInKm / earthRadius /* to radians */)
var results = collection.GeoNear(query, lat, lon, 10, options);

The above code sample would find all documents within the Landmarks collection that have a LandMarkType of Statue and which are also within 10 kilometres of our defined Latitude and Longitude position.


MongoDB also supports the ability to query and transform data using a “MapReduce”  algorithm.  MapReduce is a very powerful way in which a large set of data can be filtered, sorted (the “map” part) and summarised (the “reduce” part) using hand-crafted map and reduce functions.  These functions are written in JavaScript and are interpreted by the MongoDB database engine, which contains a full JavaScript interpreter and execution engine.  Using this MapReduce mechanism, a developer can perform many of the same kinds of complicated “grouping” and aggregation queries that RDBMS systems perform.  For example, the following sample query would iterate over the collection within the database and sum the count of documents, grouped together by the key:


var map =
    "function() {" +
    "    for (var key in this) {" +
    "        emit(key, { count : 1 });" +
    "    }" +

var reduce =
    "function(key, emits) {" +
    "    total = 0;" +
    "    for (var i in emits) {" +
    "        total += emits[i].count;" +
    "    }" +
    "    return { count : total };" +

var mr = collection.MapReduce(map, reduce);

Finally, Simon wraps up his talk by telling us about a Glimpse plug-in that he’s authored himself which can greatly help to understand exactly what is going on between the client-side code that talks to the MongoDB client library and the actual requests that are sent to the server, as well as being able to inspect the resulting responses.


After a short trip back across the campus to grab a coffee in the other building that contains the main entrance hall, as well as an array of crisps, chocolate and fruit (these were the “left-overs” from the lunch bags of earlier in the afternoon!) to keep us developers well fed and watered, I trundled back across the campus to the same David Goldman Informatics Centre building I’d been in previously to watch the final session of the day.  This session was another F# session (F# was a popular subject this year) called “You’ve Learned The Basics Of F#, What’s Next?” and given by Ian Russell.


The basis of Ian’s talk was to examine two specific features of F# that Ian thought offered a fantastic amount of productivity over other languages, and especially over other .NET languages.  These two features were Type Providers and the MailboxProcessor.


First up, Ian takes a look at Type Providers.  First introduced in F# 3.0, Ian starts by explaining that Type Providers provide type inference over third party data.  What this essentially means is that a type provider for something like (say) a database can give the F# IDE type inference over what types you’ll be working with from the database as soon as you’ve typed in the line of code that specifies the connection string!  Take a look at the sample code below:


open System.Linq
open Microsoft.FSharp.Data.TypeProviders
type SqlConnection =
    SqlDataConnection<ConnectionString = @"Data Source=.\sql2008r2;Initial Catalog=chinook;Integrated Security=True">

let db = SqlConnection.GetDataContext()

let table =
    query { for r in db.Artist do
    select r }


The really important line of code from the sample above is this one:


query { for r in db.Artist do

Note the db.Artist part.  There’s no type within the code that defines what artist is.  The FSharp Data Type Provider has asynchronously and in the background of the IDE quietly opened the SQL Server connection as soon as the connection string was specified in the code.  It’s examined the database referred to in the connection string and it has automatically generated the types base upon the tables and their columns within the database!


Ian highlights the fact that F#’s SQL Server type provider requires to mapping code to go from F# type in code to SQL Server entities.  The equivalent C# code using Entity Framework would be significantly more verbose.


Ian also shows how it’s easy to take the “raw” types captured by the type provider and wrap them up into a nicer pattern, in this case a repository:


type ChinookRepository () =
    member x.GetArtists () =
        use context = SqlConnection.GetDataContext()
        query { for g in context.Artist do
                select g }
        |> Seq.toList

let artists =


Ian explains how F# supports a “query” syntax that is very similar (but much better than) C# and LINQ’s query syntax, ie:


from x in y select new { TheID = x.Id, TheName = x.FirstName }


The reason that F#’s query syntax is far superior is that F# allow you to define your own query syntax keywords.  For example, you can define your own keyword, “top” which would implement “Select Top X” style functionality.  This effectively allows you to define your own DSL (Domain-Specific Language) within F#!


After the data type provider, Ian goes on to show us how the same functionality of early-binding and type inference to a third-party data source works equally well with local CSV data in a file.  He shares the following code with us:


open FSharp.Data

let csv = new CsvProvider<"500-uk.csv">()

let data =
    |> Seq.iter (fun t -> printf "%s %s\n" t.``First Name`` t.``Last Name``)


This code shows how you can easily express the columns from the CSV that you wish to work with by simply specifying the column name as a property of the type.  The actual type of this data is inferred from the data itself (numeric, string etc.) however, you can always explicitly specify the types should you desire.  Ian also shows how the exact same mechanism can even pull down data from an internet URI and infer strong types against it:


open FSharp.Data

let data = WorldBankData.GetDataContext()

data.Countries.``United Kingdom``.Indicators.``Central government debt, total (% of GDP)``
|> Seq.maxBy fst


The above code shows how simple and easy it is to consume data from the World Bank’s online data store in a strong, type inferred way.


This is all made possible thanks to the FSharp.Data library which is available as a NuGet package and is fully open-source and available on GitHub.  This library has the type providers for the World Bank and Freebase online data sources already built-in along with generic type providers for dealing with any CSV, JSON or XML file.  Ian tells us about a type provider that’s currently being developed to generically work against any REST service and will type infer the required F# objects and properties all in real-time simply from reading the data retrieved by the REST service.  Of course, you can create your own type providers to work with your own data sources in a strongly-typed, eagerly-inferred magical way! 


After this quick lap around type providers, Ian moves on to show us another well used and very useful feature of F#, the MailboxProcessor.  A MailboxProcessor is also sometimes known as an “Agent” (this name is frequently used in other functional languages) and effectively provides a stateless, dedicated message queue.  The MailboxProcessor consists of a lightweight message queue (the mailbox) and a message handler (the processor).  For code interacting with the MailboxProcessor, it’s all asynchronous, code can post messages to the message queue asynchronously (or synchronously if you prefer), however, internally the MailboxProcessor itself will only process it’s messages in a strictly synchronous manner and in a strict FIFO (First in, First Out) order, one message at a time.  This helps to maintain consistency of the queue.  Due to the MailboxProcessor exposing it’s messages asynchronously (but maintaining strict synchronicity internally), we don’t need to acquire locks when we’re dealing with the messages going in or coming out.  So, why is the MailboxProcessor so useful?


Well, Ian shows us a sample chat application that consists of simply posting messages to a MailboxProcessor.  The entire functionality of the chat application is contained within a single type/class:


type ChatMessage =
  | GetContent of AsyncReplyChannel<string>
  | SendMessage of string

let agent = Agent<_>.Start(fun agent ->
  let rec loop messages = async {

    // Pick next message from the mailbox
    let! msg = agent.Receive()
    match msg with
    | SendMessage msg ->
        // Add message to the list & continue
        return! loop (msg :: messages)

    | GetContent reply ->
        // Generate HTML with messages
        let sb = new StringBuilder()
        sb.Append("<ul>\n") |> ignore
        for msg in messages do
          sb.AppendFormat(" <li>{0}</li>\n", msg) |> ignore
        sb.Append("</ul>") |> ignore
        // Send it back as the reply
        return! loop messages }
  loop [] )

agent.Post(SendMessage "Welcome to F# chat implemented using agents!")
agent.Post(SendMessage "This is my second message to this chat room...")



The code above creates a single type (ChatRoom) that encapsulates all of the functionality required to “post” and “receive” messages from a MailboxProcessor – effectively mimicking the back and forth chat messages of a chat room.  Further code shows how this can be exposed over a webpage by utilising a HttpListener with another type:


let root = @"C:\Temp\Demo.ChatServer\"
let cts = new CancellationTokenSource()

("http://localhost:8082/", (fun (request, response) -> async {
  match request.Url.LocalPath with
  | "/post" ->
      // Send message to the chat room
  | "/chat" ->
      // Get messages from the chat room (asynchronously!)
      let! text = room.AsyncGetContent()
  | s ->
      // Handle an ordinary file request
      let file =
        root + (if s = "/" then "chat.html" else s.ToLower())
      if File.Exists(file) then
        let typ = contentTypes.[Path.GetExtension(file)]
        response.Reply(typ, File.ReadAllBytes(file))
        response.Reply(sprintf "File not found: %s" file) }),



This code shows how an F# type can be written to create a server which listens on a specific HTTP address and port and accepts messages to URL endpoints as part of the HTTP payload.  These messages are stored within the internal MailboxProcessor and subsequently retrieved to display on the webpage.  We can imagine two (or more) separate users with the same webpage open in their browser’s and each person’s messages getting both echoed back to themselves as well as being shown on each other user’s browsers.


Ian has actually coded up such a web application, with a slightly nicer UI, and ends off his demonstrations of the power of the MailboxProcessor by firing up two separate browsers on the same machine (mimicking two different users) and showing how chat messages from one user instantly and easily appear on the other user’s browser.  Amazingly, there’s a minimum of JavaScript involved in this demo, and even the back-end code that maintains the list of users and the list of messages is no more than a few screens full!


Ian wrapped up his talk by recapping the power of both Type Providers and the MailboxProcessor, and how both techniques build upon your existing F# knowledge and make the consumption and processing of data incredibly easy.



After Ian’s talk it was time for the final announcements of the day and the prize give away!  We all made our way back to the main building, and to the largest room, the Tom Cowie Lecture Theatre.


After a short while all of the DDD North attendees along with the speakers, and sponsors had assembled in the lecture theatre.  The main organiser of DDD North, Andy Westgarth, gave a short speech thanking the attendees and the sponsors.  I’d like to offer my thanks to the sponsors here also, because as Andy said, if it wasn’t for them there wouldn’t be a DDD North.  After Andy’s short speech a number of the sponsors took to the microphone to both offer their thanks to the organisers of the event and to give away some prizes!  One of the first was Rachel Hawley who had been manning the Telerik stand all day, and who lead the call for applause and thanks for Andy and his team.  After Rachel had given away a prize, Steve from Tinamous was up to thank everyone involved and to give away more prizes.  After Steve had given away his prize, Andy mentioned that Steve had generously put some money behind the bar for the after event Geek Dinner that was taking place later in the evening and that everyone’s first drink was on him!   Thanks Steve!


Steve was followed by representatives from the NDC Conference, a representative from Sage and various other speakers and sponsor staff, all giving a quick speech to thank to organisers and to state how much they’ve enjoyed sponsoring such a great community event as DDD North.


Of course, each of these sponsors had prizes to give away.  Each time, Andy would offer a bag of our feedback forms which we’d submitted at the end of each session and the sponsor would draw out a winning entry.  As is usual for me, I didn’t win anything, however, lots of people did and there were some great prizes on offer, including a stack of various books, some Camtasia software licenses along with a complete copy of Visual Studio Premium with MSDN!


After a final closing speech by Andy thanking everyone again and telling us that, although there’s no confirmed date or location for the next DDD North, it will definitely happen and it’ll be in a North-West location, as the intention is to alternate the location each time between a North East location and one in the North West in order to cover the entire “north” of England.


20131012_175549And with that, another fantastic DDD North event was over…   Except that it wasn’t.  Not quite yet!   Courtesy of Make It Sunderland and Sunderland Software City, they had agreed to host a “drinks reception” at the Sunderland Software City offices!  The organisers of DDD North had laid on a free bus transfer service for the short ride from Sunderland University to the location of the Sunderland Software City offices closer to Sunderland city centre.  Since I was in the car, I drove the short 10 minutes drive to the Sunderland Software City offices.  Of course, being in the car meant that my drinking was severely limited.


Around 80 of the 300+ attendees from DDD North made the trip to the drinks reception and we we’re treated with a small bar with two hand-pulled ales from the Maxim Brewery.  One was the famous Double Maxim and the other, Swedish Blonde.  Two fine ales and they were free all night long, for as long as the cask lasted (or at least for the 2 hours that the drinks reception event lasted)!


Being a big fan of real ales, it was at this point that I was kicking myself for having brought the car with me to DDD North.  I could have relatively easily taken the Metro train service from Newcastle to Sunderland, but alas, I was not to know this fantastic drinks reception would be so great or that there would be copious amounts of real-ale on offer.  In hindsight though, it was probably for the best that my ability to drink the endless free ale was curtailed!  :)



I made my way to a comfy seating area and was joined by Phil Trelford who had given the first talk of the day that I attended and who I had been chatting with off and on throughout the day, and also Sean Newham.  Later we were joined by another guy who’s name I forget (sorry).  We chatted about various things and had a really fun time.  It was here that Phil showed us an F# Type Provider that himself and his friend had written in a moment of inspiration that mimics the old “Choose Your Own Adventure” style books from the 1980’s by offering up the entire story within the Visual Studio IDE!


Not only were we supplied with free drinks for the evening, we were also supplied with a seemingly endless amount of nibbles, Hors d'oeuvre and tiny desserts and cakes.  These were brought to us with such an alarming frequency and never seemed to end!   Not that I’m complaining… Oh no.  They were delicious, but there was a real fear that the sheer amount of these lovely nibbles would ruin everyone’s appetite for the impending Geek Dinner.


There’s a tradition at DDD events to have a “geek dinner” after the event where attendees that wish to hang around can all go to a local restaurant and have their evening dinner together.  I’d never been to one of these geek dinner’s before, but on this occasion, I was able to attend.  Andy had selected a Chinese buffet restaurant, the Panda Oriental Buffet, mainly because it was a very short walk from the Sunderland Software City offices, and also presumably because they use Windows Azure to host their website!


After the excellent drinks reception was finished, we all wandered along the high street in Sunderland city centre to the restaurant.  It took a little while for us all to be seated, but we were all eventually in and were able to enjoy some nice Chinese food and continue to chat with fellow geeks and conference attendees.  I managed to speak with a few new faces, some guys who worked at Sage in Newcastle, some guys who worked at Black Marble in Yorkshire and a few other guys who’d travelled from Leeds.


After the meal, and with a full belly, I bid goodbye to my fellow geeks and set off back towards my car which I’d left parked outside the Sunderland Software City offices to head back to what was my home for that weekend, my in-law’s place in Newcastle.  A relatively short drive (approx. 30-40 minutes) away.


And so ended another great DDD event.  DDD North 2013 was superb.  The talks and the speakers were superb, and Andy and his team of helpers had, once again, arranged a conference with superb organisation. So, many thanks to those involved in putting on this conference, and of course, thanks to the sponsors without whom there would be no conference.  Here’s looking forward to another great DDD North in 2014.    I can’t wait!