DDD Southwest 2018 In Review
On Saturday 21st April 2018, at the Redcliffe Sixth Form centre, the 8th annual DDD South West conference took place.
This was my 3rd DDD South West conference overall and was another fantastic event in the DDD conference calendar.
I'd driven down to Bristol on the Friday evening and stayed overnight in a local hotel before making my way bright and early to the Redcliffe Sixth Form centre where the conference was held.
I'd not been able to source breakfast prior to arriving at the Redcliffe centre, however, that didn't matter as the conference organisers and sponsors had provided us with a lovely selection of Danish pastries and coffee. An excellent start to the day.
After the initial introductions from the organisers, it was time to head off to the various lecture rooms for the first session of the day. For me, this was to be Ian Russell's Outside-In TDD.
Ian starts by asking who in the audience has read the book "Growing Object-Oriented Software Guided By Tests" as a lot of Ian's session is based upon the guidance found within. The book, and the concept of Outside-In TDD is based upon a flavour of test-driven development known as "London-school" TDD, which is contrasted with the perhaps more well-known "Classic" school of TDD (often called the Chicago or Detroit school).
The differences of these two approaches to TDD are that the "Classic" school concentrates on state-based behaviour testing and triangulation whereas the "London" school concentrates on interaction, and end-to-end testing.
Ian talks about Acceptance tests and that they test all of your code end-to-end, but ideally not the code you don't "own". These style of tests, in conjunction with suites of unit tests deliver a truly outside-in level of testing, starting at the outer boundaries of your code and ensuring that the complete feature is tested.
We look at the four phases of acceptance tests. These are fixture setup, exercise the system-under-test (SUT), result verification and fixture teardown. We also learn of the FIRST mnemonic for unit tests - Fast, Isolated, Repeatable, Self-Validating and Timely. Each unit test should follow the "triple-A" pattern (Arrange, Act, Assert) and contain one logical assertion per test (note that this is not necessarily one single assertion in code, but one logical assertion of some functionality or result). Importantly, we must treat our test code with respect. It's just as important as the production code that it's testing.
Ian moves on to look at mocking. We learn that the difference between a Mock and a Stub is that Mocks are primarily for verifying behaviour whilst Stubs are for returning known data (i.e. verifying state).
From here we can see that an "inside-out" approach to testing has us performing triangulation and verifying state whereas an "outside-in" approach has us validating behaviours and interactions between our classes and functions. Importantly, we adopt a "tell, don't ask" approach to testing our code, meaning that we don't ask an object to tell us it's state (as per state-based testing used in Classic TDD) but rather, we tell the object what to do and assert that our methods were called.
Ian talks about a "walking skeleton". This is defined by Alistair Cockburn:
A Walking Skeleton is a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.
When performing Outside-In TDD, we should be building our system from a series of walking skeletons for each feature. Each "skeleton" is tested with outside-in acceptance tests and additional unit tests. We start with a failing acceptance test and then perform a standard red, green, refactor TDD cycle with our unit tests until our acceptance test passes. Start with an interface, then mock it and add unit tests, look for collaborators which can be other classes and/or methods. As you find the collaborators, create interfaces and mocks for those along with relevant unit tests. Continue building out until all collaborators have been identified and your suite of unit tests are all green. From here, your acceptance test which is testing the behaviour of the feature, should also pass.
One downside to this approach, especially when coming from a more traditional TDD background, is that the acceptance test can be "red" for a long time. This is in direct contradiction to "classic" TDD which states that no more production code must be written until the test is fixed to "go green".
To demonstrate this technique, Ian then shows us a some actual code for a Bank Kata. The code is available on GitHub. The Kata consists of modelling a basic Bank with Deposit, Withdraw and PrintStatement functions. Ian starts with a simple acceptance test that performs some calls to the various methods of deposit, withdraw and print statement and asserts that the final result is as expected.
In examining the acceptance test code, we can already see the initial collaborators that would be required. The first one is an Account against which amounts can be deposited or withdrawn. We can also see that there's likely to be a requirement for persisting data and something that will print our statement so we dive deeper to understand the classes and objects that might be needed to support this.
Ian shows us his unit tests which use example-based testing rather than exhaustive property-based tests. They're asserting that methods were called with relevant input data (i.e. behaviour) rather than testing on state.
Ian asks us to ponder why we'd use such way of building our our software. If we have an Account object, it's easy to think that we may have a
GetBalance method on it. However, this is asking rather than telling and the whole goal is to force our thinking to be the other way around (outside-in) and support "tell, don't ask" .
Ian summarises his session. Find the collaborators, verify the interactions, TDD all the way, design at each stage and accept that your acceptance tests will be failing for a while.
After Ian's session, it's time for our first coffee break back in the main recreation room of the sixth form centre. After a quick cup of coffee, it's time to head back to the lecture rooms for the next session of the morning. This one is Steve Gordon's Docker For .NET Developers.
Steve first tells us what Docker is and explains that it's a containerization platform. So what is a containerization platform? Well, we can think of containers as a kind of lightweight Virtual Machine (VM). It's not strictly correct, but it gives an idea. Docker containers differ from VM's in that they don't contain an operating system layer. The host OS offers a shared kernel and operating system host on top of which the Docker "engine" runs. On top of the engine are the containers. These can contain a complete environment for running an application, including all of its dependencies and runtimes - everything except the underlying operating system itself.
As a result of not needing the complete operating system within the container, unlike a virtual machine, Docker containers are much more lightweight. This also means they take up far less space on disk and represent a much smaller unit of deployment. They are also more reliable and quicker to start-up.
Steve talks about the story of how his employers, Madgex, have adopted Docker. The primary reason for adoption was the creation of a new reporting application called "Insights". The application was a greenfield development which was written using a microservices based approach, using .NET Core and ASP.NET Core with a front-end written in Vue.js Since each discrete piece of the application's functionality was isolated in it's own service, this made the microservices perfect candidates for containerization.
Steve tells us how the front-end developer workflow in the legacy system was quite clunky, requiring pulling down an entire monolithic Visual Studio solution which in turn required a complete build before work could begin. With the introduction of Docker containers, front-end developer workflow simply required the "pulling" of a pre-built Docker container from an online registry and work could begin.
Docker on Windows requires Windows 10 Pro. This is due to Docker leveraging the in-built Hyper-V virtualisation platform that's included within the Pro edition of Windows. Steve tells us that there's an alternative solution to running Docker on Windows called Docker Toolbox, which leverages Oracle's VirtualBox virtualisation platform and so can be run on Windows 10 Home edition. This, however, is considered a legacy platform and can be quite tricky to get working, so Steve highly recommends upgrading to Windows 10 Pro if you want to use Docker on Windows.
Steve then shows us a quick demo of containerizing a very simple .NET Core API application. It's very quick and easy to do with the help of a simple YAML file called a "Dockerfile" which tells docker the steps it needs to perform to build any required application code and how to package those build artefacts into a container. There's also an entry point to tell docker what process should be started on the container when the container is launched and which docker will monitor to ensure the health of the container.
Docker containers consist of "layers". Effectively, each section of Steve's dockerfile creates a layer and all of the layers are combined together to form the final container image. This allows containers to be rebuilt by simply rebuilding only those layers that have changed. This means that a .NET Core application, which will have an entire layer just for the .NET runtime, would not need this layer to be rebuilt when the application code changes, only the layer containing the application code would need to be rebuilt to reconstruct a new container.
As well as a dockerfile we can have a docker compose file. This docker-compose file allows "composing" multiple docker containers into one holistic service - i.e. a non-public API and a public API may be two separate containers but are "composed" together into a single holistic artefact which is deployed and started as one logical item.
Steve continues with the concept of orchestration. He says that when you're running docker containers in production, you'll need a way of managing multiple containers to support scaling, service discovery, health monitoring and more. One very popular orchestration system is Kubernetes. Since Steve's company was still getting used to Docker itself and was also using Amazon's Web Services to host their application, they decided to leverage Amazon's Elastic Container Service (ECS) instead. This gives you orchestration-as-a-service. AWS ECS is effectively free and you only pay for the underlying EC2 virtual machine instances that run your Docker containers. You can still use auto-scaled EC2 instances with ECS, so if one EC2 VM gets too full of containers, AWS can automatically create another EC2 instance to scale-up to meet the load. The EC2 instances used with AWS ECS are created from an Amazon supplied image that contains an operating system, the Docker engine and an ECS agent. The AWS ECS agent can talk to an AWS Load Balancer to expose the container to the outside world and allow traffic to hit those containers.
Steve tells us how he performs unit tests against his container as part of the dockerfile script. This allows him to have a failing container build if any unit tests fail. An interesting way to get advanced notification of issues with not just the application, but the deployment artefacts too.
Finally, Steve tells us learning Docker does have a learning curve but that this learning curve can be shallow. He suggests starting with a very basic dockerfile and enhancing in stages. Many teams start their transition to containers by containerizing something internal such as a build machine. Using pre-built containers from the Docker Registry can help to get started. He says that AWS ECS can be a little complex to get going with, and balancing of containers in production requires some deep thinking regarding memory and CPU requirements of the underlying virtual machine instances, but perseverance here will pay off. Steve also suggests to use CloudFormation templates to ease the set-up. Steve's slides for his talk are online, and available at http://bit.ly/dockerslides.
After Steve's session was over it was time for another coffee break. This time, we were treated to some snacks and fruit along with our coffee. After another quick cup of coffee and a packet of crisps, it was time to head back to the rooms for the last of the morning's sessions. This one was Andrew Chaa's Life Of A Garage Quant.
Andrew starts by telling us a little about his background. He's a contractor who is currently working at JustEat in London, but he'd previously worked at such places as Barclays where, despite being a developer, he was surrounded by traders. It was from overhearing their conversations that he got interested in trading. Andrew tells us that his talk will be all about trading with cryptocurrencies. He says that the main objective is not necessarily to make money but not to lose it!
Andrew first talks about BitCoin. He asks whether we should be buying BitCoin given its volatile nature. Some people say yes and others say no. But Andrew considers that BitCoin could be a future currency of the world, could easily increase in value and innovate how we think about and use money. These things are all possible, but not guaranteed, however, Andrew did find that BitCoin was very amenable to trading.
We first look at two different types of trading, Quantitative and Qualitative. Andrew tells us that investors such as Warren Buffet are the quintessential Qualitative investor. They buy based on the the intrinsic value of a stock and with an eye to seeing the value of the stock rise over time due to company growth. On the other side, we have Larry R. Williams who is the quintessential Quantitative investor. He buys based upon the "movement" of a stock within a shorter period of time without too much regard to the overall value of the company to whom the stock belongs.
Andrew continues to examine trading using BitCoin. It's available 24 hours a day with no minimum deposit, no risk of leverage (required borrowing) and can be traded in small units, but it has a high volatility. Andrew tells is that BitCoin is not an "upward moving asset", unlike a house whose value tends to rise over time. Company shares can generally be the same (so long as the company remains in business). So, this lack of upward movement will drive our strategy when trading BitCoins. When trading with BitCoin, we don't hoard or retain the asset, we only hold it for a short while. We monitor the price on a daily (or more frequent) basis, and when the price appears to be going up, we buy BitCoins. When the price starts to fall, we sell. Andrew himself holds his BitCoin assets for only around 24 hours, and never more than one week.
Andrew talks about Maximum Drawdown. This is the maximum loss from a peak to a trough of a portfolio. The maximum drawdown for BitCoin is 93%, which further highlights the volatility of BitCoin as an asset. Andrew states that one of the issues to overcome when trading is that of loss aversion. This is the psychological phenomenon that makes us value not losing some money over gaining the same amount of money. Although it's instinctive to feel this, we shouldn't listen to it.
We talk about the various BitCoin exchanges where we can perform our trades. Andrew says that he uses GDAX which is a brand of CoinBase aimed at more serious traders and doesn't charge for making individual trade transactions (but does charge for withdrawing BitCoin from the exchange in the form of fiat currency).
Andrew then talks about how to define our trading strategy. He mentions backtesting which involves looking at historical data for a stock to test out theories or strategies to see the likelihood of a loss or a gain. We're reminded that even successful, professional traders have only a 30-40% "win" ratio on their trades. Of course, it's about ensuring that the gains are as large as can be, whilst the losses are minimized.
We look at the concept of a candlestick chart which helps to see the difference between the opening and closing prices on a given day. Of course, we always want to buy low and sell high, but what is a low price? We need to follow the price momentum in order to find out. It's also worth noting that, generally, when a price is high and is on an upwards trajectory, it goes up even further. The equivalent usually applies when a price is on a downwards trajectory, it tends to drop even further.
Andrew suggests to use a five day moving average. If the current price is lower than the five day moving average, then sell. If higher, then buy. In calculating the moving average, Andrew waits until there's an upwards or downwards "trend" for two days in a row before actually making the trade transaction to buy or sell. This allows smoothing out anomalies where the price can dip (or rise) for only one day before bouncing back to it's "normal" level.
He also uses something called a stop order. With every "buy" transaction, he calculates the price minus 20%. If the value of his stock should ever drop below this, then he will sell that stock, irrespective of the moving average as this helps to limit losses. Andrew states that he calculated his five day moving average on a daily basis, but the GDAX API provides for 15 minute updates, so he could calculate on a more frequent basis if he wanted to.
We briefly look at other cryptocurrencies such as LiteCoin and Ethereum. Andrew also trades using these currencies and believes that they're better for trading with than BitCoin. BitCoin, due to its popularity, can be quite a "noisy" stock (meaning that the price can fluctuate quite wildly) whereas LiteCoin and Ethereum, being less popular, are much more predictable.
Andrew tells us about his own software that he created to help him perform all of his cryptocurrency trades. It's an open-source piece of software called CoinSong, and is available on GitHub. We also examine so alternative strategies for trading. One is Daily Momentum where we can calculate the range of the price fluctuation throughout the prior day from the lowest point to the highest. We then buy if the price is over a certain figure, usually the range multiplied by either 0.5, 0.8 or 1, and we then always sell at the end of the day. This is a successful strategy of Larry Williams.
We also look at how strategies can be combined. For example, we can use a moving average strategy along with the daily momentum. Combining these strategies will limit the possible gains, but also limit losses too. It's also possible to use certain "noise filtering" strategies in combination with all of the previously discussed strategies as a way of smoothing out the big spikes in the data.
Finally, Andrew wraps us his talk by pointing us to his GitHub account where we can find the slides for his talk.
After Andrew's talk it was time for lunch. The lunch at the DDD South West conferences is usually quite special and again this year they didn't disappoint. We were treated to a delicious hot pastie, either cheese and onion or traditional steak, along with some water and a packet of crisps. The DDD South West pasties are quite famous these days!
I took my lunch and sat on one of the many comfortable seats dotted around the main room of the sixth form centre. The day had started off quite nice and although there'd been a small downpour of rain, it had now brightened up again quite nicely.
The conference organisers had arranged for a few quick lightning talks to take place over the lunch break, which is customary for the DDD events, however, after finishing my delicious (and very filling) lunch, I decided it was time to go for a little walk outside and make the most of the rare sunshine!
After my walk, I made my way back inside to the sixth form centre in readiness for the first of the afternoon's sessions. This one was to be Yan Cui's Serverless Design Patterns.
Yan introduces himself and says that he works for a company called DAZN which is like a "Netflix for sports". As part of his job, Yan has developed many "serverless" components as part of DAZN's overall technology solution.
We start by asking "What is Serverless?" and consider that many people will answer that there's still a server somewhere! Yan shares a quote from Gojko Adzic who says:
"It's serverless in the same way WiFi is wireless"
Yan believes that "serverless" software and components are truly the future and he shares another quote from a prominent business thinker, Simon Wardley, who says:
"serverless will fundamentally change how we think about and use technology and write code"
We look at FaaS - Functions as a service. Amazon's Web Services offers Lambda for this, and Azure has Azure Functions. You no longer have to worry about maintenance of anything in the infrastructure, only the function itself, and you only pay for what you use. Functions-as-a-service offer better scalability, are cheaper than running a server and offer resilience and redundancy all for free and out-of-the-box.
When doing serverless development, having an event-driven architecture within your solution is effectively forced on you. Your entire code is written to respond to events and messages passed around the system, rather than direct function calls. Yan states that this is a good thing.
Yan looks at a few patterns that help with the development of serverless code and applications. The first pattern is CRON. Yan uses an example for AWS and states that we can use AWS's CloudWatch events to invoke an AWS Lambda function to perform some other actions within our infrastructure. This could something like spinning up additional servers. This pattern is frequently used by development teams to manage cloud costs by shutting down development and QA environments when no one is going to be using them (usually overnight), and spinning them back up the next morning.
Next we look at AWS Cognito, which offers federated identities, and can work to make all of your authentication and authorisation for your application serverless. We look at AWS S3 storage and see how we can create "data lakes" within S3 buckets. AWS KMS service can then process that data using machine learning algorithms. If we have a lot of data that needs to be constantly fed into the data lake, we can make use of AWS Kinesis to create a firehose of data that "streams" into S3. From here, we can use AWS's Athena interactive query service to almost instantly analyse that data.
Yan moves on to the next pattern to examine and that is Background Processing. We look at how, when a client calls your API, you can return a result immediately, however if there is some heavy processing that's needed as part of that API call, we should return a 202 status (Request Accepted) along with a new location, after dispatching a background worker to do the actual work. The client must then poll the new location which is, ultimately, where the result of the background processing task will be.
We should use a created date on the message that gets sent to the background processor. This allows us to ensure that, if something goes catastrophically wrong, the client can still be notified that they're never going to get a result rather than have them poll the result location forever and not receive the result they expect. Use can use AWS Gateway to invoke a Lambda function to write messages to an AWS Simple Queuing Service (SQS) queue. A background process can poll these messages and perform the required actions when they arrive. Unfortunately, it's not currently possible to invoke an AWS Lambda direct from the receipt of an SQS message, but this is functionality that is apparently coming soon from Amazon. Yan suggests that, although AWS's SNS (Simple Notification Service) exists, it's a poor choice for invoking background processes from messages as spikes of load is then pushed to downstream systems. It's far better to amortise messages so that the message handling process can group multiple copies of the same message into one single message that is subsequently passed to other downstream systems.
We move on to look at another serverless design pattern. This one is PubSub, or Publisher-Subscriber. AWS's SNS service is a good choice for this as subscribers will frequently want to receive all messages created by the publisher. AWS Kinesis is also a good choice to implement PubSub as it can make invocations to multiple lambda functions, and perform retries in the event that a given Lambda function fails, and collate the results from all functions into one.
In looking at AWS Lambda functions, we need to be careful with having Lambda invoke other AWS Services as failures can be quite messy. For example, if a downstream process invoked from the Lambda fails, is the Lambda responsible for performing retries (or resource clean-up) against the service? It's not always an easy or clear choice.
Our last pattern to look at is Sagas. These are for long running processes and transactions. A good example of this is booking a holiday. We need to book a flight first, then maybe a car, and then the hotel. Each booking is a discrete event itself, however, the entire process is effectively wrapped in an implicit transaction. If we fail at any point within the process, we'll need to roll back each individual booking to cancel out of the entire process. AWS provides Step Functions, which are a great fit for implementing Sagas. Each "step" within a step function configuration can have it's own retry policy as part of the overall workflow, and step functions have a very long running life time. They only "timeout" after one year!
Finally, Yan shares a tip for improving cold start times on AWS Lambda functions, something that many of the attendees of Yan's talk have had issues with in the past. He says the best thing to do is to look at ways to reduce the memory requirements of the function itself. Memory requirements of AWS Lambda functions are closely tied to the CPU requirements also, so a bigger memory requirement means that AWS will allocate sufficient memory along with more CPU. This increased resource allocation can take time, so reduction of memory requirements also reduces CPU requirements and means that AWS can use smaller underlying resources which can be started up quicker.
After Yan's talk, it was time for the afternoon break. Another DDD South West tradition is cream teas in the afternoon, and again we were not disappointed this year.
There was plenty of cream teas to go around, thanks to the sponsors of the event, so we all enjoyed at least a couple of scones with our afternoon tea and coffee.
After the afternoon break was over, it was time for the final session of the day. This one was to be Jess Panni's Lessons learned ingesting, securing and processing 200 million messages per day.
Jess introduces himself and says that he works for Endjin, one of the sponsors of the DDD event itself. Endjin are a consultancy who work with numerous clients and build varied technological solutions for them. As part of this work, Jess' talk is to tell us about how a solution for one of their clients involved building a system to effectively scale to deal with processing millions of messages per day.
The scale of the system was to process 200 million messages per day. This was 60GB of data per day and 21.3TB of data per year. It was a fairly big system! Since this system was hosted in the cloud, Jess talks about the infrastructure and cloud services that they used to accomplish this. They looked at Azure Event Hubs. This is a service that can stream "cloud-scale" amounts of data so they seemed like a logical choice. They needed to then store this data once ingested. The client knew that they wanted to retain all of this data, but didn't (yet) know what they wanted to do with it, so Azure Data Lakes was chosen as a way to retain all of the ingested data. Azure Data Lakes also allow ad-hoc querying of their data so they were doubly useful.
Jess tells us that Endjin take a "swiss cheese" approach when working with their clients. They look at what the worst possible things that can happen are and then design barriers that act as layers of defence to help prevent and mitigate those possibilities. Application Insights is used at each stage within ingest, prepare, analyse, publish, and the consumption of "slices" of data, to provide monitoring and telemetry, helping to detect issues with the infrastructure.
Jess talks about one of the big problems that they needed to overcome with certain clients. This is clients who refuse to allow automated deployments into the cloud. In this case, it's difficult to keep secrets and certificates secure. They overcame this by building a "certificate exchange". This is a system that runs on-premise and which requires two people to log on and authenticate with the system in order for it to talk to an Azure hosted API and securely request the required certificate from Azure, which sends it back and is then stored within the secure certificate store in the on-premise system. This then acts as a "relay" which, after the previous authentication dance, can then securely send data from the on-premise system to the Azure Event Hub using the AMQP protocol.
In order to achieve the required throughput, they used multiple instances of "event senders" within the on-premise system that send to multiple different partitions within the Azure Event Hubs. Jess equates this to lanes on a motorway. Increasing the number of lanes, increases the amount of traffic that can be sent at the same time. Azure Event Hubs can have up to 32 partitions on the standard payment tier, but Jess explains that it's possible to get this extended by "asking Microsoft nicely". Of course, this would only be required if you needed serious throughput, even more than that of Endjin's requirements.
Another aspect of the system was encryption. They required encryption of the data at rest, and at the time of building, Azure Event Hubs didn't support such encryption so they had to encrypt the data on-premise and push encrypted data to the Azure Event Hub. This, in turn, invoked an Azure Function which then decrypted the data again before sending it along to the Azure Data Lake, where upon it could be encrypted again using the built-in support that Azure Data Lake has for encrypting it's data.
Jess tells us how they actually built their code that orchestrated most of this. They made heavy use of the Task Parallel Library (TPL) and its Dataflow components, to add large amounts of parallelism and concurrency when processing the data. They created local buffers to batch their data into sizeable chunks and push the batch of data to Azure in an efficient way. Jess talks about some of the security considerations in their own code. Because they stored some required secrets inside Azure KeyVault, they needed to cache retrieved keys locally for some amount of time as Azure KeyVault will limit the amount of requests for a secret/key that can be made within a certain timeframe. Without this caching, the retrieval of these secrets could easily have become a real bottleneck within the application.
Finally, Jess talks about Azure Data Lake Analytics, which is a serverless analytics processor for Azure Data Lake storage. It's a logical way to analyse and massively parallel process data, allowing highly complex queries to be performed in relatively quick time.
So, what are the lessons that were learned? Jess says that it's important to design your Azure Data Lake Storage data taxonomy for discoverability, security and processing. It's important to minimise access by assigning permissions at the folder level and to use Azure Active Directory groups for security.
After Jess's session was over, it was time for all the attendees to gather in the main sixth form recreation area for the final wrap-up of the day.
There were some final messages and the announcement that there would be pizza and drinks for those who wished to partake over at the JustEat offices, a short walk away. There were the usual prize giveaways too. Unfortunately, I didn't win anything but there was some very nice prizes which were won by some very happy attendees.
And after that, it was all over. Another brilliant DDD South West event. Many thanks and congratulations to all involved in putting on such a great conference. I'd had a great time, and I'm really looking forward to doing it all again next year!