Setting up Jenkins on Windows with Git, Mercurial and SSH

This guide will detail the steps required to correctly setup and configure Jenkins on Windows using both Git and Mercurial as the version control tools and using SSH with both in order to authenticate with repositories hosted on the BitBucket service.

  • Download and install Jenkins, Git & Mercurial to their default locations. Ensure you get the 64-bit versions of all of these tools.
  • First, we need to create an SSH key pair, using OpenSSH which comes bundled with Git, that will allow Git to communicate with Bitbucket via SSH.
  • Next, we’ll configure OpenSSH (which is used by Git), so follow the steps under the section “Set Up SSH for Git” from here: https://confluence.atlassian.com/bitbucket/set-up-ssh-for-git-728138079.html
  • This primarily involves creating a new ssh keypair from a Git Bash shell, using ssh-keygen, ensuring the resulting keys are stored in your user’s “home” directory (which on Windows is usually, C:\Users\xxxxx\ - where xxxx is your logged on Windows username) within an .ssh directory and that you have a config file within that same folder that tells Git/SSH which key to use for a specific host (i.e. Host bitbucket.org
    IdentityFile ~/.ssh/<privatekeyfile>
    )
  • Next, we need to configure Mercurial. Since Mercurial is a more “windows-y” tool, by default, it wants to use PuTTY (and its related tools of Plink and Pageant), however, we’re going to tell Mercurial to use OpenSSH instead. Normally, you would edit a mercurial.ini file inside the installation folder of TortoiseHg (usually, C:\Program Files\TortoiseHg\) however, this didn’t seem to work for me, as Hg insisted on pulling the required config from a different file which constantly overrode anything I had set in the mercurial.ini file! The file in question that is likely to need to be edited is C:\Program Files\TortoiseHg\hgrc.d\Paths.rc Within this file, you’ll need to add or amend the [ui] section to configure ssh:
    [ui]
    ssh = ssh -2 -C -x
  • Note that the above assumes that the path for SSH (which, since it’s installed with Git, is usually C:\Program Files\Git\usr\bin. If this path isn’t added to your PATH environment variable for SYSTEM (not for a specific user) then you’ll need to add it. Open Control Panel, go to “System”, click “Advanced System Settings” in the left-hand menu, then click the button “Environment Variables” on the resulting dialog. Remember to edit the System Variables not the ones for your user).
  • Jenkins, when installed on Windows, is by default configured to run as a Windows Service. The service is configured to run under the Local System account. As a result of this, Mercurial will invoke OpenSSH in this context, and so OpenSSH will now look to the Local System account’s home directory for it’s SSH keys (Git seems perfectly happy looking for the keys in the user’s home folder). So, take the entire .ssh folder from the user’s home folder (the same folder as used earlier when creating the ssh-keys initially) and copy them to the Local System account’s home folder. Where is this? Well, it’s not within the C:\Users\ area, not even as a hidden/system folder, oh no, that would be too logical. So, Windows, in it’s infinite wisdom decides to place the Local System account’s home folder here: C:\Windows\System32\config\systemprofile\
  • After this, you should be able to start Jenkins and add some build jobs. When creating the jobs, you’ll set the relevant Source Code Management to wither Git or Mercurial, and you’ll specify an ssh:// protocol address for the Repository URL . Note that you do NOT need to specify anything within the credentials (i.e. leave it set to it’s default value of --none--) section here as when Jenkins “shells” out to the relevant SCM tool, that tool will be given an ssh:// address to connect to and that tool’s configuration will look to see how it is configured to access ssh:// URLs. It’s that configuration that will provide the necessary credentials to connect to the remote repository.

Beware NuGet’s Filename Encoding!

The other day, I was troubleshooting some issues that had occurred on a deployment of some code to a test server.  Large parts of the application were simply not working after deployment, however, the (apparently) same set of code and files worked just fine on my local development machine.

After much digging, the problem was finally discovered as being how NuGet handles and packages files with non-standard characters in the filename.

It seems that NuGet will ASCII-encode certain characters within filename, such as spaces, @ symbols etc.  This is usually not a problem as NuGet itself will correctly decode the filenames again when extracting (or installing) the package, so for example, a file named:

READ ME.txt

within your solution will be encoded inside the .nupkg file as:

READ%20ME.txt

And once installed / extracted again using NuGet will get it’s original filename back.  However, there’s a big caveat around this.  We’re told that NuGet’s nupkg files are “just zip files” and that simply renaming the file to have a .zip extension rather than a .nupkg extension allows the file to be opened using 7-Zip or any other zip archive tool.  This is all fine, except that if you extract the contents of a nupkg file using an archiving utility like 7-zip, any encoded filenames will retain their encoding and will not be renamed back to the original, correct, filename!

It turns out that my deployment included some manual steps which included the manual extraction of a nupkg file using 7-Zip.  It also turns out that my nupkg package contained files with @ symbols and spaces in some of the filenames.  These files were critical to the functioning of the application, and when manually extracting them from the package, the filenames were left in an encoded format meaning the application could not load the files as it was looking for the files with their correct (non-encoded) filenames.

SQLBits 2016 In Review

IMG_20160507_072440On 7th May 2016 in Liverpool, the 15th annual SQLBits event took place in the new Liverpool Exhibition Centre.  The event had actually been running since Wednesday 4th, however, as with all other SQLBits events, the Saturday is a free, community day.

This particular SQLBits was rather special, as Microsoft had selected the event as the UK launch event for SQL Server 2016.  As such the entire conference had a very large Microsoft presence.

Since the event was in my home town, I didn’t have too far to travel to get to the venue.  That said, I did have to set my alarm for 6am (a full 45 minutes earlier than I usually do on a working weekday!) to ensure I could get the two different trains required to get me to the venue in good time.  The Saturday day is jam packed with content and as such, the event opened at the eye-watering time of 7:30am!

IMG_20160507_072440After arriving at the venue just as it was opening at 7:30am, I heading straight to the registration booth to confirm my registration and collect my conference lanyard.  Once collected, it was time to head into the main hall.  The theme for this years SQLBits was “SQLBits in Space” so the entire hall had the various rooms where the sessions would take place as giant inflatable white domes.  In between the domes and around the main hall there was plenty of space and sci-fi themed objects.

After a short while, the venue staff started to wheel out the morning refreshments of tea & coffee, shortly followed by the obligatory bacon, sausage and egg sandwiches!

After enjoying the delicious breakfast, it was soon time to head off the the relevant “dome” for the first session of the day.  The SQLBits Saturday event had 9 different tracks, so choosing what talk to attend was difficult and there was always bound to be clashes of interesting content throughout the day.  For the first session, I decided to attend Aaron Bertrand’s T-SQL: Bad Habits and Best Practices.

Aaron’s talk is all about the various bad habits that we can sometimes pick up when writing T-SQL code and also the myths that have built up around certain approaches to achieving specific things with T-SQL.  Aaron starts by stating that we should ensure that we don’t make blind assumptions about anything in SQL Server.  We can’t always say that a seek is better than a scan (or vice-versa) or that a clustered index is better than a non-clustered one.  It always depends.  The first big myth we encounter is that it’s often stated that using SELECT * when retrieving all columns from a database is bad practice (instead of naming all columns individually).  This can be bad practice as we don’t know exactly what columns we’ll be getting – e.g. future added columns will be returned in the query, however, it’s often stated that another reason it’s bad practice is due to SQL Server having to look up the database meta data to figure out the column names.  The reality is that SQL Server will do this anyway, even with named columns!

Next, Aaron shows us a little tip using SQL Server Management Studio.  It’s something that many audience members already knew, bit it was new to me. He showed how you can drag-and-drop the “Columns” node from the left-hand treeview into a query window and it will add a comma-separated list of all of the tables columns to the query text!

Aaron continues by warning us about omitting explicit lengths from varchar/nvarchar data types.  Without specifying explicit lengths, varchars can very easily be truncated to a single character as this simple T-SQL shows:

DECLARE @x VARCHAR = 'testing'; 
SELECT [myCol] = @x;

We’re told that we should always use the correct data types for our data!  This may seem obvious, but many times we see people storing dates as varchars (strings) simply to ensure they can preserve the exact formatting that they’re using.  This is a presentation concern, though, and doing this means we lose the ability to perform correct sorting and date arithmetic on the values.  Also, avoid using datatype such as MONEY simply because it sounds appropriate.  MONEY is a particularly bad example and should always be replaced with decimal

Aaron reminds us to always explicitly use a schema prefix when referencing tables and SQL Server objects within our queries (i.e. Use [dbo].[TableName] rather than just [TableName]).  Doing this ensure that, if two different users of our query have different default schemas, there won’t be any strange potential side-effects to our query.

We’re reminded not to abuse the ORDER BY clause.  Using ORDER BY with an Ordinal column number after it can easily break if columns are added, removed or their order in the schema altered.  Be aware of the myth that tables have a “natural order”, they don’t.  Omitting an ORDER BY clause may appear to order the data the same way each time, however, this can easily change if additional indexes are added to the table.

We should always use the SET NOCOUNT ON directive as this cut down on noisy chatter in our application’s communication with SQL Server, but make sure you always test this first.  Applications built using older technologies, such as the original ADO from the Classic ASP era can be reliant upon the additional count message being returned when NOCOUNT is off.

Next, Aaron highlights the cost of poorly written date / range queries.  He tells us that we shouldn’t use non-sargable expressions on a column – for example, if we use a WHERE clause which does something like WHERE YEAR([DateCoulmn]) = 2016, SQL Server will not be able to utilise any indexes that may exist on that column and will have to scan the entire table to compute the YEAR() function for the date column in question – a very expensive operation.  We’re told not use use the BETWEEN keyword as it’s imprecise – does BETWEEN include the boundary conditions or only everything between them?  It’s far better to explicitly use a greater than and less than clause for date ranges – e.g.  WHERE [OrderDate] > ‘1 Feb 2016’ AND [OrderDate] < '1 March 2016'. This ensures we’re not incorrectly including outlying boundary values (i.e. midnight on 28th Feb which is actually 1st March!).  Regarding dates, we should also be aware of date format strings.  Formatting a date with many date format strings can give entirely different values for different languages. The only two “safe” format strings which work the same across all languages are YYYYMMDD and the full ISO 8601 Date format string, “YYYY-MM-DDTHH:MM:SS”.

Aaron continues by reminding us to use the MERGE statement wisely.  We must remember that it effectively turns two statements into one, but this can potentially mess with triggers, especially if they rely on @@ROWCOUNT.  Next up is cursors.  We shouldn’t default to using a cursor if we can help it.  Sometimes, it can be difficult to think in set-based terms to avoid the cursor, but it’s worth the investment of time to see if some computation can be performed in a set-based way.  If you must use a cursor, it’s almost always best to apply the LOCAL FAST_FORWARD qualifier on our cursor definition as the vast majority of cursors we’ll use are “firehose” cursors (i.e. we iterate over each row of data once from start to end in a forward-only manner).  Remember that applying no options to the cursor definition effectively means the cursor is defined with the default options, which are rather “heavy-handed” and not always the most performant.

We’re reminded that we should always use sp_executesql when executing dynamic SQL rather than using the EXEC() statement.  sp_executesql allows the use of strongly-typed parameters (although unfortunately not for dynamic table or column names) which reduces the chances of SQL injection.  It’s not complete protection against injection attacks, but it’s better than nothing.  We’re also reminded not use to CASE or COALESCE in sub-queries.  COALESCE turns into a CASE statement within the query plan which means SQL Server will effectively evaluate the inner query twice.  Aaron asks that we remember to use semi-colons to separate our SQL statements.  It protects against future edits to the query/code and ensures atomic statements continue to operate in that way.

Aaron says that we should not abuse the COUNT() function.  We very often write code such as:

IF (SELECT COUNT(*) FROM [SomeTable]) > 0 THEN …..

when it’s really much more efficient to write:

IF EXISTS (SELECT 1 FROM [SomeTable]) THEN….

We don’t really need the count in the first query so there’s no reason to use it.  Moreover, if you do really need a table count, it’s much better to query the sys.partitions table to get the count:

-- Do this:
SELECT SUM(rows) FROM sys.partitions where index_id IN (0,1)
AND object_id = (SELECT object_id FROM sys.tables WHERE name = 'Addresses')
-- Instead of this:
SELECT COUNT(*) FROM Addresses

Aaron’s final two points are to ensure we don’t overuse the NOLOCK statement.  It’s a magic “go-faster stripes” turbo button for your query but it will produce inaccurate results.  This is fine if, for example, you only  need a “ballpark” row count, however, it’s almost always better to use a scope-levelled READ COMMITED SNAPSHOT isolation level for your query instead.  This must be tested, though, as this can place a heavy load on the tempdb.  Finally, we should remember to always wrap every query we do with a BEGIN TRANSACTION and a COMMIT/ROLLBACK transaction.  Remember – SQL Server doesn’t have an “undo” button!  And it’s perfectly fine to simply BEGIN a transaction when writing ad-hoc queries in SQL Server Management Studio, even if we don’t explicitly close it straight away.  The transaction will remain so long as the connection remains open, so we can always manually perform the commit or the rollback at a slightly later point in time.

And with that, Aaron’s session was over.  An excellent and informative start to the day.

IMG_20160507_073553After a coffee break, during which time there was some left over breakfast bacon sandwiches available for those people who fancied a second breakfast, it was time to head off to the next session.  For this one, I’d chosen something a little leftfield.  This wasn’t a session directly based upon technology, but rather was a session based upon employment within the field of technology.  This was Alex Whittle’s Permy, Contractor Or Freelance.

Alex’s session was focused on how we might get employed within the technology sector, the various options open to us in gaining meaningful employment and the pros and cons associated with each approach.

Alex starts his talk by introducing himself and talking us through his own career history so far.  He started as an employee developer, then a team lead and then director of software before branching out on his own to become a contractor.  From there, he became a freelancer and finally started his own consultancy company.

Alex talks about an employer’s expectations for the various types of working relationship.  For permanent employees, the focus is very much on your overall personality, attitude and ability to learn.  Employers are making a long term bet with a permanent employee.  For contractors, it’s your existing experience in a given technology or specific industry that will appeal most to the client.  They’re looking for someone who can deliver without needing any training “on-the-job” although you’ll get time to “figure it out” whilst you’re there.  You’ll also have “tech-level” conversations with your client, so largely avoiding the politics that can come with a permanent role.  Finally, as a freelancer, you’ll be engaged because of your technical expertise and your ability to deliver quickly.  You’re expected to be a business  expert too and you’re engagement will revolve around “senior management/CxO” level conversations with the client.

Alex moves on to discuss the various ways of marketing yourself based upon the working relationship.  For permanent employees its recruitment agencies, LinkedIn and keeping your CV up to date.  You’re main marketing point is your stability so you’re CV needs to show a list of jobs with good lengths of tenure for each one.  One or two shorter tenures is acceptable, but you’ll need to be able to potentially explain it well to a prospective employer.  For contractors, it’s much the same avenues for marketing, recruitment agencies, LinkedIn and a good CV, but here the focus is quite different.  Firstly, a contractor’s CV can be much longer than a permanent employee’s CV, which is usually limited to 3 pages.  A contractors CV can be up to 4-6 pages long and should highlight relevant technical and industry experience as well as show contract extensions and renewals (although older roles should be in summary only).  For freelancers, it’s not really about your CV at all.  Clients are now not interesting in you per-say, they’re interested in your company.  This is where company reputation and you’re ability to really sell the company itself has the biggest impact.  For all working relationships, one of the biggest factors is networking.  Networking will lead to contacts, which will lead to roles.  Never underestimate the power of simply speaking to people!

We then move on to talk about cash flow in the various types of working relationship.  Alex states how for permanent employees, there’s long term stability, holiday and sickness pay and also a pension.  It’s the “safest” and lowest stress option.  For contractors, cash flow has medium term stability.  There’s no holiday or sickness pay and you’d need to pay for your own pension.  You need to build a good cash buffer of at least 6 months living expenses, but you can probably get started on the contracting road with only 1 or 2 months of cash buffer.  Finally, the freelance option is the least “safe” and has the least stability of cash flow.  It’s often very “spiky” and can range of short periods of good income interspersed with longer periods of little or no income.  For this reason, it’s essential to build a cash buffer of at least 12 months living expenses, although the quieter income periods can be mitigated by taking on short term contracts.

Alex shares details on the time when he had quit his permanent job to go contracting.  He says he sent around 20-30 CV’s to various contract job per week for the first 3 weeks but didn’t get a single interview.  A helpful recruiter eventually told him that it was probably largely to do with the layout of his CV.  This recruiter spent 30 minutes with him on the phone, helping him to reformat his CV after which he sent out another 10 CV to various contract roles and got almost 10 interviews as a result!

We continue by looking into differences in accounting structures between the various working types.  As a permanent employee, there’s nothing to worry about at all here, it’s all sorted for you as a PAYE employee.  As a contractor, you’ll send out invoices usually once a month, but since you’ll rarely have more than one client at a time, the invoicing requirements are fairly simple.  You will need to do real-time PAYE returns as you’ll be both a director and employee of your Ltd. company and you’ll need to perform year-end tax returns and quarterly VAT returns, however, you can use the flat-rate VAT scheme if it’s applicable to you.  This can boost your income as you charge your clients VAT at 20% but only have to pay 14.5% to HMRC!  As a freelancer, you’ll also be sending out invoices, however, you may have more than one client at a time so you may have multiple invoices per month thereby requiring better management of them (such software as Xero or Quickbooks can help here).  One useful tip that Alex shares at this point is that, as a freelancer, it can be very beneficial to join the Federation of Small Businesses (FSB) as they can help to pay for things like tax investigations, should you ever receive one.

Alex then talks about how you can, as an independent contractor, either operate as a sole-trader, work for an umbrella company, or can run your own Limited company.  Limited company is usually the best route to go down as Limited companies are entirely separate legal entities so you’re more protected personally (although not from things like malpractice), however, the previous tax efficiency of company dividends that used to be enjoyed by Ltd’s no longer applies due to the loophole in the law being closed.  As a sole trader, you are the company – the same legal entity, so you can’t be VAT registered and your not personally protected from liability.  When working for an umbrella company, you become a permanent employee of the umbrella company. They invoice on your behalf and pay your PAYE.  This affords you the same protection as any other employee and takes away some of the management of invoicing etc. however, this is probably the least cost efficient way of working since the umbrella company will take a cut of your earnings.

We then move onto the thorny issue of IR35. This is legislation that designed to catch contractors who are really operating more as “disguised employees”.  IR35 is constantly evolving and application by HMRC can be inconsistent.  The best ways to mitigate being “caught” inside of IR35 legislation are to perform tasks that an employee does not do.  For example, advertising your business differentiates you from an employee, ensuring your contracts have a “right of substitution” (whereby the actual worker/person performing the work can be changed), having multiple contracts at any one time – whilst sometimes difficult for a contractor to achieve - can greatly help, showing that you are taking on risk (especially financial risk) along with being able to show that you don’t receive any benefits from the engagement as an employee would do (for example, no sick pay).

Finally, Alex asks, “When should you change?”  He puts a number of questions forward that we’d each need to answer for ourselves.  Are you happy with your current way of working?  Understand the relative balance of income versus stress from the various working practices.  Define your goals regarding work/life balance.  Ask yourself why you would want to change, how do you stand to benefit?  Where do you live?  Be aware that very often, contracts may not be readily available in your area, and that you may need to travel considerable distance (or even stay away from home during the working week), and finally, Alex asks that you ask yourself, “Are you good enough?”.  Alex closes by re-stating the key takeaways.  Enjoy your job, figure out your goals, increase your profile, network, remember that change can be good – but only for the right reasons, and start now – don’t wait.

IMG_20160507_105119

After another coffee break following Alex’s session, it’s time for the next one.  This one was Lori Edwards’ SQL Server Statistics – What are the chances?

Lori opens by asking “What are statistics?”.  Just as Indexes provide a “path” to find some data, usually based upon a single column, statistics contain information relating to the distribution of data within a column across the entire set of rows within the table.  Statistics are always created when you create an index, but you can create statistics without needing an index.

Statistics can help with predicates in your SQL Server queries.  Predicates are the conditions within your WHERE or ORDER BY clauses.  Statistics contain information about density, which refers to the number of unique values in the column along with cardinality which refers to the uniqueness of a given value.  There’s a number of different ways to create statistics, you can simply add an index, you can use AUTO CREATE STATISTICS and CREATE STATISTICS directives as well as using a system stored procedure, sp_createstats.  If you’re querying on a column, statistics for that column will be automatically created for you if they don’t already exist, however, if you anticipate heavy querying utilising a given column, it’s best to ensure that statistics are created ahead of time.

Statistics are quite small and don’t take up as much space as indexes.  You can view statistics by running the sp_helpstats system stored procedure or you can query the sys.stats system table or even the sys.dm_db_stats table.  The best way of examining statistics, though, is to use the database console command, DBCC SHOW_STATISTICS.  When viewing statistics, low density values indicate a low level of uniqueness.  Statistics histograms show a lot of data, RANGE_HI_KEY is the highest key value, whilst RANGE_ROWS indicates how many rows there are between the HI_KEYS in different column values.

The SQL Server Query Optimizer uses statistics heavily to generate the optimized query plan.  Note, though, that optimized query plans are necessarily optimal for every situation, they’re the most optimal general purpose plans.  It’s purpose is to come up with a good plan, fast, and statistics are necessary for this to be able to happen.  To make the most of the cardinality estimates from statistics, it’s best ensure you use parameters to queries and stored procedures, use temp tables where necessary and keep column orders consistent.  Table variables and table-valued parameters can negatively affect cardinality.  Whether the query optimizer selects a serial or parallel plan can be affected by cardinality, as can the choice to use an index seek versus an index scan.  Join algorithms (i.e. hash match, nested loops etc.) can also be affected.

From here, the query optimizer will decide how much memory it thinks it needs for a given plan, so memory grants are important.  Memory grants are effectively the cost of the operation multiplied by the number of rows that the operation is performed against, therefore, it’s important for the query optimizer to have accurate row count data from the statistics. 

2016-06-07 21_42_25-I5xLz.png (766×432)One handy tip that Lori shares is in interpreting some of the data from the “yellow pop-up” box when hovering over certain parts of a query plan in SQL Server Management Studio.  She states how the “Estimated Number Of Rows” is what the table’s statistics say there are, whilst the “Actual Number Of Rows” are what the query plan actually encountered within the table.  If there’s a big discrepancy between these values, you’ll probably need to update the statistics for the table!

Statistics are automatically updating by SQL Server, although, they’re only updated after a certain amount of data has been added or updated within the table.  You can manually update statistics yourself by calling the sp_updatestats system stored procedure.

By default, tables inside a database will have AUTO UPDATE STATISTICS switched on, which is what causes the statistics to be updated automatically by SQL Server occasionally – usually after around 500 rows or 20% of the size of the table have been added/modified.  It’s usually best to leave this turned on, however, if you’re dealing with a table that contains a very large number of rows and has either many new rows added or many rows modified, it may be better to turn off the automatic updating of statistics and perform the updates manually after either a specific number of modifications or at certain appropriate times.

Finally, it’s important to remember that whenever statistics are updated or recomputed, any execution plans built on those statistics that were previously cached will be invalidated.  They’ll need to be recompiled and re-cached.

After Lori’s session, there’s another quick coffee break, and then it’s on to the next session.  This one was Mark Broadbent’s Lock, Block & Two Smoking Barrels.  Mark’s session focused on SQL Server locks.  Different types of locks, how they’re acquired and how to best design our queries and applications to ensure we don’t lock data for any longer than we need to.

Mark first talks about SQL Server’s transactions.  He explains that transactions are not committed to the the transaction logs immediately.  They are processed through in-memory buffers first before being flushed to disk.  Moreover, the logs need to grow to a certain size before they get flushed to disk so there’s always a possibility of executing a COMMIT TRANSACTION statement yet the transaction isn’t visible within the transaction log until sometime later.  The transaction being available in the transaction log is the D in ACID – Durability, but Mark highlights that it’s really delayed durability.

IMG_20160507_125814Next, Mark talks about concurrency versus correctness. He reminds us of some of the laws of concurrency control.  The first is that concurrent execution should not cause application programs to malfunction. The second is that concurrent execution should not have lower throughput or higher response times than serial execution.  To balance concurrency and correctness, SQL Server uses isolation, and there are numerous isolation levels available available to us, all of which offer differing levels of concurrency versus correctness.

Mark continues by stating that SQL Server attempts to perform our queries in as serial a manner as possible, and it uses a technique called transaction interleaving in order to achieve this between multiple concurrent and independent transactions.  Isolation levels attempt to solve the interleaving dependency problems.  They can’t completely cure them, but they can reduce the issues caused by interleaving dependencies.  Isolation levels can be set at the statement, transaction or session levels.  There are 4 types defined by the ANSI standards, but SQL Server 2005 (and above) offer a fifth level.  It’s important to remember that not all isolation levels can be used everywhere, for example, the FILESTREAM data type is limited in the isolation levels that it supports.

We’re told how SQL Server’s locks are two-phased and are considered so if every LOCK is succeeded by an UNLOCK.  SQL Server has different levels of locks, and they can exist at a various levels of granularity from row locks, to page locks all the way up to table locks.  When SQL Server has to examine existing locks in order to acquire a new or additional lock, it will only ever compare locks on the same resource.  This means that row locks are only ever compared to other row locks, page locks compared to other page locks and table locks to other table locks.  They’re all separate.  That said, SQL Server will automatically perform lock escalation when certain conditions occur, so when SQL Server has acquired more than 5000 other locks of either row or page type, it will escalate those locks to a single table level lock.  Table locks are the least granular kind of lock and a very bad for performance and concurrency within SQL Server – basically the one query that holds the table level lock prevents any other query from accessing that table.  For this reason, it’s important to ensure that our queries are written in such a way as to minimize the locks that they need, and to ensure that when they do require locks that those locks as granular as can be.  Update locks will allow multiple updates against the same table and/or rows.  They’re compatible with shared locks but not other update locks or exclusive locks so it’s worth bearing in mind how many concurrent writes we attempt to make to our data.

Mark continues to show us some sample query code that demonstrates how some simple looking queries can cause concurrency problems and can result in lost updates to our data.  For example, Mark shows us the following query:

BEGIN TRAN T1
SELECT @newquantity = quantity FROM basket
SET @newquantity = @newquantity + 1
UPDATE some_other_table SET quantity = @newquantity
COMMIT TRAN T1

The above query can fail badly, with the required UPDATE being lost if multiple running transaction perform this query at the same time.  This is due to transaction interleaving.  This results in two SELECTs which happen simultaneously and acquire the quantity value, but the two UPDATEs get performed in interleaved transactions which means that the second UPDATE that runs is using stale data to update, effectively “overwriting” the first UPDATE (so the final newquantity value is one less than it should be).  The solution to this problem is to perform the quantity incrementing in-line within the UPDATE statement itself:

BEGIN TRAN T1
UPDATE some_other_table SET quantity = t2.newquantity FROM (SELECT quantity + 1 FROM basket) t2
COMMIT TRAN T1

Reducing the number of statements needed to perform some given function on our data is always the best approach.  It means our queries are being as granular as they can be, proving us with better atomic isolation and thereby reducing the necessity to interleave transactions.

IMG_20160507_133930After Mark’s session was over, it was time for lunch.  Lunch at the SQLBits conferences in previous years has always been excellent with a number of choices of hot, cooked food being available and this year was no different.  There was a choice of 3 meals, cottage pie with potato wedges, Moroccan chicken with couscous or a vegetarian option (I’m not quite sure what that was, unfortunately), each of which could be finished off with one of a wide selection of cakes and desserts!

IMG_20160507_123500I elected to go for the Moroccan chicken, which was delicious, and plumped for a very nice raspberry ripple creamy yoghurt.  An excellent lunch, as ever!

During lunch, I managed to catch up with a few old friends and colleagues who had also attended the conference, as well as talking to a few newly made acquaintances whilst wandering around the conference floor and the various sponsors stands.

After a good wander around, during which I was able to acquire ever more swag from the sponsors, it was soon time for the afternoon’s sessions.  There were only two more sessions left within the day, it now being around 14:30 after the late lunch hour was over, I headed off to find the correct “dome” for the first of the afternoon’s sessions, Erland Sommarskog’s Dynamic Search Conditions.

Erland’s talk will highlight the best approaches when dealing with dynamic WHERE and ORDER BY clauses in SQL Server queries, something that I’m sure most developers have had to deal with at some time or another.  For this talk, Erland will use his own Northgale database, which is the same schema as Microsoft’s old Northwind database, but with a huge amount of additional data added to it!

Erland first starts off by warning us about filtered indexes.  These are indexes that themselves have a WHERE condition attached to them (i.e. WHERE value <> [somevalue]) as these tend not to play very well with dynamic queries.  Erland continues by talking about how SQL Server will deal with parameters to queries.  It will perform parameter “sniffing” to determine how best to optimize a static query by closely examining the actual parameters we’re supplying.  Erland shows us both a good and bad example:  WHERE xxx = ISNULL(@xxx,xxx) versus WHERE xxx = (xxx = @xxx OR xxx IS NULL).  He explains how the intended query will fail if you use ISNULL in this situation.  We’re told how the SQL Server query optimizer doesn’t look at the stored procedure itself, so it really has no way of knowing if any parameters we pass in to it are altered or modified in any way by the stored procedures code.  For this reason, SQL Server must generate a query plan that is optimized for any and all possible values.  This is likely to be somewhat suboptimal for most of the parameters we’re likely to supply.  It’s for this reason that the execution plan can show things like an index scan against an index on a “FromDate” datetime column even if that parameter is not being passed to the stored procedure.  When we’re supplying only a subset of parameters for a stored procedure with many optional parameters, it’s often best to use the OPTION RECOMPILE statement to force a recompilation of the query every time it’s called.  This way, the execution plan is regenerated based upon the exact parameters in use for that call.  It’s important to note, however, that recompiling queries is an expensive operation, so it’s best to measure exactly how often you’ll need to perform such queries.  If you’re calling this query very frequently, you may well get the best performance from using purely dynamic SQL.

Erland then moves on to discuss dynamically ordering data.  He states that the CASE statement inside the ORDER BY clause is the best way to achieve this, for example: ORDER BY CASE @sortcolumn WHEN ‘OrderID’ THEN [Id] END, CASE @sortcolumn = ‘OrderDate’ THEN [Date] END…..etc.  This is a great way to achieve sorting my dynamic columns, however, there’s a gotcha with this method and that is that you have to be very careful of datatype differences between the columns in the different case clauses as this can often lead to error.

Next, we look at the permissions required in order to use such dynamic SQL and Erland says that it’s important to remember that any user who wishes to run such a dynamic query will require permissions to access the underlying table(s) upon which the dynamic query is based.  This differs from (say) a stored procedure where the user only need permissions to the stored procedure and not necessarily the underlying table upon which the stored procedure is based.  One trick that can be used to gain somewhat the best of both of these approaches is to use the sp_executesql system stored procedure.  Using this will create a nameless stored procedure from your query, it will cache it and execute it.  The stored cache can then be re-used on subsequent calls to the query with the nameless stored procedure being identified based upon a hash of the the query content itself.

Another good point that Erland mentions is to ensure that all SQL server objects (tables, functions etc.) referenced within a dynamic query should always be prefixed with the full schema name and not just referenced by the object name (i.e. use [dbo].[SomeTable] rather than [SomeTable]).  This is important as different users who run your dynamic SQL code could be using different default schemas – if they are and you haven’t specified the schema explicitly, the query will fail.

Erland also mentions that one very handy tip with dynamic queries is to always include a @debug input parameter of datatype bit, that can have a default setting of 0 (off).  It’ll allow you to always specify this parameter and pass in a value of 1 (on) to ensure that code such as IF @debug PRINT @sql will be run allowing you to output the actual T-SQL query generated by the dynamic code.  Erland says that you will need this eventually, so it’s always best to build it in from the start.

When building up your dynamic WHERE clause, one tricky condition is to know whether to add an AND at the beginning of the condition if you’re adding the 2nd or higher condition (the first condition of the WHERE clause won’t need the AND to be prepended of course).  One simple way around this is to make it so that all of the dynamically added WHERE clauses are always the 2nd or higher numbered condition by statically creating the first WHERE clause condition in your code as something benign such as “WHERE 1 = 1”.  This, of course, matches all records and all subsequently added WHERE clauses can always be prefixed with an AND, for example, “IF @CustomerPostCode THEN @sql += “ AND Postcode LIKE …..”, also it’s important to always add parameters into the dynamic SQL rather than concatenating values (i.e. avoid doing @sql += ‘ AND OrderId = ‘” + @OrderId + “’) as this will mess with the query optimizer and your generated queries will be less efficient overall as a result.  Moreover, raw concatenation of values can be a vector for SQL injection attacks.  For this same reason, you should always translate the values that you’ll use for WHERE and ORDER BY clauses that are passed into your stored procedure.  Translate the passed parameter value to a specific hard-coded value that you explicitly control.  Don’t just use the passed in parameter value directly.

Occasionally, it can be a useful optimization to inline some WHERE clause values in order to force a whole new query plan to be cached.  This is useful in the scenario when, for example, you're querying by order city and 60% of all orders are in same city.  You can inline that one city value to have a cached plan just for that city and a different single cached plan for all other cities.

Finally, for complex grouping, aggregation and the dynamic selection of the columns returned from the query, Erland says is often easiest to and more robust to construct these kind of queries in the client application code rather than in a dynamic SQL producing stored procedure.  One caveat around this is to ensure that you perform the entirety of your query client-side (or entirely server-side if you must) – don’t try to mix and match by performing some client-side and some server-side.

IMG_20160507_154920And with this, Erland’s session on dynamic SQL search conditions is complete.  After yet another short coffee break, we’re ready for the final session of what has been a long, but information-packed day.  And for the final session, I decided to attend Simon D’Morias’ “What is DevOps For Databases?”

Simon starts with explaining the term “DevOps” and reminds us that it’s the blurring of lines between the two traditionally separate disciplines of development and operations.  DevOps means that developers are far closer to the “operations” side of applications which frequently means getting involved with deployments, infrastructure and a continuous delivery process.  DevOps is about automation of application integration and deployment, provably and reliably.

Simon shows the three pillars upon which a successful DevOps process is built.  Develop, Deploy & Measure.  We develop some software, deploy it and the then measure the performance and reliability of the deployment.   From this measurement we can better plan and can thus feed this back into the next iteration of the cycle.  We’re told that to make these iterations work successfully, we need to keep changes small. From small changes, rather than larger ones, we can keep deployment simple and fast.  It allows us to gather frequent feedback on the process and allows continuous improvement of the deployment process itself.  With the teams behind the software (development, operations etc.) being more integrated, there’s a greater spread of knowledge about the software itself, the changes to the software in a given development/deployment cycle which improves early feedback.  Automation of these systems also ensures that the deployment is made easier and thus also contributes to better and earlier feedback.

When it comes to databases, DBA’s and other database professionals are frequently nervous about automating any changes to production databases, however, by keeping changes small and to a minimum within a given deployment cycle, and by having a continuously improving robust process for performing that deployment, we can ensure that each change is less risky than if we performed a single large change or upgrade to the system.  Continuous deployments also allow for detecting failures fast, which is a good thing to have.  We don’t want failures caused by changes to take a long time before they surface and we’re made aware of them.  Failing fast allows easy rollback and reliability of the process enables automation which further reduces risk.  Naturally, monitoring plays a large part of this and a comprehensive monitoring infrastructure allows detection of issues and failures and allows improves in reliability over time which, again, further reduces risk.

Simon moves on to discuss the things that can break DevOps.  Unreliability is one major factor that can break a DevOps process as even something running at 95% reliability is no good.  That 5% failure rate will kill you.  Requiring approval within the deployment chain (i.e. some manual approval, governance or compliance process) will break continuity and is a potential bottleneck for a successful DevOps deployment iteration also.  A greater “distance” between the development, operations and other teams will impact their ability to be knowledgeable about the changes being made and deployed.  This will negatively impact the team’s ability to troubleshoot and issues in the process, hindering the improvement of reliability.

IMG_20160507_164439It can often be difficult to know where to start with moving to an automated and continuous DevOps process.  The first step to to ensure we can “build ourselves a pipeline to live” – this is a complete end-to-end automated continuous integration process.  There are numerous tools available to help with this and SQL Server itself assists in this regard with the ability to package SQL server objects into a DACPAC package.  Simon insists that attempting to proceed with only a partial implementation of this process will not work.  It’s an all or nothing endeavour. Automating deployments to development and test environments, but not to the production environment (as some more nervous people may be inclined to do) is like building only half and bridge across a chasm!  Half a bridge is no bridge at all!

Simon concludes by showing us a quick demo of a simple continuous deployment process using Visual Studio to make some local database changes, which are committed to version control using Git and then pushed to Visual Studio Team Services (previously known as Visual Studio Online) which performs the “build” of the database objects and packages this into a DACPAC package.  This package is then automatically pushed to an Azure DB for deployment.

Finally, Simon suggests that one of the the best ways to ensure that our continuous deployment process is consistent and reliable is to ensure that there are minimal differences (ideally, no differences) between our various environments (development, test, staging, production etc.), and especially between our staging and production environments.

After Simon’s session was over, it was time for all of the conference attendees to gather in the main part of the exhibition hall and listen as one of the conference organisers read out those people who had won prizes by filling in forms and entering competitions run by each of the conference sponsors.  I didn’t win a prize, and actually, had entered very few competitions having been far too busy either attending the many sessions or drinking copious amounts of coffee in between them!  Once the prizes were all dished out, it was time for yet another fantastic SQLBits conference to sadly end.  It had been a long, but fantastic day at another superbly organised and run SQLBits conference.  Here’s hoping next year’s conference is even better!