Dude, Where’s My Gold Code?

Let me tell you about a conversation I had yesterday with some developers in a team. They needed help with their database CI/CD process, and so I wanted to see the database solutions they were using.

Dev1: We don’t have solutions at the moment.

Me: OK, how come?

Dev1: Because we are making so many changes right now, there seemed to be little point in creating solutions, so we’re just using scripts.

[At this point, I bite my tongue and choose not to reply with ‘If you’re continuously making changes, wouldn’t it make sense to continuously integrate and  continuously deploy the changes. I mean, you’d hardly be taxing the definition of the word “continuously…” now are you…’]

Me: So how are you making changes?

Dev1: Oh well, we’re just using scripts to deploy the changes now.

Me: And where are the scripts?

Dev1 They’re in the wiki.

Dev2: No they’re not, because we put anything in the wiki that is changing regularly.

Dev1. Oh. So where are they?

[Dev2 stares blankly].

Dev1: So why do you want solutions?

Me: To help you build a release pipeline.

Dev2: But we’re making lots of changes regularly.

[Again, me, keeping cool in spite of the blindingly obvious contradiction presented to me.]

Me: But let’s get something built to help you make those changes regularly.

Dev1: OK, I’ll get something together by the end of the week. In the meantime we’re too busy making changes to get this together…

Sound familiar? I’m sure many of you have been in a situation like this before. It’s not the first time I’ve had a conversation like this either.

Now I can go on about how the business doesn’t buy into maintenance, or Continuous Integration or Continuous Delivery. Or I could talk about how CI/CD needs to be pushed by the devs and that associated tasks require items in the sprint to make sure that they are completed and audited. And I can talk about how the “build once, deploy many” process reduces the number of bugs and speeds up the deployment process. And you know, it is great that when wearing my DBA hat I have had to write wacky scripts  because dev teams haven’t been deploying to environments in a controlled manner, but I’d really rather be focusing on infrastructure. All these points are well documented.

But these points are consequences of implementing (or not) Continuous Delivery and Continuous Deployment. The most fundamental point of adopting CI/CD is getting your code into some sort of source control. It all starts with getting the code into source control, writing pre and post deployment scripts that are re-runnable, developers checking in changes, building the code, and taking the resulting artefact (in SSDT terms, a dacpac) and deploying that dacpac to your database. And taking the same dacpac and deploying again and again up to production. This process can be achieved in an hour. Actually, in 55 minutes! And AdventureWorks can be tough to automate! OK, it may take longer than that, but getting a repository of your code that can be audited, changes rolled back, tagged, branched, deployed, merged etc etc is the first step towards achieving Database DevOps.

Where Will SSDT Create my Database

SSDT is great for many things, but one thing it’s not great at is giving control over to the user with regards to data and log file creation. It uses prefixed values of DefaultDataPath and DefaultLogPath. These variables are system defined and cannot be overridden.  Instead it uses the default locations that are set on the instance we are deploying to. Humph. And I’d wager that if these are null then it will just use where the system database settings are. This is not particularly useful for anyone.

You may think “Well OK I’ll just use a pre-deployment script”. Nice try. I thought so too. But as it turns out pre-deployment scripts aren’t pre-deplyoment scripts per se. If you add a pre-deplyoment script to create a database, it’ll be run AFTER the initial database creation (go on try it yourself if you don’t believe me, but I have saved you the trouble already!)

Clearly, we have to create a shell database that we can control where files are created outside of SSDT, and also the primary filegroup. If you’re using something like Octopus Deploy or TeamCity, or VSTS then it’s simple enough to create a step that will create this shell database for you. Then everything else (logins, users, tables etc) can be created using SSDT. And if you have more files/filegroups to be created, then you can use SSDT to set these up; what I’m talking about is the initial creation of the database.

There’s no shortage of options as to how to do this: you could have a SQL file with the CREATE statements run, and this would work, providing you have the correct “if not exists create db” setup, but this may not be particularly useful for re-use.

If you want to go for something that can be re-used, then I think SMO is your friend. It can be downloaded via Feature Pack, so practically any box can have it deployed (and what I mean by this is that you don’t need a full SQL Server install to have access to SMO), it works well with PowerShell, and there’s no shortage of examples online with how you can create a script that does what you need to do. This post from Allen White gives a good overview of SMO in PowerShell. And if you’re using something like Octopus, then this script also helps give a good start on creating a database using SMO. Where I think this step template in the Octopus library falls short is that it only creates the database, and not sets the files and filegroups up. But if you combine Allen’s scripts with the idea of the step template, then you can pretty much see where I am coming from. It’s also worth setting the recovery model using SMO and not worrying about this using SSDT.

I’m not wild about the use of scripts outside of the database to deploy the database; it implies “magic”, or behind the scenes changes that some people won’t know exists. And usually I’d advocate the use of SSDT pre/post deploy scripts to manage everything within the database. But databases are hard to deploy: they’re not a bunch of dlls that can be rolled back. A minor error early in the setup of a deployment pipeline can cause massive problems further down the process. And file creation is the first thing you do. What if the log file is created on the C drive? How hard can it be to move log files? Very hard is how. And creating a secondary log file to compensate is awful. And to try to remove that first log file? All these issues could be removed with a bit of forethought.

Remember it’s no good assuming that the instance you are deploying to has everything set up the way you like it. You have the control over this stuff when you are using CI/CD properly, and you have to take responsibility for database deployments.