This post has now moved to my new site.
Going meta again this week, as I blog about blogging in other places and a certain mistake I made not two weeks ago.
It Must Be True, It’s Written On The Internet
One of the reasons I write a blog is because I find I’m more likely to remember something if I write it down. So a blog is a useful place. And usually I’m very meticulous in my research and provide examples in order to confirm with myself what I believe to be the case to actually be the case. But on occasion I will get it wrong. And this week I got it wrong. Big time. And it is to do with defining the files when creating a database. And whilst I was right in the fact that you cannot overwrite the system variables wrt where files will be written, I overlooked the fact that you can define the locations yourself. Look at that. I got it so wrong I’m using the trifecta of bold, underline and italics to make the point.
Below is a screenshot of an mdf and ldf file being created to a different location: So a pre-model script or a deployment contributor are not really necessary, you just need to define the location in your publish.xml file and parameterise the file-path yourself.
Sup Dawg, I Heard You Like Reading my Blogs
So, earlier this year I started working at Sabin.IO, the consultancy ran by Simon Sabin, he of SQLBits, amongst other distinctions. You may have noticed that around the turn of the year I started posting less and less DBA-type posts and focused more on Database DevOps, which is exactly what I used to do not so long ago. But invariably I’ve come back to Database DevOps for a few reasons:
- You can actually genuinely make a difference helping people improve their database deployment story
- I was burnt by (and burnt out) too many chaotic databases, where the definition of “gold code” was whatever was in live, but there was no control as to who or what got deployed there.
But to quote a friend of mine, “there’s nothing negative in pointing out problems. It’s only negative in pointing out problems and doing nothing about it.” And very true that is. So I’m helping my former DBA colleagues by advocating the CI/CD database deployment process.
Anyway, there’s a few blog posts over on Sabin IO written by yours meely about some exciting new features in SQLPackage, and how to leverage those same features through the DacFx API. So why not head over there and give them a read why don’t you?!
And so as One Door Closes, Another Opens….
Empty platitudes aside, this will be my last blog post here. 4 years, 325 posts, over 100,000 views, not bad going. But I haven’t stopped blogging: I will in fact be posting at Sabin IO. Because I’ve not only been working at Sabin IO, I’ve actually gone permanent as of the end of October. And seeing as:
- I like to blog
- I like wide audience to read it
It makes sense for me to post at Sabin.IO .
This site will hang around for a while, but what with the name being a company name it’ll go and revert to a “.wordpress.com” name, and some of the content will appear on the SabinIO blog.
See ya pals!Follow @rPh0enix
**(not a trick)
If you are developing your databases in SSDT, then chances are you are familiar with LocalDB. LocalDB is essentially as stripped-down version of the database engine that runs within Visual Studio, and whenever you hit F5 the database is deployed to LocalDB… sometimes.
You see, LocalDB is in fact pretty much SQL Express, the free version of SQL Server. Not that there’s anything wrong with that, except that SQL Server Express does not support all features, like say for example, full text search. But what else is missing? There’s no performance counters, but is that too much of an issue?
In fact, any Enterprise feature is not available in LocalDB, but you can fix this problem with one simple trick: download SQL Server Developer Edition for free! Yep, this is a new Microsoft, one which will give you the full fat features at zero cost. This announcement was some months ago, but it does surprise me how many people were not aware of this fact.
The data compression feature in SQL Server can help reduce the size of the database as well as improve the performance of intensive I/O intensive workloads, especially on data warehouses. This performance boost in I/O is offset against the extra CPU resource that is required to compress/decompress the data whilst data is exchanged with the application. With data warehouses, the guideline is to compress all objects as there is typically CPU capacity whilst the data storage and memory capacity is at a premium, so Microsoft’s recommendation is to compress all objects to the highest level. However, if there are no compressed objects at all, it is better to take the cautious approach and to evaluate each database object individually and the effects on the workload, particularly if the CPU headroom is limited. Continue reading “SQL Server Compression 101”
Part one of this post concerned the notion of SSDT and Pre-Model Scripts. This post relates to an actual implementation of executing a SQL script using Octopus Deploy. My pull request on Github was approved, so my step template can be exported from the Octopus Library.
The only part that I feel requires some explanation is the SQL Scripts and the option to select a previous step. If you’ve read part one of this post you’ll know that you keep the pre-model scripts in a folder in the database solution. These are not compiled (ie not in build) but are included as part of the output and packaged up in the NuGet package. When the NuGet package is unpacked, it’s then a case of locating the scripts and executing them.
First, the order. Extract the NuGet package that includes the dacpac and the post-deploy scripts and then add the execute SQL step:
In the step template for execute SQL, there are Octopus parameters. These we define the step that the scripts were unpackaged, and the final path to the scripts themselves:
Lately I’ve been thinking a lot about script modules in Octopus Deploy. Script Modules, for those of you not ITK, are a collection of PowerShell cmdlets or functions that exist outside of a script step that can be used across multiple steps across different projects/environments. They don’t have to be used to contain modules exclusively; they can just contain that will get executed upon the initial import of the module. The issue here is that these script modules are imported or every step in a process. So unless the code you write is idempotent, you’ll probably cause an error in the process. This is by design.
So at first I thought “put nothing in script modules” because they’re pretty pointless, right? Anything you write will pertain to a step exclusively, so why do you need to share code across steps. Turns out quite a bit of code is shared across steps, particularly checking and validating parameters.
Then, as step templates got more sophisticated they got harder to read, so then I thought “put everything in script modules!” It aided in legibility of the step templates and allowed them to focus mainly on setting up a process and using script modules to execute the function.
There is however an issue to this: script modules are the “magic” of a step template. If you’re not aware that a step template requires a script module, how can you communicate this implicitly? You could mention that a script module is required in a step template, but this is still a manual process that everyone remembers, or even buys into.
There is also another issue: I had to make a change to a step template that required a change to the script module. The step template in question was in heavy use, and so careful planning was required to make sure that the change in the step template and the script module was not a breaking change: basically what I mean is that whereas a step template has to be explicitly updated for a given step in an deployment pipeline, the same cannot be said for a script module. So whatever is the latest version of the script template is what a user gets. And if you make a few changes to the step template in an iterative manner, you have to be sure that if anyone updates the step template then all version have to match one another. Continue reading “When To Use Octopus Deploy Script Modules”
Two of my most recent posts have concerned pre-model scripts; those scripts that need to be run before the dacpac is compared against the production database. These are sometimes necessary. And this necessity is usually because SSDT may not produce a script that is optimal. One such example is an index creation: no index is created “ONLINE”, and this can be a problem if the database being deployed is online during this operation. And can be even worse if the table is particularly large.
In respect to the SSDT team, I can see why this is the case: some versions of SQL have the online index feature, some don’t. So one solution may be to write a pre-model script that will create an index with the online operation included. And while there’s nothing wrong with this, there is an alternative: deployment contributors. Continue reading “The Alternative to Pre-Model Scripts”
Earlier this week I posted about the need to create the database as a pre-model tasks: that is creating a database before we run sqlpackage/dacfx against a database/server so that the script itself does not create the database.
Going a step further than this, pre-deploy scripts in SSDT scripts may be executed before the main diff script is executed, but not before the diff script is generated. This is an important fact to understand. If you want scripts to be executed before the diff is generated, then you need to execute pre-model scripts.
How you go about this is up to you, but there needs to be a few rules that must be followed:
Idempotent: big word, and is used in maths. But don’t let that put you off. What it means in this context is that a script can be re-run and the same result happens. So in the context of altering a primary key, the pseudo-code would look something like this:
if database exists then
if table exists then
if primary key name eq “oldname” then
drop primary key then
add primary key with “newname”
and then you add primary key “newname” to the solution. That way the drop and add part of the script will only ever be run once, or again against an older version of the database that has the old primary key.The “newname” of the primary key guarantees that this will be the case.
Stored in the Solution: within database solutions you can add scripts that are part of the solution but are not executed:
So if you have the scripts in the solution and re-runnable then you can be confident that they will run against older versions of the database. It keeps a level of audit and history of the database.
Now, how to run the scripts. As with the create database script, SMO is probably the easiest way to do this. You could use SQLCMD, or Invoke-SqlCmd, but it’s entirely up to you.
Now if you want to use SMO, and you are deploying using Octopus Deploy, then you’re in (some) luck: there are scripts available in the Octopus Library to do something like this, but you may will have to alter them to execute a folder full of sql scripts. Fortunately, I already have completed this, and I will add my script to the GitHub solution to be available on the Library. When this is available, assuming it meets the quality control, I’ll blog about how to use it.Follow @rPh0enix
Let me tell you about a conversation I had yesterday with some developers in a team. They needed help with their database CI/CD process, and so I wanted to see the database solutions they were using.
Dev1: We don’t have solutions at the moment.
Me: OK, how come?
Dev1: Because we are making so many changes right now, there seemed to be little point in creating solutions, so we’re just using scripts.
[At this point, I bite my tongue and choose not to reply with ‘If you’re continuously making changes, wouldn’t it make sense to continuously integrate and continuously deploy the changes. I mean, you’d hardly be taxing the definition of the word “continuously…” now are you…’]
Me: So how are you making changes?
Dev1: Oh well, we’re just using scripts to deploy the changes now.
Me: And where are the scripts?
Dev1 They’re in the wiki.
Dev2: No they’re not, because we put anything in the wiki that is changing regularly.
Dev1. Oh. So where are they?
[Dev2 stares blankly].
Dev1: So why do you want solutions?
Me: To help you build a release pipeline.
Dev2: But we’re making lots of changes regularly.
[Again, me, keeping cool in spite of the blindingly obvious contradiction presented to me.]
Me: But let’s get something built to help you make those changes regularly.
Dev1: OK, I’ll get something together by the end of the week. In the meantime we’re too busy making changes to get this together…
Sound familiar? I’m sure many of you have been in a situation like this before. It’s not the first time I’ve had a conversation like this either.
Now I can go on about how the business doesn’t buy into maintenance, or Continuous Integration or Continuous Delivery. Or I could talk about how CI/CD needs to be pushed by the devs and that associated tasks require items in the sprint to make sure that they are completed and audited. And I can talk about how the “build once, deploy many” process reduces the number of bugs and speeds up the deployment process. And you know, it is great that when wearing my DBA hat I have had to write wacky scripts because dev teams haven’t been deploying to environments in a controlled manner, but I’d really rather be focusing on infrastructure. All these points are well documented.
But these points are consequences of implementing (or not) Continuous Delivery and Continuous Deployment. The most fundamental point of adopting CI/CD is getting your code into some sort of source control. It all starts with getting the code into source control, writing pre and post deployment scripts that are re-runnable, developers checking in changes, building the code, and taking the resulting artefact (in SSDT terms, a dacpac) and deploying that dacpac to your database. And taking the same dacpac and deploying again and again up to production. This process can be achieved in an hour. Actually, in 55 minutes! And AdventureWorks can be tough to automate! OK, it may take longer than that, but getting a repository of your code that can be audited, changes rolled back, tagged, branched, deployed, merged etc etc is the first step towards achieving Database DevOps.Follow @rPh0enix
SSDT is great for many things, but one thing it’s not great at is giving control over to the user with regards to data and log file creation. It uses prefixed values of DefaultDataPath and DefaultLogPath. These variables are system defined and cannot be overridden. Instead it uses the default locations that are set on the instance we are deploying to. Humph. And I’d wager that if these are null then it will just use where the system database settings are. This is not particularly useful for anyone.
You may think “Well OK I’ll just use a pre-deployment script”. Nice try. I thought so too. But as it turns out pre-deployment scripts aren’t pre-deplyoment scripts per se. If you add a pre-deplyoment script to create a database, it’ll be run AFTER the initial database creation (go on try it yourself if you don’t believe me, but I have saved you the trouble already!)
Clearly, we have to create a shell database that we can control where files are created outside of SSDT, and also the primary filegroup. If you’re using something like Octopus Deploy or TeamCity, or VSTS then it’s simple enough to create a step that will create this shell database for you. Then everything else (logins, users, tables etc) can be created using SSDT. And if you have more files/filegroups to be created, then you can use SSDT to set these up; what I’m talking about is the initial creation of the database.
There’s no shortage of options as to how to do this: you could have a SQL file with the CREATE statements run, and this would work, providing you have the correct “if not exists create db” setup, but this may not be particularly useful for re-use.
If you want to go for something that can be re-used, then I think SMO is your friend. It can be downloaded via Feature Pack, so practically any box can have it deployed (and what I mean by this is that you don’t need a full SQL Server install to have access to SMO), it works well with PowerShell, and there’s no shortage of examples online with how you can create a script that does what you need to do. This post from Allen White gives a good overview of SMO in PowerShell. And if you’re using something like Octopus, then this script also helps give a good start on creating a database using SMO. Where I think this step template in the Octopus library falls short is that it only creates the database, and not sets the files and filegroups up. But if you combine Allen’s scripts with the idea of the step template, then you can pretty much see where I am coming from. It’s also worth setting the recovery model using SMO and not worrying about this using SSDT.
I’m not wild about the use of scripts outside of the database to deploy the database; it implies “magic”, or behind the scenes changes that some people won’t know exists. And usually I’d advocate the use of SSDT pre/post deploy scripts to manage everything within the database. But databases are hard to deploy: they’re not a bunch of dlls that can be rolled back. A minor error early in the setup of a deployment pipeline can cause massive problems further down the process. And file creation is the first thing you do. What if the log file is created on the C drive? How hard can it be to move log files? Very hard is how. And creating a secondary log file to compensate is awful. And to try to remove that first log file? All these issues could be removed with a bit of forethought.
Remember it’s no good assuming that the instance you are deploying to has everything set up the way you like it. You have the control over this stuff when you are using CI/CD properly, and you have to take responsibility for database deployments.Follow @rPh0enix