Update: turns out this job doesn’t actually execute (d’oh!) so I made a stored proc. Read the updated post by clicking this linky
One of our databases is exclusively used for staging tables of raw data, before it is extracted and transformed by a SSIS package and inserted into our data warehouse. These tables are large and pretty much useless after one run. The database has over 1.5TB of storage, but even after a few days of heavy month end data loading, it can fill up quickly. There’s no real warning for when the database does fill up either, as we have expanded the Files in the Filegroup to fill out the disks, so we don’t slow down the process from autogrow. Because these tables are so throwaway, we have a SQL Agent job that deletes anything older than three days at midnight. There’s pros and cons at running it at this time, but seeing as our data-loading window is 24/7 it seems as good a time as any.
As the DROP TABLE statement won’t accept a parameter, you can’t do DROP TABLE @tablename. You can however build a string of a complete T-SQL statement and EXECUTE it. I use a CURSOR, with READ_ONLY, to loop through the SELECT statement of the tables I want to delete. The READ_ONLY keyword allows for a faster and safer execution as no lock is required; we are only running a select statement. My select is pretty simple; using sys.sysobjects I get a list of the tables that were created more than 72 hours old.
USE <db_name> GO DECLARE @tableName VARCHAR(100) DECLARE @sql VARCHAR(max) DECLARE db_cursor CURSOR READ_ONLY FOR SELECT name AS tableName FROM sys.objects WHERE create_date > DATEADD (hh, -72, CAST(GETDATE() AS datetime)) OPEN db_cursor FETCH NEXT FROM db_cursor INTO @tableName WHILE @@FETCH_STATUS = 0 BEGIN SET @sql = 'DROP TABLE ' + @tableName EXEC (@sql) --PRINT @sql FETCH NEXT FROM db_cursor INTO @tableName END CLOSE db_cursor DEALLOCATE db_cursor
We need to be careful when just getting a bunch of names for tables and dropping them that way; if you put the hour at a sufficiently high number you will find all manner of sys tables listed there that you really don’t want to delete. There’s two ways I thought about dealing with this:
- Add an “AND” clause to the “SELECT” to filter out anything you might not want to delete. As these are always staging tables for the same load, I was able to narrow it down by writing the following:
SELECT name AS tableName FROM sys.objects WHERE create_date > DATEADD (hh, -72, CAST(GETDATE() AS datetime)) and name LIKE 'TempRun%'
- Alternately, narrowing down the search to only tables that are defined as user type tables.
SELECT name AS tableName FROM sys.objects WHERE create_date > DATEADD (hh, -72, CAST(GETDATE() AS datetime)) and type = 'U'
It’s surprising to see how different the Query Execution Plan is for either of these, and I was surprised to see which one was theoretically quicker (even though in the case of either of these queries, speed is not really an issue). There’s a print option commented out which you can switch in and comment out the EXECUTE @SQL line if you want to have a play around at first.
Once you’ve chosen which method you want to do it’s a case of creating a new SQL Agent Job. I’ve added the steps here in a gallery for completeness. Click on the first image to scroll through the gallery.