Compressing Open Row Groups

If you’re using, or at least read up on Clustered Columnstore Indexes then you’ll know that the data is stored in segments. You’ll also know that there are 4 states the segments can be in at any one time:

  1. OPEN
  2. CLOSED
  3. INVISIBLE
  4. COMPRESSED

When segments are in the OPEN/CLOSED states they are delta-stores. Delta stores are immediately up-datable and are currently uncompressed. So at this stage the delta stores will not have the space saving advantages of columnstore. The db engine creates a new delta store whenever it needs one to handle inserts, closes them when full (have 1048576 rows). OPEN and CLOSED delta stores can be directly updated and deleted, unlike COMPRESSED segments (see below). You can have more than one OPEN delta store at any one time.

When a delta store has been compressed the state is changed to COMPRESSED. A row store is not directly up-datable. So when we UPDATE or DELETE a row that is stored in a COMPRESSED row group the row itself is not affected. w/r/t a DELETE the row to be deleted is marked for deletion in a delete bitmap. w/r/t an UPDATE the row to be updated is marked for deletion in a delete bitmap and a new row is inserted in an OPEN row group. The task of cleaning up these deleted rows is not managed by the db engine; a re-build of the index is required to remove the rows marked for deletion. Performance will deteriorate if these rows are not purged and data not re-organised.

INVISIBLE is a transitory state between CLOSED and COMPRESSED. The invisible state is when the Tuple Mover is in the process of compressing a closed delta store. During this phase the Tuple Mover does not prevent the data from being read, and inserts are not blocked, but UPDATES and DELETES are. The Tuple Mover compresses closed delta-stores into row-groups and runs as a background process roughly every 5 minutes.

Open row groups are uncompressed heaps, the size of which cannot be easily measured. And they will stay that way until they are filled up. Imagine you are using partitioning, then it’s unlikely you’ll ever fill up open row groups as row groups are split by partition, to keep switching as a meta data activity. In fact even if you are not using partitioning you can have more than one open row group per table, which could completely abrogate the benefits of clustered columnstore indexes.

So… what are we to do?!

Well you could rebuild the table/partition, but this takes a significant amount of time. However reading Books Online for indexes there seems to be an undocumented option mentioned in the comments for clustered columnstore indexes called ‘COMPRESS_ALL_ROW_GROUPS’. Including this option on a reorganise index command causes the open row groups to be included and compressed with the rest of the row groups.

Let’s run a quick demo on this:


--using new AdventureWorks DW 2014
use AdventureWorksDW2014
GO

--create a new table from existing table
select top 0 *into dbo.factinternetsales_new from [dbo].[FactInternetSales]
GO

--a clustered columnstore index
CREATE CLUSTERED COLUMNSTORE INDEX [CCI_factinternetsales_new] ON [dbo].[factinternetsales_new] WITH (DROP_EXISTING = OFF)
GO

--insert data from table we used to create new table
insert into dbo.factinternetsales_new
select top 60000 *
from [dbo].[FactInternetSales]
GO

Now we’re all set up let’s check the state of the segment:


--use the catalog views to check the state the index is in
select * from sys.column_store_row_groups
where object_id = object_id('dbo.factinternetsales_new');
GO

2014-09-05 19_16_05-SQLQuery10

Rebuild the index with the option COMPRESS_ALL_ROW_GROUPS = ON


--rebuild the index using the undocumented option COMPRESS_ALL_ROW_GROUPS
alter index CCI_factinternetsales_new on factinternetsales_new
Reorganize with (COMPRESS_ALL_ROW_GROUPS = ON);
GO

2014-09-05 19_18_20-SQLQuery11

Let’s also try this specifying a particular partition: if you’ve followed my previous post How To Split Non-Empty Partitions When A Clustered Columnstore Index Exists on The Table
then you can follow at home using the database created in that demo as this is the one I am using to test here. (I’ve included the script to create the table and the data at the end of this post.)


use Phoebix
GO

insert into dbo.B_Phoebix
select top 100000 * from dbo.B_Phoebix
where id = 13258
GO

insert into dbo.B_Phoebix
select top 100000 * from dbo.B_Phoebix
where id = 13278
GO

select * from sys.column_store_row_groups
where object_id = object_id('dbo.B_Phoebix')
and partition_number IN (4,7)
GO

Inserting extra values into partitions that are fully compressed, we will create new open row groups.

2014-09-05 20_35_43-SQLQuery13- Microsoft SQL Server

Ok, let’s specify partition 4 only when rebuilding. We should expect to see only one of the row groups compressed.


alter index [B_Custard] on [dbo].[B_Phoebix]
Reorganize partition = 4
with (COMPRESS_ALL_ROW_GROUPS = ON);
GO

select * from sys.column_store_row_groups
where object_id = object_id('dbo.B_Phoebix')
and partition_number IN (4,7)
GO

2014-09-05 20_37_20-SQLQuery14

Cool! So now we’re able to reorganise indexes and compress any segments that are currently open, even to a partition level. This is however no substitute for properly implementing Clustered Columnstore Indexes to keep rebuilds/reorganizes to a minimum.

Appendix: Script to create test CCI Database with partitioning enabled.

Some notes:

  • The script below will create the database and the objects necessary. I’m going to state the obvious, but read through the scripts first before you run them; things like file locations on boxes are always different.
  • This database has two tables sharing the same partition schema, which itself is on its own secondary filegroup. The script will create 7 partitions with distinct ranges. Let’s populate some of those partitions with data. This is going to load into staging tables and switch the tables into the partitions, so we can be confident that everything is set up correctly.
/*
***************
CREATE DATABASE
***************
*/

CREATE DATABASE [Phoebix]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'Phoebix', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\Phoebix.mdf' , SIZE = 5120KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N'Phoebix_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\Phoebix_log.ldf' , SIZE = 2048KB , FILEGROWTH = 10%)
GO
ALTER DATABASE [Phoebix] SET COMPATIBILITY_LEVEL = 120
GO
ALTER DATABASE [Phoebix] SET READ_WRITE
GO
ALTER DATABASE [Phoebix] SET RECOVERY SIMPLE
GO
ALTER DATABASE [Phoebix] SET MULTI_USER
GO

/*
***************
CREATE FILEGROUPS
***************
*/

USE [Phoebix]
GO
IF NOT EXISTS (SELECT name FROM sys.filegroups WHERE is_default=1 AND name = N'PRIMARY') ALTER DATABASE [Phoebix] MODIFY FILEGROUP [PRIMARY] DEFAULT
GO

use Phoebix
go

ALTER DATABASE Phoebix ADD FILEGROUP [FGPhoebix]

/*
***************
CREATE FILES
***************
*/

ALTER DATABASE Phoebix ADD FILE
(NAME = N'Phoebix_F01', FILENAME = N'S:\Data\PhoebixData\Phoebix1.ndf', SIZE = 5GB, FILEGROWTH = 1GB)
TO FILEGROUP [FGPhoebix]
ALTER DATABASE Phoebix ADD FILE
(NAME = N'Phoebix_F02', FILENAME = N'S:\Data\PhoebixData\Phoebix2.ndf', SIZE = 5GB, FILEGROWTH = 1GB)
TO FILEGROUP [FGPhoebix]

/*
***************
CREATE PARTITION FUNCTION
***************
*/

USE Phoebix
GO
CREATE PARTITION FUNCTION pfnPhoebix (int) AS RANGE LEFT FOR VALUES (12116, 12146, 12765, 13258, 13267, 13278, 13441)
GO

/*
***************
CREATE PARTITION SCHEMA
***************
*/

USE Phoebix
GO
CREATE PARTITION SCHEME pscPhoebix AS PARTITION pfnPhoebix ALL TO (FGPhoebix)
GO

/*
***************
CREATE TABLES
***************
*/

use Phoebix
go

CREATE TABLE dbo.B_Phoebix
(
Id INT NOT NULL,
MonthEndDate DATE NOT NULL,
memberId BIGINT NOT NULL,
BSKey VARCHAR (4) NOT NULL,
value FLOAT NOT NULL
) ON pscPhoebix (Id) WITH (DATA_COMPRESSION = PAGE);

GO

CREATE TABLE dbo.S_Phoebix
(
Id INT NOT NULL,
MonthEndDate DATE NOT NULL,
memberId BIGINT NOT NULL,
SbId INT NOT NULL,
flow FLOAT NOT NULL,
vesting FLOAT NULL
) ON pscPhoebix (Id) WITH (DATA_COMPRESSION = PAGE);
GO

/*
***************
CREATE CLUSTERED COLUMNSTORE INDEXES
***************
*/

CREATE CLUSTERED COLUMNSTORE INDEX [B_Custard] ON [dbo].[B_Phoebix] WITH (DROP_EXISTING = OFF)
GO
CREATE CLUSTERED COLUMNSTORE INDEX [S_Custard] ON [dbo].[S_Phoebix] WITH (DROP_EXISTING = OFF)
GO

/*
****************
CREATE LOAD DATA
****************
*/

CREATE TABLE dbo.B_Phoebix_Load
(
Id INT NOT NULL,
MonthEndDate DATE NOT NULL,
memberId BIGINT NOT NULL,
BSKey VARCHAR (4) NOT NULL,
value FLOAT NOT NULL
) WITH (DATA_COMPRESSION = PAGE);

GO

CREATE TABLE dbo.S_Phoebix_Load
(
Id INT NOT NULL,
MonthEndDate DATE NOT NULL,
memberId BIGINT NOT NULL,
SbId INT NOT NULL,
flow FLOAT NOT NULL,
vesting FLOAT NULL
) WITH (DATA_COMPRESSION = PAGE);

GO
declare @i as int;
set @i = 1;
begin tran
while @i <= 3200000
begin
INSERT INTO dbo.B_Phoebix_Load (Id, monthEndDate, memberId, bsKey, value)
values
( 13258, GETDATE(), RAND ()*12, 'FFOS', RAND ()*13),
( 13267, DATEADD(MONTH,1,GETDATE()), RAND ()*12, 'FFOS', RAND ()*13),
( 13278, DATEADD(MONTH,2,GETDATE()), RAND ()*12, 'FFOS', RAND ()*13)
INSERT INTO dbo.S_Phoebix_Load (Id, monthEndDate, memberId, sbId, flow, vesting)
values
( 13258, GETDATE(), RAND ()*12, RAND ()*50, RAND ()*13, RAND ()*13),
( 13267, DATEADD(MONTH,1,GETDATE()), RAND ()*12, RAND ()*50, RAND ()*13, RAND ()*13),
( 13278, DATEADD(MONTH,2,GETDATE()), RAND ()*12, RAND ()*50, RAND ()*13, RAND ()*13)
set @i = @i + 1;
end;
commit;

/*
*************************
LOAD DATA INTO PARTITIONS USING STAGING TABLES
*************************
*/

/****
RUN 1
****/

-- create the empty rowset from the table we will switch into
select top 0 * into dbo.B_Phoebix_13258 from dbo.B_Phoebix
select top 0 * into dbo.S_Phoebix_13258 from dbo.S_Phoebix

--do ourselves a favour and make this temporary table as small as possible
--and add page compression
ALTER TABLE dbo.B_Phoebix_13258 REBUILD WITH (DATA_COMPRESSION = PAGE)
ALTER TABLE dbo.S_Phoebix_13258 REBUILD WITH (DATA_COMPRESSION = PAGE)

--empty rowset is created on primary by default
--move to the filegroup the partitioned table is on
ALTER TABLE dbo.B_Phoebix_13258 ADD CONSTRAINT CONS_dbo_B_Phoebix_13258 PRIMARY KEY (Id) on FGPhoebix
ALTER TABLE dbo.S_Phoebix_13258 ADD CONSTRAINT CONS_dbo_S_Phoebix_13258 PRIMARY KEY (Id) on FGPhoebix

-- we don't need it (or can even keep it as we are adding CCI)
ALTER TABLE dbo.B_Phoebix_13258 DROP CONSTRAINT CONS_dbo_B_Phoebix_13258
ALTER TABLE dbo.S_Phoebix_13258 DROP CONSTRAINT CONS_dbo_S_Phoebix_13258

--add constraint on temp table so that we can switch in to main table
ALTER TABLE dbo.B_Phoebix_13258 WITH CHECK ADD CONSTRAINT CK_B_Phoebix_13258 CHECK (([Id]=(13258)))
ALTER TABLE dbo.S_Phoebix_13258 WITH CHECK ADD CONSTRAINT CK_s_Phoebix_13258 CHECK (([Id]=(13258)))

--add data to temp table from loading tables
--remember IRL this would be SSIS or HPC or data loader of choice
INSERT INTO dbo.B_Phoebix_13258 (Id, monthEndDate, memberId, bsKey, value)
select Id, monthEndDate, memberId, bsKey, value
from dbo.B_Phoebix_Load WITH (NOLOCK)
where id = 13258

INSERT INTO dbo.S_Phoebix_13258 (Id, monthEndDate, memberId, sbId, flow, vesting)
select Id, monthEndDate, memberId, sbId, flow, vesting
from dbo.S_Phoebix_Load WITH (NOLOCK)
where id = 13258

--add CCI
CREATE CLUSTERED COLUMNSTORE INDEX [PK_B_Phoebix_13258] ON dbo.B_Phoebix_13258 WITH (DROP_EXISTING = OFF)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_S_Phoebix_13258] ON dbo.S_Phoebix_13258 WITH (DROP_EXISTING = OFF)

--switch in to main table
ALTER TABLE dbo.B_Phoebix_13258 SWITCH TO dbo.B_Phoebix PARTITION $PARTITION.pfnPhoebix(13258)
ALTER TABLE dbo.S_Phoebix_13258 SWITCH TO dbo.S_Phoebix PARTITION $PARTITION.pfnPhoebix(13258)

DROP TABLE dbo.B_Phoebix_13258
DROP TABLE dbo.S_Phoebix_13258

/****
RUN 2
****/

-- create the empty rowset from the table we will switch into
select top 0 * into dbo.B_Phoebix_13267 from dbo.B_Phoebix
select top 0 * into dbo.S_Phoebix_13267 from dbo.S_Phoebix

--do ourselves a favour and make this temporary table as small as possible
--and add page compression
ALTER TABLE dbo.B_Phoebix_13267 REBUILD WITH (DATA_COMPRESSION = PAGE)
ALTER TABLE dbo.S_Phoebix_13267 REBUILD WITH (DATA_COMPRESSION = PAGE)

--empty rowset is created on primary by default
--move to the filegroup the partitioned table is on
ALTER TABLE dbo.B_Phoebix_13267 ADD CONSTRAINT CONS_dbo_B_Phoebix_13267 PRIMARY KEY (Id) on FGPhoebix
ALTER TABLE dbo.S_Phoebix_13267 ADD CONSTRAINT CONS_dbo_S_Phoebix_13267 PRIMARY KEY (Id) on FGPhoebix

-- we don't need it (or can even keep it as we are adding CCI)
ALTER TABLE dbo.B_Phoebix_13267 DROP CONSTRAINT CONS_dbo_B_Phoebix_13267
ALTER TABLE dbo.S_Phoebix_13267 DROP CONSTRAINT CONS_dbo_S_Phoebix_13267

--add constraint on temp table so that we can switch in to main table
ALTER TABLE dbo.B_Phoebix_13267 WITH CHECK ADD CONSTRAINT CK_B_Phoebix_13267 CHECK (([Id]=(13267)))
ALTER TABLE dbo.S_Phoebix_13267 WITH CHECK ADD CONSTRAINT CK_s_Phoebix_13267 CHECK (([Id]=(13267)))

--add data to temp table from loading tables
--remember IRL this would be SSIS or HPC or data loader of choice
INSERT INTO dbo.B_Phoebix_13267 (Id, monthEndDate, memberId, bsKey, value)
select Id, monthEndDate, memberId, bsKey, value
from dbo.B_Phoebix_Load WITH (NOLOCK)
where id = 13267

INSERT INTO dbo.S_Phoebix_13267 (Id, monthEndDate, memberId, sbId, flow, vesting)
select Id, monthEndDate, memberId, sbId, flow, vesting
from dbo.S_Phoebix_Load WITH (NOLOCK)
where id = 13267

--add CCI
CREATE CLUSTERED COLUMNSTORE INDEX [PK_B_Phoebix_13267] ON dbo.B_Phoebix_13267 WITH (DROP_EXISTING = OFF)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_S_Phoebix_13267] ON dbo.S_Phoebix_13267 WITH (DROP_EXISTING = OFF)

--switch in to main table
ALTER TABLE dbo.B_Phoebix_13267 SWITCH TO dbo.B_Phoebix PARTITION $PARTITION.pfnPhoebix(13267)
ALTER TABLE dbo.S_Phoebix_13267 SWITCH TO dbo.S_Phoebix PARTITION $PARTITION.pfnPhoebix(13267)

DROP TABLE dbo.B_Phoebix_13267
DROP TABLE dbo.S_Phoebix_13267
/****
RUN 3
****/

-- create the empty rowset from the table we will switch into
select top 0 * into dbo.B_Phoebix_13278 from dbo.B_Phoebix
select top 0 * into dbo.S_Phoebix_13278 from dbo.S_Phoebix

--do ourselves a favour and make this temporary table as small as possible
--and add page compression
ALTER TABLE dbo.B_Phoebix_13278 REBUILD WITH (DATA_COMPRESSION = PAGE)
ALTER TABLE dbo.S_Phoebix_13278 REBUILD WITH (DATA_COMPRESSION = PAGE)

--empty rowset is created on primary by default
--move to the filegroup the partitioned table is on
ALTER TABLE dbo.B_Phoebix_13278 ADD CONSTRAINT CONS_dbo_B_Phoebix_13278 PRIMARY KEY (Id) on FGPhoebix
ALTER TABLE dbo.S_Phoebix_13278 ADD CONSTRAINT CONS_dbo_S_Phoebix_13278 PRIMARY KEY (Id) on FGPhoebix

-- we don't need it (or can even keep it as we are adding CCI)
ALTER TABLE dbo.B_Phoebix_13278 DROP CONSTRAINT CONS_dbo_B_Phoebix_13278
ALTER TABLE dbo.S_Phoebix_13278 DROP CONSTRAINT CONS_dbo_S_Phoebix_13278

--add constraint on temp table so that we can switch in to main table
ALTER TABLE dbo.B_Phoebix_13278 WITH CHECK ADD CONSTRAINT CK_B_Phoebix_13278 CHECK (([Id]=(13278)))
ALTER TABLE dbo.S_Phoebix_13278 WITH CHECK ADD CONSTRAINT CK_s_Phoebix_13278 CHECK (([Id]=(13278)))

--add data to temp table from loading tables
--remember IRL this would be SSIS or HPC or data loader of choice
INSERT INTO dbo.B_Phoebix_13278 (Id, monthEndDate, memberId, bsKey, value)
select Id, monthEndDate, memberId, bsKey, value
from dbo.B_Phoebix_Load WITH (NOLOCK)
where id = 13278

INSERT INTO dbo.S_Phoebix_13278 (Id, monthEndDate, memberId, sbId, flow, vesting)
select Id, monthEndDate, memberId, sbId, flow, vesting
from dbo.S_Phoebix_Load WITH (NOLOCK)
where id = 13278

--add CCI
CREATE CLUSTERED COLUMNSTORE INDEX [PK_B_Phoebix_13278] ON dbo.B_Phoebix_13278 WITH (DROP_EXISTING = OFF)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_S_Phoebix_13278] ON dbo.S_Phoebix_13278 WITH (DROP_EXISTING = OFF)

--switch in to main table
ALTER TABLE dbo.B_Phoebix_13278 SWITCH TO dbo.B_Phoebix PARTITION $PARTITION.pfnPhoebix(13278)
ALTER TABLE dbo.S_Phoebix_13278 SWITCH TO dbo.S_Phoebix PARTITION $PARTITION.pfnPhoebix(13278)

DROP TABLE dbo.B_Phoebix_13278
DROP TABLE dbo.S_Phoebix_13278

Author: Richie Lee

Full time computer guy, part time runner. Full time Dad, part time blogger. Knows a thing or two about Pokémon. Knows too much about SQL Agent. Writer of fractured sentences. Maker of the best damn macaroni cheese you've ever tasted.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s