Who Got the Func Part 1: String Functions Using T-SQL

Introduction

In this post I’m going to delve into splitting the results from a single variable or cell into two separate cells using both case statements and the built in string functions of T-SQL. The object of this post is to demonstrate using the built in string functions together to extract the values from within a string.

Back in May I posted a SQL query that coalesced the RangeValues of a partition schema into one column: lines 49 – 57

The results of the Range Value column will either be one of the three:

  • Highest Range Value (eg >22411)
  • Middle Range Value  (eg >22410 AND <=22411)
  • Lowest Range Value (eg <=4638)

Let’s say I needed to split the results back out into the start of the range value and the end of the range value into 2 separate columns. I realise that putting all that effort in joining only to split it is a terrible waste of resource, and so I want to stress the point that I am only using this as an example of splitting the expression using the built in T-SQL string functions. The point is that we will have 3 different expressions with differing lengths depending on the length of the range value we will need to split consistently, and I thought that these expressions would provide a good opportunity in demonstrating some of the string functions in T-SQL that can be used to achieve this.

For testing and demonstration purposes, rather than query the table I’m going to declare a variable and then enter the relative expression:


DECLARE @EXPR NVARCHAR (20)
set @EXPR = '>22400 AND <=22411'

Here is the full SQL below that Iam going to be using: By casting the results as an INT we’re also preventing any of the math symbols from being entered erroneously. I’m going to break it down firstly into the start range, then go through each condition to explain my workings. I will then do the same with the End Range.


DECLARE @EXPR NVARCHAR (20)
set @EXPR = '>2240 AND <=22411'

-->22411
-->22410 AND <=22411
--<=4638

select
case when patindex('%AND%',@expr) = 0 AND charindex('=',@expr) = 0
THEN cast(substring(@expr, 2, len(@expr) -1) as int)
else
CASE when patindex('%AND%',@expr) = 0
THEN cast(substring(@expr, 3, len(@expr) -2) as int)
ELSE  cast(substring(@expr, 2, charindex(' ',@expr) -2) as int )
END
end startRange
,
case when patindex('%AND%',@expr) = 0 AND charindex('=',@expr) = 0
THEN 0
ELSE CASE when patindex('%AND%',@expr) = 0 THEN cast(substring(@expr, charindex('=',@expr) + 1, len(@expr) - charindex('=',@expr)) as int)
ELSE cast(substring(@expr, charindex('=', @expr) + 1, len(@expr) - charindex('=',@expr)) as int )
END
END endRange

Splitting the First Range Value

The first part deals with getting the lower value as Start Range:


select
case when patindex('%AND%',@expr) = 0 AND charindex('=',@expr) = 0
 THEN cast(substring(@expr, 2, len(@expr) -1) as int)
 else
 CASE when patindex('%AND%',@expr) = 0
 THEN cast(substring(@expr, 3, len(@expr) -2) as int)
 ELSE cast(substring(@expr, 2, charindex(' ',@expr) -2) as int )
 END
 end startRange

Let’s break this down condition by condition:

select
case when patindex('%AND%',@expr) = 0 AND charindex('=',@expr) = 0
 THEN cast(substring(@expr, 2, len(@expr) -1) as int)

the first case is using the PATINDEX string function to look for the pattern ‘%AND%’. Using the CHARINDEX we are looking for the  ‘=’ sign.

  • PATINDEX returns the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found. Just like when we are using LIKE, we use the ‘%’ as a wildcard search in the PATINDEX.
  • CHARINDEX Searches an expression for another expression and returns its starting position if found. If it does not find a starting position, it will return 0.

We are looking for these NOT to exist by expecting 0 to be returned for both of these cases. IF 0 is returned for both of these, then we can be sure that the Highest Range Value is the expression that will be found. We only want the number which means returning only part of the string, so we use SUBSTRING to to return us the value beginning at position 2, and subtract the ‘>’ character by using len(@expr-1) as an int.

  • SUBSTRING returns part of a character from an expression, with a start to determine which character we begin our return from, and the length that we return.

Let’s look at the 2nd case:


else
 CASE when patindex('%AND%',@expr) = 0
 THEN cast(substring(@expr, 3, len(@expr) -2) as int)

Again, we’re using the PATINDEX to look for the pattern ‘%AND%’. If it does not find it then it will return a 0. Unlike the previous case we are looking for the ‘=’ as this case statement is looking for the Lowest Range Value. We could if we wanted to use CHARINDEX to find the ‘=’ sign by adding to the case “AND charindex(‘=’,@expr) = 2”. However, we know that if the PATINDEX returns 0 we will find the expression and we can manipulate it accordingly because we covered not finding the ‘=’ in the previous expression.

Again we only want the numbers, so we use SUBSTRING to return the string starting at position 3 and use LEN to subtract the first 2 characters (<=) from the expression.

There’s one more expression to take into account, however as we have cases in place to deal with the first two we know that we will end up with the Middle Range Value. So we only need to remove the characters to leave us the number. In this case, as it is the start range we  are looking for, we want to remove the math symbols as well as the upper number.


ELSE cast(substring(@expr, 2, charindex(' ',@expr) -2) as int )
 END
 end startRange

Using SUBSTRING, we are specifying that the RETURN will begin on the 2nd character, which will always be the start of the Lower Range Value. Using CHARINDEX, we find the value of the starting position for the first ” ” (space) in the expression. This will return us 7. We then subtract 2 from that value to give us 5, which means that from the starting position of the 2nd character in the string, we will return 5 characters, which gives us the whole number. This also works if the number is only four characters long.

Splitting to Get The End Range

So now we’ve covered the start range of the expression, we move onto the end range:


case when patindex('%AND%',@expr) = 0 AND charindex('=',@expr) = 0
THEN 0
ELSE CASE when patindex('%AND%',@expr) = 0 THEN cast(substring(@expr, charindex('=',@expr) + 1, len(@expr) - charindex('=',@expr)) as int)
ELSE cast(substring(@expr, charindex('=', @expr) + 1, len(@expr) - charindex('=',@expr)) as int )
END
END endRange

The first case here is exactly the same as the first case statement in the start range.

case when patindex('%AND%',@expr) = 0 AND charindex('=',@expr) = 0
THEN 0

if it does not find “AND” and does not find the “=” symbol then it is the Highest Range Value expression. This time however it will return “0” as there is no higher range value (because the lowest range value is the highest range we are partitioning by. If we want to increase the range value then we have to increase the number of partitions by splitting the highest.)


ELSE CASE when patindex('%AND%',@expr) = 0 THEN cast(substring(@expr, charindex('=',@expr) + 1, len(@expr) - charindex('=',@expr)) as int)

The first condition is looking for AND not to exist then we know it will be the Lowest Range Value, as again the first CASE has accounted for the ‘=’ sign not to exist. In order to return just the number we use CHARINDEX to find the ‘=’ sign. We then add 1 from this position to give us our starting position. We then determine the length by taking the total length of @expr (in our example, 6) and from this subtracting the value of  the CHARINDEX value of ‘=’, which gives us 2.

ELSE cast(substring(@expr, charindex('=', @expr) + 1, len(@expr) - charindex('=',@expr)) as int )
END
END endRange

Finally, nearly there: so we know that any @expr that has reached this case will be the Middle Range Value. So we are using a SUBSTRING beginning at the ‘=’ symbol again using CHARINDEX to indicate the start of the SUBSTRING. From here we know that the next character will be the beginning of the higher number, so we add 1 to the SUBSTRING starting value. We then get the length of @expr and subtract the length of the @expr up to the ‘=’ sign again. So in the example above the first character after the ‘=’ is 2. And subtracting the index position of the ‘=’ sign 13 from the total length gives us 5, which is the length of the higher value. The end of the SUBSTRING will be 5 characters from the beginning.

Summary

These examples have only used a 3 of the string functions that are available using T-SQL. Yet by combining them I have been able to manipulate strings to organise the data. Clearly this is not something that you’s use in a SSIS package or any other bulk loading, but if you’re using DMV’s to extract data and present it in a legible manner.

Author: Richie Lee

Full time computer guy, part time runner. Full time Dad, part time blogger. Pokémon Nut. Writer of fractured sentences. Maker of the best damn macaroni cheese you've ever tasted.

1 thought on “Who Got the Func Part 1: String Functions Using T-SQL”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s