posted on Tuesday, January 11, 2005 11:50 AM
by
amachanic
Counting occurrences of a substring within a string
I have absolutely no idea why anyone wants to do this, but I keep answering the same question in forums: "How do I count the occurrences of a substring [note: usually comma] within a string?"
In an effort to thwart carpal tunnel syndrome, I have created the Ultimate Substring Occurrence Counting UDF.
... And here it is:
CREATE FUNCTION dbo.GetSubstringCount
(
@InputString TEXT,
@SubString VARCHAR(200),
@NoisePattern VARCHAR(20)
)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN
(
SELECT COUNT(*)
FROM dbo.Numbers N
WHERE
SUBSTRING(@InputString, N.Number, LEN(@SubString)) = @SubString
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number + LEN(@SubString), 1)) = 0
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number - 1, 1)) = 0
)
END
First note: You need (regular readers, you guessed it) a numbers table.
Okay, so what's it do? Simply put, it returns the number of times @SubString appears within @InputString. But wait! -- Act now and you will receive an additional bonus feature at no extra cost! Can you feel the love?
The @NoisePattern parameter allows the user to put the UDF into "exact match" mode.
For instance, let's say you have a big string containing some text about automobile manufacturers, and for some reason (again, I have no clue why people need this functionality -- fill me in if you do!) you want to count the number of occurrences of the word "auto", but not the number of occurrences of other forms of the word, e.g. "automobile" or .... some word that ends in "auto" (if such a word exists).
By specifying a pattern for @NoisePattern of characters that shouldn't be adjacent to your word, you're telling the UDF that any other characters are safe. Leaving the parameter empty means that all occurrences of the substring will be counted. Examples:
SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', '')
-- Returns 2
SELECT dbo.GetSubstringCount('Autos are fun. I like to drive my auto.', 'auto', '')
-- Also returns 2
SELECT dbo.GetSubstringCount('Autos are fun. I like to drive my auto.', 'auto', '%[a-z]%')
-- Only returns 1 -- The exact match must not have adjacent alphabetic characters