posted on Tuesday, January 11, 2005 11:50 AM by amachanic

Counting occurrences of a substring within a string

I have absolutely no idea why anyone wants to do this, but I keep answering the same question in forums: "How do I count the occurrences of a substring [note: usually comma] within a string?"

In an effort to thwart carpal tunnel syndrome, I have created the Ultimate Substring Occurrence Counting UDF.

... And here it is:

CREATE FUNCTION dbo.GetSubstringCount
(
	@InputString TEXT, 
	@SubString VARCHAR(200),
	@NoisePattern VARCHAR(20)
)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
	RETURN 
	(
		SELECT COUNT(*)
		FROM dbo.Numbers N
		WHERE
		SUBSTRING(@InputString, N.Number, LEN(@SubString)) = @SubString
			AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number + LEN(@SubString), 1)) = 0
			AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number - 1, 1)) = 0
	)
END

First note: You need (regular readers, you guessed it) a numbers table.

Okay, so what's it do? Simply put, it returns the number of times @SubString appears within @InputString. But wait! -- Act now and you will receive an additional bonus feature at no extra cost! Can you feel the love?

The @NoisePattern parameter allows the user to put the UDF into "exact match" mode.

For instance, let's say you have a big string containing some text about automobile manufacturers, and for some reason (again, I have no clue why people need this functionality -- fill me in if you do!) you want to count the number of occurrences of the word "auto", but not the number of occurrences of other forms of the word, e.g. "automobile" or .... some word that ends in "auto" (if such a word exists).

By specifying a pattern for @NoisePattern of characters that shouldn't be adjacent to your word, you're telling the UDF that any other characters are safe. Leaving the parameter empty means that all occurrences of the substring will be counted. Examples:

SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', '')
-- Returns 2

SELECT dbo.GetSubstringCount('Autos are fun.  I like to drive my auto.', 'auto', '')
-- Also returns 2

SELECT dbo.GetSubstringCount('Autos are fun.  I like to drive my auto.', 'auto', '%[a-z]%')
-- Only returns 1 -- The exact match must not have adjacent alphabetic characters

Comments

# Two new utility UDFs posted @ Tuesday, January 11, 2005 11:52 AM

Two new utility UDFs posted

amachanic

# Counting occurrences of a substring within a string @ Thursday, July 13, 2006 12:23 AM

Originally posted here.

I have absolutely no idea why anyone wants to do this, but I keep
answering...

Anonymous

# Counting occurrences of a substring within a string @ Monday, January 08, 2007 2:36 PM

Originally posted here . I have absolutely no idea why anyone wants to do this, but I keep answering

Anonymous