posted on Monday, October 25, 2004 12:45 PM by amachanic

Pattern-based replacement UDF

As a personal challenge, I decided to write a UDF that will work just like T-SQL's REPLACE() function, but using patterns as input.

The first question: How does REPLACE() handle overlapping patterns?

SELECT REPLACE('babab', 'bab', 'c')

-------------------------------------------------- 
cab

(1 row(s) affected)


SELECT REPLACE('bababab', 'bab', 'c')

-------------------------------------------------- 
cac

(1 row(s) affected)

It appears that SQL Server parses the input string from left to right, replacing the first instance of the replacement string, and then continues parsing to the right.

Next question: How to do the replacement on a pattern? As it turns out, this is somewhat trickier than I initially thought. A replacement requires a starting point -- easy to find using PATINDEX -- and an end point. But there is no function for finding the last character of a pattern. So you'll see that the UDF loops character-by-character, testing PATINDEX, in order to find the end of the match. This is useful for situations like:

SELECT dbo.PatternReplace('baaa', 'ba%', 'c')

-- We know that the match starts at character 1... but where does it end?

Anyway, enough background, here's the code:

CREATE FUNCTION dbo.PatternReplace
(
   @InputString VARCHAR(4000),
   @Pattern VARCHAR(100),
   @ReplaceText VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
   DECLARE @Result VARCHAR(4000) SET @Result = ''
   -- First character in a match
   DECLARE @First INT
   -- Next character to start search on
   DECLARE @Next INT SET @Next = 1
   -- Length of the total string -- 8001 if @InputString is NULL
   DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)
   -- End of a pattern
   DECLARE @EndPattern INT
 
   WHILE (@Next <= @Len) 
   BEGIN
      SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
      IF COALESCE(@First, 0) = 0 --no match - return
      BEGIN
         SET @Result = @Result + 
            CASE --return NULL, just like REPLACE, if inputs are NULL
               WHEN  @InputString IS NULL
                     OR @Pattern IS NULL
                     OR @ReplaceText IS NULL THEN NULL
               ELSE SUBSTRING(@InputString, @Next, @Len)
            END
         BREAK
      END
      ELSE
      BEGIN
         -- Concatenate characters before the match to the result
         SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
         SET @Next = @Next + @First - 1
 
         SET @EndPattern = 1
         -- Find start of end pattern range
         WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
            SET @EndPattern = @EndPattern + 1
         -- Find end of pattern range
         WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
               AND @Len >= (@Next + @EndPattern - 1)
            SET @EndPattern = @EndPattern + 1

         --Either at the end of the pattern or @Next + @EndPattern = @Len
         SET @Result = @Result + @ReplaceText
         SET @Next = @Next + @EndPattern - 1
      END
   END
   RETURN(@Result)
END

... And here's how you run it, with some sample outputs showing that it does, indeed, appear to work:

SELECT dbo.PatternReplace('babab', 'bab', 'c')
                                                   
-------------------------------------------------- 
cab

(1 row(s) affected)


SELECT dbo.PatternReplace('babab', 'b_b', 'c')
                                                   
-------------------------------------------------- 
cab

(1 row(s) affected)


SELECT dbo.PatternReplace('bababe', 'b%b', 'c')

                                                   
-------------------------------------------------- 
cabe

(1 row(s) affected)

Hopefully this will help someone, somewhere. I haven't found any use for it yet :)

Thanks to Steve Kass for posting some single-character replacement code which I based this UDF on.


Update, January 10, 2005: Thanks to Frank Kalis, I've tracked down some problems with the original UDF. The version posted here has been fixed and now should respond identically to the T-SQL REPLACE function when NULLs or non-pattern-based arguments are passed in. The following example pairs should return the same values (and do, at this point!)

SELECT dbo.PatternReplace(NULL, '', 'abc')
SELECT REPLACE(NULL, '', 'abc')

SELECT dbo.PatternReplace('abc', '', NULL)
SELECT REPLACE('abc', '', NULL)

SELECT dbo.PatternReplace('abc', NULL, '')
SELECT REPLACE('abc', NULL, '')

SELECT dbo.PatternReplace('abc', 'b', '')
SELECT REPLACE('abc', 'b', '')

SELECT dbo.PatternReplace('adc', 'b', '')
SELECT REPLACE('adc', 'b', '')

Comments

# re: Pattern-based replacement UDF @ Tuesday, January 10, 2006 4:31 PM

This UDF does exactly what I need. I'm working with an older legacy application, where the phone numbers are all stored as straight varchars.

Since all the formatting is left intact (which would be how I would have handled it,) I needed a way to search for a phone number based upon just the non-formatting characters.

So, I want to be able to find the phone number "555-1212" regardless of whether it's stored as "(800) 555-1212", "800-555-1212", "800 555 1212" or whatever else format it could be entered in.

Using the PatternReplace() function, I'm able to do:

where
dbo.PatternReplace([PhoneNo], '[^0-9]', '') like '%5551212%'

This will return all the phone numbers I'm looking for.

Some time I'll re-architect this entire application and fix the phone number problem, but I needed a solution that I could implement quickly.

Dan G. Switzer, II

# Pattern-based replacement UDF @ Thursday, July 13, 2006 12:07 AM

Originally posted here.
As a personal challenge, I decided to write a UDF that will work just
like...

Anonymous

# Pattern-based replacement UDF @ Monday, January 08, 2007 2:39 PM

Originally posted here . As a personal challenge, I decided to write a UDF that will work just like T-SQL's

Anonymous