Tuesday, 28 July 2020

Find Pattern And Its Anagrams In A Text

Suppose we need to find if some text contains a pattern or any of its anagrams. I will explain how you can do it in O(n) time.
Let's consider the example below:
Text: cedaabd
Pattern: abad

Here are all possible 4-letter anagrams of abad:
aabd aadb abad abda adab adba baad bada bdaa daab daba dbaa

So we have 12 patterns to match. That is not feasible. Instead we will match count of each unique characters in pattern with that of current text window.
We will follow the below steps with above example.

Pattern character count:
a b c d e
2 1 0 1 0

Why are we counting c & e though these characters are not present in pattern? We want to have a fixed set of characters to compare to.
Otherwise, when we take a character from text, we will have to see first if the character is present in the pattern & then increment it. We want to avoid extra comparison.
Suppose, our text only contains english letters, so we know our table length will be 26. For our example, we have considered 5 characters a, b, c, d, e.

Now let's first consider first 4 characters from text & increment it in each step.

t[1] to t[4]
a b c d e
1 0 1 1 1
no match with pattern count, increment t.

t[2] to t[5]
a b c d e
2 0 0 1 1
no match with pattern count, increment t.

t[3] to t[6]
a b c d e
2 1 0 1 0
count matched with pattern count, so we found a match, increment t to check next character.

t[4] to t[7]
a b c d e
2 1 0 1 0
count matched with pattern count, so we found another match, this is also end of text.

As you can see we have found 2 matches in the text which are anagrams of the pattern. So what is the worst time complexity? If we recalculate text character count in each step, that will require m operations.
So time complexity would be O(mn). But we can optimize that with sliding window technique. Once count has been calculated for the first time, we can do the following:
Decrement first character count from the previous value.
Add current character count to the previous value.

For example, t[1] to t[4] count:
a b c d e
1 0 1 1 1

Now t[2] to t[5] can be counted as below:
decrement t[1] count from above. t[1] is c. So table would look like this:
a b c d e
1 0 0 1 1
Increment t[5] value. t[5] is a. So final table is
a b c d e
2 0 0 1 1

So with two operations we can recalculate hash in each step & it doesn't depend on m. So time complexity becomes O(n).

Another thing to consider is the time taken to compare values of pattern table with text table in each step. But as table length is fixed (here it is 5) & it doesn't depend on m or n, we can consider it as constant.
So overall time complexity would be O(n).

No comments:

Post a comment

How To Solve "Caused by: org.hibernate.HibernateException: Missing table" When Table Is Present In Database

If you are using JPA or Hibernate directly and got that exception while starting your application, there is one obvious reason for that. You...