Monday, 27 July 2020

KMP Algorithm Explained In Plain English

Here I am just trying to explain KMP algorithm in plain english. I will also explain worst time complexity & why it is O(m + n).

We will take two examples, one with no repeatable character in the pattern & another with repeatable characters in the pattern.


Text:    aaaabaabab

Pattern: aaaaa


First, we need to build a proper prefix suffix table. Let me explain the basic idea. We will take each substring of the pattern.

Here these are "a","aa", "aaa", "aaaa". We are skipping "aaaaa" as that is the whole pattern. If we get a match for that, we are done.

For each of this substring, we will check if there are some consecutive characters which are present in the beginning & also at the end.

So why do we need that? Idea is simple. Suppose, we matched "aaaa", then we had a mismatch. We know "aaa" is common in the beginning & at the end.

As we matched last three "a", we know that starting three "a" are already matched. So we can skip them when we start matching again & directly go to 4th character. Let's see below.

Here is the prefix suffix table for the example above. It's easy to understand. All characters are same. So proper prefix suffix length will be substring length - 1.

index   1 2 3 4

value   0 1 2 3

Now let'e match it.


t[1] = p[1], match length 1, increment both t & p.

t[2] = p[2], match length 2, increment both t & p.

t[3] = p[3], match length 3, increment both t & p.

t[4] = p[4], match length 4, increment both t & p.

t[5] ≠ p[5], match failed, we won't move t in that case, prefix[4] = 3 from the table, so we set p to prefix[4] + 1 = 4.

t[5] ≠ p[4], match failed, we won't move t in that case, prefix[3] = 2 from the table, so we set p to prefix[3] + 1 = 3.

t[5] ≠ p[3], match failed, we won't move t in that case, prefix[2] = 1 from the table, so we set p to prefix[2] + 1 = 2.

t[5] ≠ p[2], match failed, we won't move t in that case, prefix[1] = 0 from the table, as we set p to prefix[1] + 1 = 1.

t[5] ≠ p[1], match failed, p is at the first character, so we can't decrement it & increment only t.

t[6] = p[1], match length 1, increment both t & p.

t[7] = p[2], match length 2, increment both t & p.

t[8] ≠ p[3], match failed, we won't move t in that case, prefix[2] = 1 from the table, so we set p to prefix[2] + 1 = 2.

t[8] ≠ p[2], match failed, we won't move t in that case, prefix[1] = 0 from the table, so we set p to prefix[1] + 1 = 1.

t[8] ≠ p[1], match failed, p is at the first character, so we can't decrement it & increment only t.

t[9] = p[1], match length 1, increment both t & p.

t[10] ≠ p[2], match failed, we won't move t in that case, prefix[1] = 0, from the table, so we set p to prefix[1] + 1 = 1.

t[10] ≠ p[1], match failed, p is at the first character, so we can't decrement it & increment only t, but t is at the last position, so matching ends here without finding the pattern.


So as you can see, t is never decremented. So if text length is n, we have a part of the complexity as n.

But p is decremented. If pattern length is m, is worst case time complexity O(mn)?

No. If you look at the above steps carefully, p was decrement 4 times consecutively after it was matched 4 times. Then p was decremented 2 times after it was matched 2 times. At last it was decremented 1 time after it was matched 1 time.

So basically to have n negative steps, you need n positive steps & n positive steps means you are done traversing through the whole text.

So worst time complexity is n + n = 2n which is O(n).


The time complexity of building proper prefix suffix table is O(m). Probably I will write a separate blog post on that. Here I want to focus on the actual search algorithm of KMP.

So total time complexity of KMP algorithm is O(m + n).


In our example above, we took a pattern with all repeatable characters. Now let's take an example where all characters are different.


Text: aaaabaabab

Pattern: abcde

Prefix suffix table is simple. There is no match among the characters in the pattern. So all values are 0.

index 1 2 3 4

value 0 0 0 0

Now let's match it.


t[1] = p[1], match length is 1, increment both t & p.

t[2] ≠ p[2], match failed, we won't increment t in that case, prefix[1] = 0 from the table, so we set p to prefix[1] + 1 = 1.

t[2] = p[1], match length 1, increment both t & p.

t[3] ≠ p[2], match failed, we won't increment t in that case, prefix[1] = 0 from the table, so we set p to prefix[1] + 1 = 1.

t[3] = p[1], match length 1, increment both t & p.

t[4] ≠ p[2], match failed, we won't increment t in that case, prefix[1] = 0 from the table, so we set p to prefix[1] + 1 = 1.

t[4] = p[1], match length 1, increment both t & p.

t[5] = p[2], match length 2, increment both t & p.

t[6] ≠ p[3], match failed, we won't increment t in that case, prefix[2] = 0 from the table, so we set p to prefix[2] + 1 = 1.

t[6] = p[1], match length 1, increment both t & p.

t[7] ≠ p[2], match failed, we won't increment t in that case, prefix[1] = 0 from the table, so we set p to prefix[1] + 1 = 1.

t[7] = p[1], match length 1, increment both t & p.

t[8] = p[2], match length 2, increment both t & p.

t[9] ≠ p[3], match failed, we won't increment t in that case, prefix[2] = 0 from the table, so we set p to prefix[2] + 1 = 1.

t[9] = p[1], match length 1, increment both t & p.

t[10] = p[2], match length 2, increment both t & p, but t is at the last position, so matching ends here without finding the pattern.


If you notice carefully above, you can see the same thing. To decrement p, we need to match pattern characters at least same number of times beforehand.

So search time complexity is 2n which translates into O(n).

If we take prefix suffix table construction time into account, then total time complexity would be O(m+n).


No comments:

Post a comment

How To Solve "Caused by: org.hibernate.HibernateException: Missing table" When Table Is Present In Database

If you are using JPA or Hibernate directly and got that exception while starting your application, there is one obvious reason for that. You...