KMP string matching algorithm

KMP 算法本质

This is a very good KMP algorithm article if you can read Chinese: KMP algorithm.  But it still doesn't clear state what known information is used to reduce comparison. I figured this out by myself.  The "known information" is the longest unique substring.



Last Step










The naive  solution what it fails to match ' ' with 'D', it will reset j = 0, and i = 5 to re-match the whole string from beginning. Therefore the "known information" S[5] == P[1] == B, S[6] == P[2] == C,
S[7] == P[3] == D, and S[8] == P[4] == A is discarded.  The KMP algorithm is utilizing this info to reduce comparison. Since we know P[0] != P[1] != P[2] != P[3] and P[0] == P[4] == A,  there is no necessary to compare P[0] with S[5], S[6], S[7], S[8]. Why? Because we know S[5] == P[1] == B, and P[0] != P[1], therefore, P[0] != S[5], similarly, P[0] cannot be S[6], S[7] either. Since P[0] == P[4] == A, we cannot skip S[8], that's why we start with S[8] as the next comparison point. The next array used to calculate the S[8] position.














Comments

Popular posts from this blog

zhitongguigu

Array Part2

Two pointers (Part 1)