Saturday, December 31, 2011

interview street challenge

The problems are not easy and require some thought, but not terribly hard as I can solve it on my own with enough time.

One problem is, given two string P[1500], Q[1500], and integer K<=1500, find max length L such that P has a substr of length L, Q has a substr of length L and the two substr have at most K char differ.

I know Knuth-Morris-Pratt but does not help much here. As I have just read tourist's solution on TC SRM 528, binary search seems work here. If best substr has length L, then there is also a feasible length L-1. So we can binary search for the maximum length L. To check a particular L is valid or not, we can check all substr of P and Q with length L. But this might be too expensive, since there are 1500 substr from P and 1500 substr from Q. So we need to compare 1500*1500 pairs, and L might be as large as 1500. 1500^3 is too much. However we can make some observation.

After we compare P[0..L) to Q[0..L), we can get
P[1..L+1), Q[1..L+1),
P[2..L+2), Q[2..L+2),
each in amortized O(1) time.
So we only need to check 1500+1500 pairs in 1500 steps and this is quite fast. The algorithm then becomes 10^7 roughly.

Another problem is computing the expected sum. This is easy if you know linearity of expectation and use birthday paradox or balls-and-bins to compute each individual expectation and add them up. The interesting part is that I printed ans with .3lf while the problem asks 2 digits after decimal point. My solution failed because of this. And I have to change it to .2lf to make it Accepted.

The testcase is a bit weak as each problem has about 10 cases, and they did only a simple diff.

The nice thing is that you can write your solution in a number of different languages.

update: findstring can be solved with suffix array, although you cannot have vector as suffix as this uses too much memory.

update: kdifference is an easy problem as you only need to check a[i]+K and a[i]-K for each i.

permutation game is another easy one, standard combinatorial game, and since N<=15, you can use a 16-bit mask to keep track of unused numbers. dp with memorization works. Now I have 5 problems solved. The scoring system is a bit strange, people with more problems are ranked below me, maybe I had a few 50pt problems, but they are actually easy.

Rank Hacker Score Submissions Solved
79 lantimilan 364.15 12 5

update Jan 4: String similarity accepted. Although I haven't complete the proof for pref[] computation.
Proof complete.

let q[j] be the border[] of string s.
let p[i] be the pref[] of string s.

Claim 1: If q[j] hits pos=i, then pos[i] has a prefix of length l=q[j].

Claim 2: If pos=i is hit by any q[j], then the max q[j] is equal to pref[i].
Proof: if some other q[j2] overshoot pos[i] and results in a longer pref[i], then q[j] can be extended longer. The implication is that we can compute all pos hit by some q[j]. These are called essential positions. The next step is to use the essential positions as markers to compute other nonessential positions.

Claim 3: If pos[i] not covered by any marker pos[k], then pref[i]=0.
Proof: any prefix > 0 must be contained in some q[j] and all q[j] are shown in p[k], since any pos[i] covered by some q[j] must be covered by some p[k].

Claim 4: If two p[k1], p[k2] both cover pos[i], then the closer one, let it be k2, is better, that is, give a longer prefix.
Proof: every prefix from k1 is contained in the segment defined by k2.

Claim 5: the computation of p[i] for non-marker i are correct.
Proof: From induction we can assume previous entries are correct, so we are done with pref[i-k] term. For pref[k]-(i-k) term, if pref[i] goes beyond the end of pref[k], then it either end up at some q[j], which is impossible because i is non-marker, or it went into another marker k' and pref[k'] gave a longer prefix. But we already argued that if this is the case, k' must be closer to i than k, this contradicts with the choice of k being the closest marker to i from the left.

For marker it is easy: pref[i] = max q[j] such that q[j] hits pos[i], j-i+1 = q[j]

For non-markers: The formula is pref[i] = min(pref[i-k], pref[k]-(i-k))
Explanation: p[k] needs to cover pos[i], that is, k + p[k] -1 >=i, the remain length for pos[i] is pref[k]-(i-k), however, we need to observe pref[i-k] since not all segment may be used, otherwise pref[i-k] could be longer.

4 comments:

  1. Hi, I don't understand your solution for the first problem (substring diff), It would be great if you elaborate a bit more...

    ReplyDelete
  2. Hi,i didn't understand the solution of your (substringdiff) problem.It would be grate if you please elaborate.
    I think we have to compare p[0...L] with all q[0...L],q[1...L+1],q[2...L+2],......q[n-l....n];
    here n=length of the string 'q'.

    ReplyDelete
  3. hi,i think we have to compare each p[0..L],p[1..L+1],..p[n-L..L] with q[0..L],q[1..L+1],..q[n-L..L] but you are doing just p[0..L] with q[0..L] only .I think its not sufficient.

    ReplyDelete
  4. Substring Diff这题复杂度应该是O(n^2logn)吧

    ReplyDelete