algorithms and random thoughts: December 2011

Saturday, December 31, 2011

interview street challenge

The problems are not easy and require some thought, but not terribly hard as I can solve it on my own with enough time.

One problem is, given two string P[1500],　Q[1500], and integer K<=1500, find max length L such that P has a substr of length L, Q has a substr of length L and the two substr have at most K char differ.

I know Knuth-Morris-Pratt but does not help much here. As I have just read tourist's solution on TC SRM 528, binary search seems work here. If best substr has length L, then there is also a feasible length L-1. So we can binary search for the maximum length L. To check a particular L is valid or not, we can check all substr of P and Q with length L. But this might be too expensive, since there are 1500 substr from P and 1500 substr from Q. So we need to compare 1500*1500 pairs, and L might be as large as 1500. 1500^3 is too much. However we can make some observation.

After we compare P[0..L) to Q[0..L), we can get
P[1..L+1), Q[1..L+1),
P[2..L+2), Q[2..L+2),
each in amortized O(1) time.
So we only need to check 1500+1500 pairs in 1500 steps and this is quite fast. The algorithm then becomes 10^7 roughly.

Another problem is computing the expected sum. This is easy if you know linearity of expectation and use birthday paradox or balls-and-bins to compute each individual expectation and add them up. The interesting part is that I printed ans with .3lf while the problem asks 2 digits after decimal point. My solution failed because of this. And I have to change it to .2lf to make it Accepted.

The testcase is a bit weak as each problem has about 10 cases, and they did only a simple diff.

The nice thing is that you can write your solution in a number of different languages.

update: findstring can be solved with suffix array, although you cannot have vector as suffix as this uses too much memory.

update: kdifference is an easy problem as you only need to check a[i]+K and a[i]-K for each i.

permutation game is another easy one, standard combinatorial game, and since N<=15, you can use a 16-bit mask to keep track of unused numbers. dp with memorization works. Now I have 5 problems solved. The scoring system is a bit strange, people with more problems are ranked below me, maybe I had a few 50pt problems, but they are actually easy.



Rank  Hacker  Score  Submissions  Solved

79  lantimilan  364.15  12  5

update Jan 4: String similarity accepted. Although I haven't complete the proof for pref[] computation.
Proof complete.

let q[j] be the border[] of string s.
let p[i] be the pref[] of string s.

Claim 1: If q[j] hits pos=i, then pos[i] has a prefix of length l=q[j].

Claim 2: If pos=i is hit by any q[j], then the max q[j] is equal to pref[i].
Proof: if some other q[j2] overshoot pos[i] and results in a longer pref[i], then q[j] can be extended longer. The implication is that we can compute all pos hit by some q[j]. These are called essential positions. The next step is to use the essential positions as markers to compute other nonessential positions.

Claim 3: If pos[i] not covered by any marker pos[k], then pref[i]=0.
Proof: any prefix > 0 must be contained in some q[j] and all q[j] are shown in p[k], since any pos[i] covered by some q[j] must be covered by some p[k].

Claim 4: If two p[k1], p[k2] both cover pos[i], then the closer one, let it be k2, is better, that is, give a longer prefix.
Proof: every prefix from k1 is contained in the segment defined by k2.

Claim 5: the computation of p[i] for non-marker i are correct.
Proof: From induction we can assume previous entries are correct, so we are done with pref[i-k] term. For pref[k]-(i-k) term, if pref[i] goes beyond the end of pref[k], then it either end up at some q[j], which is impossible because i is non-marker, or it went into another marker k' and pref[k'] gave a longer prefix. But we already argued that if this is the case, k' must be closer to i than k, this contradicts with the choice of k being the closest marker to i from the left.

For marker it is easy: pref[i] = max q[j] such that q[j] hits pos[i], j-i+1 = q[j]

For non-markers: The formula is pref[i] = min(pref[i-k], pref[k]-(i-k))
Explanation: p[k] needs to cover pos[i], that is, k + p[k] -1 >=i, the remain length for pos[i] is pref[k]-(i-k), however, we need to observe pref[i-k] since not all segment may be used, otherwise pref[i-k] could be longer.

Friday, December 30, 2011

TC SRM 528

p250
you get length=10 for free and the only extra you get is by cutting a 20 into 10 and 10, with 1 cut you get 2 items. So sort the sticks by length and first process length with multiples of 10, then process others. I made a mistake forgot to check avail cuts >=0 and got challenged.

p500
Given a string with 'o' and 'x', find number of ways that splits into two identical substrings. string length =40.

The algorithm works by noticing that substr has length =20. So you can guess all 2^20 substr. For each of the guess, compute dp[i][j] as number of ways to match first[0..i) and second[0..j) to s[0..i+j). dp[n/2][n/2] is the answer for this guess. And ans is the sum of all guesses. Need to prune the guess with wrong number of 'o' and 'x', otherwise will TLE.

p1000
Given a[] of n=50 integers, each a[i]<=2000. P1 and P2 are two integers 50<= P1,P2 <=2000.
Find number of ways to get as many (P1,P2) pairs from a[].

Since the feasible solution is monotone. If we can make k pairs, we can also make k-1 pairs.
So we can binary search ans. low=0, high=2000*50/(50+50)=1000.

dp[n][p] is max #P2 if we use a[0..n-1] and we have #P1=p.

When update dp[i][j], we iterate all possible usage of a[i].
a[i] can offer #P1 from k=0 to a[i]/P1. the corresponding P2 offer is from 0 to (a[i]-k*P1)/P2

set all dp[i][j] to -1
dp[0][0] is 0 since we got 0 P2 if we do not use any of a[] and our P1 is 0
let i=0 to n
let j=0 to mid // if dp[i][j]>=0
let k=0 to a[i]/P1 // a[i] offer P1 as k
dp[i][j+k] updated by dp[i][j] + min(avail[i][j], mid-k) // min is due to a[i] cannot offer more than maxP1 - its own offer of P1

a valid answer is dp[n][mid] >= mid
and take the best over all mid we got the answer.

Monday, December 26, 2011

CF #99, prob 138

problem B: given an integer n, find two permutations of the two copies of n such that the sum will end with maximum number of zeros. n has at most 10^5 digits.

I read the problem wrong and was thinking the sum needs to have maximum number of zeros which could be in any position. Then the problem becomes much harder as we now have to deal with sum of tens and sum<=8 to space out the tens, in addition to the trailing block of 0s, 10, and 9s.

Spent two days on it and couldn't find a way to implement the idea. Only after reading the solution did I realize the maximum is on trailing zeros.

problem C
there are n<=10^5 trees and m=10^4 mushrooms, all in one line. each tree has position a[i], height h[i], prob fall left l[i] and probability fall right r[i], both given as an integer between 1 and 100. Each mushroom has a position b[i] and a power p[i]. Each tree may fall left or right with the given probability, a mushroom would survive if no tree hits it. A tree will hit a[i]-h[i] to a[i]-1 with probability l[i], and hit a[i]+1 to a[i]+h[i] with probability r[i]. The problem asks for the total survival mushroom power.

After some thought it seems we need to sort trees and mushrooms. Each tree can be seen as two intervals. And we need to keep track of current probability of killing given what trees are covering this position. This can be implemented as a queue. Each new start will contribute to the current probability by multiplying (1-l[i]) or (1-r[i]). Each exit will remove that contribution. Each mushroom can then be calculate using current probability. However, there is a catch. When there are many trees, the accumulated probability can be very small and then never come back due to underflow. The rescue is from the probability being represented as integral percentage. So we can keep an array of [1..100] for counts of percentages. then we merely incr/decr counts as new start or exit occurs.

Wednesday, December 21, 2011

CF down again?

received 504 gateway error.

Just found that it is back.

Finished prob 136 A-E.

next_permutation() tries to generate lexico next permutation from current. It returns FALSE if cannot find a lexico greater perm. So if you start with [1 3 2]. The perm [1 2 3] never gets generated. To be sure, first sort your vector to be lexico minimum.

problem E is about a game and you have to find an explicit strategy.

Sunday, December 18, 2011

CF #98, prob 137

I had 4 problems AC in this div2 contest and rating got up a little bit. Ranked 89 overall.

A, B are trivial problems and didn't take much thought.

C is asking how many intervals are included inside another interval. All (left,right) are distinct. It is all too natural to sort them by left, then from increasing left, use current interval to include as many follow up intervals as possible. Then move to next not included interval. The observation is that, if [a2,b2] is included in [a1,b1], then any [ai,bi] included in [a2,b2] is also included in [a1,b1] so we are not missing anything by throwing away [a2,b2]. On the other hand, any [ai,bi] not included in [a1,b1] cannot be included in any of [a2,b2].

D is a DP problem. Given a string of length 500, find a way to break it into at most k<=500 palindromes. Let dp(npal,pos) be the cost of the solution to break string [0,pos) into at most npal palindromes. The answer is then dp(k,n). The recurrence is dp(npal,pos) = min of dp(npal-1, p) + cost to make str[p,pos) a palindrome where p range from 0 to pos-1. Base case is true when pos=0 and npal>=0.

E It does not look difficult but couldn't figure out an algorithm fast enough. the problem is that given a string of length <= 10^5, find all maximum length substr with nvowel <= 2*nconsonant. It looks apparent that you need O(n) or O(nlogn) algorithm, the question is how?

After two days I finally got a solution that works. I don't know why I am so determined to solve it and I have much confidence that I will get it, although several attempts failed.

Let vow[n], con[n] be the count of vow and con from 0 to pos.
The first observation is that vow[n] and con[n] change at most 1 as we move along the string. So we can compute vow[n] and con[n] in one scan of the string.

Now consider the substr with maximum length. If it does not reach pos=0 or pos=n-1, then it is bounded inside the string. Therefore, it must be the case that nvow=2*ncon in this substring, otherwise it can get longer by expand left or right. Let the start be i and end+1 be j. we know that vow[j]-vow[i]=2*(con[j]-con[i]), rearrange it gives vow[i]-2*con[i]=vow[j]-2*con[j]. So we can build an array diff[i]=pair(vow[i]-2*con[i], i) and sort diff[]. Those with same diff[i] value will be consecutive after sorting and for elements with same diff[i], we seek the index of first and last, and currlen=last-first is a candidate for maxlen.

After that we check the maxlen if we begin from pos=0, and maxlen if we begin from pos=n-1.

After we have maxlen known, we can check all substr of length maxlen to see if they qualify as a solution, that is nvow<=2*ncon. This check can be done in O(n) time.

Overall the algorithm is O(n) time except sorting the diff[] array.

With even a simple implementation, my solution runs within 60ms in CF machine.

Approaches that does not quite work:
1. binary search maxlen. Unfortunately there is no monotone property as a valid substr of length k does not include a valid substr of length k-1.
2. start from a pos with a consonant and build a substring. The problem is that you could hit a block of vows and then a block of cons. So each pos might require O(n), which amounts to O(n^2).

Saturday, December 17, 2011

TC SRM 527

p275

It is p275 so would be a bit harder than the normal p250. The problem is:
Find a tree with n nodes that achieves maximum score. The score of a tree is the sum of score of all nodes, and the score of a node depends on its degree. You are given an array scores[0..n-1] with each element between 0 and 10000, a node with degree d would have score=scores[d-1].

Thought for quite some time but no good idea. For a tree you can try recursively remove one leaf and work on the rest. Another line of attack is to notice the total deg of a tree is 2(n-1). Enumerate a node with deg=1 to n-1 and recurse on the (n-1,d-deg). However, there is no guarantee that there is a node with deg.

Archon.JK submitted this problem within 5min!

I think I understand the problem now, after reading Archon.JK's solution.

his code looks like this
dp[1][0] = 0, u[] = [0, score[]]
for i=1 to n
for j=1 to i-1
for p=1 to n-1
for q=0 to n-2
dp[i][p] = max(dp[i][p], dp[j][p-1]+dp[i-j][q]-u[p-1]-u[q]+u[p]+u[q+1])

and the answer is max of dp[n][0] to dp[n][n-1]

dp[i][j] means best score for a tree with i nodes and one of them has deg=j
The idea is to split a tree with i nodes and a node with deg=p into two subtrees, one with j nodes and the other with i-j nodes, split at deg=p, so one subtree now has a node with deg=p-1 and the other one has deg=q. After merge they become deg=p and deg=q+1.

One possible improvement is to split a tree at a leaf, at node with deg=d. The code looks like this

dp[1][0]=0;
for i=2 to n
for d=1 to i-1
dp[i][d] = max( , dp[i-1][d-1]+dp[1][0]-u[d-1]-u[0]+u[d]+u[1])
we also need to update dp[i][1] since we now have a deg=1 node
dp[i][1] = max( , dp[i][d])

Now the answer is dp[n][1]

// As I was writing this one, I realized how difficult it is to write a solution. That explains why so few people would bother to write solution or summary and even for those who do, it gets delayed or incomplete

p450
Given the rows of a matrix, in order from 0 to m-1. Each row is a string of "0,1,?"
the columns are given as well, in random order. The problem is to find a matrix that agrees with both given rows and columns, and among all feasible ones, find the lexico smallest matrix. No more than 30 rows and 30 columns.

No good idea. As usual I have no idea on level 2 problem.

OK, lyrically's solution makes sense now.
Since we want lexico minimum solution, for each pos, from row 0 to m-1, col 0 to n-1, if cell is '?', we try '0', if the resulted subproblem has a feasible solution, then we fix this '?' to '0', otherwise we put a '1' here.

The subproblem is a bipartite matching problem, cardinality only, without weights. We can construct col[0] to col[n-1] from the rows. If there is a perfect match to the permuted columns, then we have a feasible solution.

Actually, for TC problems, many times if it looks like you need to try all permutations, there is a good chance that you can solve it with bipartite matching.

TODO: both topcoder.com and arena are down currently, need to verify solution.

p250 and p500 PASSED.
Two bugs in implementation of augment, one is that left is 0..n-1 and right is n..2n-1. and when augment, if from left to right, try all neighbors, if right to left, need to follow the matched vertex on the left.

Currently use BFS. The code looks rather ad hoc.

Wednesday, December 14, 2011

CF #96 div2 problem 133

prob133 D

Not a difficult problem since there are only 2500*4*2=10^4 configurations and just iterate them all, but made several mistakes.

1. Didn't realize when CP flips, the current pixel/cell also changes.
2. wrote something like pair.second.first=xxx; and then pair.second.first=xxx again, and I meant to say pair.second.second.

prob133 E

The problem gives a string with T or F, T means turn around and F means forward one step. A turtle moves in a 1D line and you must change exactly 'change' symbols in the command string, with the goal to maximize the distance of the final position.

After checking greedy and DP, it seems you can do DFS to check all valid configurations, (pos, index, change, dir). And there are only 10^6 confs. One catch is that change-2*i can be used.

Now one question:

How to suppress the gcc warning to gets function should not be used. Seems it comes from the linker.

Tuesday, December 6, 2011

TC SRM 526

p250 is an easy one and the only one that I submitted. However a mistake in
ducks[c]=1 instead of the correct ducks[r] = 1 cost all points. Now back to 1254, close to div2 now.

p500 is a combinatorial game and seems not difficult, however I cannot get the last sample case to pass. The problem is about a game. Two players, A and B, remove pebbles from a pile with n pebbles initially. A moves first. A can remove between 1 to K pebbles from the pile, but the remaining number must be a prime. B can remove between 1 to K pebbles, but the remaining number must be a composite (>=2 and not prime). Now decide the value of the game. If a player wins, it tries to win with minimum steps, if lose, it tries to maximize steps. The natural way is to calc(N,turn) to check whether the current turn wins. However each move needs to check K possible numbers and there are 2N configurations, for a total of 10^6^2 = 10^12, which will certainly TLE. But there has to be something about primes, can you check less configurations or check each configuration faster?

algorithms and random thoughts