Tuesday, May 15, 2012

classics revisited

There are always something new to things that you thought you already know, and google codejam just proved it.

round 1B, problem C, you are given exactly 500 positive integers, each no more than 10^12, find two subsets with equal sum.

round 1C, problem C, you are given two lists, list A has a sequence of of toys, and list B has a sequence of of boxes. You try to pack as many boxes with toys. Each toy can only be put into a box of same type. And the two lists have to go in order, each time you either fit a toy into a box, or discard a toy, or discard a box. Constraints: cnt is at most 10^12, type is no more than 100, list A and B has length no more than 100.

The round 1B problem looks like the NP-hard problem subset sum and it seems impossible to deal with 500 items with value as large as 10^12. The well-known solutions either enumerates all 2^n subsets or build a table with dp[n][sum]. Neither of them works here. However, read the constraints again. It says EXACTLY 500 items, why didn't it say at most 500? Why having more items helps?

OK, you know that each item is no more than 10^12, so you do not have many different sums, in fact, no more than 500*10^12=500*10^14. On the other hand you have 2^500 = 10^150 different subsets which is way bigger.

The solution builds on all 6-element subsets. The number of sums is no more than 6*10^12, and the number of sets is 500^6 = 10^16, so you have a good chance to hit two subsets with same sum if you pick two random subsets. And the algorithm simply shuffle the input integers, and keep enumerating 6-element subsets until you have two subsets with equal sum. The time limit is 8min which is more than enough. My naive implementation runs in 4min.

The round 1C problem is about longest common subsequence, or LCS, except that the two lists are a bit long, could be 100*10^12 items, and the usual dp approach just won't work. However you can observe the special arrangement of this problem and tweak the algorithm a bit. The recurrence looks like this:

dp[i][j] = max(dp[i][j-1], dp[i-1][j], dp[i-1][j-1] + 1)
And the last case applies only when type at a[i] matches type at b[j]. In this problem, if type of a[i] and b[j] mismatch, then we do the same since the optimal solution end at a[i],b[j] has to discard either a[i] or b[j]. However, if they do agree on type, then we have to run this particular type from i to some prior position x, and run from j to some prior position y, so we update dp[i][j] by dp[x][y] + min(count of type in a[x+1..i], count of type in b[y+1..j]) and the count can be obtained by maintaining a prefix sum.

And that was it. The large case runs in 20s.

4 comments:

  1. hey Li Yan,
    for the GRID WALKING problem in interviewstreet, could you paste you output for this input...
    1
    4 4
    5 5 5 5
    9 9 9 9
    Mine is 560. Am not sure if am missing any special case in the challenge. I did the DP and calculations with Pascal triangle. Yet am missing something basic here(i passed 1/11 cases). Could you help me out... ?

    ReplyDelete
  2. [lyan@localhost interviewstreet]$ g++ gridwalk.cpp
    [lyan@localhost interviewstreet]$ cat >in
    1
    4 4
    5 5 5 5
    9 9 9 9
    [lyan@localhost interviewstreet]$ ./a.out <in
    4096

    ReplyDelete
  3. thanks a lot Li Yan, I got it now.
    Turned out that, I combinated the steps instead of permuting.

    ReplyDelete