algorithms and random thoughts: May 2012

Sunday, May 20, 2012

TC SRM 541

Another bad performance, failed 250 and get a score of 0.

One thing to learn:

When you need to do 1<<shift, it will overflow silently. You can write 1LL<<shift but it is easy to forget put LL after 1. The way to do it?

long long val=1;
while(val < cap) val<<=1;

Friday, May 18, 2012

codechef may12, problem MEDIAN

After 10 submissions, multiple WA and TLE, finally got problem MEDIAN in codechef may12 challenge accepted.

The problem is: given a sequence of 0 and 1, with at least one 1, and n<=30, find minimum number of steps to make them all 1. In each step you can choose a subarray with #1 >= #0 and flip all 0 to 1.

OK, you know n<=30 so you can represent all states with bitmap which fits into int. Then you do BFS search to start from initial configuration and look for (1<<30)-1. The hard part, at least to me, is to implement each transition in O(n) time. A naive approach is to try all possible subarray, there are O(n^2) of them, and add them to the queue. This will TLE for sure as it gets you a lot more useless states. An improvement is by observing that if a feasible interval, meaning #1>=#0, is included in another feasible interval, then there is no point try that subinterval. This gives O(n) next state instead of O(n^2). However, if you insist on finding feasible interval starting from each pos=0..n-1, it still takes O(n) time for each pos and you need O(n^2) time to find them. Maybe there is a way to do it in linear time, but I do not know how.

My idea is based on the observation: start at p1=0, then look for the rightmost p2 such that diff=#1 - #0 is minimized. Once we found p2, we check diff. If diff <0, then we know we are done, add next state into queue and update its dist. If diff ==0, then we can try p2=p2+1 in next iteration to look for a new p2. Otherwise diff > 0, then we move p1 to the right and stop when diff=0. If we end up with diff >0, that means there is no feasible interval within [p1..n-1], and we stop. Otherwise we have diff=0 so we record this interval, and in next iteration we will start with p2=p2+1. Clearly it is O(n) time and we have at most O(n) next state.

Now we prove the case diff>0 implies no feasible interval can appear within [p1..n-1]. Since diff >0, we know that either all [p1..p2] are 0, or p2 must be 1, otherwise we can take p2=p2-1 and we get a smaller diff. But if p2 is 1, then when we move p1 to p2, we have to have a feasible interval because p2 itself is a feasible interval. Therefore all [p1..p2] are 0. However, for a feasible interval to exist within p1..n-1, we have to have some 1 in this interval, then include this pos into [p1..p2] will give a smaller diff, which is a contradiction. Therefore we have a somewhat surprising result, if we fail to find a feasible interval between p1..n-1, then all p1..n-1 must be 0.

Implementation notes:
1. I forgot to clear map between testcases.
2. I was using a[i] instead of the a[i] value from bitmap conf.
3. To run all cases, the standard way is:
int ncase; cin >> ncase;
while(ncase--)
solve();

User	Mem	Time	Lang
bb_1	1.6M	0.00	C	View Solution
adge1	2.8M	0.01	C++ 4.3.2	View Solution
lantimilan	2.8M	0.02	C++ 4.3.2	View Solution

Tuesday, May 15, 2012

classics revisited

There are always something new to things that you thought you already know, and google codejam just proved it.

round 1B, problem C, you are given exactly 500 positive integers, each no more than 10^12, find two subsets with equal sum.

round 1C, problem C, you are given two lists, list A has a sequence of of toys, and list B has a sequence of of boxes. You try to pack as many boxes with toys. Each toy can only be put into a box of same type. And the two lists have to go in order, each time you either fit a toy into a box, or discard a toy, or discard a box. Constraints: cnt is at most 10^12, type is no more than 100, list A and B has length no more than 100.

The round 1B problem looks like the NP-hard problem subset sum and it seems impossible to deal with 500 items with value as large as 10^12. The well-known solutions either enumerates all 2^n subsets or build a table with dp[n][sum]. Neither of them works here. However, read the constraints again. It says EXACTLY 500 items, why didn't it say at most 500? Why having more items helps?

OK, you know that each item is no more than 10^12, so you do not have many different sums, in fact, no more than 500*10^12=500*10^14. On the other hand you have 2^500 = 10^150 different subsets which is way bigger.

The solution builds on all 6-element subsets. The number of sums is no more than 6*10^12, and the number of sets is 500^6 = 10^16, so you have a good chance to hit two subsets with same sum if you pick two random subsets. And the algorithm simply shuffle the input integers, and keep enumerating 6-element subsets until you have two subsets with equal sum. The time limit is 8min which is more than enough. My naive implementation runs in 4min.

The round 1C problem is about longest common subsequence, or LCS, except that the two lists are a bit long, could be 100*10^12 items, and the usual dp approach just won't work. However you can observe the special arrangement of this problem and tweak the algorithm a bit. The recurrence looks like this:

dp[i][j] = max(dp[i][j-1], dp[i-1][j], dp[i-1][j-1] + 1)
And the last case applies only when type at a[i] matches type at b[j]. In this problem, if type of a[i] and b[j] mismatch, then we do the same since the optimal solution end at a[i],b[j] has to discard either a[i] or b[j]. However, if they do agree on type, then we have to run this particular type from i to some prior position x, and run from j to some prior position y, so we update dp[i][j] by dp[x][y] + min(count of type in a[x+1..i], count of type in b[y+1..j]) and the count can be obtained by maintaining a prefix sum.

And that was it. The large case runs in 20s.

Saturday, May 12, 2012

TCO 2B

Much easier than 2A, although I end up with -25.

p250, just keep the known ones in a set and do the processing.

you can use max in the known,
or use some prefix of unknown and the rest maxKnown,
or use several copies of all unknown and prefix + maxKnown.

My bug? I wrote
for(int k=1; k<=N && r>0; ++k, r-=k, ans++)
But in r-=k I need the k value before ++.
A hard lesson that cost my p250.

p500, the game is completely determined by move[0]. After deciding the partition of move[0] numbers, the rest is just a dp to pick best subset. The implementation is nontrivial and I need maybe 15min more to complete.

Saturday, May 5, 2012

CodeSprint TwitchTV

This is an interview street codesprint challenge and I started really late, sometime around 3pm and the contest is between 12-4pm. Luckily I got the second one fairly quick, and then realized that the first one is bipartite matching. The third one is Floyd-Warshall with greedy. I don't know about 4th one. It surprised me a bit to see my bipartitie matching code passed in the first submission, after checked the trivial test case.

Travel: the only problem I do not have a solution.
UPDATE: Apparently it is Bellman-Ford (I don't know why I keep calling this one Boyer-Moore, which is for string matching), the dynamic programming shortest path algorithm. The only tricky point is that you need to pay toll, so let node[i] be the gold you have at i, then the update step from i to j is
node[j] = max(node[j], (node[i]-edge[i][j]+s[j]) /2)
The problem statement is not very clear on this. Here node[i] is the max gold you can have at node i. Also note that you cannot have unbounded gold at any node as max among all node[i], let it be node[k], will not improve once passed its own s[k] so the updating process cannot continue indefinitely.

Arrangement: the problem is about finding hamitonian path and with N=1000 there is no hope to solve it exactly. And that is why the problem says it is an approximate problem. I didn't have time to implement some backtrack solution but it would be interesting to try some heuristics. Given 3 more hours, I could have done better.

Anyway I am happy with my score and interesting enough, why this particular company cares so much about graph algorithms.

Update: I finished a naive dfs backtrack solution to the last problem, Arrangement, which asks for a Hamitonian path given a graph. Surprisingly, it got Accepted.

The score board and my rank

Rank	Hacker	Country	Score	Submissions	Solved
31	lantimilan	USA	300.00	4	3

Name	Points	Statistics
Super Bomb	100	40/580
Matching	100	43/630
Centre of Mass	100	47/297
Travel	100	29/63
Arrangement	100	14/271

Codeforces Round #118 (Div. 2)

This one has some easy problems and I got lucky in google problem D, and no one solved E. My highest rank in div2, and got a huge boost in rating.

Candidate Master lantimilan

Li Yan, Riverside, CA, United States (USA)

Online

Contest rating: 1735

lantimilan

4194

492 00:04

924 00:19

1290 00:35

1488 01:04

Friday, May 4, 2012

AM-GM inequality

In today's codeforces match, there is a problem ask for x,y,z such that
x+y+z = S for a given S
to maximize x^a * y^b * z^c

peter50216 points out AM-GM inequality applies to this problem.

for a, b, c > 0
S = x+y+z = a*x/a + b*y/b + c*z/c >= ((x/a)^a * (y/b)^b * (z/c)^c)^(1/abc)
and equality holds when x/a = y/b = z/c = (x+y+z)/(a+b+c)

Thursday, May 3, 2012

bash if

The if constructs in bash is a convenient tool if you know how to use it

conditions are inside [ ], which is equivalent to test command

== and related are for strings

-gt -ne for numeric comparison

-a -o for and or

if [ -f /var/log/message ]; then echo "message exists"; fi

[ "$(whoami)" != 'root' ] && (echo you are using non-privileged account; exit 1)

test "$(whoami)" != 'root' && (echo you are using non-privileged account; exit 1)

Wednesday, May 2, 2012

A silly RSA implementation

A while ago I had a homework to implement RSA and each letter is raised to the e-th power where e is the public key, modulo N=pq. Decryption is by raise each integer in cipher text C to the d-th power where d is the secret key. The problem? Since we have only 256 ascii characters and if each char is mapped to a distinct integer, then you do NOT need e or d or whatever key to decrypt, you only need a mapping from char to integer.

One remedy approach is to group adjacent char into blocks so you will need a mapping of 256^10 = 2^80 if you group letters in block of 10, and this is much harder to keep a mapping.

Strongly Connected Components

The Tarjan's algorithm does two passes of DFS, and I keep forgetting the algorithm. Now that I have to look it up again, here are some intuitions.

SCC for directed graph.

Let us begin with a DFS tree. You know that if you can reach your ancestor, then both of you are in the same component. To check if some node y is your ancestor, and you are x, then dfsbegin[y] < dfsbegin[x] and dfsfinish[y] < dfsfinish[x], that is, you start later and finish early, then whoever has an interval covers your interval is your ancestor.

Now consider a node y, we know that y can reach x for every x that is its descendant. If we start from x and there is a path to y, then we can assign x to y. However we do not want to start from every x as it is too slow. Now comes the reverse edge trick. A path from x to y is equivalent to a path from y to x if we reverse every edge in the graph. If we do that, every tree edge and forward edge become back edge, and the previous back edge becomes either tree edge or forward edge but that does not bother us since they are not involved in finding components. The annoying edge is cross edge, which must go from right to left in the original graph, now they have to go from left to right after reverse. Then we can finish components on the right before work on the left so we will detect a node already assigned to a different component when hitting a cross edge. One way to achieve this is to process nodes with decreasing order of dfsfinish[x].

algorithms and random thoughts