algorithms and random thoughts: 2015

Sunday, December 13, 2015

Indeed prime codespring

https://www.hackerrank.com/contests/indeed-prime-codesprint/challenges

Problem D flatland-roads

Given an undirected connected graph with n = 10^5 vertices and m = 2*10^5 edges, and P = 15. Find out the number of vertices reachable from v using at most P bridges, for every vertex v.

1. All bridges can be identified by dfs. Let d[u] be the time that vertex u is discovered in dfs, and low[u] be the d[] value of the vertex w with minimum d[] value, such that w is adjacent to either u or one of its descendants. In undirected graph, dfs traversal only has tree edge and back edge, and only tree edge may become a bridge. It is easy to see that edge (u,v) is a bridge if and only if u is a parent of v and d[v] = low[v].

2. If we remove all bridges, each component behaves the same in counting. So we can collapse nodes in the same component into a super vertex, and the results graph is a tree!

3. Now that we have a tree, we can compute the numbers recursively. A few book-keeping in place. Let
reach[c][k] be the number of vertices that supernode c may reach with at most k superedges (bridges).
all[c][k] be the number of vertices that supernode c may reach with at most k superedges, either via its child, or via its parent.

Base case: reach[c][0] = all[c][0] = supernode[c].size
Recurisve step:
reach[c][k] = supernode[c].size + sum_{child} reach[child][k-1]
all[c][k] = reach[c][k] + all[parent][k-1] - reach[c][k-2]
we need to subtract the path from parent back to c so that we do not double count.

The dependency is a bit tricky to figure out, and I did the dumb thing of DP with memoization.

code here

Problem C needs an AVL tree or Red-Black tree with each node keeps a count of nodes in its subtree, and I really need to write one myself. Also a good place to test my implementation, if I ever wrote one. Writing RB-tree is tricky.

Problem E, I do not know.

Monday, September 14, 2015

CF 319, hackerrank worldcup

div2
problem B
Given n = 10^6 integers and integer m <= 1000, check whether a subset of the integers sum up to a number that is a multiple of m.

This looks like subset sum but is actually not. I had several false starts and one route that looks promising is that if we are not looking for sum, instead we look for subtraction, then it is easy, just find two numbers with same remainder modulu m. However we need sum here. The solution is actually simple, keep a map of remainders that you have seen already, and update with next element, either you use it or not, then you get an updated list of remainders that you can make with element 1..k. In the end you just check whether you have hit remainder = 0.

Hackerrank university worldcup

swapping bridges: the key is that with n nodes and n directed edges, the graph is a collection of disjoint cycles. Proof by induction.

worldcup game: it appears to be a PSPACE-complete problem, and thus insurmountable. However, we are dealing with a tree here and the answer is, one player takes the biggest subtree and the other player takes the rest. So the answer would be the same if each player is trying to minimize the opponent's score. One dfs on the tree is enough.

Bishop war: at the current row, you only care your previous row, yet each cell can attack left or right or both, thus you need 2 bits to encode the state, 0,1,2,3. This makes the encode/decode slightly more complicated. And there is one catch, you need to skip invalid previous row to speed things up. For my case, a speed up of 7s to .5s was observed.

Alien's age: Haven't solved yet but looks it is a binary search on the max age, and to do the check, you can use 4 types of footprints to run through the matrix, mark cells as don't care if you already covered them. And use count of 0,1,2 for none, taken, don't care. You also need prefix sum to make it fast to query a range of column or diagonal.

Saturday, February 7, 2015

The web technology - CGI

In the old days, web servers were serving static content only, that is, when a browser requests www.example.com/mypage.html, there is actually a file named mypage.html and the server will fetch the page and return it to the client in its response.

Then people want more dynamic stuff and CGI is the first solution. When a browser requests example.com/cgi-bin/myscript.pl, the server is not returning the script itself, instead, the server will run myscript.pl and send back the output. The binary/script is normally put in a directory called cgi-bin, and perl is a popular language to write that script. However, you can put anything in cgi-bin, anything executable, for example, a c++ binary, provided that the server knows how to execute it.

This approach launches a separate process for each request and has several consequences. Forking a process and then shut it down is expensive, especially when the actual computation is cheap and quick.

Competing technologies available, Java servlet, Microsoft ASP.

Thursday, January 29, 2015

status update

I have graduated from UC Riverside and now works as a software enginner.

facebook hacker cup round 2

I was lucky enough to get into round 2, but it was a disaster.
After 3 hours, I had 10pt problem submitted, but I know somehow it would fall. End up getting 0 points.

lazy sort: the observation is that at any time, the partial solution must be a consecutive segment of (1..n), or (n..1). For some reason I was confusing it with a harder problem, where you can pop_front and pop_back, in addition to push_back and push_front.
The solution processes the given list, and keep track of current min and max, and the start and end of remain list. Now if there is a solution, then either the head or the tail of the remaining list is min-1 or max+1, and only one of the four possibility holds. So you can deterministically follow that one, or return false.

all critical: the problem asks for expectation. As usual there are two possible approaches, one is using definition, that is, we compute an infinite sum, pr(X=n) * n for n = 0 to inf. It takes a dynamic programming dp(s, l), meaning the probability to get s success in the first l rounds. dp(0, 0) = 1.0, dp(s, l) could come from dp(k, l-1) for k = 0, 1, 2, ..., s, that is, dp(s, l) = sum_{k} dp(k, l-1) * (N-k, s-k) because in round l-1 we had k success, and there can be at most N = 20 success. In round l we must have s - k success, and that must come from N-k candidates. Once we have dp(N, l) for l = 1, 2, ..., n for some large enough n, we can compute expectation as sum_{l} [dp(N,l) - dp(N, l-1)] * l. The problem asks for a precision of 1e-5, so L = 1e6 should be enough.
The other approach uses recursive idea. Let E[n] be the expected number of trials when there are exactly n possible success. Now consider the first trial. We could have 0, 1, ..., n success. If we have 0 success, then we need another E[n] trials, if we have 1 success, then we need another E[n-1] trials, ..., therefore
E[n] = 1 + E[n] * (1-p)^n + E[n-1] * C(n,1) * p * (1-p)^n-1 + ... + E[n-k] * C(n,k) * p^k * (1-p)^(n-k) + ... + E[0] * p^n
You should convince yourself that E[0] = 0. Then it is simple to calculate E[n] for n = 1 to N, and E[N] is the answer.

auto complete strikes back: The first step is to build a trie with all words. Then we solve the problem recursively on the tree. Consider a subtree rooted at T, let cost(T, k) be the cost to get k words if we only use words in this subtree. Then cost(root, K) is our answer. Obviously cost(T, 0) is 0. To get k words, we need to get k1 from child c1, k2 from child c2, ..., such that k1 + k2 + ... = k, and the cost is the sum of the child subtree's cost. A special case is that when T is a word, then the sum of k1, k2, ... maybe k-1. The cost to get the word in T is the number of links to follow to get to node T, which can be stored when we build the trie, call it the depth. Another case that warrants special treatment is k = 1, in this case we can have a cost of T.depth because T must have at least one word in its subtree and to get one word we only need to arrive at node T and no further.
One implementation challenge is to manage memory. Both new/delete and use a global array to allocate node will work, but my mistake to put a local array of size 26*100 caused stack overflow and segmentation fault, and it takes quite some time to find out the culprit.

problem D is supposed to be a complicated segment tree problem. To be continued.

algorithms and random thoughts