algorithms and random thoughts

Sunday, March 13, 2016

bash

http://mywiki.wooledge.org/BashSheet#Syntax

If you need to run several commands in background, do it like this:

./cmd1 args1 & ./cmd2 args2 & ./cmd3 args3

; is used to separate commands executed synchronously.

Saturday, March 5, 2016

install mesos

Turns out to be much more work than expected.

I had to install a few packages manually because my ubuntu is too old, and apt-get cannot find anything.

libapr and its friends
svn headers
libcurl
and cyrus sasl

final command line is:
../configure LD_LIBRARY_PATH=/usr/local/lib SASL_PATH=/usr/local/lib/sasl2 --with-apr=/usr/local/apr --with-svn=/usr/local --with-curl=/usr/local --with-sasl=/usr/local

just typed make and it would take a while to finish.

Thursday, February 4, 2016

Ramsey's theorem

The problem is classic, given 6 people, either there is a group of 3 who know each other, or there is a group of 3 who do not know each other, assuming the known relationship is bidirectional.

Proof from West's Graph Theory book.
Use pigeon-hole principle, among 6 nodes, in G and ~G, at least one of them have deg 3, because deg_G(v) + deg_~G(v) = 5. Let's say it is G, and node v has deg >= 3. Now we consider 3 neighbors of v, none of them have an edge between them, otherwise we have a triangle. It follows that the three neighbors form a triple with no edges.

Tuesday, February 2, 2016

Facebook Hacker Cup Round 2

Problem B: Carnival Coins.

Problem: You play a game with N coins. Each round you can use some number of coins and flip all of them. If the number of heads is at least K, then you win a point. You keep playing until all N coins are used. Calculate the expected number of points, assuming you play optimally. Flipping a single coin turns head with probability P.

Constraints: N <= 3000, K <= N, 0 <= P <= 1

First, we can calculate the probability prob[n][k], getting exactly k head when flipping n coins. Then it is easy to get cum[n] which is the probability to get at least K heads with n coins.

prob[n][k] = P * prob[n-1][k-1] + (1-P) * prob[n-1][k]
and prob[0][0] = 1.0

cum[n] = prob[n][K] + prob[n][K+1] + ... + prob[n][K]

Next, we need to compute dp[n], which is the expected number of points with n coins, if play optimally.
dp[n] could be cum[n] if we flip all n coins at once. Or we could split n coins into two subsets, one with j coins and the other with n-j coins, for j = 1 to n-1. Then dp[n] is the max among cum[n], dp[1] + dp[n-1], ..., dp[n-1] + dp[1]

Implementation is straightforward.

During contest, I missed both observation. Was calculating prob[n][k] using binomial coefficients and N=3000 is simply too big for double. Didn't see the recurrence. Also not seeing the simple dp to split coins into two subproblems.

===========================================
Problem C : Snakes and Ladders

This problem looks intimidating, as N is 200000, so we need a O(nlogn) solution. What I have observed during contest:
1. It might help to find ladders with same height and no higher ladders in between. This is because for such ladders, any pair can hang a snake.
2. To compute the sum (X_j - Xi)^2, we can actually compute this sum in linear time.

Looking at the sum again, it is actually simple.
sum_(1<=i= sum_(1<=i= (n-1) * sum_(1<=i<=n) X_i^2 - second

while second = (sum_(1<=i<=n) X_i)^2 - sum_(1<=i<=n) X_i^2

To partition snakes into adjacent sets, we process each ladder in order of their x-value, and maintain a stack of previous heights in decreasing order. For each ladder, it kills all previous smaller heights, and append the current height to end of stack.

Sunday, December 13, 2015

Indeed prime codespring

https://www.hackerrank.com/contests/indeed-prime-codesprint/challenges

Problem D flatland-roads

Given an undirected connected graph with n = 10^5 vertices and m = 2*10^5 edges, and P = 15. Find out the number of vertices reachable from v using at most P bridges, for every vertex v.

1. All bridges can be identified by dfs. Let d[u] be the time that vertex u is discovered in dfs, and low[u] be the d[] value of the vertex w with minimum d[] value, such that w is adjacent to either u or one of its descendants. In undirected graph, dfs traversal only has tree edge and back edge, and only tree edge may become a bridge. It is easy to see that edge (u,v) is a bridge if and only if u is a parent of v and d[v] = low[v].

2. If we remove all bridges, each component behaves the same in counting. So we can collapse nodes in the same component into a super vertex, and the results graph is a tree!

3. Now that we have a tree, we can compute the numbers recursively. A few book-keeping in place. Let
reach[c][k] be the number of vertices that supernode c may reach with at most k superedges (bridges).
all[c][k] be the number of vertices that supernode c may reach with at most k superedges, either via its child, or via its parent.

Base case: reach[c][0] = all[c][0] = supernode[c].size
Recurisve step:
reach[c][k] = supernode[c].size + sum_{child} reach[child][k-1]
all[c][k] = reach[c][k] + all[parent][k-1] - reach[c][k-2]
we need to subtract the path from parent back to c so that we do not double count.

The dependency is a bit tricky to figure out, and I did the dumb thing of DP with memoization.

code here

Problem C needs an AVL tree or Red-Black tree with each node keeps a count of nodes in its subtree, and I really need to write one myself. Also a good place to test my implementation, if I ever wrote one. Writing RB-tree is tricky.

Problem E, I do not know.

Monday, September 14, 2015

CF 319, hackerrank worldcup

div2
problem B
Given n = 10^6 integers and integer m <= 1000, check whether a subset of the integers sum up to a number that is a multiple of m.

This looks like subset sum but is actually not. I had several false starts and one route that looks promising is that if we are not looking for sum, instead we look for subtraction, then it is easy, just find two numbers with same remainder modulu m. However we need sum here. The solution is actually simple, keep a map of remainders that you have seen already, and update with next element, either you use it or not, then you get an updated list of remainders that you can make with element 1..k. In the end you just check whether you have hit remainder = 0.

Hackerrank university worldcup

swapping bridges: the key is that with n nodes and n directed edges, the graph is a collection of disjoint cycles. Proof by induction.

worldcup game: it appears to be a PSPACE-complete problem, and thus insurmountable. However, we are dealing with a tree here and the answer is, one player takes the biggest subtree and the other player takes the rest. So the answer would be the same if each player is trying to minimize the opponent's score. One dfs on the tree is enough.

Bishop war: at the current row, you only care your previous row, yet each cell can attack left or right or both, thus you need 2 bits to encode the state, 0,1,2,3. This makes the encode/decode slightly more complicated. And there is one catch, you need to skip invalid previous row to speed things up. For my case, a speed up of 7s to .5s was observed.

Alien's age: Haven't solved yet but looks it is a binary search on the max age, and to do the check, you can use 4 types of footprints to run through the matrix, mark cells as don't care if you already covered them. And use count of 0,1,2 for none, taken, don't care. You also need prefix sum to make it fast to query a range of column or diagonal.

Saturday, February 7, 2015

The web technology - CGI

In the old days, web servers were serving static content only, that is, when a browser requests www.example.com/mypage.html, there is actually a file named mypage.html and the server will fetch the page and return it to the client in its response.

Then people want more dynamic stuff and CGI is the first solution. When a browser requests example.com/cgi-bin/myscript.pl, the server is not returning the script itself, instead, the server will run myscript.pl and send back the output. The binary/script is normally put in a directory called cgi-bin, and perl is a popular language to write that script. However, you can put anything in cgi-bin, anything executable, for example, a c++ binary, provided that the server knows how to execute it.

This approach launches a separate process for each request and has several consequences. Forking a process and then shut it down is expensive, especially when the actual computation is cheap and quick.

Competing technologies available, Java servlet, Microsoft ASP.