A factor graph is a bipartite graph corresponding to the factorization of a function. Given a factorization of a function G(X1, X2, ... , Xn),
G(X1, X2, ... , Xn) = ∏j=1,...,m ƒj(Sj)       (1)
where Sj ⊆ {X1, X2, ... , Xn}, the corresponding factor graph G = (X,F,E) consists of variable vertices X = {X1,X2, ... , Xn}, factor vertices F = {ƒ1, ƒ2, ... , ƒm}, and edges E. The edges depend on the factorization as follows: if Xk ⊆ Sj, then there is an undirected edge between the factor vertex ƒj and the variable vertex Xk.
For example, we consider a function that factorizes as follows: G(X1,X2,X3) = ƒ1(X1)ƒ2 (X1,X2)ƒ3(X1,X2)ƒ4(X2,X3). The corresponding factor graph representation is given below.
Note that, a factor graph is a tree if it does not contain any cycle. Thus, the factor graph shown in the above diagram is not a tree. However, if we merge ƒ2(X1,X2)ƒ3(X1,X2) into a single factor, the resulting factor graph will be a tree. In the following sections, we will see how factor graphs can be combined with belief propagation algorithms to efficiently compute certain characteristics of the function G(X1, X2, ... , Xn), i.e., the marginal distributions, the largest joint a posteriori probability. Belief propagation algorithms are generally exact for trees, but approximate for graphs with cycles.
In this project, we use factor graph to solve the following problem. Find a setting of the variables {X1, X2, ... , Xn}, that jointly maximize G(X1, X2, ... , Xn) in Equation 1. In other words, we want to determine the vector Xmax, that maximizes the joint distribution, so that
Xmax = arg max X G(X)       (2)
for which the corresponding value of the joint distribution will be given by
G(Xmax) = max X G(X)       (3)
In the next sextion, we shall use a version of the belief propagation (BP) algorithm to solve the optimization problem described above. The algorithm can find the exact solution if the graph is a tree.
First, we shall describe the belief propagation algorithm to solve the above mentioned optimization problem when the factor graph is a tree. This is also referred to as the Max-sum algorithm and can be viewed as an application of dynamic programming in the context of graphical models.
Since, the products of many small probabilities may lead to underflow problems, it is convenient to work with the logarithm of the joint distribution. The logarithm is a monotonic function and hence, the max operator and the logarathmic function can be interchanged.
ln(maxX G(X)) = maxX ln G(X)       (4)
Therefore, taking the logarithm has the effect of replacing the products in the right hand side of Equation 1 by the summation of logarithmic factors. Now, the optimization problem stated in Equation 2 can be solved by passing messages from the variable nodes to factor nodes and vice versa. In the following equations, we denote a factor node by ƒ ∈ F = {ƒ1, ƒ2, ... , ƒm}, and a variable node by X ∈ X = {X1, X2, ... , Xn}.
μƒ → X (X) = max X1, ... , Xk [ln ƒ(X, X1, ... , Xk) + ∑ Xk ∈ N(ƒ)\X  μ Xk → ƒ (Xk)]       (5)
μX → ƒ (X) = ∑ ƒl ∈ N(X)\ƒ   [μƒl → X (X)]       (6)
Here, N(.) represents the neighbors of a node in the factor graph. The initial messages sent by the leaf nodes are as follow.
μƒ → X (X) = max X1, ... , Xk ln ƒ(X, X1, ... , Xk)       (7)
μX → ƒ (X) = 0       (8)
By using the above mentioned Max-sum algorithm, the maximum value of G will be computed at root node X of the Factor tree, where any choice of root node X ∈ X in the beginning of the algorihm will generate the same result. The corresponding values of the variables {X1, X2, ... , Xn}, that generate the maximum value of G, can be obtained by keeping track of of which values of the variables gave rise to the maximum state of each variable.
The Max-sum algorithm converges and generates the exact solution for tree. However, for factor graphs that contain loops, nearly the same algorithm is used. The algorithm is sometimes called loopy belief propagation . The procedure must be adjusted slightly because graphs might not contain any leaves. To resolve this, all variable messages are initialized to 1 and and then, we use the same message passing technique above, updating all messages at every iteration. The precise condition under which the loopy belief propagation algorithm will converge is still an open problem; it is known that graphs containing a single loop will converge to a correct solution. However, there exist graphs which will fail to converge, or which will oscillate between multiple states over repeated iterations. Therefore, we usually terminate the algorithm after a finite number of steps if it fails to converge.
The time complexity of the loopy belief propagation algorithm is O(nd2T), where n is the number of variables, d is the maximum number of states of a variable and T is the number of iterations. Therefore, the sequential implementation of this algorithm can have a very high numerical cost due to the large number of messages that need to be sent before the algorithm is either converged or terminated. However, at each iteration of the loopy belief propagation algorithm, the message updates can be done in parallel; since each message update only depends on the previous messages received by the neighbors. Thus the algorithm is perfectly suited for a parallel implementation. In this project, we shall implement a parallel version of the loopy belief propagation algorithm.
Link .
Slides .
Report .
[1] F. R. Kschischang, B. J. Frey and H.-A. Loeliger, "Factor Graphs and the Sum-Product Algorithm". In IEEE Transactions on Information Theory, Vol. 47, No. 2, 2001, pp. 498-519.
[2] C. M. Bishop, "Pattern Recognition and Machine Learning". Springer 2007.
[3] B. Kaneva, "Parallel Belief Propagation for Stereo Matching". http://beowulf.lcs.mit.edu/18.337/projects/reports/Kaneva_final_report.pdf.
[4] http://en.wikipedia.org/wiki/Belief_propagation.