## Update: Extracting Information from Noisy Observations

This is meant as a follow-up for the last post's motivation. In summary we have a bipartite graph $G=(U,V,E)$ with bi-adjacency matrix $A$ and with $|U|\approx rk,$ $|V| \approx ck,$ $|E|\approx rck,$ $\mathsf{deg}(u)\approx c,$ $\mathsf{deg}(v)\approx r,$ and a sequence of graphs $(G_n)$ where $G_1=G$ and $G_{n+1}$ is the bipartite graph on the vertex sets $U^{n+1},V^{n+1}$ defined by the bi-adjacency matrix:

$$\underbrace{A\otimes A \dots \otimes A}_{n \text{ times}}.$$

$U$ represents inputs to a system, $V$ represents outputs to a system, and edges represent possible input-output pairs. We hope to cluster $G_n$ into approximately $k^n$ mostly-disjoint vertex clusters of roughly equal size. I have attempted to attack this problem with the various techniques, the most promising being spectral clustering, which I discuss in this mathoverflow post.

Update (12/5/2016): The conjecture is false in general, demonstrated by the counterexample in my answer to my own question in the mathoverflow link. I don't think I will dedicate any more time to this problem in the future, at least from the perspective of graph theory.

## The Planar Separator Theorem won't Work

A variation of the planar separator theorem states that any planar graph with $v$ vertices and maximum degree $d$ is guaranteed to have an edge separator of size $O(\sqrt{d\cdot v})$ or less. An analogous result guarantees that graphs which can be embedded in a 2-manifold of genus $g$ (equivalently, a sphere with $g$ handles) have an edge separator of size $O(\sqrt{d\cdot v \cdot g})$, where the two subgraphs contain no more than $1/2+\varepsilon$ edges.

It was my thinking that perhaps this type of result could guarantee that such a clustering exists, but I no longer believe this is the case. We want to cut $G_n$ into $k^n$ roughly-equally-sized pieces, and a principal goal is that the number of edges we cut be small relative to the total amount of edges in the graph. One strategy might be to apply the above splitting technique $\log(k^n)=n\log(k)$ times. If we assume each $G_n$ can be embedded in a surface of genus $g_n$, this guarantees we can find a cut of size:

$$C_n = O(n\cdot \log(k) \sqrt{\mathrm{max}(c^n,k^n)\cdot (rc+rk)^n\cdot g_n}).$$

If $g_n$ grows slowly enough in $n$, then $C_n/|E_i| \approx C_n/(rck)^n\rightarrow 0.$ If this held, a channel use would be in one of $k^n$ mutually disjoint input-output classes with up to negligible probability. The problem with this argument is that the effectiveness of clustering does not depend on the constant $\log(k)$. If the argument held, we could just as easily apply the clustering theorem enough times to divide the graph into more than $k^n$ input-output classes, which would imply that as sequence length grows, eventually there are more than $n\cdot I(X;Y)$ bits of information about inputs in arbitrary noisy observations. This is impossible, so I supsect that the planar separator theorem cannot help us.