Skip to main content

ORIGINAL RESEARCH article

Front. Genet., 15 October 2020
Sec. Evolutionary and Population Genetics
This article is part of the Research Topic Algebraic and Geometric Phylogenetics View all 4 articles

Phylogenetic Networks as Circuits With Resistance Distance

\nStefan Forcey
Stefan Forcey*Drew ScalzoDrew Scalzo
  • Department of Mathematics, The University of Akron, Akron, OH, United States

Phylogenetic networks are notoriously difficult to reconstruct. Here we suggest that it can be useful to view unknown genetic distance along edges in phylogenetic networks as analogous to unknown resistance in electric circuits. This resistance distance, well-known in graph theory, turns out to have nice mathematical properties which allow the precise reconstruction of networks. Specifically we show that the resistance distance for a weighted 1-nested network is Kalmanson, and that the unique associated circular split network fully represents the splits of the original phylogenetic network (or circuit). In fact, this full representation corresponds to a face of the balanced minimal evolution polytope for level-1 networks. Thus, the unweighted class of the original network can be reconstructed by either the greedy algorithm neighbor-net or by linear programming over a balanced minimal evolution polytope. We begin study of 2-nested networks with both minimum path and resistance distance, and include some counting results for 2-nested networks.

1. Introduction

Consider an electrical circuit: a network made of wires joining resistors in parallel and in sequence, with some portion hidden inside an opaque box. It is not always possible to determine that portion by testing the visible leads. However, we prove here that if the hidden portion has a particular form made of connected cycles, and we can test the resistance between all the pairs of leads, then the lengths and connected structure of the cycles in the circuit are uniquely determined. The mathematics used to recover that circuit is more typically found in work on phylogenetic networks.

Modeling heredity as the flow of genetic information suggests that mutations in DNA might be analogous to resistance in an electrical circuit. The weights of edges in a phylogenetic network can represent genetic distances: if we have the genomes of the two endpoints of an edge then we can use a model of mutation rates to calculate a real number distance. For several edges that form the unique path between two taxon-labeled leaves, the total distance is the sum of those edge weights. Paths between leaves are only unique if the network is a tree. When paths between are not unique, one option is to take the distance to be that of the minimum length path. This option may correspond to a parsimonious approach—assuming the least complicated history. This minimum path length distance is studied for instance in Forcey and Scalzo (2020a).

Instead, however, a greater weight of an edge could represent a greater loss of information. Dividing and rejoining of edges illustrates events such as speciation, recombination, or hybridization. If the genetic information of an ancestor genome can be shared among descendants, and then collaboratively recovered upon hybridization, then a different metric than minimum distance may be appropriate. Here we consider weighted phylogenetic networks with the resistance distance, or resistance metric. The distance between two leaves of the network is found by considering the edge weights as electrical resistance, obeying Ohm's law. The metric resistance distance for all nodes (not only leaves) of a graph is introduced in Klein and Randić (1993), and studied closely in subsequent papers such as Yang and Klein (2015) and Yang and Klein (2019). To study graphs, the resistance of each edge is often assumed to have unit value, but the definitions allow any weight. We review the definitions in section 2.2.3.

In Curtis et al. (1998) and Curtis and Morrow (1990, 1991), the authors study circular planar graphs with boundary nodes that are analogous to the leaves of our phylogenetic networks. They consider resistance values (or conductivity) on the edges. They prove that complete information about the linear map which transforms electric current values at each boundary node to electric current values at all the edges can be used in some cases to recover the resistance values. In our applications there is no way to know the complete map of boundary currents to edge currents. However, we seek only to recover the graphical structure of the network, not the original edge weights.

In Ejov et al. (2019), the authors consider the entire set of resistance distances (again using unit values for edges), between any pair of nodes (not only leaves.) They show that using this metric is useful for discovering Hamiltonian cycles via algorithms for the Traveling Salesman problem. There is a close connection to our applications, since the algorithm neighbor-net can be used as a greedy approach to the Traveling Salesman problem as shown in Levy and Pachter (2011).

1.1. Main Results and Overview

In section 2, we start by reviewing Ohm's law and resistance distance. Then we review the relevant definitions of mathematical phylogenetics, many taken from other sources to help make this paper self-contained. In section 3, we state and prove the main results for 1-nested phylogenetic networks N. The upshot is that when the distances between taxa are effective resistances based on unknown connections, then using well known methods we can recover an unweighted circular split network, which gives us the precise class of (unweighted) 1-nested phylogenetic network. Specifically, this recovery is via the (greedy) algorithm neighbor-net as described in Theorem 3.3 or linear programming; see Theorem 4.5.

Several features of the resistance distance seem exactly suited to phylogenetic networks with weighted edges. First, from Theorem 3.1, the resistance distance of 1-nested phylogenetic networks is Kalmanson, allowing the circular split network to be uniquely reconstructed from the measured distances. Second, from Theorem 3.2, that reconstructed circular split network always displays precisely the same splits as the original network. As a consequence, the trivial splits which are the traditional final edges to the leaves of a phylogenetic network are automatically guaranteed to be represented in the split network—this is a condition beyond the basic Kalmanson condition. Finally, triangular subgraphs are interchangeable with three-edge stars when measuring resistance distance. This is known as the Y-Δ transform, pictured in Figure 2. The Y-Δ equivalence mirrors the fact that triangles in a phylogenetic network, when attached via bridges to the rest of the network, are indistinguishable from degree-three tree-like vertices by the linear functionals used for balanced minimal evolution. As well, the split networks are bipartite, so triangle free.

In section 4, we review the balanced minimal evolution polytopes, and show how our results can be interpreted geometrically, in Theorems 4.2 and 4.5. In section 5, we point out some interesting counterexamples and limiting cases, and conjecture about how to extend our results to more complicated networks. Section 5.1 contains some new results on 2-nested networks with regard to the minimum path distance. Finally in section 6, we consider qualifications of experimental distance measurements in phylogenetics that would give justification for assuming the resistance analogy to be valid in practice.

2. Definitions and Cited Results

We start by reviewing some equations from electric circuit theory.

2.1. Electricity

Given a conductive circuit with a power supply, the materials have resistance R and the power causes a current I. The classic Ohm and Kirchoff equations include: R = V/I and I = I1 + I2. The first depends on the conductive material—it must be experimentally verified. It relates the resistance in a circuit to the constant voltage drop over the circuit and the constant current in all of the circuit. The second states that total current must equal the sum of circuit-parallel portions of that current after a branching in the circuit. Together, these rules imply the law for total resistance RT for a pair of circuit-parallel resistances R1, R2. We have RT = R1R2/(R1 + R2), which we refer to as Ohm's law for parallel resistance. Also, the voltage drop over a closed circuit must equal the total voltage: this implies that resistors in series are summed to find the total resistance. We illustrate the basic calculation of a total resistance in Figure 1. We illustrate the implied Y-Δ equivalence in Figure 2.

FIGURE 1
www.frontiersin.org

Figure 1. The total resistance from i to k is Ri,k=R1R2R1+R2+R3+R4. On the left is the circuit itself, on the right we see it as a pairwise circuit within a phylogenetic network. Here we have chosen i to be an outgroup, so the network is rooted at the top and the downward direction is forward in time.

FIGURE 2
www.frontiersin.org

Figure 2. The two networks shown here have identical resistance between any two corresponding pairs of nodes at the three corners. Here Ra=R1R2R1+R2+R3,Rb=R2R3R1+R2+R3, and Rc=R1R3R1+R2+R3.

2.2. Phylogenetic Definitions

Many of the definitions and notes here are repeated (sometimes verbatim) from Forcey and Scalzo (2020a) for the sake of self-containment. For further reference, see Steel (2016) and Gambette et al. (2017).

A split A|B is a bipartition of [n] = {1, …, n}. That is, A and B are non-empty disjoint subsets whose union is [n]. The two parts of a split are often called clades. If one clade of a split has only a single element, we call that split trivial. A split system is a set s of splits of [n] which contains all the trivial splits. We say a split system s refines another split system s′ when ss′. In this paper all graphs are simple (no multi-edges) and connected.

Definition 2.1. An (unrooted) phylogenetic network on [n] is a simple connected graph with:

 i. Labeled leaves: n degree-1 vertices, labeled bijectively with the elements of [n],

ii. Unlabeled nodes: all these must have degree larger than 2.

For the remainder of the paper, all phylogenetic networks are assumed to be unrooted and without any edge directions.

A split A|B is displayed by a phylogenetic network N when there is (at least) one subset of edges of N whose deletion (keeping all nodes) results in two connected components with A and B their respective sets of labeled leaves. We call that collection of edges a minimal cut displaying the split when the collection contains no proper subset displaying the same split. A bridge is a single edge which displays a split. A trivial bridge displays a trivial split. A phylogenetic tree is a cycle-free phylogenetic network, so every edge is a bridge. Figure 3 shows examples of splits displayed, for the trees and their two generalizations described here: phylogenetic networks and split networks. Recall that a cycle in a graph is a path of edges that does not revisit any nodes except for the node at which it starts and ends. The following is defined in Gambette et al. (2017):

FIGURE 3
www.frontiersin.org

Figure 3. Modified from a figure in Forcey and Scalzo (2020a). In a phylogenetic tree t, on the left, splits are always single edges. The highlighted edge is the split {2, 3}|{1, 4, 5, 6, 7, 8, 9}. That same split is a pair of edges making a minimal cut in the 1-nested phylogenetic network N, center. Finally on the right, that same split is a set of parallel edges in a circular split network s.

Definition 2.2. An unrooted phylogenetic network N is called 1-nested when each edge of N is contained in at most one cycle, and N is triangle-free—all cycles are of length greater than 3 edges.

A 1-nested phylogenetic network can be drawn in the plane with its leaves on the exterior, which is referred to as outer planarity. We consider two 1-nested networks to be split-equivalent if they display the same set of splits. See Figure 4 for examples. Twisting a phylogenetic network around a bridge (reflecting one side through the line of the bridge), or around a cut-point node, does not change the list of splits. Any cyclic order of the leaves seen around the exterior in some representative drawing of a 1-nested phylogenetic network is said to be consistent with that split system. Figure 7 shows examples. A binary phylogenetic network is one in which the unlabeled nodes each have degree 3. A phylogenetic network N refines another, written NM, when the splits displayed by M are a subset of those displayed by N. Several of these terms are exhibited in Figure 7. Next we review the definition of another generalization of a phylogenetic tree.

FIGURE 4
www.frontiersin.org

Figure 4. A trio of equivalent 1-nested phylogenetic networks. All display the same set of splits. The highlighted edges display the same split in each network.

Definition 2.3. A split network displaying a split system s on [n] is an embedding in Euclidean space of a simple connected graph, also called s, with the following:

  i. Labeled leaves: n degree 1 nodes are bijectively labeled by [n].

 ii. Unlabeled nodes: these have degree larger than 1.

iii. A partition of the set of edges: the parts of this partition are called split-classes. There is one split-class for each split A|B in the system. It is required that for any two leaves, the set of edges on a shortest path between them intersects each split-class in at most one edge, and that the set of splits thus traversed is the same for any shortest path between those two leaves.

iv. The split-class of edges corresponding to a split A|B comprises a minimal cut displaying that split: deletion of those edges results in two connected components with respective labeled leaves A and B.

The resulting bipartite graph is often shown with each class of edges embedded as a set of equal length parallel line segments. (Note: here parallel means geometrically parallel.) Alternate definitions use colors; the edges in a split-class are colored alike, as in, Dress et al. (2012) and Steel (2016). A split-class of size one is a bridge. Two split networks are defined to be equivalent when they represent the same split system.

Definition 2.4. A circular split system is a split system which allows the embedding of a representative split network in the plane, with the labeled nodes all on the exterior, and thus arranged in a circular order. We refer to these representatives as circular split networks.

Just as for phylogenetic networks, twisting a circular split network around a bridge (reflecting one side through the line of the bridge), or around a cut-point node, does not change the list of splits. Any cyclic order of the leaves seen in some representative circular split network is said to be consistent with that split system. Two circular split networks are equivalent if they display the same split system. For instance see Figure 5. The following lemma is from Forcey and Scalzo (2020a), included here without proof for the terminology that will be useful in the next section.

FIGURE 5
www.frontiersin.org

Figure 5. Modified from a figure in Forcey and Scalzo (2020a). A trio of equivalent split networks. All three represent the same split system. The highlighted edges display the same split in each network. The third is the invariant exterior subgraph of all three.

Lemma 2.5. Given a circular split network s, the nodes and edges adjacent to the exterior of the graph are a subgraph which is invariant: that is, this exterior subgraph will be identical to the exterior subgraph of any circular split network representing the same set of splits as s.

Again for example see Figure 5. Introduced in Forcey and Scalzo (2020a) is a subclass of circular split networks.

Definition 2.6. Forcey and Scalzo (2020a) An outer-path circular split system is a split system whose representative circular split networks have shortest paths between pairs of leaves which can all be chosen to lie on the exterior of the diagram, that is, using only edges adjacent to the exterior.

For examples, see Figure 6.

FIGURE 6
www.frontiersin.org

Figure 6. Forcey and Scalzo (2020a) from the left, N and N′ are outer-path circular split networks. In contrast M and M′ are non-outer-path circular split networks.

2.2.1. Functions for Unweighted Networks

The definitions in this section are repeated from Forcey and Scalzo (2020a), but originate in Gambette et al. (2017).

Definition 2.7. For a 1-nested phylogenetic network N define Σ(N) to be the circular split system made up of the splits displayed by N. Thus the map Σ takes a 1-nested phylogenetic network and outputs the set of splits displayed by N.

In Gambette et al. (2017), it is shown that Σ(N) is a circular split system, since it can be represented by a circular split network, also referred to as Σ(N). Examples of representations of Σ(N) are seen in Figure 8. Note that since the bridges in a split network are invariant, every representation of Σ(N) will have the same bridges: these will match the maximal set of bridges of any representation of N. The range of Σ(N) will be referred to as the faithfully phylogenetic circular split networks.

From Forcey and Scalzo (2020a) and Gambette et al. (2017), we repeat an algorithm for drawing a circular split network to represent Σ(N). Each split of N must correspond to a class of parallel edges in Σ(N). The simplest representing network would just subdivide the edges of N to make a class for each split, but we show how to construct a representative which makes the splits more visible via bridges and parallelograms. For m ≥ 5, each m-cycle in N is replaced by an m-marguerite: a collection of exactly m2 − 4m parallelograms arranged in a circle, each sharing sides with two neighbors, specifically organized as follows: each node of the original m-cycle is replaced by a rhombus, and then each edge of the cycle is replaced by m − 5 parallelograms in a row. The rows are attached to the rhombi along adjacent edges of each rhombus, so that the whole arrangement has m(m − 5) sides on the interior of the original m-cycle, and m(m − 3) sides on the exterior. Bridges are attached to the m remaining degree-2 vertices, one at each of the rhombi that replaced the original m nodes of the cycle.

Now for a function that takes circular split networks to 1-nested phylogenetic networks. This function is shown to exist in Gambette et al. (2017), and described on the split networks which are images of the function Σ. In Durell and Forcey (2020) and Forcey and Scalzo (2020a), we define the general function L as follows:

Definition 2.8. For a circular split system s, define L(s) to be the smoothed exterior subgraph of a representative split network s. Thus, L takes a circular split system (with a given representation) and outputs a 1-nested phylogenetic network. The operation of L is easy to describe as (1) erasing the interior edges of split network s and (2) smoothing, which here refers to removing any degree-2 nodes that are seen in the exterior subgraph. Such a node is removed, but the two edges adjacent to it are joined to form a single edge.

Recall that the nodes and edges adjacent to the exterior of a circular split network are an invariant subgraph for the split system, so the function L is well-defined on split systems. Examples exhibiting Σ and L are in Figures 7, 8.

FIGURE 7
www.frontiersin.org

Figure 7. Here we see (A) a split system s, with only the non-trivial splits listed (trivial splits are assumed to be included), (B) a circular split network representing s, (C) the exterior subgraph of s as a step in the process of applying L, (D) the output 1-nested phylogenetic network N = L(s), (E) the split system Σ(N) displayed by N, again showing only the non-trivial splits, and (F) a representative circular split network also referred to as Σ(N). We see that Σ(N) ≥ s, and that the cyclic orders (1, 2, 3, 4, 5, 6) and (1, 2, 4, 3, 5, 6) are both consistent with N and with s.

FIGURE 8
www.frontiersin.org

Figure 8. Examples of the functions Σ and L.

Remark 2.9. Note that by its construction, L preserves bridges and cut-point nodes. When restricted to phylogenetic trees, the functions L and Σ are both the identity. In Forcey and Scalzo (2020a), several other properties of the two functions are listed, in the process of showing that L and Σ form a Galois reflection, as in Erné et al. (1993). These include the facts that L is surjective but not injective, Σ is injective but not surjective, and that L ○ Σ is the identity map.

2.2.2. Weights and Metrics

We continue to repeat definitions from Forcey and Scalzo (2020a). Weighted networks can be constructed in two distinct ways: by assigning non-negative real numbers to splits or to edges.

Definition 2.10. A weighted phylogenetic network N has non-negative real numbers assigned to its edges, described by a weight function wN.

Definition 2.11. A weighted split network s has non-negative weights assigned to each split, by a weight function ws. Equivalently, ws assigns a weight to every edge, with the requirement that each edge in a (geometrically parallel) split-class of s has the same weight.

Definition 2.12. For a weighted phylogenetic network N, or a weighted split network s, we denote by N¯, respectively s¯, the unweighted networks found by forgetting the weights.

As in Forcey and Scalzo (2020a): a pairwise distance function assigns a non-negative real number to each pair of values from [n]. We call the lexicographically listed outputs for distinct pairs a distance vector d, with entries denoted dij = d(i, j) = d(j, i) for each pair of taxa ij ∈ [n] (also known as a dissimilarity matrix, or discrete metric when obeying the metric axioms.)

Definition 2.13. When the distance vector is Kalmanson, or circular decomposable it means there exists a cyclic order of [n] such that for any subsequence (i, j, k, l) of that order, d obeys this condition:

max{dij+dkl,djk+dil}dik+djl.

Definition 2.14. Given a weighted split system s on [n] we can derive a metric ds on [n],

ds(i,j)=iA,jBws(A|B)

where the sum is over all splits of s with i in one part and j in the other. The metric is often referred to as the distance vector ds.

It is well-known that Kalmanson metrics are in one-to-one correspondence with weighted circular split networks. Specifically, from Steel (2016), and as repeated in Forcey and Scalzo (2020a), we have the following:

Lemma 2.15. A distance vector d is Kalmanson with respect to a circular order c if and only if d = ds for s a unique weighted circular split system s, (not necessarily containing all trivial splits) with each split A|B of s having both parts contiguous in that circular order c.

Definition 2.16. We define the minimum path distance vector dN for a weighted 1-nested phylogenetic network N, where

dN(i,j)=minp{epwN(e) |p is a path connecting i,j}

where the minimum is over paths p from leaf i to leaf j, and each sum is over edges in one of those paths. Examples are calculated in Figures 9, 17.

FIGURE 9
www.frontiersin.org

Figure 9. Example of the action of Sw. Here
dN=(4,5,6.5,6.5,7,4,3,4.5,4.5,5,4,3.5,3.5,4,5,1,3.5,6.5,3.5,6.5,7).

2.2.3. Resistance Distance

Now we define a new kind of pairwise distance functions on the leaves of a phylogenetic network. Isolating sections of circuit-parallel paths between two leaves allows the Ohm relations, together with the Y-Δ transformation, to be used to find the effective resistance between those leaves. A simplifying fact is that the resistance between two leaves only depends on the resistances of edges that are in paths between those leaves. (We use the term pairwise circuit Pij to refer to the edges that are in any path between leaves i, j. For example see Figure 15.)

There is a well-known alternate method for calculating effective resistances. As defined in Klein and Randić (1993) and Bapat (2004) the resistance distance matrix for a graph G with n total vertices (leaves and non-leaf nodes) is given by:

Ωij=Γii-1+Γjj-1-2Γij-1

where Γ = L + 1/n, the Laplacian matrix of G plus the n × n matrix with 1/n for every entry. Our resistance distance for phylogenetic networks uses entries of the matrix Ω.

Definition 2.17. We define the resistance distance vector dNR for a positive weighted phylogenetic network N, where dNR(i,j) is the resistance distance on the graph between leaves i and j. That is, dNR(i,j)=Ωij for leaves i and j. The distance can also be calculated using the basic relations of Ohm's law. Examples of the resistance distance vector are in Figures 10, 20, 22.

FIGURE 10
www.frontiersin.org

Figure 10. Example of the function Rw which takes a weighted phylogenetic network and outputs the split network associated to its resistance distance. Neighbor-net can be run on input dNR, or the results of Theorem 3.2 can be used: for instance the value 0.95 for the split {1, 2, 3, 7}|{4, 5, 6} is found by 95(1)/(95+1+1+1+1+1). Here
dNR=(3.99,4.96,6.41,6.41,6.84,3.99,2.99,4.46,4.46,4.91,3.96,3.49,3.49,3.96,4.91,1,3.49,6.34,3.49,6.34,6.75).

2.2.4. Weighted Functions

We next define functions between the weighted split networks and the weighted phylogenetic networks. As previously explained in Durell and Forcey (2020) and Forcey and Scalzo (2020a), we begin by extending the function L to a weighted version Lw.

Definition 2.18. Forcey and Scalzo (2020a) For a weighted circular split network s we define Lw(s) to be the 1-nested phylogenetic network L(s¯) (the smoothed exterior subgraph of the unweighted version of s), with weighted edges. The weight of an edge in the image is found by summing the weights of splits which contribute to that edge. Let ps(e) be the set of splits A|B of s, such that A|B is represented by edges in s one of which is used to form the edge e in L(s). [Recall that e in L(s) is formed by smoothing a path of edges from the exterior subgraph of s.] If ws is the weight function on s then the weight function on Lw(s) is:

wLw(s)(e)=A|Bps(e)ws(A|B).

By this definition we have the following (from Forcey and Scalzo, 2020a):

Lemma 2.19. Lw(s)¯=L(s¯).

For an example of Lw see Figure 18. From Forcey and Scalzo (2020a), we have the fact that the minimum path distance is Kalmanson for planar networks. Therefore, as in that source, we can make the following:

Definition 2.20. Given a weighted unrooted phylogenetic network N that can be drawn on the plane with leaves on the exterior, we define Sw(N) to be the unique weighted circular split network with the same minimum path distance vector as N. That is, dN = dSw(N). This image is calculable, for instance, as the circular split network Sw(N)=N(dN), where N is the neighbor-net algorithm defined by Bryant et al. (2007) and implemented as in Splits-Tree (Huson, 1998). Thus, to find Sw(N) we first calculate the minimum path distance vector, dN, and then use any algorithm (such as neighbor-net) to find the split network.

For an example see Figure 9. Another example of Sw, on a 2-nested network, is in Figure 18. When we restrict to the domain of weighted circular split networks arising from weighted 1-nested networks, the codomain of Sw is the outer-path circular split networks, and the distance vector is preserved by the map Lw. Specifically from Forcey and Scalzo (2020a) we have:

Lemma 2.21. For any weighted 1-nested phylogenetic network N, if s = Sw(N) then s is outer-path and thus dLw(s) = ds.

Sw is defined using the minimum path distance metric. Similarly, since we will see that the resistance distance is Kalmanson in Theorem 3.1, then by Lemma 2.15 we can make the following definition using resistance distance.

Definition 2.22. For a weighted 1-nested phylogenetic network N we define Rw(N) to be the unique weighted circular split network corresponding to the resistance distance dNR. This image is calculable, for instance, as the circular split network Rw(N)=N(dNR), where N is the neighbor-net algorithm defined by Bryant et al. (2007) and implemented as in Splits-Tree (Huson, 1998). The algorithm neighbor-net is guaranteed to produce Rw(N) using input dNR.

There are several algorithms for finding the unique circular split system associated to a Kalmanson network; here we recommend neighbor-net and its implementation in Huson (1998). That algorithm finds both the circular split network and its weighting. However, for a weighted 1-nested phylogenetic network N, due to our Theorems 3.1 and 3.2 we can calculate the weighted circular split network Rw(N) directly, bypassing both the calculation of the metric and the use of neighbor-net. The function Rw is shown by example in Figure 10. For another example, on a 2-nested network that happens to be Kalmanson, see Figure 22.

Remark 2.23. When restricted to phylogenetic trees, the functions Lw and Sw are both the identity, and Sw = Rw. In Forcey and Scalzo (2020a), several other properties are listed, in the process of showing that Lw and Sw form a Galois coreflection when restricted to weighted 1-nested phylogenetic networks and outer-path circular split networks. These include the facts that Lw is injective but not surjective, Sw is surjective but not injective, and SwLw is the identity map.

3. Kalmanson Networks

The main result in this section is that the resistance metric is Kalmanson for 1-nested phylogenetic networks, and that the unique associated split network has the same exterior form as the original 1-nested phylogenetic network. First we show that dNR obeys the Kalmanson condition: there exists a circular ordering of [n] such that for all i < j < k < l in that ordering,

max{dN(i,j)+dN(k,l),dN(j,k)+dN(i,l)}dN(i,k)+dN(j,l).

Theorem 3.1. Given a 1-nested phylogenetic network N with positive weighted edges and n leaves, the resistance metric on its leaves is Kalmanson.

Proof: The cyclic order that we need to exist in order to demonstrate the Kalmanson property is found by choosing any cyclic order of [n] consistent with N. That is, we choose an outer planar drawing of N and use the induced cyclic order of the leaves arranged around the exterior of that drawing.

Begin by noting that for each pair of the four leaves i, j, k, l there is a sub-graph, called the pairwise circuit, for instance Pik, made of all the edges which are part of any path between those two leaves. The pairwise circuit will contain perhaps some cycles—it will in fact be a series of cycles connected by paths. We are especially interested in the intersection I of the two “crossing” pair circuits, I = PikPjl. There are three basic cases to consider.

Case 1: The intersection I is a single cycle. Here the four leaves i, j, k, l have pairwise circuits that reach the cycle I at four different nodes. Notice that any of the two pairwise circuits summed in the Kalmanson condition will include all four of the smaller pairwise circuits from each of the four leaves i, j, k, l to the node of I closest to that respective leaf. We will call those closest nodes vi, vj, vk, vl. The three sums in the Kalmanson condition all share some terms in common: those which come from the weighted edges in pairwise paths between the four leaves and the respective nodes vi, vj, vk, vl. Discarding these common terms, we are left with terms that come from the weighted edges in I. Thus, the only differences between the three sums in the Kalmanson condition arise from the different contributions of the cycle I. We denote by a, b, c, d the cumulative edge weights between the four nodes vi, vj, vk, vl, following the cyclic order. For instance, in Figure 11, a is the weight of the edge between vl and vi and b is the sum of the weights on edges of I between the nodes vi and vj.

FIGURE 11
www.frontiersin.org

Figure 11. Case 1 of Theorem 3.1: the highlighted edges are the intersection I of the pairwise circuits between leaves i, k and j, l.

Thus, we can write the sums explicitly:

dNR(i,j)+dNR(k,l)=dNR(i,vi)+dNR(j,vj)+dNR(k,vk)+dNR(l,vl)                                   +b(a+d+c)a+b+c+d+d(a+b+c)a+b+c+d;dNR(j,k)+dNR(i,l)=dNR(i,vi)+dNR(j,vj)+dNR(k,vk)+dNR(l,vl)                                   +c(a+b+d)a+b+c+d+a(b+c+d)a+b+c+d;dNR(i,k)+dNR(j,l)=dNR(i,vi)+dNR(j,vj)+dNR(k,vk)+dNR(l,vl)                                   +(a+d)(b+c)a+b+c+d+(a+b)(c+d)a+b+c+d.

After discarding the common terms, we consider just the remaining sums of fractions. All the edge weights are positive, and the denominators of all three are the same. Clearly the third sum, when expanded, has a numerator larger than either of the first two.

Case 2: The intersection I is a series of cycles containing at least two cycles. In this case there are two possible ways that the inequalities are satisfied, depending on which pair of consecutive leaves (i, j or j, k) reach the same end of I, that is, have their attaching nodes (vi, vj or vk) in I at the same end of I. In Figure 12, below we choose i, j to do so, on the left-hand cycle, but the other option is similar. Checking this case can be done visually for the equality: dNR(i,k)+dNR(j,l)=dNR(i,l)+dNR(j,k) since the two sums end up using precisely the same effective resistances. That is, both dNR(i,k)+dNR(j,l) and dNR(i,l)+dNR(j,k) have all terms in common: both the portions from the paths outside of I as in case 1, and the summands contributed by I, which are the terms:

c(a+b)a+b+c+a(b+c)a+b+c+x(w+y)w+x+y+w(x+y)w+x+y.

The inequality dNR(i,k)+dNR(j,l)>dNR(i,j)+dNR(k,l) (for the subcase where again i, j reach the same end of I) is easily checked. Here, after discarding the terms in common, the larger sum contains more terms than the smaller (from the parts of I not in the pairwise circuits for i, j and k, l). As well, when the smaller sum has terms with denominator matching a term in the larger, the numerator is indeed larger in the latter. For instance, in Figure 12, after discarding the common terms contributed by the paths outside of I, the sum dNR(i,j)+dNR(k,l) has the sum contributed by I:

b(a+c)a+b+c+y(w+x)w+x+y.

The numerator here is exceeded by the sum contributed by I in dNR(i,k)+dNR(j,l) as just listed above. Finally, notice that there are sub-cases of Case 2 in which the smaller sum will have fewer or no terms at all contributed by I; these occur when I includes a path at one end or at both ends. See Figure 13 for example.

FIGURE 12
www.frontiersin.org

Figure 12. Case 2 of Theorem 3.1: the highlighted edges are the intersection I of the pairwise circuits between leaves i, k and j, l.

FIGURE 13
www.frontiersin.org

Figure 13. Case 2 of Theorem 3.1 continued. Here vi = vj and vk = vl.

Case 3: The intersection I is a path. In this case it is quickly verified that the Kalmanson inequality is satisfied as an equality. See Figure 14 for example.

FIGURE 14
www.frontiersin.org

Figure 14. Case 3 of Theorem 3.1.

The fact that effective resistance distance is a Kalmanson metric immediately suggests that it would be a good candidate for modeling weighted phylogenetic networks. First there is the intuition from experience that if two pathways of heredity exist, the ancestor individual or species will have more in common with the extant individual or species. Thus, mutations in the genetic code play the role of resistors to the flow of information.

Secondly, Kalmanson metrics are known to be the only example for which each metric is represented uniquely by a circular split system, as seen in Lemma 2.15. In the case of the resistance distance, the associated unique split network has an additional advantage: it is guaranteed to represent faithfully every split displayed by the original 1-nested network.

Theorem 3.2. Given a 1-nested phylogenetic network N with positive weighted edges and n leaves, and letting dNR be the resistance metric on the n leaves, then the unique associated split network Rw(N)=N(dNR) displays precisely the same splits as displayed by N.

Proof: A split A|B can be displayed by N in three possible ways: either it is displayed by a single bridge e with weight w(e), by a pair of edges both in the same cycle c with respective weights ac and xc, or in more than one way. Let the weight of a specific display of a split in N be w(e) in the first case and (acxc)/zc in the second case, where zc is the sum of all the weights in the cycle. We claim: if the split A|B in Σ(N¯) is assigned the sum of the weights of all distinct displays of that split as displayed in N, then the resulting distance metric d from the weighted split network thus constructed is indeed dNR. Therefore, we will conclude, since Theorem 2.1 shows that dNR is Kalmanson, that the weighted split network thus constructed is equal to the unique split network corresponding to dNR, as found for instance by the algorithm neighbor-net.

First we check that the claim holds. Consider the pairwise circuit Pij in N for a given pair i, j of leaves. It will be a series of paths and cycles, as seen for example in Figure 15. Thus each cycle c in Pij will be split into two circuit-parallel paths pc and qc of respective lengths p, q. Both paths begin and end at the two nodes where that cycle is attached to the rest of the series. Now the resistance distance dNR(i,j) will be the sum of the weights of the (non-circuit-parallel) paths, and of the effective resistances of the circuit-parallel paths. Specifically, every weighted edge of Pij not in a cycle will contribute its weight to the sum, and every weighted edge in a cycle of Pij will appear in one of two factors in the numerator of the term giving the effective resistance from those circuit-parallel paths. We see that

dNR(i,j)=ePijw(e)+cPij(c1++cp)(cp+1++cp+q)c1++cp+q              =ePijw(e)+cPijcmpccrqccmcrzc

where c1, …, cp and cp+1, …, cp+q are the weights of the circuit-parallel paths of cycle cPij, with zc = c1 + ⋯ + cp+q being the total weight of c. That is, we expand the numerator of each term from a cycle. Now, the distance metric corresponding to the weighted split network we constructed using Σ(N¯) has distance

d(i,j)=A|BNiA,jBw(A|B)

Now splits in N, and thus in Σ(N¯), which separate leaves i, j are precisely those displayed by a bridge in Pij or by a pair of circuit-parallel edges in a cycle of Pij. Thus, using the weights for splits (as stated above):

w(A|B)=A|B disp. by ew(e)+A|B disp. bycm,crccmcrzc

in the split metric, gives us the desired claim: d=dNR.

FIGURE 15
www.frontiersin.org

Figure 15. The highlighted subgraph is the pairwise circuit Pij.

Then we conclude that since the weighted circular split network associated to the original Kalmanson metric dNR is the unique such network where the split metric equals the original Kalmanson metric, then N(dNR) will have precisely the splits of N and thus of Σ(N¯). 

Remark: The fact that we can take a weighted 1-nested phylogenetic network N and build a weighted circular split network s which has the same metric, ds=dNR, implies another proof that the resistance distance is Kalmanson. Since the circular split network is planar, and the split metric on it is the same as the minimum path network on it, that metric is guaranteed to be Kalmanson. However, our original proof has the advantage that we see which of the inequalities are strict, and which are actually equalities.

The first important implication of these theorems is that the resistance distance on any 1-nested phylogenetic network N is precisely represented by a unique circular split network N(dNR). Exactly all the splits displayed by the original N are present in N(dNR). Thus the function L applied to the unweighted version of N(dNR) returns the unweighted version of N itself.

Theorem 3.3. Given weighted 1-nested N, we have that N(dNR)¯=Σ(N¯). Thus, L(N(dNR))¯=N¯.

Proof: The first equality follows directly from Theorem 3.2, since neighbor-net is guaranteed to output the splits of the unique circular split network associated to the Kalmanson metric given by the resistance distance, which is indeed all the splits displayed by the network N. Then from Forcey and Scalzo (2020a), we have the second equality since L ○ Σ is shown there to be the identity map. 

The first application implied by this result is that when using neighbor-net on a measured distance matrix, if we assume that it reflects a resistance distance, we can always recover the form of the original network. The weights of splits in the result of neighbor net are interesting, they are in fact terms in the expansion of the calculated resistance distance. However, the first advantage we see is that the original unweighted phylogenetic network can be directly recovered by taking the exterior of the result of neighbor-net.

As an alternative to neighbor-net, there are polytopes which can serve as the domain for linear programming that finds the best-fit 1-nested phylogenetic network for a measured distance matrix.

4. Resistance Distance and Polytopes

In Durell and Forcey (2020), the authors describe a new family of polytopes. This family lies between the Symmetric Traveling Salesman Polytope [STSP(n)] and the Balanced Minimum Evolution Polytope [BME(n)]. Our polytopes are called the level-1 network polytopes BME(n, k) for 0 ≤ kn − 3. All have dimension (n2)n. In Forcey and Scalzo (2020a), we looked at implications of the Galois connections studied there for these polytopes, especially using Sw, the function based on minimum path distance. It turns out that if we assume an input distance metric represents the resistance distance on a 1-nested phylogenetic network N, then the result of neighbor-net or of linear programming on a BME polytope is a network accurately showing all the splits of N. Also, neighbor-net is statistically consistent, as shown in Bryant et al. (2007). Therefore, as a measured set of pairwise distances approach the resistance distance of N, the output of neighbor-net will approach the faithfully phylogenetic circular split network N. This is in contrast to minimum path distance where some genetic connections are assumed to be negligible, and then are lost in the output of neighbor-net. However, the theorems about minimum path distance, specifically Theorems 8, 9, and 11 of Durell and Forcey (2020), play an important role in the proof of Theorem 4.5 here. Here we repeat some of the same introductory definitions and remarks and then extend the results to resistance distance.

Definition 4.1. For a binary, 1-nested phylogenetic network N, (weighted or unweighted) the vector x(N) is defined to have lexicographically ordered components xij(N) for each unordered pair of distinct leaves i, j ∈ [n] as follows:

xij(N)={2k-bijif there exists cyclic order c consistent with N;with i,j adjacent in c,0otherwise.

where k is the number of bridges in N and bij is the number of bridges traversed in a path from i to j. For example, see Figure 16.

FIGURE 16
www.frontiersin.org

Figure 16. Using N from Figure 10, we find N′ as in the proof of Theorem 4.5. In the vector x(N) the first component is x1,2=21-0, since there are no non-trivial bridges traversed between leaves 1 and 2. As well, there are two consistent circular orders, with leaves 1 and 2 adjacent, found by twisting around the single non-trivial bridge. The 19th entry is x5,6=21-1, since the path between leaves 5 and 6 traverse the non-trivial bridge. Here the minimum path distance vector is:
dN=(3.99,4.96,6.43,6.43,6.84,3.99,2.99,4.46,4.46,4.93,3.96,3.49,3.49,3.96,4.93,1,3.49,6.34,3.49,6.34,6.75).

The convex hull of all the x(N) such that binary N has n leaves and k nontrivial bridges is the level-1 network polytope BME(n, k). As shown in Durell and Forcey (2020), the vertices of BME(n, k) are precisely the vectors x(N) for N binary with n leaves and k nontrivial bridges. In light of Theorems 3.1 and 3.2, we can now characterize the vertices in terms of resistance distance:

Theorem 4.2. Every 1-nested phylogenetic network found as an image L[Rw(N)¯] gives rise to a face of BME(n, k) for some k. In particular, the vertices of the polytopes BME(n, k) correspond to images L[Rw(N)¯] which exhibit k non-trivial bridges, for weighted 1-nested networks N with n leaves and such that any node not in a cycle has degree three.

Proof: The image Rw(N) will faithfully represent all splits, as seen in Theorem 3.2. Thus, Rw(N)¯ will be faithfully phylogenetic, in the range of Σ. Specifically, the function Rw will introduce bridges that separate all cycles, thus insuring that any node in a cycle will have degree three. Therefore, if the non-cycle nodes of N are degree three, L[Rw(N)¯] will be a binary unweighted 1-nested phylogenetic network.

Also as shown in Durell and Forcey (2020) and repeated in Forcey and Scalzo (2020a), an equivalent definition of the vector x(N) is the vector sum of the vertices of the STSP(n) which correspond to cyclic orders consistent with N. Recall that the vertices of STSP(n) are the incidence vectors x(c) for each cyclic order c of n, where the i, j component is 1 for i and j adjacent in the order c, 0 otherwise. This equivalent definition for binary 1-nested phylogenetic networks may also be applied to any 1-nested phylogenetic network:

Lemma 4.3. For a 1-nested phylogenetic network N, the vector x(N) is equal to cx(c) where the sum is over all cyclic orders c of [n] consistent with N.

We point out, for the sake of attribution, that for phylogenetic trees t (with nodes of any degree), Lemma 4.3 with N = t gives a formula for x(t) that agrees with the definition of the coefficient nt in Semple and Steel (2004), in the proof of Theorem 4.2 of that paper.

In Forcey and Scalzo (2020a), it is shown that the minimum path distance vector for a 1-nested phylogenetic network may be seen as a linear functional, and that it is minimized over the BME(n, k) polytope. Specifically,

Theorem 4.4. Given any weighted 1-nested phylogenetic network N with n leaves, the product x(N^)·dN is minimized over BME(n, k) precisely for the unweighted binary 1-nested networks N^ with k bridges such that Sw(N)¯Σ(N^).

Here N^ is used to denote a variable binary 1-nested phylogenetic network, taking values from the set of networks which refine Sw(N)¯. By this refinement we mean taking values from the set of networks with a superset of the set of splits displayed by Sw(N)¯. Now we can extend that result to resistance distances. In fact it becomes stronger: binary networks can be directly recovered even when they have long edges, since the action of Rw preserves all splits. Precisely, we have:

Theorem 4.5. The minimum of x(N)·dNR is achieved at the face of BME(n, k) with vertices x(N^), for unweighted binary networks N^ with k bridges such that N^ refines N¯.

Proof: We claim that x(N)·dNR is the same as x(N)·dN for N=Lw(Rw(N)). That is because the leaves which are adjacent in some circular order consistent with N and thus in Rw(N) have distance between them which is the sum of the splits that separate them. Since those leaves are adjacent, the shortest path of splits between them will lie on the exterior of Rw(N). In fact, for adjacent i, j an edge of a cycle on the path between them with weight a, contributes a(b+c+d+...)a+b+c+d+... to dNR(i,j), where the other edges of that cycle have weights b, c, d, .... Bridges e between them contribute their weights w(e). These values are the same as those for the splits displayed between i, j, seen in the proof of Theorem 3.2. Therefore:

x(N)·dNR=cx(c)·dnR                  =cx(c)·ds, for s=Rw(N)                  =cx(c)·dN for N=Lw(Rw(N))                  =x(N)·dN

We know from Theorems 8, 9, and 11 of Durell and Forcey (2020) that for any weighted 1-nested phylogenetic network M with n leaves, the product x(M^)·dM is minimized over BME(n, k) precisely for binary networks M^ with k bridges such that M¯M^.

Thus, in our case we have x(N^)·dN is minimized over BME(n, k) precisely for the unweighted binary networks N^ with k bridges such that N¯N^. Here, (N)¯=N¯, since Lw(Rw(N))¯=N¯. The inequality here is refinement. 

For example compare Figures 10, 16. It is easily checked that although dNdNR, we have x(N)·dN=x(N)·dNR=51.4

The implication then is that using either linear programming on BME(n, 0) or neighbor-net, assuming that the resistance metric is valid, the resulting split network gives the true exterior form of the original 1-nested phylogenetic network.

5. 2-Nested Networks, Counterexamples, and Conjectures

In this section, we examine functions between 1-nested and 2-nested networks, and circular split networks. We point out how well the various distance measurement distinguish or do not distinguish between network types, via examples. Then we make some conjectures based on observations.

5.1. 2-Nested Networks

Toward the end of Gambette et al. (2017), the authors ask: is it possible to characterize split systems induced by more complex uprooted networks such as 2-nested networks (i.e., networks obtained from 1-nested networks by adding a chord to a cycle)? At first we interpret this question to be about the result of applying Sw. That is, we specialize the question to asking more specifically which kinds of split systems correspond to 2-nested networks, via assigning them a weighting, finding the minimum path distance, and then finding the unique corresponding circular split network? The question is still open, but we begin by carefully defining 2-nested networks and making some initial observations.

Definition 5.1. For N an unrooted phylogenetic network, if every edge of N is part of at most two cycles, we call it a 2-nested network. By this definition, 2-nested networks contain 1-nested networks as a subset, which in turn contain 0-nested networks, which are phylogenetic trees. By strict k-nested networks we mean k-nested but not (k − 1)-nested. We will add the extra descriptor of triangle-free-ness explicitly when desired.

A weighted 2-nested network is shown in Figure 17, with its minimum path distance vector.

FIGURE 17
www.frontiersin.org

Figure 17. The minimum path distance vector for the weighted 2-nested network N is dN = (4, 7, 5, 8, 7, 5, 7, 10, 9, 7, 13, 12, 10, 9, 3). Note that d14 = 5, for example, referring to the shortest distance between leaves 1 and 4.

The first case we note is that weighted 2-nested networks often have images under Sw that are not outer-path circular split networks. For instance see Figure 18. Therefore, by Lemma 5.4, 2-nested networks can lead to split networks distinct from those induced via Sw from 1-nested networks. Also, applying Sw and then Lw in sequence will produce a weighted 1-nested network that has a different distance vector than the original.

FIGURE 18
www.frontiersin.org

Figure 18. Here the output of Sw(N) is a non-outer-path circular split network, and its image under Lw has a distance vector that does not match the original: for instance dN(1, 4) = 4 but the distance from 1 to 4 in Lw[Sw(N)] is 5.

However, not all weighted 2-nested networks lead to distinct images from the 1-nested networks, under Sw. In fact we have the following:

Theorem 5.2. For every weighted 1-nested network M, there exists some (not unique) weighted 2-nested network N such that the minimum path distance vectors coincide: dM = dN.

Proof: Consider a 1-nested network M with positive values for its edges and a 2-nested network N that has the same exterior subgraph. Let N also have the same positive values for its exterior edges, but a positive value for its internal chord large enough such that on paths of least distance the internal chord of the 2-nested network is never used. Therefore, both networks will have the same distance vector dM = dN.

5.1.1. Counting 2-Nested Networks

We begin counting the total number of unweighted binary, triangle free, 2-nested networks. The numbers of unweighted binary, triangle free, 2-nested networks exist with n leaves are: 6, 120, 2,790 for n = 4, 5, 6.

First, consider structures with 4 leaves (n = 4). We start by considering the unlabeled pictures, and then count the ways to assign the values 1, …, 4 to the leaves. In fact, we can simplify further by finding the unlabeled 1-nested networks and showing the potential locations of chords simultaneously in each picture. There is one such unlabeled picture for n = 4 as shown in Figure 19, with two possible internal chords. There are 3!2 ways to arrange the leaves before choosing a chord. Therefore, the total number of unweighted binary triangle-free 2-nested networks with n = 4 leaves is (2)3!2=6.

FIGURE 19
www.frontiersin.org

Figure 19. For n = 4, there is only one exterior structure with two internal chords possible (as seen by the dotted lines above). For n = 5, there exist two exterior structures. For n = 6 there are six such structures, labeled (a–f).

For n = 5 the possible internal structures are shown in Figure 19. There are 5 possible internal chords for one structure, and 2 possible internal chords for the other. The number of ways to arrange the leaves of the first structure is n!, and the second structure is (n − 1)! (since the first is not rotationally symmetric.) However, rearranging the leaves clockwise and counterclockwise yield the same rearrangement, so we must then divide by 2 to eliminate half of the arrangements garnered from the counting of those leaves. Finally, if there were a bridge connecting any components of the structure, simply divide by 2 for the twisting around that bridge. The counting for each n = 5 structure in Figure 19 is as follows:

5(2)24!2=60,
4(1)25!212=60.

The total number of networks for n = 5 is = 60 + 60 = 120.

For n = 6 the counting for each structure is as follows (from a to f as pictured in Figure 19):

(a)6(3)25!2=540,(b)(2)(2)4(1)26!21212=720,(c)4(1)26!21412=90,(d)5(2)26!212=900,(e)4(1)26!21412=180,(f)4(1)2(6!)14=360.

The total number of networks for n = 6 is = 540 + 720 + 90 + 900 + 180 + 360 = 2,790. Notice for (f), reading the labels clockwise is not equivalent to reading them counterclockwise due to the tree structures. This means we just consider 6! and not 6!2. We ask whether there is a general formula for the number of binary triangle-free 2-nested networks with n leaves. Alternatively, we might look for a 2-variable formula. In Durell and Forcey (2020), there is a 2-variable formula for binary triangle-free 1-nested networks with n leaves and k non-trivial bridges, which may serve as a model:

(n3k)(n+k1)!(2k+2)!!.

5.2. Indistinguishable Weightings

Resistance distance metrics on a 1-nested phylogenetic network are not in bijection with edge weightings, but the split-equivalence class is an invariant of those edge weights. That is, if two networks N and N′ have the same resistance distance metric dNR=dNR, this does not imply that N = N′, but it does imply that N¯=N¯. The latter fact is implied by Lemma 2.15 and the theorems of section 3, and we can see the former fact via counterexample. In Figure 20, we show two weighted phylogenetic networks with four leaves, called N and N′. Their resistance distances between leaves are identical:

dNR=dNR=(12223,17823,10823,19823,16823,17623).
FIGURE 20
www.frontiersin.org

Figure 20. Two weighted phylogenetic networks with identical resistance distances for their leaves.

Note that we do see that N¯=N¯. There are 7 split-classes of 1-nested phylogenetic networks on four leaves, and our theorems show that none of the other six classes can be given edge weights that yield this same resistance distance metric on four leaves.

5.3. Non-Kalmanson Networks

Not all resistance distances are Kalmanson, even when restricted to phylogenetic networks. For a counterexample, consider the network N formed by having six leaves attached to the six vertices of the complete bipartite graph K3,3, pictured in Figure 21.

FIGURE 21
www.frontiersin.org

Figure 21. A phylogenetic network with non-Kalmanson resistance distance. All the edge lengths are 1.

The resistance distance metric for complete bipartite graphs is found in Klein and Randić (1993). Consider that Km,n is the graph join if two edgeless graphs: Km,n=K¯m+K¯n with unit weight for each edge. Then the resistance distance on Km,n is 2/n for vertices that have no edge between them (they are both the same color), and (m + n − 1)/mn for vertices with an edge between them (Klein and Randić, 1993). For our example N, let the two (same-colored) parts of the graph (3 nodes each, say red and blue) be attached to the leaves {1, 2, 3} and {4, 5, 6}, respectively. Letting each edge have weight 1, we find the resistance distance between any two leaves attached to the same colored part is 2 + 2/3 = 8/3, while the distance between any two leaves, with one attached to each part, is 2 + 5/9 = 23/9. In any circular order of the leaves, there will be a sub-sequence i, j, k, l where the first two leaves i, j are attached to the same color, and the second two k, l are both attached to the other color. Thus dNR(i,j)+dNR(k,l)=16/3=48/9 which is larger than dNR(i,k)+dNR(j,l)=46/9. This counterexample raises the question of necessary conditions for a network with resistance distance to be Kalmanson.

5.4. Outer Planarity

We conjecture that outer planarity is a sufficient condition for Kalmanson: that if a weighted phylogenetic network can be drawn in the plane with its leaves on the exterior that the resistance distance is Kalmanson. We note that it this condition is not necessary: it can be checked that the complete graph K5 with unit edges has the Kalmanson property.

5.5. Faithfully Phylogenetic Kalmanson Distance Vectors

Following the terminology in Definition 2.7, we call a Kalmanson distance vector d faithfully phylogenetic if the unique circular split network associated to d is in the range of Σ (after forgetting weights). We conjecture that faithfully phylogenetic Kalmanson distance vectors always arise from resistance distances. Specifically we conjecture that if d is faithfully phylogenetic, then d=dNR for some weighted phylogenetic network N. Note that not all Kalmanson distance vectors arise from resistance distances, simply due to the fact that not all circular split networks are in the range of Σ.

5.6. 2-Nested Kalmanson Networks

A special case of 5.5 is the conjecture that 2-nested phylogenetic networks have Kalmanson resistance distance. For instance in Figure 22 we show a simple 2-nested network N whose resistance distance is clearly Kalmanson: in fact it is the same resistance distance as possessed by the shown 1-nested network.

FIGURE 22
www.frontiersin.org

Figure 22. Two weighted phylogenetic networks with identical resistance distances for their leaves, and their common split network.

5.7. Indistinguishable Weightings and Invariants

We conjecture that for every weighted 2-nested network there is a weighted 1-nested network with matching resistance distance. Again see Figure 22. However, in light of the above conjecture 5.4, we conjecture that the exterior shape of networks is an invariant of resistance distance: specifically that if any two outer planar networks N, N′ have dNR=dNR then L(Rw(N)¯)=L(Rw(N)¯).

5.8. Limiting Case

Consider when an edge in a cycle of N has a very large weight, or high resistance. As this weight grows, the limit of Lw(Rw(N)) approaches a network with that edge being deleted entirely. We see this by considering any two circuit-parallel paths with resistance R1 and R2 the first of which uses an edge with variable weight w (all other weights constant). Then letting w → ∞ implies R1 → ∞ and thus R1R2/(R1 + R2) approaches R2 by L'Hospital's rule. Thus, as w goes to ∞ we see that the resistance distances using those circuit-parallel paths reduce to the path distances, and so the distance metric from that network approaches one without that edge. This is similar to the way in which Sw, which uses the minimal path distance on N, serves to delete some edges as seen in Figure 10.

6. Distance Measures

A question is raised about the mathematics which precedes the work described in this paper: what sort of measurement should actually yield the experimental resistance distances in a real example? What should play the role of attaching the ohmmeter to pairs of wires? Usually, DNA sequences of length m are aligned (a multi-step problem of its own) and then the number of disagreeing sites is counted. Let p be the proportion of disagreements to the length m of the sequence: p = (mc)/m where c is the number of correct, matching sites. Then there is a selection of mutation models, such as the simplest Jukes-Cantor model, which predict a distance D which is the expected total number of mutations. Experimentally we find that distance D as a function of the observed disagreements. Alternately we could choose D from the list of evolutionary models: for instance

D=K=-12ln((1-2p-q)1-2q)

for Kimura's two parameter model. Or, alignment-free models such as the k-mer distance measures as described in Allman et al. (2017).

Here, we would want a distance D = R which is summed when in sequence but obeys the Ohm equations. The answer will depend both on the model of mutation we choose and the model of recombination we choose. For instance, D=-34ln(1-43p) for the Jukes-Cantor model, as described in Jukes and Cantor (1969). Rewriting using p = (mc)/c we have:

D(c)=34ln(3m4c-m).

D has the graph in Figure 23. The c-axis is explained by the fact that in the Jukes-Cantor model, mutations of the 4 nucleotides A, G, T, C can replace any letter with another—including a self replacement. This implies that the smallest number of matching sites is m4, while the largest is m. We can use D for the resistance distance only if there is experimental evidence that for circuit-parallel paths we have D = D1D2/(D1 + D2), where D1(c1) and D2(c2) are the distances for each path, in expected numbers of mutations as a function of correct matching sites. There are certainly some features of D that look promising, including the shape of its graph: resistance typically ranges from 0 to infinity. Assuming that the formula for D over the circuit-parallel paths does hold, when one of the circuit-parallel resistances is infinite: say D1 → ∞; then we see that DD2. Similarly, as c1m/4, we have that c, the number of correct sites after recombination, approaches c2.

FIGURE 23
www.frontiersin.org

Figure 23. Calculated Jukes-Cantor distance D as a function of the number of matching sites c in aligned sequences of length m.

When both branches have the same distance D1 = D2, and it obeys Ohm's law, we see the total resistance D = D1/2. Using the formula for D(c) and D1(c1) and solving for c we get the following function, graphed in Figure 24:

c=m4+3(m4c1-(m4)2).

Thus, as a first check the geneticist could compare two genomes and their hybrid genome with a common ancestor. When the two are close to the same distance from the common ancestor (both have c1 matching sites), then the pair (c1, c) for c the number of matches between the hybrid and the common ancestor might fit the parabola as seen in Figure 24. If that fit is achieved, then it would be reasonable to apply the theorems of this paper.

FIGURE 24
www.frontiersin.org

Figure 24. On the left is a simple parallel circuit with identical resistance on each branch. If the resistance is the Jukes-Cantor distance and obeys the Ohm laws, then the number c of matching sites at the end of the circuit will depend on the number c1 of correct matching sites at the end of each branch before recombination.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author Contributions

SF did the majority of research and writing. DS: student researcher, contributed most of section 5.1, 2-nested networks, from his thesis, especially the combinatorics. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We are thankful for excellent proofreading and suggestions from the reviewers and for conversations with Jim Stasheff and Robert Kotiuga. This manuscript has been released as a pre-print at arxiv.org/abs/2007.13574 (Forcey and Scalzo, 2020b).

References

Allman, E. S., Rhodes, J. A., and Sullivant, S. (2017). Statistically consistent k-mer methods for phylogenetic tree reconstruction. J. Comput. Biol. 24, 153–171. doi: 10.1089/cmb.2015.0216

PubMed Abstract | CrossRef Full Text | Google Scholar

Bapat, R. B. (2004). Resistance matrix of a weighted graph. Commun. Math. Comp. Chem. 50, 73–82. Available online at: http://match.pmf.kg.ac.rs/electronic_versions/Match50/match50_73-82.pdf

Bryant, D., Moulton, V., and Spillner, A. (2007). Consistency of the neighbor-net algorithm. Algorith. Mol. Biol. 2:8. doi: 10.1186/1748-7188-2-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Curtis, E. B., Ingerman, D., and Morrow, J. A. (1998). Circular planar graphs and resistor networks. Lin. Algeb. Appl. 283, 115–150. doi: 10.1016/S0024-3795(98)10087-3

CrossRef Full Text | Google Scholar

Curtis, E. B., and Morrow, J. A. (1990). Determining the resistors in a network. SIAM J. Appl. Math. 50, 918–930. doi: 10.1137/0150055

CrossRef Full Text | Google Scholar

Curtis, E. B., and Morrow, J. A. (1991). The Dirichlet to Neumann map for a resistor network. SIAM J. Appl. Math. 51, 1011–1029. doi: 10.1137/0151051

CrossRef Full Text | Google Scholar

Dress, A., Huber, K. T., Koolen, J., Moulton, V., and Spillner, A. (2012). Basic Phylogenetic Combinatorics. Cambridge: Cambridge University Press.

Google Scholar

Durell, C., and Forcey, S. (2020). Level-1 phylogenetic networks and their balanced minimum evolution polytopes. J. Math. Biol. 80, 1235–1263. doi: 10.1007/s00285-019-01458-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Ejov, V., Filar, J. A., Haythorpe, M., Roddick, J. F., and Rossomakhine, S. (2019). A note on using the resistance-distance matrix to solve Hamiltonian cycle problem. Annals Oper. Res. 261, 393–399. doi: 10.1007/s10479-017-2571-7

CrossRef Full Text | Google Scholar

Erné, M., Koslowski, J., Melton, A., and Strecker, G. E. (1993). “A primer on Galois connections,” in Papers on General Topology and Applications (Madison, WI, 1991), Volume 704 (New York, NY: New York Acad. Sci), 103–125.

Google Scholar

Forcey, S., and Scalzo, D. (2020a). Galois connections for phylogenetic networks and their polytopes. J. Algeb. Comb. doi: 10.1007/s10801-020-00974-z

CrossRef Full Text | Google Scholar

Forcey, S., and Scalzo, D. (2020b). Phylogenetic networks as circuits with resistance distance. arXiv [Preprint]. Available online at: https://arxiv.org/pdf/2007.13574.pdf

Google Scholar

Gambette, P., Huber, K. T., and Scholz, G. E. (2017). Uprooted phylogenetic networks. Bull. Math. Biol. 79, 2022–2048. doi: 10.1007/s11538-017-0318-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Huson, D. H. (1998). Splits-tree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73. doi: 10.1093/bioinformatics/14.1.68

PubMed Abstract | CrossRef Full Text | Google Scholar

Jukes, T., and Cantor, C. (1969). “Evolution of protein molecules,” in Mammalian Protein Metabolism, ed H. N. Munro (New York, NY: Academic Press), 21–132.

Google Scholar

Klein, D.J., and Randić, M. (1993). Resistance distance. J. Math. Chem. 12, 81–95. doi: 10.1007/BF01164627

CrossRef Full Text | Google Scholar

Levy, D., and Pachter, L. (2011). The neighbor-net algorithm. Adv. Appl. Math. 47, 240–258. doi: 10.1016/j.aam.2010.09.002

CrossRef Full Text | Google Scholar

Semple, C., and Steel, M. (2004). Cyclic permutations and evolutionary trees. Adv. Appl. Math. 32, 669–680. doi: 10.1016/S0196-8858(03)00098-8

CrossRef Full Text | Google Scholar

Steel, M. (2016). Phylogeny—Discrete and Random Processes in Evolution Vol. 89 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM).

Google Scholar

Yang, Y., and Klein, D. J. (2015). Resistance distance-based graph invariants of subdivisions and triangulations of graphs. Discrete Appl. Math. 181, 260–274. doi: 10.1016/j.dam.2014.08.039

CrossRef Full Text | Google Scholar

Yang, Y., and Klein, D. J. (2019). Two-point resistances and random walks on stellated regular graphs. J. Phys. A 52:075201. doi: 10.1088/1751-8121/aaf8e7

CrossRef Full Text | Google Scholar

Keywords: phylogenetic network, resistance, linear program (LP), polytope, circuit

Citation: Forcey S and Scalzo D (2020) Phylogenetic Networks as Circuits With Resistance Distance. Front. Genet. 11:586664. doi: 10.3389/fgene.2020.586664

Received: 23 July 2020; Accepted: 07 September 2020;
Published: 15 October 2020.

Edited by:

Ruriko Yoshida, Naval Postgraduate School, United States

Reviewed by:

Abraham Martin Del Campo, Centro de Investigación en Matemáticas, Mexico
Benjamin Keith Hollering, North Carolina State University, United States

Copyright © 2020 Forcey and Scalzo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Stefan Forcey, c2ZvcmNleSYjeDAwMDQwO3Vha3Jvbi5lZHU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.