- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
Graph summarization has received much attention lately, with various works tackling the challenge of defining pooling operators on data regions with arbitrary structures. These contrast the grid-like ones encountered in image inputs, where techniques such as max-pooling have been enough to show empirical success. In this work, we merge the Mapper algorithm with the expressive power of graph neural networks to produce topologically grounded graph summaries. We demonstrate the suitability of Mapper as a topological framework for graph pooling by proving that Mapper is a generalization of pooling methods based on soft cluster assignments. Building upon this, we show how easy it is to design novel pooling algorithms that obtain competitive results with other state-of-the-art methods. Additionally, we use our method to produce GNN-aided visualisations of attributed complex networks.
1 Introduction
The abundance of relational information in the real world and the success of deep learning techniques have brought renowned interest in learning from graph-structured data. Efforts in this direction have been primarily focused on replicating the hierarchy of convolutional filters and pooling operators, which have achieved previous success in computer vision Sperduti. (1994); Goller and Kuchler. (1996); Gori et al. (2005); Scarselli et al. (2009); Bruna et al. (2014); Li et al. (2015), within relational data domains. In contrast to image processing applications, where the signal is defined on a grid-like structure, designing graph coarsening (pooling) operators is a much more difficult problem, due to the arbitrary structure typically present in graphs.
In this work, we introduce Structural Deep Graph Mapper (SDGM)1—an adaptation of Mapper (Singh et al., 2007), an algorithm from the field of Topological Data Analysis (TDA) (Chazal and Michel, 2017), to graph domains. First, we prove that SDGM is a generalization of pooling methods based on soft cluster assignments, which include state-of-the-art algorithms like minCUT (Bianchi et al., 2019) and DiffPool (Ying et al., 2018). Building upon this topological perspective of graph pooling, we propose two pooling algorithms leveraging fully differentiable and fixed PageRank-based “lens” functions, respectively. We demonstrate that these operators achieve results competitive with other state-of-the-art pooling methods on graph classification benchmarks. Furthermore, we show how our method offers a means to flexibly visualize graphs and the complex data living on them through a GNN “lens” function.
2 Related Work
In this section, we investigate the existing work in the two broad areas that our method is part of—graph pooling (also deemed hierarchical representation learning) and network visualisations.
2.1 Graph Pooling
Algorithms have already been considerably explored within GNN frameworks for graph classification. Luzhnica et al. (2019) propose a topological approach to pooling, which coarsens the graph by aggregating its maximal cliques into new clusters. However, cliques are local topological features, whereas our methods leverage a global perspective of the graph during pooling. Two paradigms distinguish themselves among learnable pooling layers: Top-k pooling based on a learnable ranking (Gao and Ji, 2019), and learning the cluster assignment (Ying et al., 2018) with additional entropy and link prediction losses for more stable training (DiffPool). Following these two trends, several variants and incremental improvements have been proposed. The Top-k approach is explored in conjunction with jumping-knowledge networks (Cangea et al., 2018), attention (Huang et al., 2019; Lee et al., 2019) and self-attention for cluster assignment (Ranjan et al., 2019). Similarly to DiffPool, the method suggested by Bianchi et al. (2019) uses several loss terms to enforce clusters with strongly connected nodes, similar sizes and orthogonal assignments. A different approach is also proposed by Ma et al. (2019), who leverage spectral clustering.
2.2 Graph Visualization
Graph visualization is a vast topic in network science. We therefore refer the reader to existing surveys, for a complete view of the field (Nobre et al., 2019; von Landesberger et al., 2011; Beck et al., 2017), and focus here only on methods that, similarly to ours, produce node-link-based visual summaries through the aggregation of static graphs. Previous methods rely on grouping nodes into a set of predefined motifs (Dunne and Shneiderman, 2013), modules (Dwyer et al., 2013) or clusters with basic topological properties (Batagelj et al., 2010). Recent approaches have considered attribute-driven aggregation schemes for multivariate networks. For instance, PivotGraph (Wattenberg, 2006) groups the nodes based on categorical attributes, while van den Elzen and van Wijk. (2014) propose a more sophisticated method using a combination of manually specified groupings and attribute queries. However, these mechanisms are severely constrained by the simple types of node groupings allowed and the limited integration between graph topology and attributes. Closest to our work, Mapper-based summaries for graphs have recently been considered by Hajij et al. (2018). Despite the advantages provided by Mapper, their approach relies on hand-crafted graph-theoretic “lenses,” such as the average geodesic distance, graph density functions or eigenvectors of the graph Laplacian. Not only are these functions unable to fully adapt to the graph of interest, but they are also computationally inefficient and do not take into account the attributes of the graph.
3 Background and Formal Problem Statement
3.1 Formal Problem Statement
Consider a dataset whose samples are formed by a graph
We briefly review the Mapper (Singh et al., 2007) algorithm, with a focus on graph domains (Hajij et al., 2018). We first introduce the required mathematical background.
Definition 3.1: Let
Definition 3.2: Let X be a topological space with an open cover
3.2 Mapper
Given a topological space X, a carefully chosen lens function
Mapper for Graphs. More recently, Hajij et al. (2018) considered the case when the input topological space
However, the graph version of Mapper described above has two main limitations. Firstly, the graph-theoretic functions considered for f are rather limited, not taking into account the signals which are typically defined on the graph in signal processing tasks, such as graph classification. Secondly, by using a pull back cover only over the graph vertices, as opposed to a cover of the entire graph, the method relies exclusively on the lens function to capture the structure of the graph and the edge-connections between the clusters. This may end up discarding valuable structural information, as we later show in Section 7.7.
4 Structural Deep Graph Mapper
Structural Graph Mapper. One of the disadvantages of the graph version of Mapper (described in the background section) is that its output does not explicitly capture the connections between the resulting collections of clusters. This is primarily because the lens function f is defined only over the set of vertices V and, consequently, the resulting pull-back cover only covers V. In contrast, one should aim to obtain a cover for the graph G, which automatically includes the edges. While this could be resolved by considering a lens function over the geometric realization of the graph, handling only a finite set of vertices is computationally convenient.
To balance these trade-offs, we add an extra step to the Mapper algorithm. Concretely, we extend the refined pull back cover into a cover over both nodes and edges. Given the set of refined clusters
Remark 1: We note that Structural Mapper, unlike the original Mapper method, encodes two types of relationships via the edges of the output graph. The semantic connections highlight a similarity between clusters, according to the lens function (that is, two clusters have common nodes—as before), while structural connections show how two clusters are connected (namely, two clusters have at least one edge in common). This latter type of connection is the result of considering the extended cover over the edges. The two types of connections are not mutually exclusive because two clusters might have both nodes and edges in common. We now broadly outline our proposed method, using the three main degrees of freedom of the Mapper algorithm to guide our discussion: the lens function, the cover, and the clustering algorithm.
4.1 Lens
The lens is a function
Unlike the traditional graph theoretic lens functions proposed by Hajij et al. (2018), GNNs can naturally learn to integrate the features associated with the graph and its topology, while also scaling computationally to large, complex graphs. Additionally, visualisations can be flexibly tuned for the task of interest, by adjusting the lens
4.2 Cover
The cover
Another choice that we employ for designing differentiable pooling algorithms is a set of RBF kernels, where the second arguments of kernel functions are distributed over the real line. We introduce this in detail in Section 5.2.
4.3 Clustering
Clustering statistically approximates the (topological) connected components of the cover sets
However, relying on the topological connected components introduces certain challenges when the aim is to obtained a coarsened graph. Many real-world graphs comprise thousands of connected components, which is a lower bound to the number of connected components of the graph produced by
We broadly refer to this instance of Structural Graph Mapper, with the choices described above, as Structural Deep Graph Mapper (SDGM). We summarize it step-by-step in the cartoon example in Figure 1 and encourage the reader to refer to it.
FIGURE 1. A cartoon illustration of Structural Deep Graph Mapper (SDGM) where, for simplicity, a graph neural network (GNN) approximates a “height” function over the nodes in the plane of the diagram. The input graph (A) is passed through the GNN, which maps the vertices of the graph to a real number (the height) (B–C). Given a cover
5 Structural Graph Mapper for Pooling
We begin this section by introducing several theoretical results, which provide a connection between our version of Mapper and other graph pooling algorithms. We then use these results to show how novel pooling algorithms can be designed.
5.1 Relationship to Graph Pooling Methods
An early suggestion that Mapper could be suitable for graph pooling is given by the fact that it constitutes a generalization of binary spectral clustering, as observed by Hajij et al. (2018). This link is a strong indicator that Mapper can compute “useful” clusters for pooling. We formally restate this observation below and provide a short proof.
Proposition 5.1: Let L be the Laplacian of a graph
Lemma 5.1: The adjacency matrix
Proposition 5.2:
5.2 Differentiable Mapper Pooling
The main challenge for making pooling via Mapper differentiable is to differentiate through the pull back computation. To address this, we replace the cover of n overlapping intervals over the real line, described in the previous section, with a cover formed of overlapping RBF kernels
where the lens function
In Figure 2 we show two examples of RBF kernel covers for the output space. The scale of the kernel, δ, determines the amount of overlap between the cover elements. At bigger scales, there is a higher overlap between the clusters, as shown in the two plots. Because the line is one-dimensional, a point on the unit interval can only be part of a small number of clusters (that is, the kernels for which the value is greater than zero), assuming the scale δ is not too large. Therefore, DMP can be seen as a DiffPool variant where the low-entropy constraint on the cluster assignment is satisfied topologically, rather than by a loss function enforcing it.
FIGURE 2. Two covers of RBF kernels with different scales:
5.3 Mapper-Based PageRank Pooling
To evaluate the effectiveness of the differentiable pooling operator, we also consider a fixed and scalable non-differentiable lens function
where
where
5.4 Model
For the graph classification task, each example G is represented by a tuple
where
We note that either of these pooling operators could readily be adapted to the recently proposed message passing simplicial neural networks (MPSNs) (Bodnar et al., 2021) as a tool for coarsening simplicial complexes by dropping the 1-skeleton operator after computing the nerve. We leave this endeavor for future work.
5.5 Complexity
The topology of the output graph can be computed in
6 Pooling Experiments
6.1 Tasks
We illustrate the applicability of the Mapper-GNN synthesis within a pooling framework, by evaluating DMP and MPR in a variety of settings: social (IMDB-Binary, IMDB-Multi, Reddit-Binary, Reddit-Multi-5k), citation networks (Collab) and chemical data (D&D, Mutag, NCI1, Proteins) (Kersting et al., 2016).
6.2 Experimental Setup
We adopt a 10-fold cross-validation approach to evaluating the graph classification performance of DMP, MPR and other competitive state-of-the-art methods. The random seed was set to zero for all experiments (with respect to dataset splitting, shuffling and parameter initialisation), in order to ensure a fair comparison across architectures. All models were trained on a single Titan Xp GPU, using the Adam optimiser (Kingma and Ba, 2014) with early stopping on the validation set, for a maximum of 30 epochs. We report the classification accuracy using 95% confidence intervals calculated for a population size of 10 (the number of folds).
6.3 Baselines
We compare the performance of DMP and MPR to two other pooling methods that we identify mathematical connections with: minCUT (Bianchi et al., 2019) and DiffPool (Ying et al., 2018). Additionally, we include Graph U-Net (Gao and Ji, 2019) in our evaluation, as it has been shown to yield competitive results while performing pooling from the perspective of a learnable node ranking; we denote this approach by Top-k in the remainder of this section. The non-pooling baselines evaluated are the WL kernel (Shervashidze et al., 2011), a “flat” model (2 MP steps and global average pooling) and an average-readout linear classifier.
We optimize both DMP and MPR with respect to the cover cardinality n, the cover overlap (δ for DMP, overlap percentage g for MPR), learning rate and hidden size. The Top-k architecture is evaluated using the code provided in the official repository, where separate configurations are defined for each of the benchmarks. The minCUT architecture is represented by the sequence of operations described by Bianchi et al. (2019): MP(32)-pooling-MP(32)-pooling-MP(32)-GlobalAvgPool, followed by a linear softmax classifier. The MP(32) block represents a message-passing operation performed by a graph convolutional layer with 32 hidden units:
where
6.4 Evaluation Procedure
The best procedure for evaluating GNN pooling layers remains a matter of debate in the graph machine learning community. One may consider a fixed GNN architecture with a different pooling layer for each baseline; alternatively, the whole architecture can be optimized for each type of pooling layer. The first option, more akin to the typical procedure for evaluating pooling layers in CNNs on image domains, is used in papers like minCUT (Bianchi et al., 2019). The second option is more particular to GNNs and it is employed, for instance, by DiffPool (Ying et al., 2018). In this work, we choose the latter option for our evaluation.
We argue that for non-Euclidean domains, such as graph ones, the relationships between the nodes of the pooled graph and the ones of the input graph are semantically different from one pooling method to another. This is because pooling layers have different behaviors and may interact in various ways with the interleaved convolutional layers. Therefore, evaluating the same architecture with only the pooling layer(s) swapped is restrictive and might hide the benefits of certain operators. For example, Top-k pooling (one of our baselines) simply drops nodes from the input graph, instead of computing a smaller number of clusters from all nodes. Assume we fix the pooled graph to have only one node. Then Top-k would only select one node from the original graph. In contrast, DiffPool would combine the information from the entire graph in a single node. DiffPool would thus have access to additional information with respect to Top-k, so it would be unfair to conclude that one model is better than the other in such a setting. These differences implicitly affect the features of the output graph at that layer, which in turn affect the next pooling layer, as its computation depends on the features. This can have a cascading effect on the overall performance of the model. One might also argue that this procedure makes the evaluated models more homogeneous and, therefore, easier to compare. While this is true, the conclusions one can draw from such a comparison are much more limited because they are restricted to the particular architecture that was chosen.
For this reason, we have either run models with hyperparameters as previously reported by the authors, or optimized them ourselves end-to-end, where applicable. The best-performing configurations were (Appendix A details the hyperparameter search):
• MPR—learning rate
• D&D, Collab, Reddit-Binary, Reddit-Multi-5K: cover sizes
• Proteins, NCI1: cover sizes
• Mutag, IMDB-Binary, IMDB-Multi: cover sizes
• DMP—learning rate
• Proteins: cover sizes
• Others: cover sizes
• Top-k—specific dataset configurations, as provided in the official GitHub repository3;
• minCUT—learning rate
• DiffPool—learning rate
• Flat: hidden size 32.
6.5 Pooling Results
The graph classification performance obtained by these models is reported in Table 1. We reveal that MPR ranks either first or second on all social datasets, or achieves accuracy scores within
TABLE 1. Results obtained on classification benchmarks. Accuracy measures with 95% confidence intervals are reported. The highest result is bolded and the second highest is underlined. The first columns four are molecular graphs, while the others are social graphs. Our models perform competitively with other state of the art models.
7 Mapper for Visualisations
Graph pooling methods and summarized graph visualisations methods can be seen as two sides of the same coin, since both aim to condense the information in the graph. We now turn our attention to the latter.
7.1 Visualisations in Supervised Learning
The first application of DGM is in a supervised learning context, where
FIGURE 3. SDGM visualization using as a lens function the GNN-predicted probability of a node in the network to be Spam. The (A) is colored with the average predicted spam probability in each cluster, whereas the (B) is colored by the proportion of true spammers in each node.
Through the central cluster node, the SDGM visualization correctly shows how spammers occupy an essential place in the network, while non-spammers tend to form many smaller disconnected communities. When labels are available, we also produce visualisations augmented with ground-truth information. These visualisations can provide a label-driven understanding of the graph. For instance, in Figure 3 (right) we color each node of the SDGM visualization according to the most frequent class in the corresponding cluster. This second visualization, augmented with the ground-truth information, can also be used to compare with the model predictions.
7.2 Visualization in Unsupervised Learning
The second application corresponds to an unsupervised learning scenario, where the challenge is obtaining a parameterization of the graph in the absence of labels. This is the typical use case for unsupervised graph representation learning models (Chami et al., 2020). The approach we follow is to train a model to learn node embeddings in
7.3 Qualitative Evaluation
In this section, we qualitatively compare SDGM against the two best-performing graph theoretic lens functions proposed by Hajij et al. (2018), on the Cora and CiteSeer (Sen et al., 2008) and PubMed (Yang et al., 2016) citation networks. Namely, we compare against a PageRank (Page et al., 1999) lens function and a graph density function
As depicted by Figure 4, SDGM successfully summarizes many of the properties of the graphs that are also reflected by full graph visualisations. For instance, on Cora, Genetic Algorithms (in dark orange) are shown to be primarily connected to Reinforcement Learning (orange). At the same time, related classes that largely overlap in the full visualisation—Probabilistic Methods and Neural Networks (NNs) on Cora or Information Retrieval (IR) and ML on CiteSeer—are connected in the SDGM plot. In contrast, the baselines do not have the same level of granularity and fail to capture many such properties. Both PageRank and the graph density function tend to focus on the classes with the highest number of nodes, such as the IR class on CiteSeer or the NNs class on Cora, while largely de-emphasizing other classes.
FIGURE 4. Qualitative comparison between SDGM (first column), Mapper with an RBF graph density function (Hajij et al., 2018) (second), and Mapper with a PageRank function (Hajij et al., 2018) (third). The Graphviz visualization of the graph cores (fourth column) are added for reference. The rows show plots for Cora, CiteSeer, and PubMed, respectively. The graphs are colored based on the most frequent class in each cluster to aid the comparison. SDGM with unsupervised lens implicitly makes all dataset classes appear in the visualization more clearly separated. This does not happen in the baseline visualisations, which mainly focus on the class with the highest number of nodes from each graph.
7.3.1 Limitations
The proposed visualisations also present certain limitations. In an unsupervised learning setting, in the absence of any labels or attributes for coloring the graph, the nodes have to be colored based on a colourmap associated with the abstract embedding space, thus affecting the interpretability of the visualisations. In contrast, even though the graph theoretic lens functions produce lower quality visualisations, their semantics are clearly understood mathematically. This is, however, a drawback shared even by some of the most widely used data visualization methods, such as t-SNE or UMAP (McInnes et al., 2018). In what follows, we present additional visualisations and ablation studies.
7.4 Ablation Study for Dimensionality Reduction
We study how the choice of the dimensionality reduction method for the unsupervised visualisations affects the output. To test this, we consider the following dimensionality reduction methods: t-SNE (van der Maaten and Hinton, 2008), UMAP (McInnes et al., 2018), IsoMap (Tenenbaum et al., 2000) and PCA. We use the same model as in Section 7.2 and Section 8. 2D cells for the cover of all models. The overlap was set after fine-tuning to 0.2 for t-SNE and UMAP, and to 0.1 for the other two models. Figure 5 displays the four visualisations. As expected, t-SNE and UMAP produce more visually pleasing outputs, due to their superior ability to capture variation in the GNN embedding space. However, the features highlighted by all visualisations are largely similar, generally indicating the same binary relations between clusters. This demonstrates that the GNN embedding space is robust to the choice of the dimensionality reduction method.
FIGURE 5. Ablation for dimensionality reduction methods; left–right, top–bottom: t-SNE, PCA, Isomap, UMAP. While t-SNE and UMAP produce slightly better visualisations, the graph features displayed by the visualisations are roughly consistent across all of the four dimensionality reduction techniques.
7.5 Ablation for the Unsupervised Lens
To better understand the impact of GNNs on improving the quality of the Mapper visualisations, we perform an ablation study on the type of unsupervised lens functions used within Mapper. The first model we consider is simply the identity function taking as input only graph features. The second model is a randomly initialized DGI model. Despite the apparent simplicity of a randomly initialized model, it was shown that such a method produces reasonably good embeddings, often outperforming other more sophisticated baselines (Veličković et al., 2018). Finally, we use our trained DGI model from Section 7.2. For all models, we perform a t-SNE reduction of their embedding space to obtain a 2D output space and use 81 overlapping cells that cover this space. An overlap of 0.2 is used across all models.
The three resulting visualisations are depicted in Figure 6. The identity model and the untrained DGI model do not manage to exploit the dataset structure and neither does particularly well. In contrast, the trained DGI model emphasizes all the classes in the visualization, together with their main interactions.
FIGURE 6. Ablation for different types of unsupervised lenses (identity, untrained DGI, trained DGI). The trained DGI model significantly improves the quality of the visualisations.
7.6 Hierarchical Visualisations
One of the most powerful features of Mapper is the ability to produce multi-resolution visualisations through the flexibility offered by the cover hyperparameters. As described in Section 4, having a higher number of cells covering the output space results in more granular visualisations containing more nodes, while a higher overlap between these cells results in increased connectivity. We highlight these trade-offs in Figure 7, where we visualize the Cora citation network using nine combinations of cells and overlaps. These kinds of hierarchical visualisations can help one identify the persistent features of the graph. For instance, when inspecting the plots that use
FIGURE 7. Hierarchical visualisations of the Cora citation network using various number of cover cells and degrees of overlap. Rows (top–bottom) have a different overlap (g) between intervals:
7.7 The Importance of Capturing Structural Information
In this section, we revisit the synthetic spammer dataset to illustrate the importance of capturing structural information via the edge-refined pull back cover operator. To that end, we compare SDGM with a version using the usual refined pull back cover as in Hajij et al. (2018), while using the same lens function for both (a GCN classifier). We refer to the latter as DGM. The visualisations produced by the two models are included in Figure 8. We note that while both models capture the large cluster of spammers at the center of the network and the smaller communities of non-spammers, DGM does not capture the structural relationships between spammers and non spammers since it encodes only semantic relations.
FIGURE 8. DGM (A) vs SDGM (B) visualization of the sythetic spammer datasets. DGM does not capture important relational information between spammers and non-spammers.
8 Conclusion
We have introduced Deep Graph Mapper, a topologically grounded method for producing informative graph visualisations with the help of GNNs. We have shown these visualisations are not only useful for understanding various graph properties, but can also aid in visually identifying classification mistakes. Additionally, we have proved that Mapper is a generalization of soft cluster assignment methods, effectively providing a bridge between graph pooling and the TDA literature. Based on this connection, we have proposed two Mapper-based pooling operators: a simple one that scores nodes using PageRank and a differentiable one that uses RBF kernels to simulate the cover. Our experiments show that both layers yield architectures competitive with several state-of-the-art methods on graph classification benchmarks.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.
Author Contributions
CB and CC have contributed equally in designing the model. CB has performed the visualization experiments and proved the theoretical results. CC has performed the pooling experiments. PL is the senior author.
Funding
CC is supported by EPSRC NRAG/465 NERC CDT Dream (grant no. NE/M009009/1). PL is funded by EPSRC.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
We would like to thank both reviewers for their constructive feedback and useful iterations on the manuscript. We would like to thank Petar Veličković, Ben Day, Felix Opolka, Simeon Spasov, Alessandro Di Stefano, Duo Wang, Jacob Deasy, Ramon Viñas, Alex Dumitru and Teodora Reu for their constructive comments. We are also grateful to Teo Stoleru for helping with the diagrams.
Footnotes
1Code to reproduce models and experimental results is available at https://github.com/crisbodnar/dgm.
2Note that one or more {embedding
3https://github.com/HongyangGao/Graph-U-Nets/blob/48aa171b16964a2466fceaf4cb06fc940d649294/run_GUNet.sh
References
Batagelj, V., Didimo, W., Liotta, G., Palladino, P., and Patrignani, M. (2010). Visual Analysis of Large Graphs Using. x.Y)-Clustering and Hybrid Visualizations. In (2010). IEEE Pacific Visualization Symposium (PacificVis). 209–216.
Beck, F., Burch, M., Diehl, S., and Weiskopf, D. (2017). A Taxonomy and Survey of Dynamic Graph Visualization. Comp. Graphics Forum 36, 133–159. doi:10.1111/cgf.12791
Bianchi, F. M., Grattarola, D., and Alippi, C. (2019). Mincut Pooling in Graph Neural Networks (arXiv preprint arXiv:1907.00481.
Bodnar, C., Frasca, F., Wang, Y. G., Otter, N., Montúfar, G., Liò, P., et al. (2021). Weisfeiler and lehman Go Topological: Message Passing Simplicial Networks (arXiv preprint arXiv:2103.03212.
Bruna, J., Zaremba, W., Szlam, A., and Lecun, Y. (2014). Spectral Networks and Locally Connected Networks on Graphs. ICLR.
Cangea, C., Veličković, P., Jovanović, N., Kipf, T., and Liò, P. (2018). Towards Sparse Hierarchical Graph Classifiers (arXiv preprint arXiv:1811.01287.
Carriere, M., Michel, B., and Oudot, S. (2018). Statistical Analysis and Parameter Selection for Mapper. J. Machine Learn. Res. 19, 478–516.
Carrière, M., and Oudot, S. (2018). Structure and Stability of the One-Dimensional Mapper. Found. Comput. Math. 18, 1333–1396. doi:10.1007/s10208-017-9370-z
Chami, I., Abu-El-Haija, S., Perozzi, B., Ré, C., and Murphy, K. (2020). Machine Learning on Graphs: A Model and Comprehensive Taxonomy (ArXiv abs/2005.03675.
Chazal, F., and Michel, B. (2017). An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists (arXiv preprint arXiv:1710.04019.
Demmel, J. (1995). UC Berkeley CS267 - Lecture 20: Partitioning Graphs without Coordinate Information II.
Dey, T. K., Mémoli, F., and Wang, Y. (2017). Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers. In Symposium on Computational Geometry.
Dunne, C., and Shneiderman, B. (2013). Motif Simplification. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery), CHI ’13, 3247–3256. doi:10.1145/2470654.2466444
Dwyer, T., Riche, N. H., Marriott, K., and Mears, C. (2013). Edge Compression Techniques for Visualization of Dense Directed Graphs. IEEE Trans. Vis. Comput. Graphics 19, 2596–2605. doi:10.1109/TVCG.2013.151
Fiedler, M. (1973). Algebraic Connectivity of Graphs. Czech. Math. J. 23, 298–305. doi:10.21136/cmj.1973.101168
Gansner, E. R., and North, S. C. (2000). An Open Graph Visualization System and its Applications to Software Engineering. Softw. Pract. Exper. 30, 1203–1233. doi:10.1002/1097-024x(200009)3011<1203::aid-spe338>3.0.co;2-n
Gao, H., and Ji, S. (2019). Graph U-Nets. In International Conference on Machine Learning, 2083–2092.
Goller, C., and Kuchler, A. (1996). Learning Task-dependent Distributed Representations by Backpropagation through Structure. ICNN.
Gori, M., Monfardini, G., and Scarselli, F. (2005). A New Model for Learning in Graph Domains. ICNN.
Huang, J., Li, Z., Li, N., Liu, S., and Li, G. (2019). AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism. In Proceedings of the IEEE International Conference on Computer Vision. 6480–6489.
Kersting, K., Kriege, N. M., Morris, C., Mutzel, P., and Neumann, M. (2016). Benchmark Data Sets for Graph Kernels.
Kingma, D. P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization (arXiv preprint arXiv:1412.6980.
Kipf, T. N., and Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks (arXiv preprint arXiv:1609.02907.
Lee, J., Lee, I., and Kang, J. (2019). Self-Attention Graph Pooling. In International Conference on Machine Learning. 3734–3743.
Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2015). Gated Graph Sequence Neural Networks (arXiv:1511.05493.
Luzhnica, E., Day, B., and Lio, P. (2019). Clique Pooling for Graph Classification (arXiv preprint arXiv:1904.00374.
Ma, Y., Wang, S., Aggarwal, C. C., and Tang, J. (2019). Graph Convolutional Networks with EigenPooling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 723–731.
McInnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints.
Nobre, C., Meyer, M., Streit, M., and Lex, A. (2019). The State of the Art in Visualizing Multivariate Networks. Comp. Graphics Forum 38, 807–832. doi:10.1111/cgf.13728
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order To the Web. Tech. Rep. Stanford InfoLab.
Ranjan, E., Sanyal, S., and Talukdar, P. P. (2019). ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations (arXiv preprint arXiv:1911.07979.
Scarselli, F., Gori, M., Ah Chung Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009). Computational Capabilities of Graph Neural Networks. IEEE Trans. Neural Netw. 20, 81–102. doi:10.1109/TNN.2008.2005141
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. (2008). Collective Classification in Network Data. AIMag 29, 93. doi:10.1609/aimag.v29i3.2157
Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K., and Borgwardt, K. M. (2011). Weisfeiler-lehman Graph Kernels. J. Machine Learn. Res. 12, 2539–2561.
Singh, G., Mémoli, F., and Carlsson, G. E. (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3d Object Recognition. SPBG 91, 100.
Tenenbaum, J. B., Silva, V. d., and Langford, J. C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290, 2319–2323. doi:10.1126/science.290.5500.2319
van den Elzen, S., and van Wijk, J. J. (2014). Multivariate Network Exploration and Presentation: From Detail to Overview via Selections and Aggregations. IEEE Trans. Vis. Comput. Graphics 20, 2310–2319. doi:10.1109/tvcg.2014.2346441
van der Maaten, L., and Hinton, G. (2008). Visualizing Data Using T-SNE. J. Machine Learn. Res. 9, 2579–2605.
Veličković, P., Fedus, W., Hamilton, W. L., Liò, P., Bengio, Y., and Hjelm, R. D. (2018). Deep Graph Infomax (arXiv preprint arXiv:1809.10341.
von Landesberger, T., Kuijper, A., Schreck, T., Kohlhammer, J., van Wijk, J. J., Fekete, J.-D., et al. (2011). Visual Analysis of Large Graphs: State-Of-The-Art and Future Research Challenges. Comp. Graphics Forum 30, 1719–1749. doi:10.1111/j.1467-8659.2011.01898.x
Wattenberg, M. (2006). Visual Exploration of Multivariate Graphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery), CHI ’06, 811–819. doi:10.1145/1124772.1124891
Yang, Z., Cohen, W. W., and Salakhutdinov, R. (2016). Revisiting Semi-supervised Learning with Graph Embeddings. ICML.
Appendix
A Model Architecture and Hyperparameters.
We additionally performed a hyperparameter search for DiffPool on hidden sizes
• all datasets: cover sizes
• D&D: learning rate
• Proteins: learning rate
Keywords: mapper, graph neural networks, pooling, graph summarization, graph classification
Citation: Bodnar C, Cangea C and Liò P (2021) Deep Graph Mapper: Seeing Graphs Through the Neural Lens. Front. Big Data 4:680535. doi: 10.3389/fdata.2021.680535
Received: 14 March 2021; Accepted: 24 May 2021;
Published: 16 June 2021.
Edited by:
Umberto Lupo, École Polytechnique Fédérale de Lausanne, SwitzerlandReviewed by:
Mustafa Hajij, Santa Clara University, United StatesStanislav Schmidt, Ecole polytechnique fédérale de Lausanne (EPFL), Switzerland
Copyright © 2021 Bodnar, Cangea and Liò. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Cristian Bodnar, cb2015@cam.ac.uk; Cătălina Cangea, ccc53@cam.ac.uk