- 1Department of Mathematics, University of Pisa, Pisa, Italy
- 2Institute of Information Science and Technologies “A. Faedo”, National Research Council of Italy (CNR), Pisa, Italy
- 3Department of Mathematics, University of Bologna, Bologna, Italy
- 4Alma Mater Research Center on Applied Mathematics, University of Bologna, Bologna, Italy
- 5Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, Bologna, Italy
- 6Research Centre on Electronic Systems for the Information and Communication Technology, University of Bologna, Bologna, Italy
- 7ENEA Centro Ricerche Bologna, Bologna, Italy
Group Equivariant Operators (GEOs) are a fundamental tool in the research on neural networks, since they make available a new kind of geometric knowledge engineering for deep learning, which can exploit symmetries in artificial intelligence and reduce the number of parameters required in the learning process. In this paper we introduce a new method to build non-linear GEOs and non-linear Group Equivariant Non-Expansive Operators (GENEOs), based on the concepts of symmetric function and permutant. This method is particularly interesting because of the good theoretical properties of GENEOs and the ease of use of permutants to build equivariant operators, compared to the direct use of the equivariance groups we are interested in. In our paper, we prove that the technique we propose works for any symmetric function, and benefits from the approximability of continuous symmetric functions by symmetric polynomials. A possible use in Topological Data Analysis of the GENEOs obtained by this new method is illustrated.
1. Introduction
In recent years, the theory of equivariant operators has become a topic of great interest to the scientific community, since these operators allow to make explicit the use of symmetries in deep learning and artificial intelligence (Mallat, 2012, 2016; Bengio et al., 2013; Zhang et al., 2015; Anselmi et al., 2016, 2019; Cohen and Welling, 2016; Worrall et al., 2017), thereby reducing the number of parameters required in the learning process. In particular, group equivariant non-expansive operators (GENEOs) have been recently proposed as elementary components for building new kinds of neural networks, benefiting from good mathematical properties, such as compactness and convexity, under suitable assumptions on the space of data and with respect to the choice of appropriate topologies (Bergomi et al., 2019). In particular, compactness guarantees total-boundedness, i.e., for every ε > 0 we can find a finite set of GENEOs such that any other GENEO has a distance less than ε from at least one Fi. This property opens the way to the search for methods to effectively build such a representative set , leading us to look for new techniques to produce GENEOs.
GENEOs are grounded in Topological Data Analysis (TDA) and allow to shift the attention from the data to the observers who process them, and to the properties of invariance and simplification associated with those observers. The use of these operators is justified by the fact that in most of the cases we are not directly interested in data, but in approximating the experts' behavior in the presence of given data (Frosini, 2016). Since different agents can have different reactions in the presence of the same data, it is clear that data analysis has to be based on each pair (data, observer) rather than on data alone. From the point of view of AI, the focus on GENEOs corresponds to the rising interest in the so called “explainable deep learning” (Rudin, 2019; Carrieri et al., 2021; Hicks et al., 2021), which looks for methods and techniques that can be understood by humans.
GENEOs transform data according to two properties. First of all, they are equivariant with respect to the action of a given transformation group, i.e., they commute with such a group. Secondly, they do not increase the distance between data. This kind of regularity is frequently found in applications, since in several cases the operators we use are required to simplify the metric structure of data. We can obviously imagine particular applications where this condition is locally violated, but the usual long-term goal is to produce representations that are much simpler and more meaningful than the original data, thereby leading us to assume that the considered (compositions of any sufficiently long chain of) operators are non-expansive. This assumption is not only of use to simplify the information we have to manage, but it is also fundamental in the proof that the space of group equivariant non-expansive operators is compact (and hence finitely approximable), provided that the space of data is compact with respect to a suitable topology (Bergomi et al., 2019). This statement becomes false if we renounce non-expansivity.
The use of GENEOs is not limited to machine learning. Another important reason for the study of these operators follows from the relationship between GENEOs and TDA. We indeed know that TDA and Persistent Homology allow for a qualitative and efficient geometric study of the data space, but suffer from some important limitations, since Persistent Homology alone is not able to distinguish between some functions. Fortunately, the joint use of TDA and GENEOs overcomes this difficulty in the discrimination of data (Frosini, 2016; Frosini and Jabłoński, 2016; Bergomi et al., 2019). In other words, GENEOs are able to preserve information on the data that would have been lost through TDA alone.
Another interesting aspect of GENEOs is that we can also look at them as operators that change the pseudo-metrics we use in data comparison. If the real-valued functions φ1, φ2 represent the data we have to compare and a GENEO F is given, we can replace the max-norm distance ||φ1 − φ2||∞ with the new pseudo-metric . In this approach, F is not seen as a map that transforms the data we are considering, but as a new way of comparing data. We will see in section 6 that the availability of non-linear GENEOs can indeed produce more flexible pseudo-metrics.
Last but not least, a theory of GENEOs could be a relevant tool in the investigation of the role of internal conflicts in AI. We know that the availability of procedures that emulate intelligence opens the way to the appearance of contradictions, conflicts and unexpected behaviors (Frosini, 2009). This phenomenon cannot be ignored in the mathematical study of AI. The use of a precise geometric formalization of components in machine learning could be of great help in facing and analyzing this emerging problem.
However, the main reason for the research about GENEOs follows from a shift of interest from the spaces of data to the topological and geometric analysis of the spaces of observers of the data. This fact naturally leads us to the problem of the efficient approximation of observers. Such an approximation requires to make available large and dense sets of GENEOs, each one representing a possible data-observer interaction. Therefore, since non-linear interactions between observers and data are of great importance in applications, new techniques to build non-linear GENEOs are needed. The main contribution of this paper consists in introducing a new method to produce non-linear GENEOs through the concepts of symmetric function and permutant, thereby extending the procedure illustrated in Botteghi et al. (2020) for the building of linear GENEOs. In this way, we strictly expand the set of operators we can use in applications.
The concept of permutant comes into play when a set Φ of functions from a space X to ℝ and a group G of permutations of X are given. The set Φ represents the space of signals we are interested in, and is assumed to be preserved by right composition with elements of G. If two signals φ1, φ2 ∈ Φ are obtained from each other by right composition with an element g ∈ G, we say that they are equivalent with respect to G, just as happens when two images are considered equivalent if there exists an isometry changing one into the other. In this setting, a permutant is defined as a finite set H of Φ-preserving permutations of X that is stable under the conjugation action h ↦ g ◦ h ◦ g−1 of any element of G on H (Camporesi et al., 2018).
This paper shows that when a symmetric function and a permutant for the equivariance group G are available, we can easily build a (non-linear) GENEO with respect to G (section 3). This fact justifies the theoretical and practical importance of permutants. Our long-term purpose is the one of developing an effective theory for the approximation of observers and agents via GENEOs in a topological-geometrical setting, so extending the use of these operators in deep learning. While this goal is challenging, we think that our approach could lead to think of GENEOs as elementary components in the building of a new kind of neural networks. This idea is justified by at least two reasons. First of all, deep learning could benefit from using components that are guaranteed to be equivariant with respect to given groups of transformations and are grounded in a well founded topological theory, thereby allowing neural nets to save time in the learning process and to take advantage of techniques developed in TDA. Secondly, an engineering based on GENEOs would be much more transparent, because of the intrinsic interpretability of its components.
The reader could wonder why building GENEOs via permutants should be better than building them by other methods (for example by integrating on the equivariance group G). The key point is that in many applications some permutants exist, whose size is much smaller than the size of the equivariance group. In these cases, the approaches based on permutants can be much simpler than the ones based on G. We observe that permutants encode part of the information represented by the data equivalence expressed by G. Of course, by deciding to build GENEOs via permutants we implicitly accept to lose some information about such a data equivalence, and make a compromise between the computational complexity and the analytical power of the operators we are interested in. The reader can understand this tradeoff by thinking about the limit case given by a permutant containing just the identical permutation id of X. While the singleton {id} is indeed a (trivial) permutant, it does not give any information about the equivariance group G we are considering, since {id} is a permutant for any group of Φ-preserving permutations of X. However, if we consider a larger and larger set H of Φ-preserving permutations of X, the set of groups admitting H as a permutant becomes smaller and smaller. In other words, larger permutants make easier the identification of G.
This article is part of an extensive research on permutants. In Botteghi et al. (2020) it has been proved that each linear G-equivariant non-expansive operator can be produced by a weighted summation associated with a suitable weighted permutant, provided that the group G transitively acts on a finite signal domain. This paper opens the way to the research about the natural conjecture that each non-linear G-equivariant non-expansive operator can be produced (or at least well approximated) by applying our new technique to suitable symmetric functions and permutants, provided that the group G transitively acts on a finite signal domain. This probably non-trivial problem will be attacked in following papers, grounding on the results obtained in this article.
The outline of the paper is as follows. In section 2, we recall the main definitions in our mathematical setting. In Section 3, we show how to associate a group equivariant operator (GEO) with a symmetric function. Section 4 is devoted to the approximation of a generic continuous symmetric function by a polynomial in the elementary symmetric functions, and in section 5, we finally show how to associate a GENEO with such a polynomial. Section 6 highlights the benefits of our approach.
For more details and proofs about the results and concepts illustrated in section 2 we refer the interested reader to the papers (Frosini, 2016; Frosini and Jabłoński, 2016; Frosini and Quercioli, 2017; Camporesi et al., 2018; Bergomi et al., 2019). The other sections present our new results about the construction of non-linear GENEOs via symmetric functions and permutants.
2. Mathematical Setting
Let X be a non-empty set and consider a non-empty, compact subspace Φ of the normed vector space , where is the set of all bounded real-valued functions with domain X, and . We can think of the functions in Φ as the data, i.e., the measurements provided by our measuring instruments (or by any operator), and of X as the space where the measurements are made. Sometimes the functions in Φ are also referred as admissible filtering functions or admissible signals. We now recall the usual setting for the introduction of group equivariant non-expansive operators. We endow Φ with the topology induced by the uniform convergence distance
At this stage, X is only a set. We endow X with the topology induced by the pseudo-metric
The idea behind this definition is that two points x1, x2 ∈ X are considered different only if they are taken to different values by at least one admissible filtering function.
We recall that a pseudo-metric space is a generalization of a metric space in which the distance between two distinct points can be zero. Moreover, a function f from a pseudo-metric space (P1, d1) to a pseudo-metric space (P2, d2) is called non-expansive if
for every x, y ∈ P1.
Remark 2.1. Every function φ ∈ Φ is non-expansive with respect to the pseudo-metric DX on X and the Euclidean metric on ℝ. Therefore, each function φ ∈ Φ is continuous with respect to these topologies.
Since Φ is compact, the topology induced by the pseudo-metric DX coincides with the initial topology τin on X with respect to Φ (see Theorem 2.1 in Bergomi et al., 2019, Supplementary Methods). We recall that the initial topology is the coarsest topology on X which makes each function in Φ continuous. Moreover, the compactness of Φ implies that if X is complete then it is also compact (see Theorem 2.2 in Bergomi et al., 2019, Supplementary Methods). In this work, we assume that X is complete, and therefore compact with respect to the topology induced by DX. The image of X through the filtering functions is denoted by Im(Φ) and is defined as
The following result will be of use in section 5.
Proposition 2.2. If X and Φ are compact, Im(Φ) is compact with respect to the Euclidean topology.
Proof. Let us consider the function γ : Φ × X → ℝ such that γ(φ, x) : = φ(x). The space Φ × X is compact with respect to the product topology, which we recall is induced by the sum pseudo-distance. Since the continuous image of a compact is compact, γ(Φ × X) = Im(Φ), and every non-expansive function is continuous, it is sufficient to prove that γ is non-expansive. Given φ1, φ2 ∈ Φ and x1, x2 ∈ X, we have that
We have proved that γ is non-expansive, and therefore Im(Φ) is compact.
Definition 2.3. Chachólski et al. (2020) A Φ-operation is a function g : X → X such that, for every φ ∈ Φ, the composition φ ◦ g also belongs to Φ.
Definition 2.4. A Φ-operation g is invertible if there is a Φ-operation h such that g ◦ h = h ◦ g = idX.
We denote the collection of all invertible Φ-operations by AutΦ(X). In other words,
We note that AutΦ(X) is a group with respect to the usual composition operation.
Definition 2.5. A perception pair is an ordered pair (Φ, G) where and G is a subgroup of AutΦ(X).
As an example, (Φ, AutΦ(X)) is always a perception pair.
Remark 2.6. When a perception pair (Φ, G) is given, each element g ∈ G acts on the set Φ by right composition, taking each function φ ∈ Φ to the function φ ◦ g.
2.1. Group Equivariant Non-Expansive Operators
Definition 2.7. Let us consider two perception pairs (Φ, G), (Ψ, H) and a homomorphism T : G → H. Each map F : Φ → Ψ such that F is T-equivariant (i.e., F(φ ◦ g) = F(φ) ◦ T(g) for every φ ∈ Φ, g ∈ G) is called a Group Equivariant Operator (GEO) with respect to T.
Definition 2.8. Let us consider two perception pairs (Φ, G), (Ψ, H) and a homomorphism T : G → H. Each map F : Φ → Ψ such that F is T-equivariant (i.e., F(φ ◦ g) = F(φ) ◦ T(g) for every φ ∈ Φ, g ∈ G) and non-expansive (i.e., ||F(φ) − F(ψ)||∞ ≤ φ − ψ∞ for every φ, ψ ∈ Φ) is called a Group Equivariant Non-Expansive Operator (GENEO) with respect to T.
After fixing two perception pairs (Φ, G), (Ψ, H) and a homomorphism T : G → H, we will use the symbol to denote the collection of all GENEOs with respect to T between such perception pairs. We endow with the topology induced by the metric . For a more in-depth study of the GENEO topology, we refer the reader to Bergomi et al. (2019). We stress that the non-expansivity of the operators is pivotal for two reasons. The first reason is that we want our operators to simplify the data metric, i.e., not to introduce complexity into the data. The second reason is that non-expansivity allows us to prove the compactness of the space , provided that Φ and Ψ are compact with respect to the distances DΦ, DΨ (see Theorem 7 in Bergomi et al., 2019). If we remove the assumption that our operators are non-expansive, this property of compactness does not hold anymore. As an example, let Φ = Ψ be equal to the set of all constant functions from ℝ to [0, 1], and G = H be the trivial group containing just the identity permutation of ℝ. We observe that Φ, X = ℝ and G are compact with respect to the topologies we have defined on them. Let us now consider the sequence (Fn) of GEOs from Φ to Φ with respect to the identity homomorphism idG : G → G, defined by setting for every function φ ∈ Φ and every positive integer n. It is easy to check that for every positive integer m, and hence the sequence (Fn) does not admit any converging subsequence. This implies that the space of all GEOs from Φ to Φ with respect to idG is not compact. The compactness of is a key property in applications, since it guarantees that such a space can be approximated by a finite set.
If G = H and T = idG, we can say that is a G-equivariant map. From now on, we will make these assumptions, and use the terms GEO and GENEO with reference to this setting.
2.2. Permutants
Definition 2.9. Let SX be the set of permutations of X. For each g ∈ G, the map cg : SX → SX taking each s ∈ SX to g ◦ s ◦ g−1 is called the conjugation action of g ∈ G on SX. For every subset H of SX, we denote the set cg(H) by the symbol gHg−1.
Definition 2.10. Camporesi et al. (2018) A finite set H ⊆ AutΦ(X) is called a permutant for G if either H = ∅ or gHg−1 = H for every g ∈ G.
Remark 2.11. In general, a permutant is not a normal subgroup of G. Indeed we require neither that H is a group nor that H is a subset of G. We observe that the sets ∅ and {idX} are trivial permutants for any subgroup G of AutΦ(X). Both G and AutΦ(X) are also permutants for G, provided that they are finite groups.
Example 2.12. Let Φ be the set of all functions φ : X = S1 = {(x, y) ∈ ℝ2|x2 + y2 = 1} → [0, 1] that are non-expansive with respect to the Euclidean distances on S1 and [0, 1]. Let us consider the group G of all isometries of ℝ. If h is the clockwise rotation of ℓ radians for a fixed ℓ ∈ ℝ, then the set H = {h, h−1} is a permutant for G.
Example 2.13. Let Φ be the set of all functions φ : X = S1 = {(x, y) ∈ ℝ2|x2 + y2 = 1} → [0, 1] that are non-expansive with respect to the Euclidean distances on S1 and [0, 1]. Let G be the group generated by the reflection with respect to the axis x = 0. If ρ is the clockwise rotation of π/2 around the origin (0, 0), then the set is a permutant for G.
Example 2.14. Let us consider the set X of the vertices of a cube in ℝ3, and assume that Φ is the set of all functions from X to [0, 1]. Let G be the group of the orientation-preserving isometries of ℝ3 that take X to X. Let π1, π2, π3 be the three planes that contain the center of mass of X and are parallel to a face of the cube. Let hi : X → X be the orthogonal symmetry with respect to πi, for i ∈ {1, 2, 3}. We have that the set H = {h1, h2, h3} is an orbit under the conjugation action of G on AutΦ(X), and therefore a permutant for G.
Remark 2.15. If the group G is Abelian, every finite subset of G is a permutant for G, since the conjugation action of G on AutΦ(X) is just the identity.
Remark 2.16. In this section, the symbol || · ||∞ has been used to denote the max-norm of functions. With a slight abuse of notation, in the rest of the paper such a symbol will be also used to denote the max-norm of points of ℝm, i.e., .
3. Building GEOs From Symmetric Functions
Definition 3.1. Let C be a symmetric subset of ℝn, i.e., a subset C such that π(C) = C for every permutation π of the coordinates. A function f : C → ℝ is said to be symmetric on C if its value is the same no matter the order of its arguments. That is,
for every (a1, …, an) ∈ C and every permutation π of the set {1, …, n}.
Proposition 3.2. Let f be a continuous real-valued symmetric function defined on a compact symmetric subset K of ℝn. Then f is the restriction of a continuous real-valued symmetric function defined on ℝn.
Proof. The Tietze Extension Theorem (Dugundji, 1966) implies that f can be extended to a continuous function . If Sn is the symmetric group over the set {1, …, n}, we can easily check that the function has the wanted property.
Proposition 3.2 guarantees that the concept of continuous real-valued symmetric function defined on a compact symmetric subset K of ℝn coincides with the concept of restriction to K of a continuous real-valued symmetric function defined on ℝn.
Let be a symmetric function. If is a non-empty permutant for G ⊆ AutΦ(X), then we can define an operator by setting, for any φ ∈ Φ,
where for every x ∈ X.
Proposition 3.3. If is a symmetric function and G ⊆ AutΦ(X), then is a GEO from Φ to with respect to the identity homomorphism idG : G → G.
Proof. For every g ∈ G, it holds that
where πg is the permutation such that , that is g ◦ hi = hπg(i) ◦ g. Therefore, , and hence is a GEO.
Corollary 3.4. If is a symmetric function and its restriction to Im(Φ)n is non-expansive, then is a GENEO from Φ to with respect to idG.
Proof. It is sufficient to prove the non-expansivity of with respect to the max-norms on Φ and , since the group equivariance is already granted by Proposition 3.3. If φ, ψ ∈ Φ, then for every x ∈ X
In conclusion, and is a GENEO.
So far we have shown how to construct GEOs associated with a symmetric function. These operators are actually GENEOs if the function they are associated with is non-expansive. In the next sections, we will show how to adapt this concept to build GENEOs even in the presence of symmetric functions that are not non-expansive.
We stress that our approach requires no integration over the (possibly infinite and large) group G, but just the availability of a permutant and the computation of a symmetric function. This approach generalizes the method introduced in Camporesi et al. (2018), concerning the symmetric function .
4. Approximating Symmetric Functions With Symmetric Polynomials
Let us now explore the concept of approximation of symmetric functions by symmetric polynomials. For more details, we refer the reader to Davidson and Donsig (2009), Blum-Smith and Coskey (2017). In the sequel, we will denote the symmetric group over the set {1, …, n} as Sn. Let K be a compact metric space, and be the vector space of continuous real-valued functions on K. With a slight abuse of notation, in the following we will confuse each polynomial with the function it represents, restricted to the domain we are considering. Furthermore, if I is a finite subset of ℕn, we will say that a polynomial is symmetric if π(I) = I, and ck1, …, kn = cπ(k1), …, π(kn) for every multi-index (k1, …, kn) ∈ I and every permutation π ∈ Sn.
Definition 4.1. Davidson and Donsig (2009) A subset A of is an algebra if it is a vector subspace of that is closed under multiplication (i.e., if f, g ∈ A then f · g ∈ A). A set S of functions on K separates points if for each pair of points s, t ∈ K there is a function f ∈ S such that f(s) ≠ f(t). A set S of functions on K vanishes at s ∈ K if f(s) = 0 for all f ∈ S.
Theorem 4.2 (Stone - Weierstrass Theorem). Davidson and Donsig (2009) An algebra A of continuous real-valued functions on a compact metric space K that separates points and does not vanish at any point is dense in with respect to the max-norm referred to the domain K.
Corollary 4.3. Davidson and Donsig (2009) Let K be a compact subset of ℝn. The algebra of all polynomials p(y1, …, yn) in n variables is dense in with respect to the max-norm referred to the domain K.
This theorem allows us to approximate a continuous symmetric function by a polynomial with arbitrary accuracy, provided that K is a compact subset of ℝn. However, this is not exactly what we need, as we would like such a polynomial to be symmetric. This can be obtained by a symmetrization of the previously found polynomial, as shown by the next proposition, which proves that any symmetric continuous function on a compact and symmetric domain can be approximated with arbitrary precision by a symmetric polynomial.
Proposition 4.4. Let K be a compact subset of ℝn, verifying the property π(K) = K for every π ∈ Sn. If is the restriction to K of a continuous symmetric function and || · ||∞ is the max-norm referred to the domain K, then for every ε > 0 there exists a symmetric polynomial q in n variables such that .
Proof. From Corollary 4.3 it follows that there exists a polynomial p : ℝn → ℝ such that . Let us now define the symmetric polynomial . If a = (a1, …, an) ∈ K, we define aπ = (aπ(1), …, aπ(n)) for every permutation π ∈ Sn. Then
Definition 4.5. The elementary symmetric polynomials in the n variables a1, …, an, also called elementary symmetric functions, are defined as:
We now recall an important result in the theory of symmetric polynomials:
Theorem 4.6. (Fundamental Theorem on Symmetric Polynomials). Blum-Smith and Coskey (2017) Any symmetric polynomial in n variables a1, …, an is representable in a unique way as a polynomial in the elementary symmetric polynomials σ1, …, σn.
Remark 4.7. It is a well know fact (see Rao, 2005; Davidson and Donsig, 2009; Blum-Smith and Coskey, 2017) that the proofs of Theorems 4.2 and 4.6 are constructive. This means that, if K is a compact symmetric subset of ℝn, and the restriction of a continuous symmetric function is given, we are able to effectively approximate with an error less than ε by the restriction to K of an explicitly defined polynomial in the elementary symmetric functions.
In conclusion, if an equivariance group G is chosen and a GEO F is built by applying Proposition 3.3 to the continuous symmetric function , we can approximate F in the following way, provided that X and Φ are compact. First of all, we can approximate the continuous function by a polynomial p : ℝn → ℝ, with an arbitrarily small error ε on the symmetric set Im(Φ)n, which is guaranteed to be compact by Proposition 2.2. Then, we can consider the symmetric polynomial . Finally, we can consider the GEO F′ defined by setting for every φ ∈ Φ. Since H ⊆ AutΦ(X), for any φ ∈ Φ, and hence the operator F′ can be chosen arbitrarily close to F.
5. Building GENEOs From Polynomials in the Elementary Symmetric Functions
Proposition 2.2 shows that Im(Φ) is compact. Moreover, the equality π (Im(Φ)n) = Im(Φ)n trivially holds for every π ∈ Sn. Therefore, Proposition 4.4 and the Fundamental Theorem on Symmetric Polynomials guarantee that the restriction to Im(Φ)n of any continuous symmetric functions can be approximated arbitrarily well by the restriction to Im(Φ)n of a polynomial in the elementary symmetric functions, defined as
where mi ∈ ℕ for every i ∈ {1, …, n}, ck1, …, kn ∈ ℝ for every k1 ∈ {0, …, m1}, …, kn ∈ {0, …, mn} and σi is the i-th elementary symmetric polynomial for every i ∈ {1, …, n}. From Proposition 3.3, we already know that the associated operator is a GEO. We can indeed obtain a GENEO by applying Corollary 3.4 to a suitable multiple of . In the sequel, we will need the following constants:
Let us consider a non-empty permutant for G ⊆ AutΦ(X). We can define an operator by setting
for any φ ∈ Φ, where for every x ∈ X and C is the constant defined in (5.4).
Theorem 5.1. If is a polynomial in the n elementary symmetric functions, then is a GENEO from Φ to with respect to idG.
Proof. The thesis immediately follows from Corollary 3.4, once it is proved that the restriction of to Im(Φ)n is non-expansive. For every , by applying Lemma 1.5 in Appendix 1, we have that
We have shown that is non-expansive and therefore the associated operator is a GENEO.□
Example 5.2. Let us consider the setting of Example 2.12 and the polynomial . Then the operator
is a GENEO.
Example 5.3. Let us consider the setting of Example 2.13 and the polynomial . In this case, the operator
is a GENEO.
Example 5.4. Let us consider the setting of Example 2.14 and the polynomial . Then the operator
is a GENEO.
Remark 5.5. It is worth noticing that if we replace the constant C, defined as in (5.4), with any constant , the operator defined as in Theorem 5.1 is still a GENEO. However, it should be specified that the larger the constant is, the more difficult it becomes to distinguish different signals, since their distance is smaller. For this reason, a larger means that more information is lost when applying the GENEO. Nonetheless, in some cases rough constants which are easier to compute may be preferred. For this reason, we present here a possible alternative to C. Let and , where is defined as in (5.1). We define the constant
We can easily show that . Therefore, the operator , obtained by replacing C with in the definition of and applying Theorem 5.1, is still a GENEO. In Example 5.2, , which is far larger than C = 40. In Example 5.3, , while C = 1, 040. Finally, in Example 5.4 and C = 162. We stress that the constant C defined in (5.4) is optimal if we do not add any further assumption. We can realize this by applying Theorem 5.1 to the symmetric function in just one variable, provided that Φ is the collection of all non-expansive functions from X: = [0, 1] to itself, and we set both G and the permutant H equal to the trivial group containing only the identity of X. On the one hand, it can be easily checked that in this case the GEO defined by applying Proposition 3.3 is the identity map, and hence a GENEO. Therefore, any constant C′ that could replace C in the definition of must be not smaller than 1 in order to preserve the non-expansivity of . On the other hand, we can immediately see that . It follows that C′ ≥ C.
6. GENEOs Increase Our Ability to Distinguish Data
In this section, we illustrate a few examples showing that our approach can produce useful non-linear GENEOs and increase our ability to distinguish data. As already discussed in the Introduction, it is of fundamental importance to make a large number of GENEOs available in machine learning, as each of them models a data-observer pair. The results presented in this paper could be of great help in the task of extending the set of available GENEOs and the consequent possibilities of using them in applications.
The next example shows that our new method indeed extends the approach introduced in Botteghi et al. (2020).
Example 6.1. Let us set X = {1, 2, 3} and Φ equal to the set of all functions from X to [0, 1]. In this case, AutΦ(X) = S3, and we define G = S3. We now consider the symmetric function and the permutant H = {(1, 2, 3), (1, 3, 2)} =:{h1, h2}. From Theorem 5.1 we get that is a GENEO. Such an operator is not linear, and hence it cannot be obtained by the method described in Botteghi et al. (2020).
In next Example 6.2 we illustrate the synergy between GENEOs and TDA. TDA is mainly grounded on Persistent Homology (PH), which is an algebraic topological theory devised to describe “holes” in geometrical data, focusing on their persistence under the action of noise. In particular, TDA takes benefit from the topological comparison of data by means of persistence diagrams, which are the main tools in PH. For more details about TDA and PH, we refer the interested reader to Edelsbrunner and Harer (2008), Edelsbrunner and Morozov (2013). Example 6.2 shows that the use of new GENEOs increases our ability of distinguishing data by persistence diagrams.
Example 6.2. Let us consider the following functions: φ(x) = |sin(x)| and ψ(x) = sin(x)2 ∈ Φ, where Φ is the space of all 1-Lipschitz functions from the unit circle S1 to [0, 1] and the invariance group G is composed of all rotations of S1. If we are looking at Φ only through Persistent Homology, then φ and ψ are indistinguishable, since they induce the same persistence diagram (see Figure 1). Let us now consider the GENEOs F1 = id : Φ → Φ and , where is the clockwise rotation through a angle. We highlight the fact that the joint use of TDA, F1 and F2 does not allow us to distinguish φ from ψ, since the functions F1(φ) and F1(ψ) have the same persistence diagram, and the same happens for F2(φ) and F2(ψ) (see Figures 2, 3). However, if we consider the symmetric function σ2 and the permutant , from Theorem 5.1 we get that is a non-linear GENEO. We observe that the joint use of TDA and allows us to distinguish φ and ψ, since the persistence diagrams of the functions and are different from each other (see Figure 4).
Figure 4. and have different persistence diagrams, and hence they are distinguishable by Persistent Homology. The bottleneck distance between their persistence diagrams is 0.0625.
Finally, we stress another important aspect of the GENEOs. The use of GENEOs can be seen as a methodology for changing the max-norm metric that we use to compare functions in Φ. We can indeed change the distance ||φ1 − φ2||∞ into the pseudo-metric ||F(φ1) − F(φ2)||∞, where F is a GENEO. In Example 6.3 we show how the pseudo-metrics associated with non-linear GENEOs are much more flexible than those generated by linear operators, thus guaranteeing a wider range of applications.
Example 6.3. Let us consider Φ as the set of all functions from the set with just two elements X = {A, B} to [0, 1]. The functions in Φ can be described by ordered pairs (φ(A), φ(B)). In this setting, the group G is composed of the permutations of two elements. As usual, in Φ we have the metric DΦ(φ1, φ2) = ||φ1 − φ2||∞, therefore the distance between the functions (0, 0), (0, 1), (1, 0), (1, 1) is always 1. Suppose that we need a GENEO F such that the pseudo-metric ||F(φ1) − F(φ2)||∞ vanishes between functions with a null component, while maintaining positive the distance between (1, 1) and the other three functions belonging to Φ. No linear GENEO can induce such a pseudo-metric, since if a linear transformation maps (1, 0) and (0, 1) to (0, 0), it must also map (1, 1) to (0, 0). It is worth noticing that through the GENEO associated with the elementary symmetric function σ2(a1, a2) = a1 · a2 and with the permutant H = G, we can obtain a pseudo-metric with the desired property.
7. Conclusions
In our paper, we have introduced a new method to build GENEOs, grounded on the concepts of symmetric function and permutant. Our main goal is the one of building a good theory of GENEOs, making available methods to define and use these operators in machine learning. The main advantage of our approach is the fact that it requires no integration over the (possibly infinite and large) group G, but just the availability of a permutant and the computation of a symmetric function. Many lines of research still remain to be explored in this field. For example, the reader can observe that in section 3 the requirement that the function is symmetric could be weakened without losing the property that is a GEO. It would be indeed sufficient to assume that is invariant when we apply to its argument any permutation corresponding to the permutation of H associated with the conjugation action h ↦ g ◦ h ◦ g−1 that is defined by any g ∈ G. We have decided to postpone the research concerning this extension of the theory, since it would have added a dependence on H of the choice of , thereby introducing some technicalities. Another line of research that we would like to explore in the future is the possibility of adapting our approach to permutant measures, which are a sort of extension of the concept of permutant to the case that H has infinite cardinality. However, the most challenging problem we will have to face is likely to be the proof or disproof of the natural conjecture that each non-linear G-equivariant non-expansive operator can be produced (or at least well approximated) by applying our new technique to suitable symmetric functions and permutants, provided that the group G transitively acts on a finite signal domain. We plan to devote some subsequent papers to these topics.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.
Author Contributions
PF devised the project. All authors contributed to the manuscript. All authors read and approved the final manuscript.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
This research has been partially supported by INdAM-GNSAGA. FC thanks Davide Moroni and Maria Antonietta Pascali for their helpful advise and support.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frai.2022.786091/full#supplementary-material
References
Anselmi, F., Evangelopoulos, G., Rosasco, L., and Poggio, T. (2019). Symmetry-adapted representation learning. Pattern Recogn. 86, 201–208. doi: 10.1016/j.patcog.2018.07.025
Anselmi, F., Rosasco, L., and Poggio, T. (2016). On invariance and selectivity in representation learning. Inform. Inference J. IMA 5, 134–158. doi: 10.1093/imaiai/iaw009
Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. doi: 10.1109/TPAMI.2013.50
Bergomi, M. G., Frosini, P., Giorgi, D., and Quercioli, N. (2019). Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning. Nat. Mach. Intell. 1, 423–433. doi: 10.1038/s42256-019-0087-3
Blum-Smith, B., and Coskey, S. (2017). The fundamental theorem on symmetric polynomials: history's first whiff of Galois theory. College Math. J. 48, 18–29. doi: 10.4169/college.math.j.48.1.18
Botteghi, S., Brasini, M., Frosini, P., and Quercioli, N. (2020). On the finite representation of group equivariant operators via permutant measures. arXiv preprint arXiv:2008.06340.
Camporesi, F., Frosini, P., and Quercioli, N. (2018). “On a new method to build group equivariant operators by means of permutants,” in 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), eds A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl (Hamburg: Springer International Publishing), 265–272. doi: 10.1007/978-3-319-99740-7_18
Carrieri, A. P., Haiminen, N., Maudsley-Barton, S., Gardiner, L.-J., Murphy, B., Mayes, A. E., et al. (2021). Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences. Sci. Rep. 11:4565. doi: 10.1038/s41598-021-83922-6
Chachólski, W., Gregorio, A. D., Quercioli, N., and Tombari, F (2020). Landscapes of data sets and functoriality of persistent homology. arXiv preprint arXiv:2002.05972.
Cohen, T., and Welling, M. (2016). “Group equivariant convolutional networks,” in International Conference on Machine Learning (New York, NY: ICML), 2990–2999.
Davidson, K., and Donsig, A. (2009). Real Analysis and Applications: Theory in Practice. Undergraduate Texts in Mathematics. New York, NY: Springer. doi: 10.1007/978-0-387-98098-0
Edelsbrunner, H., and Harer, J. (2008). “Persistent homology–a survey,” in Surveys on Discrete and Computational Geometry, eds J. E. Goodman, J. Pach, and R. Pollack (Providence, RI: American Mathematical Society), 257–282. doi: 10.1090/conm/453/08802
Edelsbrunner, H., and Morozov, D. (2013). “Persistent homology: theory and practice,” in European Congress of Mathematics (Zürich: European Mathematical Society), 31–50. doi: 10.4171/120-1/3
Frosini, P. (2009). Does intelligence imply contradiction? Cogn. Syst. Res. 10, 297–315. doi: 10.1016/j.cogsys.2007.07.009
Frosini, P. (2016). “Towards an observer-oriented theory of shape comparison: position paper,” in Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval, 3DOR '16 (Goslar: Eurographics Association), 5–8.
Frosini, P., and Jabłoński, G. (2016). Combining persistent homology and invariance groups for shape comparison. Discrete Comput. Geom. 55, 373–409. doi: 10.1007/s00454-016-9761-y
Frosini, P., and Quercioli, N. (2017). “Some remarks on the algebraic properties of group invariant operators in persistent homology,” in 1st International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), eds A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl (Reggio: Springer International Publishing), 14–24. doi: 10.1007/978-3-319-66808-6_2
Hicks, S. A., Isaksen, J. L., Thambawita, V., Ghouse, J., Ahlberg, G., Linneberg, A., et al. (2021). Explaining deep neural networks for knowledge discovery in electrocardiogram analysis. Sci. Rep. 11:10949. doi: 10.1038/s41598-021-90285-5
Mallat, S. (2012). Group invariant scattering. Commun. Pure Appl. Math. 65, 1331–1398. doi: 10.1002/cpa.21413
Mallat, S. (2016). Understanding deep convolutional networks. Philos. Trans. R. Soc A Math. Phys. Eng. Sci. 374:20150203. doi: 10.1098/rsta.2015.0203
Rao, N. V. (2005). The Stone-Weierstrass theorem revisited. Am. Math. Mnthly 112, 726–729. doi: 10.1080/00029890.2005.11920244
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215. doi: 10.1038/s42256-019-0048-x
Worrall, D. E., Garbin, S. J., Turmukhambetov, D., and Brostow, G. J. (2017). “Harmonic networks: Deep translation and rotation equivariance,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI: IEEE), 7168–7177. doi: 10.1109/CVPR.2017.758
Zhang, C., Voinea, S., Evangelopoulos, G., Rosasco, L., and Poggio, T. (2015). “Discriminative template learning in group-convolutional networks for invariant speech representations,” in Interspeech-2015 (Dresden: International Speech Communication Association), 3229–3233. doi: 10.21437/Interspeech.2015-650
Keywords: GENEO, permutant, symmetric function, persistence diagram, persistent homology, machine learning
Citation: Conti F, Frosini P and Quercioli N (2022) On the Construction of Group Equivariant Non-Expansive Operators via Permutants and Symmetric Functions. Front. Artif. Intell. 5:786091. doi: 10.3389/frai.2022.786091
Received: 29 September 2021; Accepted: 18 January 2022;
Published: 15 February 2022.
Edited by:
Fabio Anselmi, Baylor College of Medicine, United StatesReviewed by:
Remco Duits, Eindhoven University of Technology, NetherlandsKelin Xia, Nanyang Technological University, Singapore
Copyright © 2022 Conti, Frosini and Quercioli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Patrizio Frosini, patrizio.frosini@unibo.it