Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell., 15 February 2022
Sec. Machine Learning and Artificial Intelligence
This article is part of the Research Topic Symmetry as a Guiding Principle in Artificial and Brain Neural Networks View all 11 articles

On the Construction of Group Equivariant Non-Expansive Operators via Permutants and Symmetric Functions

\nFrancesco Conti,Francesco Conti1,2Patrizio Frosini,,,
Patrizio Frosini3,4,5,6*Nicola Quercioli,Nicola Quercioli3,7
  • 1Department of Mathematics, University of Pisa, Pisa, Italy
  • 2Institute of Information Science and Technologies “A. Faedo”, National Research Council of Italy (CNR), Pisa, Italy
  • 3Department of Mathematics, University of Bologna, Bologna, Italy
  • 4Alma Mater Research Center on Applied Mathematics, University of Bologna, Bologna, Italy
  • 5Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, Bologna, Italy
  • 6Research Centre on Electronic Systems for the Information and Communication Technology, University of Bologna, Bologna, Italy
  • 7ENEA Centro Ricerche Bologna, Bologna, Italy

Group Equivariant Operators (GEOs) are a fundamental tool in the research on neural networks, since they make available a new kind of geometric knowledge engineering for deep learning, which can exploit symmetries in artificial intelligence and reduce the number of parameters required in the learning process. In this paper we introduce a new method to build non-linear GEOs and non-linear Group Equivariant Non-Expansive Operators (GENEOs), based on the concepts of symmetric function and permutant. This method is particularly interesting because of the good theoretical properties of GENEOs and the ease of use of permutants to build equivariant operators, compared to the direct use of the equivariance groups we are interested in. In our paper, we prove that the technique we propose works for any symmetric function, and benefits from the approximability of continuous symmetric functions by symmetric polynomials. A possible use in Topological Data Analysis of the GENEOs obtained by this new method is illustrated.

1. Introduction

In recent years, the theory of equivariant operators has become a topic of great interest to the scientific community, since these operators allow to make explicit the use of symmetries in deep learning and artificial intelligence (Mallat, 2012, 2016; Bengio et al., 2013; Zhang et al., 2015; Anselmi et al., 2016, 2019; Cohen and Welling, 2016; Worrall et al., 2017), thereby reducing the number of parameters required in the learning process. In particular, group equivariant non-expansive operators (GENEOs) have been recently proposed as elementary components for building new kinds of neural networks, benefiting from good mathematical properties, such as compactness and convexity, under suitable assumptions on the space of data and with respect to the choice of appropriate topologies (Bergomi et al., 2019). In particular, compactness guarantees total-boundedness, i.e., for every ε > 0 we can find a finite set F={F1,,Fs} of GENEOs such that any other GENEO has a distance less than ε from at least one Fi. This property opens the way to the search for methods to effectively build such a representative set F, leading us to look for new techniques to produce GENEOs.

GENEOs are grounded in Topological Data Analysis (TDA) and allow to shift the attention from the data to the observers who process them, and to the properties of invariance and simplification associated with those observers. The use of these operators is justified by the fact that in most of the cases we are not directly interested in data, but in approximating the experts' behavior in the presence of given data (Frosini, 2016). Since different agents can have different reactions in the presence of the same data, it is clear that data analysis has to be based on each pair (data, observer) rather than on data alone. From the point of view of AI, the focus on GENEOs corresponds to the rising interest in the so called “explainable deep learning” (Rudin, 2019; Carrieri et al., 2021; Hicks et al., 2021), which looks for methods and techniques that can be understood by humans.

GENEOs transform data according to two properties. First of all, they are equivariant with respect to the action of a given transformation group, i.e., they commute with such a group. Secondly, they do not increase the distance between data. This kind of regularity is frequently found in applications, since in several cases the operators we use are required to simplify the metric structure of data. We can obviously imagine particular applications where this condition is locally violated, but the usual long-term goal is to produce representations that are much simpler and more meaningful than the original data, thereby leading us to assume that the considered (compositions of any sufficiently long chain of) operators are non-expansive. This assumption is not only of use to simplify the information we have to manage, but it is also fundamental in the proof that the space of group equivariant non-expansive operators is compact (and hence finitely approximable), provided that the space of data is compact with respect to a suitable topology (Bergomi et al., 2019). This statement becomes false if we renounce non-expansivity.

The use of GENEOs is not limited to machine learning. Another important reason for the study of these operators follows from the relationship between GENEOs and TDA. We indeed know that TDA and Persistent Homology allow for a qualitative and efficient geometric study of the data space, but suffer from some important limitations, since Persistent Homology alone is not able to distinguish between some functions. Fortunately, the joint use of TDA and GENEOs overcomes this difficulty in the discrimination of data (Frosini, 2016; Frosini and Jabłoński, 2016; Bergomi et al., 2019). In other words, GENEOs are able to preserve information on the data that would have been lost through TDA alone.

Another interesting aspect of GENEOs is that we can also look at them as operators that change the pseudo-metrics we use in data comparison. If the real-valued functions φ1, φ2 represent the data we have to compare and a GENEO F is given, we can replace the max-norm distance ||φ1 − φ2|| with the new pseudo-metric d(φ1,φ2) :=||F(φ1)-F(φ2)||. In this approach, F is not seen as a map that transforms the data we are considering, but as a new way of comparing data. We will see in section 6 that the availability of non-linear GENEOs can indeed produce more flexible pseudo-metrics.

Last but not least, a theory of GENEOs could be a relevant tool in the investigation of the role of internal conflicts in AI. We know that the availability of procedures that emulate intelligence opens the way to the appearance of contradictions, conflicts and unexpected behaviors (Frosini, 2009). This phenomenon cannot be ignored in the mathematical study of AI. The use of a precise geometric formalization of components in machine learning could be of great help in facing and analyzing this emerging problem.

However, the main reason for the research about GENEOs follows from a shift of interest from the spaces of data to the topological and geometric analysis of the spaces of observers of the data. This fact naturally leads us to the problem of the efficient approximation of observers. Such an approximation requires to make available large and dense sets of GENEOs, each one representing a possible data-observer interaction. Therefore, since non-linear interactions between observers and data are of great importance in applications, new techniques to build non-linear GENEOs are needed. The main contribution of this paper consists in introducing a new method to produce non-linear GENEOs through the concepts of symmetric function and permutant, thereby extending the procedure illustrated in Botteghi et al. (2020) for the building of linear GENEOs. In this way, we strictly expand the set of operators we can use in applications.

The concept of permutant comes into play when a set Φ of functions from a space X to ℝ and a group G of permutations of X are given. The set Φ represents the space of signals we are interested in, and is assumed to be preserved by right composition with elements of G. If two signals φ1, φ2 ∈ Φ are obtained from each other by right composition with an element gG, we say that they are equivalent with respect to G, just as happens when two images are considered equivalent if there exists an isometry changing one into the other. In this setting, a permutant is defined as a finite set H of Φ-preserving permutations of X that is stable under the conjugation action hghg−1 of any element of G on H (Camporesi et al., 2018).

This paper shows that when a symmetric function and a permutant for the equivariance group G are available, we can easily build a (non-linear) GENEO with respect to G (section 3). This fact justifies the theoretical and practical importance of permutants. Our long-term purpose is the one of developing an effective theory for the approximation of observers and agents via GENEOs in a topological-geometrical setting, so extending the use of these operators in deep learning. While this goal is challenging, we think that our approach could lead to think of GENEOs as elementary components in the building of a new kind of neural networks. This idea is justified by at least two reasons. First of all, deep learning could benefit from using components that are guaranteed to be equivariant with respect to given groups of transformations and are grounded in a well founded topological theory, thereby allowing neural nets to save time in the learning process and to take advantage of techniques developed in TDA. Secondly, an engineering based on GENEOs would be much more transparent, because of the intrinsic interpretability of its components.

The reader could wonder why building GENEOs via permutants should be better than building them by other methods (for example by integrating on the equivariance group G). The key point is that in many applications some permutants exist, whose size is much smaller than the size of the equivariance group. In these cases, the approaches based on permutants can be much simpler than the ones based on G. We observe that permutants encode part of the information represented by the data equivalence expressed by G. Of course, by deciding to build GENEOs via permutants we implicitly accept to lose some information about such a data equivalence, and make a compromise between the computational complexity and the analytical power of the operators we are interested in. The reader can understand this tradeoff by thinking about the limit case given by a permutant containing just the identical permutation id of X. While the singleton {id} is indeed a (trivial) permutant, it does not give any information about the equivariance group G we are considering, since {id} is a permutant for any group of Φ-preserving permutations of X. However, if we consider a larger and larger set H of Φ-preserving permutations of X, the set of groups admitting H as a permutant becomes smaller and smaller. In other words, larger permutants make easier the identification of G.

This article is part of an extensive research on permutants. In Botteghi et al. (2020) it has been proved that each linear G-equivariant non-expansive operator can be produced by a weighted summation associated with a suitable weighted permutant, provided that the group G transitively acts on a finite signal domain. This paper opens the way to the research about the natural conjecture that each non-linear G-equivariant non-expansive operator can be produced (or at least well approximated) by applying our new technique to suitable symmetric functions and permutants, provided that the group G transitively acts on a finite signal domain. This probably non-trivial problem will be attacked in following papers, grounding on the results obtained in this article.

The outline of the paper is as follows. In section 2, we recall the main definitions in our mathematical setting. In Section 3, we show how to associate a group equivariant operator (GEO) with a symmetric function. Section 4 is devoted to the approximation of a generic continuous symmetric function by a polynomial in the elementary symmetric functions, and in section 5, we finally show how to associate a GENEO with such a polynomial. Section 6 highlights the benefits of our approach.

For more details and proofs about the results and concepts illustrated in section 2 we refer the interested reader to the papers (Frosini, 2016; Frosini and Jabłoński, 2016; Frosini and Quercioli, 2017; Camporesi et al., 2018; Bergomi et al., 2019). The other sections present our new results about the construction of non-linear GENEOs via symmetric functions and permutants.

2. Mathematical Setting

Let X be a non-empty set and consider a non-empty, compact subspace Φ of the normed vector space (bX,||·||), where bX is the set of all bounded real-valued functions with domain X, and φ :=supxX|φ(x)|. We can think of the functions in Φ as the data, i.e., the measurements provided by our measuring instruments (or by any operator), and of X as the space where the measurements are made. Sometimes the functions in Φ are also referred as admissible filtering functions or admissible signals. We now recall the usual setting for the introduction of group equivariant non-expansive operators. We endow Φ with the topology induced by the uniform convergence distance

DΦ(φ1,φ2) :=φ1-φ2.

At this stage, X is only a set. We endow X with the topology induced by the pseudo-metric

DX(x1,x2) :=supφΦ|φ(x1)-φ(x2)|.

The idea behind this definition is that two points x1, x2X are considered different only if they are taken to different values by at least one admissible filtering function.

We recall that a pseudo-metric space is a generalization of a metric space in which the distance between two distinct points can be zero. Moreover, a function f from a pseudo-metric space (P1, d1) to a pseudo-metric space (P2, d2) is called non-expansive if

d2(f(x),f(y))d1(x,y)

for every x, yP1.

Remark 2.1. Every function φ ∈ Φ is non-expansive with respect to the pseudo-metric DX on X and the Euclidean metric on. Therefore, each function φ ∈ Φ is continuous with respect to these topologies.

Since Φ is compact, the topology induced by the pseudo-metric DX coincides with the initial topology τin on X with respect to Φ (see Theorem 2.1 in Bergomi et al., 2019, Supplementary Methods). We recall that the initial topology is the coarsest topology on X which makes each function in Φ continuous. Moreover, the compactness of Φ implies that if X is complete then it is also compact (see Theorem 2.2 in Bergomi et al., 2019, Supplementary Methods). In this work, we assume that X is complete, and therefore compact with respect to the topology induced by DX. The image of X through the filtering functions is denoted by Im(Φ) and is defined as

Im(Φ)={φ(x) s.t. φΦ,xX}.

The following result will be of use in section 5.

Proposition 2.2. If X and Φ are compact, Im(Φ) is compact with respect to the Euclidean topology.

Proof. Let us consider the function γ : Φ × X → ℝ such that γ(φ, x) : = φ(x). The space Φ × X is compact with respect to the product topology, which we recall is induced by the sum pseudo-distance. Since the continuous image of a compact is compact, γ(Φ × X) = Im(Φ), and every non-expansive function is continuous, it is sufficient to prove that γ is non-expansive. Given φ1, φ2 ∈ Φ and x1, x2X, we have that

|γ(φ1,x1)-γ(φ2,x2)|=|φ1(x1)-φ2(x2)|             =|φ1(x1)-φ1(x2)+φ1(x2)-φ2(x2)|             supφΦ|φ(x1)-φ(x2)|             +supxX|φ1(x)-φ2(x)|             =DX(x1,x2)+DΦ(φ1,φ2).

We have proved that γ is non-expansive, and therefore Im(Φ) is compact.

Definition 2.3. Chachólski et al. (2020) A Φ-operation is a function g : XX such that, for every φ ∈ Φ, the composition φg also belongs to Φ.

Definition 2.4. A Φ-operation g is invertible if there is a Φ-operation h such that gh = hg = idX.

We denote the collection of all invertible Φ-operations by AutΦ(X). In other words,

AutΦ(X)={g : XX|g is a bijection, and φg,  φg-1Φfor every φΦ}.

We note that AutΦ(X) is a group with respect to the usual composition operation.

Definition 2.5. A perception pair is an ordered pair (Φ, G) where ΦbX and G is a subgroup of AutΦ(X).

As an example, (Φ, AutΦ(X)) is always a perception pair.

Remark 2.6. When a perception pair (Φ, G) is given, each element gG acts on the set Φ by right composition, taking each function φ ∈ Φ to the function φ ◦ g.

2.1. Group Equivariant Non-Expansive Operators

Definition 2.7. Let us consider two perception pairs (Φ, G), (Ψ, H) and a homomorphism T : GH. Each map F : Φ → Ψ such that F is T-equivariant (i.e., F(φg) = F(φ) ◦ T(g) for every φ ∈ Φ, gG) is called a Group Equivariant Operator (GEO) with respect to T.

Definition 2.8. Let us consider two perception pairs (Φ, G), (Ψ, H) and a homomorphism T : GH. Each map F : Φ → Ψ such that F is T-equivariant (i.e., F(φ ◦ g) = F(φ) ◦ T(g) for every φ ∈ Φ, gG) and non-expansive (i.e., ||F(φ) − F(ψ)|| ≤ φ − ψ for every φ, ψ ∈ Φ) is called a Group Equivariant Non-Expansive Operator (GENEO) with respect to T.

After fixing two perception pairs (Φ, G), (Ψ, H) and a homomorphism T : GH, we will use the symbol FTall to denote the collection of all GENEOs with respect to T between such perception pairs. We endow FTall with the topology induced by the metric DGENEO(F1,F2):=supφΦDΦ(F1(φ),F2(φ)). For a more in-depth study of the GENEO topology, we refer the reader to Bergomi et al. (2019). We stress that the non-expansivity of the operators is pivotal for two reasons. The first reason is that we want our operators to simplify the data metric, i.e., not to introduce complexity into the data. The second reason is that non-expansivity allows us to prove the compactness of the space FTall, provided that Φ and Ψ are compact with respect to the distances DΦ, DΨ (see Theorem 7 in Bergomi et al., 2019). If we remove the assumption that our operators are non-expansive, this property of compactness does not hold anymore. As an example, let Φ = Ψ be equal to the set of all constant functions from ℝ to [0, 1], and G = H be the trivial group containing just the identity permutation of ℝ. We observe that Φ, X = ℝ and G are compact with respect to the topologies we have defined on them. Let us now consider the sequence (Fn) of GEOs from Φ to Φ with respect to the identity homomorphism idG : GG, defined by setting Fn(φ):=φn for every function φ ∈ Φ and every positive integer n. It is easy to check that limnDGENEO(Fm,Fn)=1 for every positive integer m, and hence the sequence (Fn) does not admit any converging subsequence. This implies that the space of all GEOs from Φ to Φ with respect to idG is not compact. The compactness of FTall is a key property in applications, since it guarantees that such a space can be approximated by a finite set.

If G = H and T = idG, we can say that F:ΦbX is a G-equivariant map. From now on, we will make these assumptions, and use the terms GEO and GENEO with reference to this setting.

2.2. Permutants

Definition 2.9. Let SX be the set of permutations of X. For each gG, the map cg : SXSX taking each sSX to gsg−1 is called the conjugation action of gG on SX. For every subset H of SX, we denote the set cg(H) by the symbol gHg−1.

Definition 2.10. Camporesi et al. (2018) A finite set H ⊆ AutΦ(X) is called a permutant for G if either H = ∅ or gHg−1 = H for every g ∈ G.

Remark 2.11. In general, a permutant is not a normal subgroup of G. Indeed we require neither that H is a group nor that H is a subset of G. We observe that the setsand {idX} are trivial permutants for any subgroup G of AutΦ(X). Both G and AutΦ(X) are also permutants for G, provided that they are finite groups.

Example 2.12. Let Φ be the set of all functions φ : X = S1 = {(x, y) ∈ ℝ2|x2 + y2 = 1} → [0, 1] that are non-expansive with respect to the Euclidean distances on S1 and [0, 1]. Let us consider the group G of all isometries of ℝ. If h is the clockwise rotation of ℓ radians for a fixed ℓ ∈ ℝ, then the set H = {h, h−1} is a permutant for G.

Example 2.13. Let Φ be the set of all functions φ : X = S1 = {(x, y) ∈ ℝ2|x2 + y2 = 1} → [0, 1] that are non-expansive with respect to the Euclidean distances on S1 and [0, 1]. Let G be the group generated by the reflection with respect to the axis x = 0. If ρ is the clockwise rotation of π/2 around the origin (0, 0), then the set H={idS1,ρ,ρ2,ρ3} is a permutant for G.

Example 2.14. Let us consider the set X of the vertices of a cube in3, and assume that Φ is the set of all functions from X to [0, 1]. Let G be the group of the orientation-preserving isometries of3 that take X to X. Let π1, π2, π3 be the three planes that contain the center of mass of X and are parallel to a face of the cube. Let hi : XX be the orthogonal symmetry with respect to πi, for i ∈ {1, 2, 3}. We have that the set H = {h1, h2, h3} is an orbit under the conjugation action of G on AutΦ(X), and therefore a permutant for G.

Remark 2.15. If the group G is Abelian, every finite subset of G is a permutant for G, since the conjugation action of G on AutΦ(X) is just the identity.

Remark 2.16. In this section, the symbol || · || has been used to denote the max-norm of functions. With a slight abuse of notation, in the rest of the paper such a symbol will be also used to denote the max-norm of points ofm, i.e., ||(α1,,αm)||:=max1im|αi|.

3. Building GEOs From Symmetric Functions

Definition 3.1. Let C be a symmetric subset ofn, i.e., a subset C such that π(C) = C for every permutation π of the coordinates. A function f : C → ℝ is said to be symmetric on C if its value is the same no matter the order of its arguments. That is,

f(a1,,an)=f(aπ(1),,aπ(n))

for every (a1, …, an) ∈ C and every permutation π of the set {1, …, n}.

Proposition 3.2. Let f be a continuous real-valued symmetric function defined on a compact symmetric subset K ofn. Then f is the restriction of a continuous real-valued symmetric function f- defined onn.

Proof. The Tietze Extension Theorem (Dugundji, 1966) implies that f can be extended to a continuous function f^ : n. If Sn is the symmetric group over the set {1, …, n}, we can easily check that the function f-(a1,,an) : =1n!πSnf^(aπ(1),,aπ(n)) has the wanted property.

Proposition 3.2 guarantees that the concept of continuous real-valued symmetric function defined on a compact symmetric subset K of ℝn coincides with the concept of restriction to K of a continuous real-valued symmetric function defined on ℝn.

Let S : n be a symmetric function. If H={hi}i=1n is a non-empty permutant for G ⊆ AutΦ(X), then we can define an operator SH : ΦbX by setting, for any φ ∈ Φ,

SH(φ) :=S(φh1,,φhn),

where S(φh1,,φhn)(x) : =S((φh1)(x),,(φhn)(x)) for every xX.

Proposition 3.3. If S : n is a symmetric function and G ⊆ AutΦ(X), then SH is a GEO from Φ to bX with respect to the identity homomorphism idG : GG.

Proof. For every gG, it holds that

SH(φg)=S(φgh1,,φghn)    =S((φhπg(1))g,,(φhπg(n))g)    =S((φh1)g,,(φhn)g)    =S(φh1,,φhn)g    =SH(φ)g,

where πg is the permutation such that ghig-1=hπg(i), that is ghi = hπg(i)g. Therefore, SH(φg)=SH(φ)g, and hence SH is a GEO.

Corollary 3.4. If S : n is a symmetric function and its restriction to Im(Φ)n is non-expansive, then SH is a GENEO from Φ to bX with respect to idG.

Proof. It is sufficient to prove the non-expansivity of SH with respect to the max-norms on Φ and bX, since the group equivariance is already granted by Proposition 3.3. If φ, ψ ∈ Φ, then for every xX

|(SH(φ))(x)-(SH(ψ))(x)|=|S(φh1,,φhn)(x)-S(ψh1,,ψhn)(x)|((φh1)(x)-(ψh1)(x),,(φhn)(x)-(ψhn)(x))=max1in|(φhi)(x)-(ψhi)(x)|max1inφhi-ψhi=φ-ψ.

In conclusion, ||SH(φ)-SH(ψ)||||φ-ψ|| and SH is a GENEO.

So far we have shown how to construct GEOs associated with a symmetric function. These operators are actually GENEOs if the function they are associated with is non-expansive. In the next sections, we will show how to adapt this concept to build GENEOs even in the presence of symmetric functions that are not non-expansive.

We stress that our approach requires no integration over the (possibly infinite and large) group G, but just the availability of a permutant and the computation of a symmetric function. This approach generalizes the method introduced in Camporesi et al. (2018), concerning the symmetric function S(a1,,an)=1ni=1nai.

4. Approximating Symmetric Functions With Symmetric Polynomials

Let us now explore the concept of approximation of symmetric functions by symmetric polynomials. For more details, we refer the reader to Davidson and Donsig (2009), Blum-Smith and Coskey (2017). In the sequel, we will denote the symmetric group over the set {1, …, n} as Sn. Let K be a compact metric space, and C(K) be the vector space of continuous real-valued functions on K. With a slight abuse of notation, in the following we will confuse each polynomial with the function it represents, restricted to the domain we are considering. Furthermore, if I is a finite subset of ℕn, we will say that a polynomial (k1,,kn)Ick1,,kny1k1··ynkn is symmetric if π(I) = I, and ck1, …, kn = cπ(k1), …, π(kn) for every multi-index (k1, …, kn) ∈ I and every permutation π ∈ Sn.

Definition 4.1. Davidson and Donsig (2009) A subset A of C(K) is an algebra if it is a vector subspace of C(K) that is closed under multiplication (i.e., if f, gA then f · gA). A set S of functions on K separates points if for each pair of points s, tK there is a function fS such that f(s) ≠ f(t). A set S of functions on K vanishes at sK if f(s) = 0 for all f ∈ S.

Theorem 4.2 (Stone - Weierstrass Theorem). Davidson and Donsig (2009) An algebra A of continuous real-valued functions on a compact metric space K that separates points and does not vanish at any point is dense in C(K) with respect to the max-norm referred to the domain K.

Corollary 4.3. Davidson and Donsig (2009) Let K be a compact subset ofn. The algebra of all polynomials p(y1, …, yn) in n variables is dense in C(K) with respect to the max-norm referred to the domain K.

This theorem allows us to approximate a continuous symmetric function S : K by a polynomial with arbitrary accuracy, provided that K is a compact subset of ℝn. However, this is not exactly what we need, as we would like such a polynomial to be symmetric. This can be obtained by a symmetrization of the previously found polynomial, as shown by the next proposition, which proves that any symmetric continuous function on a compact and symmetric domain can be approximated with arbitrary precision by a symmetric polynomial.

Proposition 4.4. Let K be a compact subset ofn, verifying the property π(K) = K for every π ∈ Sn. If S|K : K is the restriction to K of a continuous symmetric function S : n and || · || is the max-norm referred to the domain K, then for every ε > 0 there exists a symmetric polynomial q in n variables such that ||S|K-q|K||ε.

Proof. From Corollary 4.3 it follows that there exists a polynomial p : ℝn → ℝ such that ||S|K-p|K||ε. Let us now define the symmetric polynomial q(a1,,an) : =1n!πSnp(aπ(1),,aπ(n)). If a = (a1, …, an) ∈ K, we define aπ = (aπ(1), …, aπ(n)) for every permutation π ∈ Sn. Then

S|K-q|K=maxaK|S(a)-q(a)|        =maxaK|S(a)-1n!πSnp(aπ)|        =maxaK|1n!πSnS(a)-1n!πSnp(aπ)|        1n!maxaKπSn|S(a)-p(aπ)|        =1n!maxaKπSn|S(aπ)-p(aπ)|        =1n!πSnmaxaK|S(aπ)-p(aπ)|        =1n!πSnmaxaK|S(a)-p(a)|        =1n!πSnS|K-p|K        1n!πSnε=ε.

Definition 4.5. The elementary symmetric polynomials in the n variables a1, …, an, also called elementary symmetric functions, are defined as:

σ1 :=a1++anσ2 :=a1·a2+a1·a3++an-1·an=1i<jnai·ajσr :=1i1<i2<<irnai1·ai2··air=1i1<i2<<irnj=i1irajσn :=a1·a2··an.

We now recall an important result in the theory of symmetric polynomials:

Theorem 4.6. (Fundamental Theorem on Symmetric Polynomials). Blum-Smith and Coskey (2017) Any symmetric polynomial in n variables a1, …, an is representable in a unique way as a polynomial in the elementary symmetric polynomials σ1, …, σn.

Remark 4.7. It is a well know fact (see Rao, 2005; Davidson and Donsig, 2009; Blum-Smith and Coskey, 2017) that the proofs of Theorems 4.2 and 4.6 are constructive. This means that, if K is a compact symmetric subset ofn, and the restriction S|K of a continuous symmetric function S : n is given, we are able to effectively approximate S|K with an error less than ε by the restriction to K of an explicitly defined polynomial in the elementary symmetric functions.

In conclusion, if an equivariance group G is chosen and a GEO F is built by applying Proposition 3.3 to the continuous symmetric function S : n, we can approximate F in the following way, provided that X and Φ are compact. First of all, we can approximate the continuous function S by a polynomial p : ℝn → ℝ, with an arbitrarily small error ε on the symmetric set Im(Φ)n, which is guaranteed to be compact by Proposition 2.2. Then, we can consider the symmetric polynomial q(a1,,an) : =1n!πSnp(aπ(1),,aπ(n)). Finally, we can consider the GEO F′ defined by setting F(φ) : =q(φh1,,φhn) for every φ ∈ Φ. Since H ⊆ AutΦ(X), ||F(φ)-F(φ)||=maxxX|S(φ(h1(x)),,φ(hn(x)))-q(φ(h1(x)),,φ(hn(x)))|||S|Im(Φ)n-q|Im(Φ)n||εfor any φ ∈ Φ, and hence the operator F′ can be chosen arbitrarily close to F.

5. Building GENEOs From Polynomials in the Elementary Symmetric Functions

Proposition 2.2 shows that Im(Φ) is compact. Moreover, the equality π (Im(Φ)n) = Im(Φ)n trivially holds for every π ∈ Sn. Therefore, Proposition 4.4 and the Fundamental Theorem on Symmetric Polynomials guarantee that the restriction to Im(Φ)n of any continuous symmetric functions can be approximated arbitrarily well by the restriction to Im(Φ)n of a polynomial in the elementary symmetric functions, defined as

S~(a1,,an)=k1=0m1kn=0mnck1,,kni=1nσiki(a1,,an),

where mi ∈ ℕ for every i ∈ {1, …, n}, ck1, …, kn ∈ ℝ for every k1 ∈ {0, …, m1}, …, kn ∈ {0, …, mn} and σi is the i-th elementary symmetric polynomial for every i ∈ {1, …, n}. From Proposition 3.3, we already know that the associated operator is a GEO. We can indeed obtain a GENEO by applying Corollary 3.4 to a suitable multiple of S~. In the sequel, we will need the following constants:

MIm(Φ)n :=maxαIm(Φ)nα=maxφΦφ    (5.1)
M1 :=max1in{ki(ni)kiiMIm(Φ)niki-1}    (5.2)
M2 :=max1in{(ni)kiMIm(Φ)niki}n-1    (5.3)
C=nk1=0m1kn=0mn|ck1,,kn|M1M2,    (5.4)

Let us consider a non-empty permutant H={hi}i=1n for G ⊆ AutΦ(X). We can define an operator S^H : ΦbX by setting

S^H(φ) :=1CS~(φh1,,φhn)

for any φ ∈ Φ, where S~(φh1,,φhn)(x) : =S~((φh1)(x),,(φhn)(x)) for every xX and C is the constant defined in (5.4).

Theorem 5.1. If S~ is a polynomial in the n elementary symmetric functions, then S^H is a GENEO from Φ to bX with respect to idG.

Proof. The thesis immediately follows from Corollary 3.4, once it is proved that the restriction of 1CS~ to Im(Φ)n is non-expansive. For every α=(α1,,αn),β=(β1,,βn)Im(Φ)n, by applying Lemma 1.5 in Appendix 1, we have that

|1CS~(α)-1CS~(β)|=|1Ck1=0m1kn=0mnck1,,kni=1nσiki(α)-1Ck1=0m1kn=0mnck1,,kni=1nσiki(β)|1Ck1=0m1kn=0mn|ck1,,kn||i=1nσiki(α)-i=1nσiki(β)|1Ck1=0m1kn=0mn|ck1,,kn|nα-βM1M2=(1Cnk1=0m1kn=0mn|ck1,,kn|M1M2)α-β=α-β.

We have shown that 1CS~ is non-expansive and therefore the associated operator S^H is a GENEO.

Example 5.2. Let us consider the setting of Example 2.12 and the polynomial S~(a1,a2)=-σ1(a1,a2)+σ22(a1,a2)-3σ1(a1,a2)σ2(a1,a2)=-a1-a2+a12a22-3a12a2-3a1a22. Then the operator

S^H(φ)=140(-(φh)-(φh-1)+(φ2h)·(φ2h-1)   -3(φ2h)·(φh-1)-3(φh)·(φ2h-1))

is a GENEO.

Example 5.3. Let us consider the setting of Example 2.13 and the polynomial S~(a1,a2,a3,a4)=σ1(a1,a2,a3,a4)+σ4(a1,a2,a3,a4)=a1+a2+a3+a4+a1a2a3a4. In this case, the operator

S^H(φ)=11040(φ+(φρ)+(φρ2)+(φρ3)   +φ·(φρ)·(φρ2)·(φρ3))

is a GENEO.

Example 5.4. Let us consider the setting of Example 2.14 and the polynomial S~(a1,a2,a3)=σ2(a1,a2,a3)·σ3(a1,a2,a3)=a12a22a3+a12a2a32+a1a22a32. Then the operator

S^H(φ)=1162((φ2h1)·(φ2h2)·(φh3)   +(φ2h1)·(φh2)·(φ2h3)   +(φh1)·(φ2h2)·(φ2h3))

is a GENEO.

Remark 5.5. It is worth noticing that if we replace the constant C, defined as in (5.4), with any constant C-C, the operator S^H defined as in Theorem 5.1 is still a GENEO. However, it should be specified that the larger the constant C- is, the more difficult it becomes to distinguish different signals, since their distance is smaller. For this reason, a larger C- means that more information is lost when applying the GENEO. Nonetheless, in some cases rough constants which are easier to compute may be preferred. For this reason, we present here a possible alternative to C. Let c=max|ck1,,kn|,m=maxjmj,μ=j=1n(mj+1) and M-Im(Φ)n=max{MIm(Φ)n,1}, where MIm(Φ)n is defined as in (5.1). We define the constant

C-=cμn2m(nn2)mnM-Im(Φ)nmn2-1.    (5.5)

We can easily show that C-C. Therefore, the operator S-H, obtained by replacing C with C- in the definition of S^H and applying Theorem 5.1, is still a GENEO. In Example 5.2, C-=2,304, which is far larger than C = 40. In Example 5.3, C-=82,944, while C = 1, 040. Finally, in Example 5.4 C-=972 and C = 162. We stress that the constant C defined in (5.4) is optimal if we do not add any further assumption. We can realize this by applying Theorem 5.1 to the symmetric function S~(a1)=σ1(a1)=a1 in just one variable, provided that Φ is the collection of all non-expansive functions from X: = [0, 1] to itself, and we set both G and the permutant H equal to the trivial group containing only the identity of X. On the one hand, it can be easily checked that in this case the GEO S~H defined by applying Proposition 3.3 is the identity map, and hence a GENEO. Therefore, any constant C′ that could replace C in the definition of S^H must be not smaller than 1 in order to preserve the non-expansivity of S^H. On the other hand, we can immediately see that C=M1=M2=MIm(Φ)n=1. It follows that C′ ≥ C.

6. GENEOs Increase Our Ability to Distinguish Data

In this section, we illustrate a few examples showing that our approach can produce useful non-linear GENEOs and increase our ability to distinguish data. As already discussed in the Introduction, it is of fundamental importance to make a large number of GENEOs available in machine learning, as each of them models a data-observer pair. The results presented in this paper could be of great help in the task of extending the set of available GENEOs and the consequent possibilities of using them in applications.

The next example shows that our new method indeed extends the approach introduced in Botteghi et al. (2020).

Example 6.1. Let us set X = {1, 2, 3} and Φ equal to the set of all functions from X to [0, 1]. In this case, AutΦ(X) = S3, and we define G = S3. We now consider the symmetric function S~(a1,a2)=a1·a2=σ2(a1,a2) and the permutant H = {(1, 2, 3), (1, 3, 2)} =:{h1, h2}. From Theorem 5.1 we get that S^H(φ)=(φh1)·(φh2)4 is a GENEO. Such an operator is not linear, and hence it cannot be obtained by the method described in Botteghi et al. (2020).

In next Example 6.2 we illustrate the synergy between GENEOs and TDA. TDA is mainly grounded on Persistent Homology (PH), which is an algebraic topological theory devised to describe “holes” in geometrical data, focusing on their persistence under the action of noise. In particular, TDA takes benefit from the topological comparison of data by means of persistence diagrams, which are the main tools in PH. For more details about TDA and PH, we refer the interested reader to Edelsbrunner and Harer (2008), Edelsbrunner and Morozov (2013). Example 6.2 shows that the use of new GENEOs increases our ability of distinguishing data by persistence diagrams.

Example 6.2. Let us consider the following functions: φ(x) = |sin(x)| and ψ(x) = sin(x)2 ∈ Φ, where Φ is the space of all 1-Lipschitz functions from the unit circle S1 to [0, 1] and the invariance group G is composed of all rotations of S1. If we are looking at Φ only through Persistent Homology, then φ and ψ are indistinguishable, since they induce the same persistence diagram (see Figure 1). Let us now consider the GENEOs F1 = id : Φ → Φ and F2(φ)=φρπ2, where ρπ2 is the clockwise rotation through a π2 angle. We highlight the fact that the joint use of TDA, F1 and F2 does not allow us to distinguish φ from ψ, since the functions F1(φ) and F1(ψ) have the same persistence diagram, and the same happens for F2(φ) and F2(ψ) (see Figures 2, 3). However, if we consider the symmetric function σ2 and the permutant H={idS1,ρπ2}, from Theorem 5.1 we get that S^H(φ) : =14(φ·(φρπ2)) is a non-linear GENEO. We observe that the joint use of TDA and S^H allows us to distinguish φ and ψ, since the persistence diagrams of the functions S^H(φ) and S^H(ψ) are different from each other (see Figure 4).

FIGURE 1
www.frontiersin.org

Figure 1. The functions φ and ψ have the same persistence diagram.

FIGURE 2
www.frontiersin.org

Figure 2. F1(φ) and F1(ψ) have the same persistence diagram.

FIGURE 3
www.frontiersin.org

Figure 3. F2(φ) and F2(ψ) have the same persistence diagram.

FIGURE 4
www.frontiersin.org

Figure 4. S^H(φ) and S^H(ψ) have different persistence diagrams, and hence they are distinguishable by Persistent Homology. The bottleneck distance between their persistence diagrams is 0.0625.

Finally, we stress another important aspect of the GENEOs. The use of GENEOs can be seen as a methodology for changing the max-norm metric that we use to compare functions in Φ. We can indeed change the distance ||φ1 − φ2|| into the pseudo-metric ||F1) − F2)||, where F is a GENEO. In Example 6.3 we show how the pseudo-metrics associated with non-linear GENEOs are much more flexible than those generated by linear operators, thus guaranteeing a wider range of applications.

Example 6.3. Let us consider Φ as the set of all functions from the set with just two elements X = {A, B} to [0, 1]. The functions in Φ can be described by ordered pairs (φ(A), φ(B)). In this setting, the group G is composed of the permutations of two elements. As usual, in Φ we have the metric DΦ1, φ2) = ||φ1 − φ2||, therefore the distance between the functions (0, 0), (0, 1), (1, 0), (1, 1) is always 1. Suppose that we need a GENEO F such that the pseudo-metric ||F1) − F2)|| vanishes between functions with a null component, while maintaining positive the distance between (1, 1) and the other three functions belonging to Φ. No linear GENEO can induce such a pseudo-metric, since if a linear transformation maps (1, 0) and (0, 1) to (0, 0), it must also map (1, 1) to (0, 0). It is worth noticing that through the GENEO associated with the elementary symmetric function σ2(a1, a2) = a1 · a2 and with the permutant H = G, we can obtain a pseudo-metric with the desired property.

7. Conclusions

In our paper, we have introduced a new method to build GENEOs, grounded on the concepts of symmetric function and permutant. Our main goal is the one of building a good theory of GENEOs, making available methods to define and use these operators in machine learning. The main advantage of our approach is the fact that it requires no integration over the (possibly infinite and large) group G, but just the availability of a permutant and the computation of a symmetric function. Many lines of research still remain to be explored in this field. For example, the reader can observe that in section 3 the requirement that the function S is symmetric could be weakened without losing the property that SH is a GEO. It would be indeed sufficient to assume that S is invariant when we apply to its argument any permutation corresponding to the permutation of H associated with the conjugation action hghg−1 that is defined by any gG. We have decided to postpone the research concerning this extension of the theory, since it would have added a dependence on H of the choice of S, thereby introducing some technicalities. Another line of research that we would like to explore in the future is the possibility of adapting our approach to permutant measures, which are a sort of extension of the concept of permutant to the case that H has infinite cardinality. However, the most challenging problem we will have to face is likely to be the proof or disproof of the natural conjecture that each non-linear G-equivariant non-expansive operator can be produced (or at least well approximated) by applying our new technique to suitable symmetric functions and permutants, provided that the group G transitively acts on a finite signal domain. We plan to devote some subsequent papers to these topics.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Author Contributions

PF devised the project. All authors contributed to the manuscript. All authors read and approved the final manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

This research has been partially supported by INdAM-GNSAGA. FC thanks Davide Moroni and Maria Antonietta Pascali for their helpful advise and support.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frai.2022.786091/full#supplementary-material

References

Anselmi, F., Evangelopoulos, G., Rosasco, L., and Poggio, T. (2019). Symmetry-adapted representation learning. Pattern Recogn. 86, 201–208. doi: 10.1016/j.patcog.2018.07.025

CrossRef Full Text | Google Scholar

Anselmi, F., Rosasco, L., and Poggio, T. (2016). On invariance and selectivity in representation learning. Inform. Inference J. IMA 5, 134–158. doi: 10.1093/imaiai/iaw009

CrossRef Full Text | Google Scholar

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. doi: 10.1109/TPAMI.2013.50

PubMed Abstract | CrossRef Full Text | Google Scholar

Bergomi, M. G., Frosini, P., Giorgi, D., and Quercioli, N. (2019). Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning. Nat. Mach. Intell. 1, 423–433. doi: 10.1038/s42256-019-0087-3

CrossRef Full Text | Google Scholar

Blum-Smith, B., and Coskey, S. (2017). The fundamental theorem on symmetric polynomials: history's first whiff of Galois theory. College Math. J. 48, 18–29. doi: 10.4169/college.math.j.48.1.18

CrossRef Full Text | Google Scholar

Botteghi, S., Brasini, M., Frosini, P., and Quercioli, N. (2020). On the finite representation of group equivariant operators via permutant measures. arXiv preprint arXiv:2008.06340.

Google Scholar

Camporesi, F., Frosini, P., and Quercioli, N. (2018). “On a new method to build group equivariant operators by means of permutants,” in 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), eds A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl (Hamburg: Springer International Publishing), 265–272. doi: 10.1007/978-3-319-99740-7_18

CrossRef Full Text | Google Scholar

Carrieri, A. P., Haiminen, N., Maudsley-Barton, S., Gardiner, L.-J., Murphy, B., Mayes, A. E., et al. (2021). Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences. Sci. Rep. 11:4565. doi: 10.1038/s41598-021-83922-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Chachólski, W., Gregorio, A. D., Quercioli, N., and Tombari, F (2020). Landscapes of data sets and functoriality of persistent homology. arXiv preprint arXiv:2002.05972.

Google Scholar

Cohen, T., and Welling, M. (2016). “Group equivariant convolutional networks,” in International Conference on Machine Learning (New York, NY: ICML), 2990–2999.

Google Scholar

Davidson, K., and Donsig, A. (2009). Real Analysis and Applications: Theory in Practice. Undergraduate Texts in Mathematics. New York, NY: Springer. doi: 10.1007/978-0-387-98098-0

CrossRef Full Text | Google Scholar

Dugundji, J. (1966). Topology. Boston, MA: Allyn and Bacon.

Edelsbrunner, H., and Harer, J. (2008). “Persistent homology–a survey,” in Surveys on Discrete and Computational Geometry, eds J. E. Goodman, J. Pach, and R. Pollack (Providence, RI: American Mathematical Society), 257–282. doi: 10.1090/conm/453/08802

CrossRef Full Text | Google Scholar

Edelsbrunner, H., and Morozov, D. (2013). “Persistent homology: theory and practice,” in European Congress of Mathematics (Zürich: European Mathematical Society), 31–50. doi: 10.4171/120-1/3

CrossRef Full Text | Google Scholar

Frosini, P. (2009). Does intelligence imply contradiction? Cogn. Syst. Res. 10, 297–315. doi: 10.1016/j.cogsys.2007.07.009

CrossRef Full Text | Google Scholar

Frosini, P. (2016). “Towards an observer-oriented theory of shape comparison: position paper,” in Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval, 3DOR '16 (Goslar: Eurographics Association), 5–8.

Google Scholar

Frosini, P., and Jabłoński, G. (2016). Combining persistent homology and invariance groups for shape comparison. Discrete Comput. Geom. 55, 373–409. doi: 10.1007/s00454-016-9761-y

CrossRef Full Text | Google Scholar

Frosini, P., and Quercioli, N. (2017). “Some remarks on the algebraic properties of group invariant operators in persistent homology,” in 1st International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), eds A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl (Reggio: Springer International Publishing), 14–24. doi: 10.1007/978-3-319-66808-6_2

CrossRef Full Text | Google Scholar

Hicks, S. A., Isaksen, J. L., Thambawita, V., Ghouse, J., Ahlberg, G., Linneberg, A., et al. (2021). Explaining deep neural networks for knowledge discovery in electrocardiogram analysis. Sci. Rep. 11:10949. doi: 10.1038/s41598-021-90285-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Mallat, S. (2012). Group invariant scattering. Commun. Pure Appl. Math. 65, 1331–1398. doi: 10.1002/cpa.21413

PubMed Abstract | CrossRef Full Text | Google Scholar

Mallat, S. (2016). Understanding deep convolutional networks. Philos. Trans. R. Soc A Math. Phys. Eng. Sci. 374:20150203. doi: 10.1098/rsta.2015.0203

PubMed Abstract | CrossRef Full Text | Google Scholar

Rao, N. V. (2005). The Stone-Weierstrass theorem revisited. Am. Math. Mnthly 112, 726–729. doi: 10.1080/00029890.2005.11920244

CrossRef Full Text | Google Scholar

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215. doi: 10.1038/s42256-019-0048-x

CrossRef Full Text | Google Scholar

Worrall, D. E., Garbin, S. J., Turmukhambetov, D., and Brostow, G. J. (2017). “Harmonic networks: Deep translation and rotation equivariance,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI: IEEE), 7168–7177. doi: 10.1109/CVPR.2017.758

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, C., Voinea, S., Evangelopoulos, G., Rosasco, L., and Poggio, T. (2015). “Discriminative template learning in group-convolutional networks for invariant speech representations,” in Interspeech-2015 (Dresden: International Speech Communication Association), 3229–3233. doi: 10.21437/Interspeech.2015-650

CrossRef Full Text | Google Scholar

Keywords: GENEO, permutant, symmetric function, persistence diagram, persistent homology, machine learning

Citation: Conti F, Frosini P and Quercioli N (2022) On the Construction of Group Equivariant Non-Expansive Operators via Permutants and Symmetric Functions. Front. Artif. Intell. 5:786091. doi: 10.3389/frai.2022.786091

Received: 29 September 2021; Accepted: 18 January 2022;
Published: 15 February 2022.

Edited by:

Fabio Anselmi, Baylor College of Medicine, United States

Reviewed by:

Remco Duits, Eindhoven University of Technology, Netherlands
Kelin Xia, Nanyang Technological University, Singapore

Copyright © 2022 Conti, Frosini and Quercioli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Patrizio Frosini, cGF0cml6aW8uZnJvc2luaUB1bmliby5pdA==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.