- 1Department of Mathematics, University of Bologna, Bologna, Italy
- 2Department of Electrical, Electronic, and Information Engineering (DEI) and WiLab-National Laboratory for Wireless Communications, National Inter-University Consortium for Telecommunications (CNIT), University of Bologna, Bologna, Italy
- 3Department of Mathematics, Royal Institute of Technology (KTH), Stockholm, Sweden
In this article, we propose a topological model to encode partial equivariance in neural networks. To this end, we introduce a class of operators, called P-GENEOs, that change data expressed by measurements, respecting the action of certain sets of transformations, in a non-expansive way. If the set of transformations acting is a group, we obtain the so-called GENEOs. We then study the spaces of measurements, whose domains are subjected to the action of certain self-maps and the space of P-GENEOs between these spaces. We define pseudo-metrics on them and show some properties of the resulting spaces. In particular, we show how such spaces have convenient approximation and convexity properties.
1 Introduction
Over the past decade, several geometric techniques have been incorporated into Deep Learning (DL), giving rise to the new field of Geometric Deep Learning (GDL) (Cohen and Welling, 2016; Masci et al., 2016; Bronstein et al., 2017). This geometric approach to deep learning is exploited with a dual purpose. On one hand, geometry provides a common mathematical framework to study neural network architectures. On the other hand, a geometric bias, based on prior knowledge of the data set, can be incorporated into DL models. In this second case, GDL models take advantage of the symmetries imposed by an observer, which encode and elaborate the data. The general blueprint of many deep learning architectures is modeled by group equivariance to encode such properties. If we consider measurements on a data set and a group encoding their symmetries, i.e., transformations taking admissible measurements (for example, rotation or translation of an image), the group equivariance is the property guaranteeing that such symmetries are preserved after applying an operator (e.g., a layer in a neural network) on the observed data. In particular, let us assume that the input measurements Φ, the output measurements Ψ and, respectively, their symmetry groups G and H are given. Then the agent F: Φ → Ψ is T-equivariant if F(φg) = F(φ)T(g), for any φ in Φ and any g in G, where T is a group homomorphism from G to H. In the theory of Group Equivariant Non-Expansive Operators (GENEOs) (Camporesi et al., 2018; Bergomi et al., 2019; Cascarano et al., 2021; Bocchi et al., 2022, 2023; Conti et al., 2022; Frosini et al., 2023; Micheletti, 2023), as in many other GDL models, the collection of all symmetries is represented by a group, but in some applications, the group axioms do not necessarily hold since real-world data rarely follow strict mathematical symmetries due to noise, incompleteness, or symmetry-breaking features. As an example, we can consider a data set that contains images of digits and the group of rotations as the group acting on it. Rotating an image of the digit “6” by a straight angle returns an image that the user would most likely interpret as “9”. At the same time, we may want to be able to rotate the digit “6” by small angles while preserving its meaning (see Figure 1).
Figure 1. Example of a symmetry breaking feature. Applying a rotation g of π/4, the digit “6” preserves its meaning (left). The rotation g4 of π is, instead, not admissible, since it transforms the digit “6” into the digit “9” (right).
It is then desirable to extend the theory of GENEOs by relaxing the hypotheses on the sets of transformations. The main aim of this article is to give a generalization of the results obtained for GENEOs to a new mathematical framework, where the property of equivariance is maintained only for some transformations of the measurements, encoding a partial equivariance with respect to the action of the group of all transformations. To this end, we introduce the concept of Partial Group Equivariant Non-Expansive Operator (P-GENEO).
In this new model, there are some substantial differences with respect to the theory of GENEOs:
1. The user chooses two sets of measurements in input: the one containing the original measurements and another set that encloses the admissible variations of such measurements, defined in the same domain. For example, in the case where the function that represents the digit “6” is being observed, we define an initial space that contains this function and another space that contains certain small rotations of “6” but excludes all the others.
2. Instead of considering a group of transformations, we consider a set containing only those that do not change the meaning of our data, i.e., only those associating with each original measurement another one inside the set of its admissible variations. Therefore, by choosing the initial spaces, the user defines also which transformations of the data set, given by right composition, are admissible and which ones are not.
3. We define partial GENEOs, or P-GENEOs, as a generalization of GENEOs. P-GENEOs are operators that respect the two sets of measurements in input and the set of transformations relating them. The term “partial” refers to the fact that the set of transformations does not necessarily need to be a group.
With these assumptions in mind, we will extend the results proven in the study by Bergomi et al. (2019) and Quercioli (2021a) for GENEOs. We will define suitable pseudo-metrics on the spaces of measurements, the set of transformations, and the set of non-expansive operators. Grounding on their induced topological structures, we prove compactness and convexity of the space of P-GENEOs under the assumption that the function spaces are compact and convex. These are useful properties from a computational point of view. For example, compactness guarantees that the space can be approximated by a finite set. Moreover, convexity allows us to take the convex combination of P-GENEOs in order to generate new ones.
2 Related work
The main motivation for our study is that observed data rarely follow strict mathematical symmetries. This may be due, for example, to the presence of noise in data measurements. The idea of relaxing the hypothesis of equivariance in GDL and data analysis is not novel, as it is shown by the recent increase in the number of publications in this area (see, for example, Weiler and Cesa, 2019; Finzi et al., 2021; Romero and Lohit, 2022; van der Ouderaa et al., 2022; Wang et al., 2022; Chachlski et al., 2023).
We identify two main ways to transform data via operators that are not strictly equivariant due to the lack of strict symmetries of the measurements. On one hand, one could define approximately equivariant operator. These are operators for which equivariance holds up to small perturbation. In this case, given two groups, G and H acting on the spaces of measurements Φ and Ψ, respectively, and a homomorphism between them, T: G → H, we say that F: Φ → Ψ is ε-equivariant if, for any g ∈ G and φ ∈ Φ, ||F(φg) − F(φ)T(g)||∞ ≤ ε. Alternatively, when defining operators transforming the measurements of certain data sets, equivariance may be substituted by partial equivariance. In this case, equivariance is guaranteed for a subset of the groups acting on the space of measurements, with no guarantees for this subset to be a subgroup. Among the previously cited articles about relaxing the property of equivariance in DL, the approach by Finzi et al. (2021) is closer to an approximate equivariance model. Here, the authors use a Bayesian approach to introduce an inductive bias in their network that is sensitive to approximate symmetry. The authors of Romero and Lohit (2022) utilize a partial equivariance approach, where a probability distribution is defined and associated with each group convolutional layer of the architecture, and the parameters defining it are either learnt, to achieve equivariance, or partially learnt, to achieve partial equivariance. The importance of choosing equivariance with respect to different acting groups on each layer of the CNN was actually first observed in the study by Weiler and Cesa (2019) for the group of Euclidean isometries in ℝ2.
The point of view of this article is closer to the latter. Our P-GENEOs are indeed operators that preserve the action of certain sets ruling the admissibility of the transformations of the measurements of our data sets. Moreover, non-expansiveness plays a crucial role in our model. This is, in fact, the feature allowing us to obtain compactness and approximability in the space of operators, distinguishing our model from the existing literature on equivariant machine learning.
3 Mathematical setting
3.1 Data sets and operations
Consider a set X and the normed vector space , where is the space of all bounded real-valued functions on X and ∥·∥∞ is the usual uniform norm, i.e., for any , . On the set X, the space of transformations is given by elements of Aut(X), i.e., the group of bijections from X to itself. Then, we can consider the right group action defined as follows (we represent composition as a juxtaposition of functions):
Remark 3.1. For every s ∈ Aut(X), the map with preserves the distances. In fact, for any , by bijectivity of s, we have that
In our model, our data sets are represented as two sets Φ and Φ′ of bounded real-valued measurements on X. In particular, X represents the space where the measurements can be made, Φ is the space of permissible measurements, and Φ′ is a space which Φ can be transformed into, without changing the interpretation of its measurements after a transformation is applied. In other words, we want to be able to apply some admissible transformations on the space X so that the resulting changes in the measurements in Φ are contained in the space Φ′. Thus, in our model, we consider operations on X in the following way:
Definition 3.2. A (Φ, Φ′)-operation is an element s of Aut(X) such that, for any measurement φ ∈ Φ, the composition φs belongs to Φ′. The set of all (Φ, Φ′) operations is denoted by .
Remark 3.3. We can observe that the identity function idX is an element of if Φ ⊆ Φ′.
For any , the restriction to of the map takes values in Φ′ since for any φ ∈ Φ. We can consider the restriction of the map (for simplicity, we will continue to use the same symbol to denote this restriction):
where , for every and every φ ∈ Φ.
Definition 3.4. Let X be a set. A perception triple is a triple (Φ, Φ′, S) with and . The set X is called the domain of the perception triple and is denoted by dom(Φ, Φ′, S).
Example 3.5. Given X = ℝ2, consider two rectangles R and R′ in X. Assume Φ: = {φ: X → [0, 1]: supp(φ) ⊆ R} and Φ′: = {φ′:X → [0, 1]: supp(φ′) ⊆ R′}. We recall that, if we consider a function f: X → ℝ, the support of f is the set of points in the domain, where the function does not vanish, i.e., supp(f) = {x ∈ X | f(x) ≠ 0}. Consider S as the set of translations that bring R into R′. The triple (Φ, Φ′, S) is a perception triple. If Φ represents a set of gray level images, S determines which translations can be applied to our pictures.
3.2 Pseudo-metrics on data sets
In our model, considering a generic set X, data are represented by a space of bounded real-valued functions. We endow the real line ℝ with the usual Euclidean metric and the space X with an extended pseudo-metric induced by Ω:
for every x1, x2 ∈ X. The choice of this pseudo-metric over X means that two points can only be distinguished if they assume different values for some measurements. For example, if Φ contains only a constant function and X contains at least two points, the distance between any two points of X is always null.
The pseudo-metric space can be considered as a topological space with the basis
and the induced topology is denoted by τΩ. The reason for considering a topological space X, rather than just a set, follows from the need of formalizing the assumption that data are stable under small perturbations.
Remark 3.6. In our case, there are two collections of functions Φ and Φ′ in representing our data, both of which induce a topology on X. Hence, in the model, we consider two pseudo-metric spaces XΦ and with the same underlying set X. If , the topologies τΦ and are comparable and, in particular, is finer than τΦ.
Now, given a set , we will prove a result about the compactness of the pseudo-metric space XΩ. Before proceeding, let us recall the following lemma (e.g., see Gaal, 1964):
Lemma 3.7. Let (P,d) be a pseudo-metric space. The following conditions are equivalent:
1. P is totally bounded;
2. Every sequence in P admits a Cauchy subsequence.
Theorem 3.8. If Ω is totally bounded, XΩ is totally bounded.
Proof: By Lemma 3.7, it will suffice to prove that every sequence in X admits a Cauchy subsequence with respect to the pseudo-metric . A sequence (xi)i∈ℕ in XΩ is considered and a real number ε > 0 is taken. Since Ω is totally bounded, we can find a finite subset Ωε = {ω1, …, ωn} such that for every ω ∈ Ω, there exists ωr ∈ Ω for which ||ω−ωr||∞ < ε. We can consider now the real sequence (ω1(xi))i∈ℕ, which is bounded since . From Bolzano-Weierstrass Theorem, it follows that we can extract a convergent subsequence (ω1(xih))h∈ℕ. Again, we can extract from (ω2(xih))h∈ℕ another convergent subsequence (ω2(xiht))t∈ℕ. Repeating the process, we are able to extract a subsequence of (xi)i∈ℕ, that for simplicity of notation we can indicate as (xij)j∈ℕ, such that (ωk(xij))j∈ℕ is a convergent subsequence in ℝ, and hence a Cauchy sequence in ℝ, for every k ∈ {1, …, n}. By construction, Ωε is finite, then we can find an index such that for any k ∈ {1, …, n}
Furthermore, we have that, for any ω ∈ Ω, any ωk ∈ Ωε, and any ℓ, m ∈ ℕ
We observe that the choice of depends only on ε and Ωε not on k. Then, choosing a ωk ∈ Ωε such that ||ωk−ω||∞ < ε, we get ||ω(xiℓ)−ω(xim)||∞ < 3ε for every ω ∈ Ω and every . Then,
Then (xij)j∈ℕ is a Cauchy sequence in XΩ. For Lemma 3.7 the statement holds.
Corollary 3.9. If Ω is totally bounded and XΩ is complete, XΩ is compact.
Proof: From Theorem 3.8, we have that XΩ is totally bounded, and since by hypothesis it is also complete, it is compact.
Now, we will prove that the choice of the pseudo-metric on X makes the functions in Ω non-expansive.
Definition 3.10. Two pseudo-metric spaces (P, dP) and (Q, dQ) are considered. A non-expansive function from (P, dP) to (Q, dQ) is a function f: P → Q such that dQ(f(p1), f(p2)) ≤ dP(p1, p2) for any p1, p2 ∈ P.
We denote as NE(P, Q) the space of all non-expansive functions from (P, dP) to (Q, dQ).
Proposition 3.11. Ω ⊆ NE(XΩ, ℝ).
Proof: For any x1, x2 ∈ X, we have that
Then, the topology on X induced by naturally makes the measurements in Ω continuous. In particular, since the previous results hold for a generic , they are also true for Φ and Φ′ in our model.
Remark 3.12. Assuming that (Φ, Φ′, S) is a perception triple. A function φ′ ∈ Φ′ may not be continuous from XΦ to ℝ and a function φ ∈ Φ may not be continuous from to ℝ. In other words, the topology on X induced by the pseudo-metric of one of the function spaces does not make the functions in the other continuous.
Example 3.13. Assuming X = ℝ, for every a, b ∈ ℝ the functions φa:X → ℝ and are defined by setting
Suppose Φ: = {φa:a ≥ 0} and . Consider the symmetry with respect to the y-axis, i.e., the map s(x) = −x. Surely, . We can observe that the function φ1 ∈ Φ is not continuous from to ℝ; indeed , but |φ1(0)−φ1(2)| = 1.
However, if Φ ⊆ Φ′, we have that the functions in Φ are also continuous on , indeed:
Corollary 3.14. If Φ ⊆ Φ′, then .
Proof: By Proposition 3.11, the statement trivially holds since .
3.3 Pseudo-metrics on the space of operations
Proposition 3.15. Every element of is non-expansive from to XΦ.
Proof: Considering a bijection we have that
for every x1, x2 ∈ X, where Φs = {φs, φ ∈ Φ}. Then, and the statement is proved.
Now, we are ready to put more structure on . Considering a set of bounded real-valued functions, we can endow the set Aut(X) with a pseudo-metric inherited from Ω
for any s1, s2 in Aut(X).
Remark 3.16. Analogously to what happens in Remark 3.6 for X, the sets Φ and Φ′ can endow Aut(X) with two possibly different pseudo-metrics and . In particular, we can consider as a pseudo-metric subspace of Aut(X) with the induced pseudo-metrics.
Remark 3.17. We observe that, for any s1, s2 in Aut(X),
In other words, the pseudo-metric , which is based on the action of the elements of Aut(X) on the set Ω, is exactly the usual uniform pseudo-metric on XΩ.
3.4 The space of operations
Since we are only interested in transformations of functions in Φ, it would be natural to just endow with the pseudo-metric . However, it is sometimes necessary to consider the pseudo-metric in order to guarantee the continuity of the composition of elements in , whenever it is admissible. Considering two elements s, t in such that st is still an element of , i.e., for every function φ ∈ Φ, we have that φst ∈ Φ′. Then, for any φ ∈ Φ we have that
Therefore, t is also an element of . By definition, Φs is contained in Φ′ for every , and this justifies the choice of considering in also the pseudo-metric . We have shown, in particular, that if s, t are elements of such that st is still an element of , t is an element of , which is an implication of the following proposition:
Proposition 3.18. Let . Then, if .
Proof: If the composition st belongs to , we have already proven that . On the other hand, if , we have that for every . Since φ(st) = (φs)t, it follows that φ(st) ∈ Φ′ for every φ ∈ Φ. Therefore, and the statement is proved.
Remark 3.19. Let . We can observe that if s ∈ AutΦ(X), Φs ⊆ Φ and .
Lemma 3.20. Consider r, s, t ∈ Aut(X). For any , it holds that
Proof: Since preserves the distances, we have that:
Lemma 3.21. Consider r, s ∈ Aut(X) and . It holds that
Proof: Since Φt ⊆ Φ′, we have that:
Let Π be the set of all pairs (s, t) such that . We endow Π with the pseudo-metric
and the corresponding topology.
Proposition 3.22. The function that maps (s, t) to st is non-expansive and hence continuous.
Proof: Consider two elements (s1, t1), (s2, t2) of Π. By Lemma 3.20 and Lemma 3.21,
Therefore, the statement is proved.
Let Υ be the set of all s with .
Proposition 3.23. The function , that maps s to s−1, is non-expansive, and hence continuous.
Proof: Consider two bijections s1, s2 ∈ Υ. Because of Lemma 3.20 and Lemma 3.21, we obtain that
We have previously defined the map
where , for every .
Proposition 3.24. The function is continuous by choosing the pseudo-metric on .
Proof: We have that
for any and any . This proves that is continuous.
Now, we can give a result about the compactness of under suitable assumptions.
Proposition 3.25. If Φ and Φ′ are totally bounded, is totally bounded.
Proof: Consider a sequence (si)i∈ℕ in and a real number ε > 0. Since Φ is totally bounded, we can find a finite subset Φε = {φ1, …, φn} such that for every φ ∈ Φ, there exists φr ∈ Φ for which ||φ−φr||∞ < ε. Now, consider the sequence (φ1si)i∈ℕ in Φ′. Since also Φ′ is totally bounded, from Lemma 3.7, it follows that we can extract a Cauchy subsequence (φ1sih)h∈ℕ. Again, we can extract another Cauchy subsequence (φ2siht)t∈ℕ. Repeating the process for every k ∈ {1, …, n}, we are able to extract a subsequence of (si)i∈ℕ, that for simplicity of notation we can indicate as (sij)j∈ℕ, such that (φksij)j∈ℕ is a Cauchy sequence in Φ′ for every k ∈ {1, …, n}.
Since Φε is finite, we can find an index such that for any k ∈ {1, …, n}
Furthermore, we have that for any φ ∈ Φ, any φk ∈ Φε, and any ℓ, m ∈ ℕ
We observe that the choice of in (3.4.1) depends only on ε and Φε not on φ. Then, choosing a φk ∈ Φε such that ||φk−φ||∞ < ε, we get ||φsiℓ−φsim||∞ < 3ε for every φ ∈ Φ and every . Hence, for every ℓ, m ∈ ℕ
Therefore, (sij)j∈ℕ is a Cauchy sequence in . For Lemma 3.7, the statement holds.
Corollary 4.26. Assume that . If Φ and Φ′ are totally bounded and is complete, it is also compact.
Proof: From Proposition 3.25, we have that S is totally bounded, and since by hypothesis it is also complete, the statement holds.
4 The space of P-GENEOs
In this section, we introduce the concept of Partial Group Equivariant Non-Expansive Operator (P-GENEO). P-GENEOs allow us to transform data sets, preserving symmetries and distances and maintaining the acceptability conditions of the transformations. We will also describe some topological results about the structure of the space of P-GENEOs and some techniques used for defining new P-GENEOs in order to populate the space of P-GENEOs.
Definition 4.1. Let X, Y be sets and (Φ, Φ′, S), (Ψ, Ψ′, Q) be perception triples with domains X and Y, respectively. Consider a triple of functions (F, F′, T) with the following properties:
• F: Φ → Ψ, F′:Φ′ → Ψ′, T: S → Q;
• For any s, t ∈ S such that st ∈ S it holds that T(st) = T(s)T(t);
• For any s ∈ S such that s−1 ∈ S it holds that T(s−1) = T(s)−1;
• (F, F′, T) is equivariant, i.e., F′(φs) = F(φ)T(s) for every φ ∈ Φ, s ∈ S.
The triple (F, F′, T) is called a perception map or a Partial Group Equivariant Operator (P-GEO) from (Φ, Φ′, S) to (Ψ, Ψ′, Q).
In Remark 3.3, we observed that if Φ ⊆ Φ′. Then, we can consider a perception triple (Φ, Φ′, S) with Φ ⊆ Φ′ and . Now, we will show how a P-GEO from this perception triple behaves.
Lemma 4.2. Consider two perception triples (Φ, Φ′, S) and (Ψ, Ψ′, Q) with domains X and Y, respectively, and with . Let (F, F′, T) be a P-GEO from (Φ, Φ′, S) to (Ψ, Ψ′, Q). Then, Ψ ⊆ Ψ′ and .
Proof: Since (F, F′, T) is a P-GEO, by definition, we have that, for any s, t ∈ S such that st ∈ S, T(st) = T(s)T(t). Since idX ∈ S, then
and hence . Moreover, for Remark 3.3, we have that Ψ ⊆ Ψ′.
Proposition 4.3. Consider two perception triples (Φ, Φ′, S) and (Ψ, Ψ′, Q) with domains X and Y, respectively, and with . Let (F, F′, T) be a P-GEO from (Φ, Φ′, S) to (Ψ, Ψ′, Q). Then .
Proof: Since (F, F′, T) is a P-GEO, it is equivariant, and by Lemma 4.2, we have that
for every φ ∈ Φ.
Definition 4.4. Assume that (Φ, Φ′, S) and (Ψ, Ψ′, Q) are perception triples. If (F, F′, T) is a perception map from (Φ, Φ′, S) to (Ψ, Ψ′, Q) and F, F′ are non-expansive, i.e.,
for every φ1, φ2 ∈ Φ, , (F, F′, T) is called a Partial Group Equivariant Non-Expansive Operator (P-GENEO).
In other words, a P-GENEO is a triple (F, F′, T) such that F, F′ are non-expansive and the following diagram commutes for every s ∈ S:
Remark 4.5. We can observe that a GENEO (see Bergomi et al., 2019) can be represented as a special case of P-GENEO, considering two perception triples (Φ, Φ′, S), (Ψ, Ψ′, Q) such that Φ = Φ′, Ψ = Ψ′, and the subsets containing the invariant transformations S and Q are groups (and then the map T: S → Q is a homomorphism). In this setting, a P-GENEO (F, F′, T) is a triple where the operators F, F′ are equal to each other (because of Proposition 4.3), and the map T is a homomorphism. Hence, instead of the triple, we can simply write the pair (F, T) that is a GENEO.
Considering two perception triples, we typically want to study the space of all P-GENEOs between them with the map T fixed. Therefore, when the map T is fixed and specified, we will simply consider pairs of operators (F, F′) instead of triples (F, F′, T), and we say that (F, F′) is a P-GENEO associated with or with respect to the map T. Moreover, in this case, we indicate the property of equivariance of the triple (F, F′, T) writing that the pair (F, F′) is T-equivariant.
Example 4.6. Let X = ℝ2. Take a real number ℓ > 0. In X, consider the square Q1: = [0, ℓ] × [0, ℓ] and its translation sa of a vector , . Analogously, let us consider a real number 0 < ε < ℓ and two squares inside Q1 and , Q2: = [ε, ℓ−ε] × [ε, ℓ−ε] and , as shown in Figure 2.
Consider the following function spaces in :
Let , where sa is the translation by the vector a = (a1, a2). The triples (Φ, Φ′, S) and (Ψ, Ψ′, S) are perception triples. This example could model the translation of two nested gray-scale images. We want to build now an operator between these images in order to obtain a transformation that commutes with the selected translation. We can consider the triple of functions (F, F′, T) defined as follows: F: Φ → Ψ is the operator that maintains the output of functions in Φ at points of Q2 and set them to zero outside it; analogously, F′:Φ′ → Ψ′ is the operator that maintains the output of functions in Φ′ at points of and set them to zero outside it and T = idS. Therefore, the triple (F, F′, T) is a P-GENEO from (Φ, Φ′, S) to (Ψ, Ψ′, S). It turns out that the maps are non-expansive, and the equivariance holds
for any φ ∈ Φ. From the point of view of application, we are considering two square images and their translations, and we apply an operator that “cuts” the images, taking into account only the part of the image that interests the observer. This example justifies the definition of P-GENEO as a triple of operators (F, F′, T), without requiring F and F′ to be equal in the possibly non-empty intersection of their domains. In fact, if φ is a function contained in Φ ∩ Φ′, its image via F and F′ may be different.
4.1 Methods to construct P-GENEOs
Starting from a finite number of P-GENEOs, we will illustrate some methods to construct new P-GENEOs. First of all, the composition of two P-GENEOs is still a P-GENEO.
Proposition 4.7. Given two composable P-GENEOs, and , their composition defined as
is a P-GENEO.
Proof: First, one could easily check that the map T = T2 ◦ T1 respects the second and the third property of Definition 4.1. Therefore, it remains to verify that F(Φ) ⊆ Ω, F′(Φ′) ⊆ Ω′ and the properties of equivariance and non-expansiveness are maintained.
1. Since F1(Φ) ⊆ Ψ and F2(Ψ) ⊆ Ω, we have that F(Φ) = (F2 ◦ F1)(Φ) = F2(F1(Φ)) ⊆ F2(Ψ) ⊆ Ω. Analogously, F′(Φ′) ⊆ Ω′.
2. Since and are equivariant, (F, F′, T) is equivariant. Indeed, for every φ ∈ Φ, we have that
3. Since F1 and F2 are non-expansive, F is non-expansive; indeed for every φ1, φ2 ∈ Φ, we have that
Analogously, F′ is non-expansive.
Given a finite number of P-GENEOs with respect to the same map T, we illustrate a general method to construct a new operator as a combination of them. Given two sets X and Y, consider a finite set {H1, …, Hn} of functions from to and a map , where ℝn is endowed with the norm . We define as
for any ω ∈ Ω, where is defined by setting
for any y ∈ Y. Now, consider two perception triples (Φ, Φ′, S) and (Ψ, Ψ′, Q) with domains X and Y, respectively, and a finite set of P-GENEOs between them associated with the map T: S → Q. We can consider the functions and , defined as before and state the following result.
Proposition 4.8. Assume that is non-expansive. If and , is a P-GENEO from (Φ, Φ′, S) to (Ψ, Ψ′, Q) with respect to T.
Proof: By hypothesis, and , so we just need to verify the properties of equivariance and non-expansiveness.
1. Since are T-equivariant, for any φ ∈ Φ and any s ∈ S, we have that:
Therefore, is T-equivariant.
2. Since F1, …, Fn and are non-expansive, for any φ1, φ2 ∈ Φ, we have that:
Hence, is non-expansive. Analogously, since and are non-expansive, is non-expansive.
Therefore, is a P-GENEO from (Φ, Φ′, S) to (Ψ, Ψ′, Q) with respect to T.
Remark 4.9. The above result describes a general method to build new P-GENEOs, starting from a finite number of known P-GENEOs via non-expansive maps. Some examples of such non-expansive maps are the maximum function, the power mean, and the convex combination (for further details, see Frosini and Quercioli, 2017; Quercioli, 2021a,b).
4.2 Compactness and convexity of the space of P-GENEOs
Given two perception triples, under some assumptions on the data sets, it is possible to show two useful features in applications: compactness and convexity. These two properties guarantee, on the one hand, that the space of P-GENEOs can be approximated by a finite subset of them, and, on the other hand, that a convex combination of P-GENEOs is again a P-GENEO.
First, we define a metric on the space of P-GENEOs. Let X, Y be sets and consider two sets , we can define the distance
for every F1, F2 ∈ NE(Ω, Δ).
The metric DP-GENEO on the space of all the P-GENEOs between the perception triples (Φ, Φ′, S) and (Ψ, Ψ′, Q) associated with the map T is defined as
for every .
4.2.1 Compactness
Before proceeding, we need to prove that the following result holds:
Lemma 4.10. If (P, dP), (Q, dQ) are compact metric spaces, NE(P, Q) is compact.
Proof: Theorem 5 in the study by Li et al. (2012) implies that NE(P, Q) is relatively compact, since it is a equicontinuous space of maps. Hence, it will suffice to show that NE(P, Q) is closed. Considering a sequence (Fi)i∈ℕ in NE(P, Q) such that , we have that
for every p1, p2 ∈ P. Therefore, F ∈ NE(P, Q). It follows that NE(P, Q) is closed.
Consider two perception triples (Φ, Φ′, S) and (Ψ, Ψ′, Q), with domains X and Y, respectively, and the space of P-GENEOs between them associated with the map T: S → Q. The following result holds:
Theorem 4.11. If Φ, Φ′, Ψ and Ψ′ are compact, is compact with respect to the metric DP−GENEO.
Proof: By definition, . Since Φ, Φ′, Ψ and Ψ′ are compact, for Lemma 4.10, the spaces NE(Φ, Ψ) and NE(Φ′, Ψ′) are also compact, and then, by Tychonoff's Theorem, the product NE(Φ, Ψ) × NE(Φ′, Ψ′) is also compact, with respect to the product topology. Hence, to prove our statement, it suffices to show that is closed. Let us consider a sequence of P-GENEOs, converging to a pair (F, F′) ∈ NE(Φ, Ψ) × NE(Φ′, Ψ′). Since is T-equivariant for every i ∈ ℕ and the action of Q on Ψ is continuous (see Proposition 3.24), (F, F′) belongs to . Indeed, we have that
for every s ∈ S and every φ ∈ Φ. Hence, is a closed subset of a compact set, and then, it is also compact.
4.2.2 Convexity
Assume that Ψ, Ψ′ are convex. Let and consider an n-tuple with ai ≥ 0 for every i ∈ {1, …, n} and . We can define two operators FΣ:Φ → Ψ and as
for every φ ∈ Φ, φ′ ∈ Φ′. We notice that the convexity of Ψ and Ψ′ guarantees that FΣ and are well defined.
Proposition 4.12. belongs to .
Proof: By hypothesis, for every i ∈ {1, …, n}, is a perception map, and then:
for every φ ∈ Φ and every s ∈ S. Furthermore, since for every i ∈ {1, …, n}, Fi(Φ) ⊆ Ψ and Ψ are convex, also FΣ(Φ) ⊆ Ψ. Analogously, the convexity of Ψ′ implies that . Therefore is a P-GEO. It remains to show the non-expansiveness of FΣ and . Since Fi is non-expansive for any i, for every φ1, φ2 ∈ Φ, we have that
Analogously, since every is non-expansive, for every , we have that
Therefore, we have proven that is a P-GEO with FΣ and non-expansive. Hence it is a P-GENEO.
Then, the following result holds:
Corollary 4.13. If Ψ, Ψ′ are convex, the set is convex.
Proof: It is sufficient to apply Proposition 4.12 for n = 2 by setting a1 = t, a2 = 1−t for 0 ≤ t ≤ 1.
5 P-GENEOs in applications
The importance of equivariance with respect to a group is becoming clear and widespread in many machine learning applications used for drug design, traffic forecasting, object recognition, and detection (see, e.g., Bronstein et al., 2021; Gerken et al., 2023). In some situations, however, requiring equivariance with respect to a whole group could even become an obstacle in the correct learning process of an equivariant neural network. In the following, we describe a possible application to optical character recognition (OCR), in which partial equivariance might be better suited than equivariance. Consider a planar transformation that deforms characters. One may notice that if such transformation is performed too many times, the letter may lose or change its meaning, as shown in Figure 3. Another example is given by a reparameterization of the domain of a sound message. While a limited contraction or dilation of the domain can preserve the meaning attributed to the sound, an iterated application of the same transformation can radically change the perceived message.
Furthermore, experiments performed in the study by Weiler and Cesa (2019) have shown that tuning the level of equivariance in each layer of a neural network may increase the performance of the model. This tuning is, however, performed manually. The successive step, conducted in the study by Romero and Lohit (2022), is to learn the level of equivariance of each layer directly from data, possibly restricting to certain subsets whenever the full equivariance prevents a good classification performance. The authors of Romero and Lohit (2022) test their result on MNIST. In applications of this type, the use of P-GENEOs could allow partial equivariance to be framed within a precise mathematical model.
6 Conclusion
In this article, we proposed a generalization of some known results in the theory of GENEOs to a new mathematical framework, where the collection of all symmetries is represented by a subset of a group of transformations. We introduced P-GENEOs and showed that they are a generalization of GENEOs. We defined pseudo-metrics on the space of measurements and on the space of P-GENEOs and studied their induced topological structures. Under the assumption that the function spaces are compact and convex, we showed compactness and convexity of the space of P-GENEOs. In particular, compactness guarantees that any operator can be approximated by a finite number of operators belonging to the same space, while convexity allows us to build new P-GENEOs by taking convex combinations of P-GENEOs. Compactness and convexity together ensure that every strictly convex loss function on the space of P-GENEOs admits a unique global minimum. Given a collection of P-GENEOs, we presented a general method to construct new P-GENEOs as combinations of the initial ones.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
LF: Writing – original draft. PF: Writing – original draft, Writing – review & editing. NQ: Writing – original draft, Writing – review & editing. FT: Writing – original draft, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research has been partially supported by INdAM-GNSAGA. FT was supported by the Wallenberg AI, Autonomous System and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation. PF and NQ carried out this work in the framework of the CNIT National Laboratory WiLab and the WiLab-Huawei Joint Innovation Center.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Bergomi, M. G., Frosini, P., Giorgi, D., and Quercioli, N. (2019). Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning. Nat. Mach. Intellig. 1, 423–433. doi: 10.1038/s42256-019-0087-3
Bocchi, G., Botteghi, S., Brasini, M., Frosini, P., and Quercioli, N. (2023). On the finite representation of linear group equivariant operators via permutant measures. Ann. Mathem. Artif. Intellig. 91, 465–487. doi: 10.1007/s10472-022-09830-1
Bocchi, G., Frosini, P., Micheletti, A., Pedretti, A., Gratteri, C., Lunghini, F., et al. (2022). GENEOnet: a new machine learning paradigm based on Group Equivariant Non-Expansive Operators. An application to protein pocket detection. arXiv.
Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. (2017). Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. 34, 18–42. doi: 10.1109/MSP.2017.2693418
Bronstein, M. M., Bruna, J., Cohen, T., and Velivcković, P. (2021). Geometric deep learning: grids, groups, graphs, geodesics, and gauges. arXiv.
Camporesi, F., Frosini, P., and Quercioli, N. (2018). “On a new method to build group equivariant operators by means of permutants,” in Machine Learning and Knowledge Extraction: Second IFIP TC 5, TC 8/WG 8.4, 8.9, TC 12/WG 12.9 International Cross-Domain Conference, CD-MAKE 2018. Hamburg, Germany: Springer, 265–272.
Cascarano, P., Frosini, P., Quercioli, N., and Saki, A. (2021). On the geometric and Riemannian structure of the spaces of group equivariant non-expansive operators. arXiv.
Chachlski, W., De Gregorio, A., Quercioli, N., and Tombari, F. (2023). Symmetries of data sets and functoriality of persistent homology. Theory Appl. Categor. 39, 667–686.
Cohen, T., and Welling, M. (2016). “Group equivariant convolutional networks,” in International Conference on Machine Learning. PMLR, 2990–2999. Available online at: jmlr.org
Conti, F., Frosini, P., and Quercioli, N. (2022). On the construction of group equivariant non-expansive operators via permutants and symmetric functions. Front. Artif. Intellig. 5, 16. doi: 10.3389/frai.2022.786091
Finzi, M., Benton, G., and Wilson, A. G. (2021). “Residual pathway priors for soft equivariance constraints,” in Advances in Neural Information Processing Systems (Cambridge, MA: The MIT Press), 30037–30049.
Frosini, P., Gridelli, I., and Pascucci, A. (2023). A probabilistic result on impulsive noise reduction in topological data analysis through group equivariant non-expansive operators. Entropy 25, 1150. doi: 10.3390/e25081150
Frosini, P., and Quercioli, N. (2017). “Some remarks on the algebraic properties of group invariant operators in persistent homology,” in Machine Learning and Knowledge Extraction, eds. A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl. Cham. Springer International Publishing, 14–24.
Gerken, J., Aronsson, J., Carlsson, O., Linander, H., Ohlsson, F., Petersson, C., et al. (2023). Geometric deep learning and equivariant neural networks. Artif. Intell. Rev. 56, 1–58. doi: 10.1007/s10462-023-10502-7
Li, R., Zhong, S., and Swartz, C. (2012). An improvement of the Arzel-Ascoli theorem. Topol. Appl. 159, 2058–2061. doi: 10.1016/j.topol.2012.01.014
Masci, J., Rodolà, E., Boscaini, D., Bronstein, M. M., and Li, H. (2016). “Geometric deep learning,” in SIGGRAPH ASIA 2016 Courses. New York, NY: Association for Computing Machinery, 1–50.
Micheletti, A. (2023). A new paradigm for artificial intelligence based on group equivariant non-expansive operators. Eur. Math. Soc. Mag. 128, 4–12. doi: 10.4171/mag/133
Quercioli, N. (2021a). On the Topological Theory of Group Equivariant Non-Expansive Operators (PhD thesis). Bologna: Alma Mater Studiorum - Universit di Bologna. Available online at: http://amsdottorato.unibo.it/9770/. (accessed July 12, 2023).
Quercioli, N. (2021b). “Some new methods to build group equivariant non-expansive operators in TDA,” in Topological Dynamics and Topological Data Analysis, 229–238.
Romero, D. W., and Lohit, S. (2022). Learning partial equivariances from data. Adv. Neural Inform. Proc. Syst. 35, 36466–36478.
van der Ouderaa, T., Romero, D. W., and van der Wilk, M. (2022). Relaxing equivariance constraints with non-stationary continuous filters. Adv. Neural Inform. Proc. Syst. 35, 33818–33830.
Wang, R., Walters, R., and Yu, R. (2022). “Approximately equivariant networks for imperfectly symmetric dynamics,” in Proceedings of the 39th International Conference on Machine Learning. PMLR, 23078–23091. Available online at: jmlr.org
Keywords: partial-equivariant neural network, P-GENEO, pseudo-metric space, compactness, convexity
Citation: Ferrari L, Frosini P, Quercioli N and Tombari F (2023) A topological model for partial equivariance in deep learning and data analysis. Front. Artif. Intell. 6:1272619. doi: 10.3389/frai.2023.1272619
Received: 04 August 2023; Accepted: 27 November 2023;
Published: 21 December 2023.
Edited by:
Fabio Anselmi, University of Trieste, ItalyReviewed by:
Maurizio Parton, G. d'Annunzio University of Chieti and Pescara, ItalyFilippo Maggioli, Sapienza University of Rome, Italy
Copyright © 2023 Ferrari, Frosini, Quercioli and Tombari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Nicola Quercioli, bmljb2xhLnF1ZXJjaW9saSYjeDAwMDQwO2dtYWlsLmNvbQ==