About this Research Topic
While lacking a precise definition, big data problems share typically three main characteristics: large volume, high velocity and heterogeneity of the generated datasets.
The focus of this Reseach Topic will be on learning problems arising from big data over networks, i.e., massive datasets with an intrinsic graph (or network) structure. An intrinsic network or graph structure is present in a wide range of big data applications:
• Social Networks. Nodes represent individuals; Edges represent acquaintance relations.
• Sensor Networks. Nodes represent sensors; Edges represent sensor connections.
• Communication Networks. Nodes represent transceivers; Edges represent communication
channels.
• Physical Universe. Nodes represent events in space-time; Edges represent particles moving in space-time.
• Biological Networks. Nodes represent proteins; Edges connect proteins within same protein
families.
As another key characteristic, beside network structure, the massive datasets occurring in many applications are generated by processes which have only very few degrees of freedom. The recordings of human voice can be modeled concisely using only a few vocal folds and the images of human faces can be efficiently represented by modeling the action of only a few facial muscles. The intrinsic low-dimensional structure of seemingly high-dimensional objects is captured by sparse models which represent high-dimensional data using only a small number of parameters or degrees of freedom. Signal processing based on sparse models has been proven extremely powerful within the recent framework of compressed sensing.
The algorithms confronted with big data on the other hand consume different types of resources, e.g., hardware cost, communication bandwidth and energy. A key challenge in big data applications is the limitedness of computational resources relative to the volume and speed of big data. Characterizing the effect of constraints on computational resources is highly non-trivial. The available results lack in several regards: they have not been tailored for sparse graph signal models and do not apply to general fully distributed learning algorithms where communication between the processing nodes is over several rounds or iterations. In particular, the study of fully decentralized learning methods for graph-structured data (big data over networks), also taking into account constraints on link quality and bit-rates, requires new tools.
This Research Topic aims at furthering our understanding of the fundamental limits and tradeoffs of, as well as efficient methods for, learning from big data under constraints on the available algorithmic resources. We expect the extension of compressed sensing of sparse vectors to graph signals defined over complex networks to allow for an insight into the fundamental limitations for learning from big data. For the development of efficient learning algorithms, we consider efficient convex optimization methods as the right tool. We expect this Topic also to boost efforts for generalizing the analysis and implementation of convex optimization methods for sparse model to a big data regime, including implementation in modern distributed programming frameworks.
Keywords: Sparse Models, Convex Optimization, Machine Learning, Complex Networks, Compressed Sensing
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.