- 1Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada
- 2School of Information Technology, Carleton University, Ottawa, ON, Canada
- 3Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada
The sixth generation (6G) networks are expected to enable immersive communications and bridge the physical and the virtual worlds. Integrating extended reality, holography, and haptics, immersive communications will revolutionize how people work, entertain, and communicate by enabling lifelike interactions. However, the unprecedented demand for data transmission rate and the stringent requirements on latency and reliability create challenges for 6G networks to support immersive communications. In this survey article, we present the prospect of immersive communications and investigate emerging solutions to the corresponding challenges for 6G. First, we introduce use cases of immersive communications, in the fields of entertainment, education, and healthcare. Second, we present the concepts of immersive communications, including extended reality, haptic communication, and holographic communication, their basic implementation procedures, and their requirements on networks in terms of transmission rate, latency, and reliability. Third, we summarize the potential solutions to addressing the challenges from the aspects of communication, computing, and networking. Finally, we discuss future research directions and conclude this study.
1. Introduction
Ever since its birth, communication technology has been a symbol of the modernization of human society, and the evolution of communication technology has accompanied the advance of civilization. The commercialization of electrical telegraph and telephone during the second industrial revolution boosted globalization by facilitating finance and trade overseas (Wenzlhuemer, 2013). The debut of vehicle-mounted mobile radio systems (“car phones”) and the analog first generation (1G) mobile telecommunication systems from the 1950s to 1980s enabled voice calls on the go (del Peral-Rosado et al., 2018). The second generation (2G) mobile communication systems, which introduced roaming and preliminary data services in the form of text messages, emerged amidst and as a part of the third industrial revolution (i.e., the digital revolution) (Billström et al., 2006). Then, the next two decades witnessed the proliferation of mobile Internet and mobile multimedia services brought by the third and fourth generation (3G and 4G) mobile communication technology, which revolutionized how people communicate and changed the world. Nowadays, the fifth generation (5G) mobile communication systems are reshaping industries by facilitating the fourth industrial revolution (i.e., Industry 4.0) toward smart inter-connectivity and automation (Chen K.-C. et al., 2021).
Accustomed to the convenience brought by the latest communication technologies, many people may not realize that ordinary daily activities such as video calls or zoom meetings were nothing more than science fiction merely three decades ago. Indeed, from the so-called “telephot” in the pioneering novel “Ralph 124C 41+” to the video call scene in the classic movie “Back to the Future,” the simultaneous transmission of live image and sound was considered as a “technology of the future” in the most part of the twentieth century (Fowler et al., 1986; Gooday, 2005). When the fantasy of the past has become a reality, a question that naturally arises is: what will be the next revolutionary form of communications, potentially in the era of the sixth generation (6G)? Fortunately, we may again find clues in science fiction, with examples ranging from the famous scene of Princess Leia's three-dimensional (3D) holographic message in “Star Wars” (Conti, 2008) to the virtual world “OASIS” in the metaverse presented in the recent film “Ready Player One” (Sparkes, 2021). The fact that such scenes created a long-lasting influence on a vast audience reflected people's desire for more lifelike, immersive, and interactive communications (Xu et al., 2022).
Unfolding exactly as depicted in science fiction or not, immersive communications will come to reality and shift the current communication paradigm in three aspects. First, rather than two-dimensional (2D) images displayed on a flat screen, immersive communications will deliver 3D images with parallax information. Second, in addition to audiovisual information, immersive communications will involve haptic information. Third, the pursuit of immersive experiences will further blur the boundary between the physical and the virtual worlds, allowing new forms of interactions across the two worlds. These paradigm shifts can significantly enrich communication experiences of users and enable a plethora of new use cases such as 3D telepresence (Yu et al., 2021), ultra-realistic online interactive sports (Next G Alliance, 2022), and immersive learning in education (Pellas et al., 2020), to name a few. In particular, immersive communications can also enable human-machine collaboration in industrial environments and propel the next industrial revolution, i.e., industrial 5.0 (Leng et al., 2022; Maddikunta et al., 2022). As a result, immersive communications are expected to have a profound influence on the landscape of communication industries and impact how people study, work, and entertain in the years to come.
Motivated by the potentials of immersive communications, scientists and engineers over the world have been working on the development of related technologies, products, and platforms. Significant progress has been made in recent years, including but not limited to advancements in sensor systems and data capture techniques (Dahiya et al., 2019; Meyer et al., 2022), data processing and computing frameworks (Petkov et al., 2022; Qian et al., 2022; Song et al., 2022), and rendering and display devices (Hirayama et al., 2019; Schmitz et al., 2020; Xiong et al., 2021). Some component development of immersive communications is progressing faster than others, leading to the establishment of testbeds, prototypes, or even commercial products. Virtual reality (VR), as an example, has gained popularity, especially in the gaming industry (Jung et al., 2020). Devices such as VR headsets and haptic glove development kits are available in the market (Kugler, 2021; Chen et al., 2022), while researchers are building testbeds for extended reality (XR) (Huzaifa et al., 2022) and human-machine interaction with haptic feedback (Gokhale et al., 2020). An example of recent development in immersive communications is the VirtualCube system, a 3D video conference system capable of synthesizing remote and local participants so that they appear in the same environment (Zhang Y. et al., 2022). In addition, a research team in Germany is exploring VR-based full-body avatars for training police forces while evaluating their stress level and response to threats (Caserman et al., 2022) .
As the aforementioned progress and efforts are paving the way for realizing immersive communications, advancements in communication and networking technologies will be indispensable. Despite the advent of 5G systems and the accompanying advancements in network capabilities, there are still many challenges to achieving immersive communications in various aspects of communications, networking, and computing. The data rate required to transmit live 3D images can be so high, e.g., on the level of terabits per second (Tbps), that even 5G cannot support it, especially for high-resolution and 360° videos. The required end-to-end delay for delivering haptic information can be as low as a few milliseconds for a satisfactory user experience (Maier and Ebrahimzadeh, 2019; Sim et al., 2021). The synchronization of data streams from multiple cameras or sensors and that of audiovisual and haptic information in data transmission also create new challenges. The storing and processing of massive data for immersive communications demand new architectures and techniques for caching and computing (Glushakov et al., 2020; Liu et al., 2021; Taleb et al., 2021). Moreover, artificial intelligence (AI) is necessary both for supporting applications such as human-machine collaboration and user viewpoint/gesture prediction, and for orchestrating network resources to satisfy the demanding requirements of immersive communications (Maier et al., 2018; Tataria et al., 2021; Zawish et al., 2022). Since the realization of immersive communications can require integrated support for enhanced mobile broadband (eMBB) and ultra-reliable low-latency communications (URLLC) (Pang et al., 2022), which is beyond the capability of 5G, researchers look forward to breakthroughs in immersive communications in the era of 6G. Targeting 2030 for large scale 6G deployments, the 3rd Generation Partnership Project (3GPP) plans to start 6G studies in 2024 and complete its first 6G standard in 2028 (Ericsson, 2022), while the International Telecommunication Union (ITU)'s “IMT for 2030 and beyond” timeline aims at completing IMT-2030 specifications in 2029–2030 (Yrjölä et al., 2022).
Recognizing the importance of immersive communications, the research community in communications, networking, and computer science is expanding its effort in this field. Several recent review and survey articles can be found in the literature, among which some present state of the art in immersive communications, while others envision the next steps. Most of these articles focus on a specific aspect, such as supporting 360°/holographic video streaming (Yaqoob et al., 2020; Huang et al., 2022), evaluating the immersive experience of users (Gao et al., 2022a), analyzing the effects of user motions on network performance in XR (Chukhno et al., 2022), enabling the use case of Metaverse (Tang et al., 2022; Wang et al., 2022b; Xu et al., 2022), or facilitating distributed implementation of VR (Morín et al., 2022). Different from the above works, we present a comprehensive survey of immersive communications in this article. With a focus on the communication, networking, and computing perspectives, we review a large number of publications, especially the latest works in communications, networking, and computer science to present the representative use cases, the recent developments, the technical challenges, and the potential solutions related to immersive communications in the era of 6G communications. In specific, we focus on immersive communications by looking into its three main forms, i.e., XR, haptic communication, and holographic communication in the remainder of this article. Section 2 introduces representative use cases of immersive communications to illustrate its promising prospect. Section 3 presents the concepts, basic implementation procedures, and requirements of XR, haptic communication, and holographic communication to paint an overall picture of immersive communications. Section 4 focuses on the challenges and the state-of-the-art solutions toward realizing each of the three forms of immersive communications. Section 5 discusses some open issues regarding immersive communications in 6G, and Section 6 concludes this article. A list of the main acronyms used in this article is given in Table 1.
2. Use cases
There are many potential use cases for immersive communications, relating to both commercial and enterprise scenarios and ranging from gaming to industrial control. In this section, we detail four representative use cases to illustrate the promising prospect of immersive communications. A list of representative use cases is given in Table 2.
2.1. Immersive gaming and entertainment
XR provides the ultimate gaming and entertainment experience by presenting convincing gaming environments through XR devices such as VR headsets or smartphones. Players can interact with each other without feeling a barrier between the virtual and the physical worlds (Bastug et al., 2017). XR devices display the virtual world of the game to players and capture their actions such as eye movements to allow them to interact with the virtual world (Elbamby et al., 2018b). With the success of advanced XR gaming consoles and headsets, e.g., Oculus and PlayStation VR, as well as games and platforms, e.g., Pokemon Go and Roblox, game developers are striving to offer more flexible XR experiences with wireless XR devices (Maimone and Wang, 2020). Through wireless XR devices, players can interact freely with other players or virtual objects, e.g., in XR sporting (Kim et al., 2018). Furthermore, haptic communication devices can be combined with XR to significantly enhance the immersive gaming experience. Transducer arrays, which can be attached to XR devices, can capture haptic data from players. As a result, XR devices can fuse haptic information into the virtual world and provide haptic feedback to players by mapping motions in the game to players' sensations. Players can use haptic devices, such as gloves, to control objects in the game (Hashimoto and Ishibashi, 2006) or synchronize their sensations with other players (Mauve, 2000).
2.2. Telesurgery
In telesurgery, surgeons remotely manipulate robotic arms to operate on patients by utilizing control panels and low-latency display of the surgical scenes. Telesurgery is beneficial in removing the barrier of distance among surgeons and patients, tackling the scarcity of surgeons in remote or difficult-to-reach areas such as countryside, battlefields and spacecraft, and facilitating the collaboration of surgeons at different locations (Choi et al., 2018; Mohan et al., 2021). The assistance of robotic arms can enhance the performance of surgeries by detecting and canceling out the physiological tremors of surgeons' hand motions (Kumar et al., 2020), performing delicate surgical operations and minimizing the surgical incision areas for reducing blood loss and incision-related complications (Diana and Marescaux, 2015). To guarantee the performance of surgeons, the display of surgical scenes to them should be highly precise and informative. To this end, 3D video of the surgical scenes with depth information, can be displayed to the surgeons, e.g., by using passive polarized glasses, and an eye-tracking mechanism can be used to quickly center the area where the surgeon is viewing in the visual display (Stark et al., 2015). In addition, augmented reality (AR) can be leveraged to overlay medical images such as ultrasound images and computed tomography (CT) images onto the video of surgical scenes (Liu X. et al., 2016). Besides visual information, haptic information in the surgeries, such as the texture of tissues and the tension in tying surgical sutures, can be captured by the haptic devices on the robotic arms and then transmitted to and reproduced by the haptic devices at the surgeons' side (El Rassi and El Rassi, 2020; Patel et al., 2022).
2.3. Immersive learning
Immersive learning integrates emerging technologies, including XR and haptic technologies, into teaching to provide students or trainees an interactive and engaging learning experience (Laamarti et al., 2014; Affan et al., 2021). During the recent COVID-19 pandemic, traditional methods of teaching, e.g., online courses, encountered the problem of engaging students in the learning process (Jumreornvong et al., 2020; Fitzek et al., 2021). To this end, immersive learning, as a potential solution to boost student engagement, is receiving increasing attention, especially from primary and secondary schools. With immersive learning, avatars of students and teachers can be created in the virtual world (Gupta et al., 2019), and each student is allowed to interact with the avatars of teachers and other students via the senses of sight, hearing, and touch. Such interactions can keep students' attention in learning process. Immersive learning is categorized as either asynchronous or synchronous. Training some skills, such as sports skills and cooperative tele-operation skills for industrial robots, requires real-time interactions, which can encourage active participation in the learning process (Kaluschke et al., 2021; Lee et al., 2021). Utilizing XR, haptic communication, and holography communication technologies, teachers can check whether the moves and actions of their students are correct and provide immediate corrections if not, regardless of their physical distance from each other. For the skills that do not need real-time interactions, information regarding teachers' positions, velocities, and applied forces can be recorded and displayed to students via XR and haptic devices asynchronously (Tan et al., 2020). Such “record-and-replay” strategy can allow a much larger number of students to learn at their own pace, despite the absence of real-time interactions (Yokokohji et al., 1996a,b; Steinbach et al., 2018).
2.4. Holographic teleconference
Teleconference is a convenient choice for users to remotely collaborate with each other. In the current video teleconferencing, remote participants can only be displayed on flat screens, which results in a very different perception in a virtual conference from that in an on-site conference. In order to provide an immersive experience in teleconferences, holographic teleconferences depict realistic 3D presence for people by projecting 3D images of remote participants as holograms (Jiang et al., 2021; Siemonsma and Bell, 2022; Zhang Y. et al., 2022; Zhou F. et al., 2022). Specifically, when a remote participant joins the holographic teleconference, 3D visual information and the corresponding audio information of the participant can be captured by multiple sensors, transmitted, and then reconstructed as a hologram on the side of other participants to provide 3D audiovisual information for interactions among participants (Strinati et al., 2019). In this case, holographic teleconference can reduce the impact on participants of the separation between the virtual and the physical worlds. In addition to the audio and video information, participants in a holographic teleconference are able to obtain haptic information from others to achieve an immersive experience with the sense of physical contacts (Tataria et al., 2021). For example, a participant with haptic sensors can sense a handshake with others, thereby enabling an immersive experience similar to in-person interactions.
2.5. Metaverse
A metaverse provides fully immersive and self-sustaining virtual spaces that merge the physical and digital worlds (Wang et al., 2022b). In the metaverse, users can have avatars as digital representations in simulated or imaginary environments, such as games and virtual cities. Through XR devices including phones or laptops, users can interact with digital avatars, other digital objects, and virtual environments. Metaverses require the synchronization between the physical and the digital worlds through two main information flows. One of them is from the physical world to digital worlds, in which sensors and actuators capture user activity so that the behaviors of a user in the physical world are reflected via their avatars in a digital world. The other is from digital worlds to the physical world, including the interactions among avatars, other digital objects, and metaverse services in the virtual environments. As a result of advanced networking technologies, big data analysis, blockchain, and AI, metaverses are expected to provide human-centric content for users to enable immersive social experiences (Heath, 2021), online collaborations (Suzuki et al., 2020), etc.
3. Immersive communications: Concepts and requirements
The use cases for immersive communications and their potential importance in 6G are intuitive. Understanding immersive communications beyond the use cases, however, requires answers to the question “what are immersive communications?”. Since the research of immersive communications is in an early stage, there is no commonly-agreed definition yet.
We consider immersive communications as a communication paradigm along with the supporting technologies that allow users to have lifelike experiences in the physical world, the virtual world, or both, with interactions via 3D audiovisual and/or haptic information exchange. In this section, we focus on the three main forms of immersive communications as illustrated in Figure 1, i.e., XR, haptic communication, and holographic communication.1 Via introducing the concept, basic implementation procedure, and the network requirements for each of the three forms, we aim to sketch an overall picture of immersive communications. The requirements of representative immersive communications use cases are illustrated in Figure 2 and also summarized in Table 3.
3.1. Extended reality
In this subsection, we introduce the concept of XR and investigate two respective XR technologies: VR and AR. Then, we examine their implementation procedure and service requirements for 6G.
3.1.1. Concept
XR covers a range of technologies, including VR, AR, mixed reality (MR), and everything in between (Hu et al., 2020). In general, XR combines the physical and virtual worlds through extensive video processing and data fusion. Using XR devices, users can interact with virtual avatars and access XR content. Under the umbrella of XR, a variety of technologies are defined depending on the level of virtuality. Two representative technologies in XR are AR and VR. With the lowest level of virtuality, AR focuses on constructing artificial objects according to the objects (e.g., buildings, faces, or vehicles) residing in the physical world and enabling users to interact with them. Conversely, with the highest level of virtuality, VR creates an entirely artificial scenery and allows users to interact with the objects in a completely artificial environment generated by the headsets. In MR, the concepts of VR and AR can be combined to create different levels of virtuality. In spite of the variety of XR technologies, the methods to provide immersive experiences to users are similar, which combine sensory data with virtual environments to produce artificial sceneries, from either the physical or virtual worlds, using headsets or portable display devices.
The first VR flight simulator was developed in 1970s to train pilots for flights without exposing them to risks of flying (Earnshaw, 1993). In the early stage, VR headsets were cumbersome, and processing VR content required large supercomputers. Nowadays, VR technologies have gained momentum due to recent advances in computing and display technologies. The headsets, such as Oculus head-mounted displays and HTC Vive, are affordable and can support ultra-high resolutions (3,840 × 2,160 in Pimax 8K) and refresh rates (up to 120 Hz) (Hu et al., 2020). Most VR content is processed and rendered by user devices. Rendering content with a high level of virtuality requires extensive computing power. For a VR headset, a console is required to supply additional computing power to the headset, while a wired connection restricts the user to a workstation. Therefore, wireless VR is the primary focus of VR research now (Elbamby et al., 2018a). In addition, multi-sensory XR, as another future vision of XR, integrates human senses and perception, including visual, auditory, olfactory, and tactile into XR content, enabling a truly immersive experience. This requires the confluence of multiple disciplines, including AI, computer vision, biology, ultra-low-latency networking, etc., while linking the real and virtual worlds (Hu et al., 2021; Wang and Li, 2022).
3.1.2. Basic implementation procedure
While XR comprises several technologies with different levels of virtuality, its implementation procedure can be summarized into three steps: content transmission, rendering, and feedback collection. For each of the above three steps, communication networks can play an important role.
In the step of content transmission, VR content generated by VR content providers is transmitted from content servers and VR devices. VR devices play 360° spherical videos, which can be mapped to equirectangular videos. During playing VR content, these equirectangular videos are mapped onto a sphere, in which the user is situated at the center, to provide a 3D stereoscopic experience. The key feature of VR video is the ultra-high spatial resolution. A VR video has a resolution of up to 12K (11,520 × 6,480), while the conventional video normally has a resolution of 4K or less. Transmitting full equirectangular videos from content servers requires an ultra-high data rate. Thus, tile-based transmission is usually adopted in VR video delivery. As shown in Figure 3, a content server can divide equirectangular videos spatio-temporally into video chunks, i.e., tiled videos, and only the tiled videos within a user's field-of-view (FoV) is delivered (Son et al., 2018; Yadav and Ooi, 2020). In this way, VR content can be delivered in a significantly reduced data size. However, the tile-based solution requires VR headsets to detect and estimate user viewpoints to determine the region of FoV. Content servers should select which tiled videos to be delivered to users based on both the user's current viewpoint and network conditions (Zare et al., 2016). In terms of AR, AR devices generate raw content by the sensors at the local devices, such as cameras in smartphones (Ren et al., 2020a). In contrast to VR devices, which download content from a content server, AR devices can upload raw content to the server for further processing. Specifically, raw videos captured by AR devices are clipped into frames with a specific image format, and those frames can be offloaded to the server. The processed content is then delivered to and played on the AR devices.
In the step of content rendering, tiled VR videos transmitted to VR devices are stitched together, and computing resources are required to project 2D stereoscopic videos to 3D stereoscopic videos, i.e., generating two different videos for the left and right eyes respectively. This content rendering step can be performed on VR devices once all the required content has been received. In addition, due to the limited computing capability of VR devices, the workload of content rendering can be offloaded to adjacent edge servers enabled by mobile edge computing (MEC) (Sukhmani et al., 2018; Dang and Peng, 2019; Dai et al., 2020). Content processing and rendering are more complex in AR than in VR, where AR processing procedure is shown in Figure 4. Once the raw AR content, i.e., video frames, is captured by an AR device, a location tracking step determines the device's location and position according to the captured frames. Then, a mapping step establishes a virtual coordinate of the environment based on the result of the tracker, and an object recognizing step detects the objects to process in the video frames (Qiao et al., 2018; Ren et al., 2019). Based on the identified objects, the augmented data is retrieved from the local cache or network servers and attached to the frames accordingly. Specifically, a template matching step attaches the augmented data to the frames, and an annotation rendering step renders the processed frames at AR devices. The computing workload for conducting the above functions can be fully or partially offloaded from AR devices to network servers to minimize computing latency or improve energy efficiency at AR devices.
After receiving and playing XR content, XR devices collect user feedback to select the content to deliver next. VR and AR devices have similar methods for feedback collection, with sensors or cameras attached to the devices to capture users' actions and motions. Moreover, VR requires additional feedback regarding the user's viewpoint. A user's viewpoint determines which tiled videos to deliver to render the FoV of the user. The viewpoint can be captured by motion tracking modules on a VR device. Additionally, motion emulation can be used to simulate a user's viewpoint movement on VR devices. VR devices can request the content proactively based on the emulation results to avoid performance degradation, such as rebuffering (Yao et al., 2017). In addition, for interactive applications such as XR gaming, the sensors connected to XR devices, such as inertial measurement units (IMUs), haptic gloves, etc., gather inputs from the users. Depending on the inputs, the XR devices can either process the inputs locally or upload the inputs to content servers for computing and updating.
3.1.3. Requirements
In general, XR has stringent latency requirements for accurate and smooth content playback based on user motions. In terms of VR, motion-to-photon (MTP) delay is the most important delay metric, which measures the time difference between the user's viewpoint movement and corresponding reflections at the output of the VR headset. If the MTP delay is larger than 20 ms, VR users may feel spatially disoriented and dizzy, referred to as VR sickness (Yao et al., 2017). Current VR industries target lower MTP delay (below 15 ms) for ideal user experience (Mangiante et al., 2017). In addition, for VR applications requiring extensive interactions, the requirement of response time for rendering the interactions into VR content can be longer than the MTP delay requirement. For example, in VR gaming, a latency of up to 50 ms for responding to player actions can be noticeable yet currently acceptable (Zhang et al., 2017). In terms of AR, the content is mainly captured by local devices. The MTP delay in AR can be minimized by playing the raw content captured by AR devices before the content is processed. However, users' immersive experiences can be adversely affected by delayed processing for rendering the user's motions into AR content. The delay requirements for reproducing user interactions in AR content are 75 ms for online gaming and 250 ms for telemetry based on the sensitivity of the human vestibular system (Mohan et al., 2020).
Furthermore, in order to achieve low content delivery latency, an ultra-high data transmission rate is required for delivering XR content. Specifically, users view VR videos on headsets placed a few centimeters from their faces. Therefore, high-resolution videos are required for VR applications to improve user experience. Although tile-based content transmission can reduce the data size in VR content delivery, data rate requirements can still be 2.35 gigabits per second (Gbps) or above for VR video delivery, which is more than 100 times higher than the data rate for current high-definition video streaming (Mangiante et al., 2017). For interactive XR applications, such as VR gaming and AR, extensive video processing is required. The computing capability of both network servers and user devices dominates the performance of interactive XR applications, and limited computing capability in the network can be another bottleneck for XR content delivery (Elbamby et al., 2018b).
3.2. Haptic communication
In this subsection, we first provide the concepts of haptics and haptic communication. Then, we detail the implementation procedure and service requirements of haptic communication in the 6G era.
3.2.1. Concept
The term haptics initially referred to interactions between humans and objects in the physical world that involve the sense of touch, e.g., swiping a phone screen (Steinbach et al., 2012). The development of tele-operation technologies over the past few decades have expanded the definition of haptics to all forms of interactions involving the sense of touch, including interactions between humans and virtual objects in the virtual world or the tele-operated machines in the physical world (O'malley and Gupta, 2008; Tan et al., 2020). The information conveying the sense of touch in such interactions is referred to as haptic information. The sense of touch relates to different types of mechanoreceptors in human skin and muscles, and the haptic information can be broadly classified into tactile and kinesthetic information (Abiri et al., 2019). Specifically, tactile information is related to the sense of surface texture, friction, and temperature felt by the human skin when in contact with objects, and kinesthetic information is related to the sense of position and motion of limbs along with the associated forces (Srinivasan and Basdogan, 1997; Steinbach et al., 2012). A device that supports haptic interactions and the transmission of haptic information is referred to as haptic interface (HI) or haptic device. Existing HIs can be broadly categorized into graspable, wearable, and touchable HIs. Generally, graspable HIs are mainly used for capturing and displaying kinesthetic information; wearable HIs are mainly used for capturing and displaying tactile information; and touchable HIs can be used in both kinesthetic and tactile information capture and display (Culbertson et al., 2018). An HI is comprised of haptic sensors and haptic actuators responsible for capturing and displaying haptic information, respectively (Antonakoglou et al., 2018). An HI can capture and display a variety of haptic information, and the number of independent coordinates used by the HI to specify the haptic information is referred to as the degrees of freedom (DoF) of the HI (Promwongsa et al., 2020).
Haptic communication refers to the process in which humans communicate and interact through the sense of touch over a communication network (Steinbach et al., 2012). The communication network supporting haptic communication is named as Tactile Internet in some existing works (Ali-Yahiya and Monnet, 2022).2 With the use of HIs and the transmission of haptic information over communication networks, users can interact with virtual objects in the virtual world or remotely operate machines in the physical world (Steinbach et al., 2012). The transmission of haptic information can be unilateral, bilateral, or multilateral, depending on the number of users participating in the haptic communication. In the cases of one user manipulating a remote machine or two users interacting with each other, the haptic communication is unilateral (i.e., an HI either sends or receives haptic information) or bilateral (i.e., an HI both sends and receives haptic information). In other cases, haptic information can be transmitted multilaterally, e.g., in cooperative tele-operations involving multiple users. This is because the behavior of each user may have an effect on other users, resulting in interconnections and couplings in the exchanges of haptic information (Feth et al., 2009; Shahbazi et al., 2018). Since haptic communication centers on humans, some studies examine the human-in-the-loop nature of haptic communication and predict a paradigm shift from content delivery to skillset delivery, as a result of the emergence of haptic communication (Simsek et al., 2016; Ali-Yahiya and Monnet, 2022).
3.2.2. Basic implementation procedure
The implementation procedure of haptic communication depends on how the haptic information is transmitted. For bilateral haptic communication, the implementation procedure mainly consists of four steps: haptic information acquisition, data reduction, data transmission, and haptic display, as shown in Figure 5.3 In the first step, haptic information, including tactile and kinesthetic information, can be acquired by haptic sensors in HIs. In terms of tactile information, force sensors, thermistors, and laser scanners are mainly used in the measurement or evaluation of friction and hardness, warmth, and macroscopic roughness, respectively (Lederman and Klatzky, 2009; Fishel and Loeb, 2012; Okamoto et al., 2012; Liu et al., 2017). Haptic sensors such as IMUs are responsible for the acquisition of kinesthetic information, e.g., tracking the position, velocity, and angular velocity of sensors positioned at different parts of a human (Steinbach et al., 2018). The haptic sensors of interest can be dynamically selected, and only the haptic information captured by the selected haptic sensors needs to be collected for efficient haptic information acquisition (Van Den Berg et al., 2017). Due to the potentially high DoF of an HI, data reduction is adopted in the second step to reduce the amount of haptic data without degrading the users' immersive experience too much. Specifically, waveform-based representation and feature extraction algorithms can be used in the compression of tactile information, and perceptual coding techniques based on perceptual masking phenomenon can be applied for compressing kinesthetic information (Steinbach et al., 2010; Jayasankar et al., 2021). In addition, predictive methods (also called predictive coding techniques) can be leveraged to reduce the amount of transmitted haptic data by inferring upcoming haptic information (Steinbach et al., 2018). Haptic data reduction can be carried out at either HIs or network servers (Steinbach et al., 2012; Fitzek et al., 2021). Existing methods of haptic data reduction are detailed in Section 4.2. In the third step, the haptic data can be transmitted over a communication network, resulting in a haptic data stream between two HIs. The haptic data stream can consist of multiple haptic data substreams, each of which corresponds to a type of haptic information. Data traffic patterns and QoS requirements can vary across different haptic data substreams due to the differences in the sensitivity of human perception, such as reaction time and the range of perception (Fitzek et al., 2021). The respective QoS requirements of haptic data substreams should be satisfied, and the haptic data substreams should be synchronized in transmission. Moreover, a haptic data stream should be synchronized with audiovisual data streams in the case of immersive communications involving multiple modalities (Cizmeci et al., 2017). In the last step, i.e., haptic display, haptic actuators in an HI stimulate human mechanoreceptors to create realistic haptic sensations when the HI receives haptic data (Wang et al., 2019). In general, haptic display includes tactile display, e.g., adjusting the temperature, and kinesthetic display, e.g., creating motion and changing muscle tension (Pacchierotti et al., 2017; Steinbach et al., 2018; Ozioko et al., 2020). In the case when haptic data transmission is unreliable or delayed, predictive methods can be leveraged at the receiver side to estimate the haptic data not received timely for smooth haptic display.
In the case of multilateral haptic communication, three additional steps take place besides the aforementioned four steps, especially for cooperative tele-operation applications (Feth et al., 2009). The implementation procedure of multilateral haptic communication is shown in Figure 6, and the three additional steps are highlighted with green rectangles. First, even if there is no direct haptic interaction between two users, they can still share haptic information (Takagi et al., 2017). For example, the information on tensile strength, texture, and depth of the tissue can be shared among surgeons to facilitate their collaboration in telesurgery. The data format and content of the transmitted haptic information in such haptic information sharing may differ from those of the transmitted haptic information in direct haptic interactions (Shahbazi et al., 2018). Second, it is necessary to properly fuse the haptic information from multiple users, e.g., the weighted sum, when their behaviors affect other users (Fujimoto et al., 2008; Thanh et al., 2012). Third, when one user's behavior affects multiple users at the same time, distributing haptic information to multiple users according to their different behaviors is required to achieve precise haptic display for individual users, e.g., different reaction forces are applied to tele-operators (Chen et al., 2016).
3.2.3. Requirements
The data transmission rate requirement of haptic communication is determined by the packet rate and size of haptic data. The packet rate is the number of packets transmitted by an HI per second, which depends on the information update rate. For the smoothness and fidelity of haptic perception, haptic information typically needs to be updated at a rate above 1,000 times per second (Choi and Tan, 2004). If each update of haptic information is packetized and transmitted, the corresponding packet rate of haptic data is above 1,000 packets per second (Xu et al., 2015). The packet size of haptic data largely depends on the DoF of the haptic data (Holland et al., 2019). For kinesthetic data, controlling one movable component (e.g., a joint) on a tele-operator (e.g., a robotic arm) needs six coordinates to be specified to achieve 6 DoF, with three coordinates specifying the transitional motion in the 3D space and the other three specifying the rotational motion including roll, pitch and yaw, respectively (Promwongsa et al., 2020). Since a human hand consists of multiple movable components (e.g., finger joints and wrist joints), its kinesthetic data can be described by a 24-DoF model (Cobos et al., 2008). In addition, for reproducing tactile information with high fidelity, a dense array of haptic sensors/actuators needs to be deployed on a user (Hoggan et al., 2007). For example, for reproducing vibrotactile data, four actuators are deployed around one fingertip (Baik et al., 2020). As a result, tactile data can involve even higher DoF than kinesthetic data (Holland et al., 2019). The packet size of 1-DoF, 10-DoF and 100-DoF haptic data is about 8, 80 and 800 bytes, respectively, and the specific data transmission rate requirement can be derived accordingly (Holland et al., 2019).
The delay tolerance of haptic communication can be as low as 1 ms since the packet rate of haptic data can be above 1,000 packets per second (Fettweis et al., 2014). In practice, the delay requirement of haptic communication is determined by factors including the perceptual sensitivity of receivers, the dynamics of haptic interaction, and specific operation or interaction. First, higher perceptual sensitivity for haptic information generally indicates the need for a higher packet rate and thus a stricter delay requirement (Chaudhuri and Bhardwaj, 2018). For example, while touring a virtual museum of natural history, archaeologists can have a stricter delay requirement than the majority of visitors due to their higher perception sensitivities of artifacts and specimens. Second, similarly, higher dynamics of haptic interaction generally call for a higher packet rate and a lower delay. Specifically, the delay requirement when such dynamics is high (e.g., in tele-soccer), medium (e.g., in telerehabilitation) and low (e.g., in tele-maintenance) is 1–10 ms, 10–100 ms, and 100–1,000 ms, respectively (Holland et al., 2019). Moreover, for the same use case, the delay requirement can vary with the dynamics of the interaction. For example, in tele-training, the delay requirement when the trainee is being assessed and corrected by the trainer is 1–10 ms; the delay requirement when the trainee is observing the illustration of the trainer is 1–100 ms (Holland et al., 2019). Third, a delay below 2 ms is required for remote machine manipulation, while a delay below 50 ms is required for remote machine monitoring (Aijaz and Sooriyabandara, 2018).
The reliability of haptic communication can be evaluated in terms of bit error rate, packet loss rate, delay-bound violation probability, or prediction error when haptic data prediction is adopted (Promwongsa et al., 2020). The requirement for the reliability depends on factors such as the specific communication scenario and whether or not haptic data reduction is used. First, in terms of delay-bound violation probability, the reliability of haptic communication in immersive gaming is required to be above 99.9% (Holland et al., 2019). In contrast, when critical operation tasks are performed based on haptic information, higher reliability of haptic communication is required. For example, the reliability of above 99.999% is required for haptic communication in telesurgery and remote machine manipulation (Aijaz and Sooriyabandara, 2018; Gupta et al., 2019). Second, when haptic data reduction is adopted, the same packet loss or bit error rate can cause more degradation in the haptic information (Steinbach et al., 2010). As a result, the use of haptic data reduction can result in a stricter requirement for the reliability of haptic communication. For example, the reliability above 99.999% is required in immersive gaming when haptic data reduction is adopted (Holland et al., 2019).
3.3. Holography and holographic communication
In this subsection, we introduce holography and holographic communication, beginning from presenting the concept and different types of holography, followed by the basic implementation procedure of holographic communication, and ending with the data transmission rate and delay requirements.
3.3.1. Concept
As the name suggests, holographic communication depends on holography technology, which has made significant progress in the past decade. There are different stages in the development of holography technology. Optical holography generates holograms via recording and recreating optical wavefront, and the corresponding holograms are recorded interference patterns (e.g., on photographic emulsions) of an “object wave” and a “reference wave.” When the recorded interference pattern is illuminated by the reference wave, a 3D light field can be recreated using diffraction. The original idea of hologram was developed in 1940s, and real breakthrough was made in 1960s thanks to the development of laser (Gabor, 1972). Later, with advances in electronic devices, digital holography emerged, which uses image sensors to capture interference patterns. In digital holography, recording is done optically, while a 3D image is reproduced via numerical calculation of light wave diffraction using methods such as Fourier transform (Tahara et al., 2018). The latest development of holography is computer-generated holography, in which both the interference pattern and the 3D image in display are generated digitally using a computer (Sahin et al., 2021). With computer-generated holography, the object to be displayed does not have to be physically present, which yields great flexibility at the cost of high computational complexity (Shimobaba et al., 2022). Despite of the advance in recent years, generating dynamic 3D holograms in real time is challenging. As a result, alternative approaches to displaying 3D images emerge, which are sometimes referred to as “false holography.” Such approaches use glass panes or other “tricks” to create illusions of 3D images (Jones et al., 2007; Kerrigan, 2018). Among the false holography techniques, volumetric display has attracted significant interest in the field of computer-aided design and medical imaging (Favalora, 2005). Volumetric display, an umbrella term for many different techniques, renders volume-filling 3D images via the generation, absorption, and scattering of illumination in a confined space, e.g., a cube or cone (Yang et al., 2016). The study of volumetric display is active with exciting experiments (Smalley et al., 2018), and commercial products are also available (Gibney, 2019). Other approaches to imitate 3D display include the use of multiple projectors and a human-size retroreflective cylinder (Gotsch et al., 2018). For example, a circular multi-projector array can be implemented for a light field cylindrical display to differentiate perceived images from different viewing angles.
Based on either true holography or “false holography,” holographic communication is about transferring data representing dynamic 3D images of physical objects over a network and displaying the objects in 3D at the receiver.4 Integrating 3D data capturing, processing, transmission, and rendering, holographic communication is expected to enable exciting new services in 6G (Strinati et al., 2019; Clemm et al., 2020). At the moment, there is no consensus on the scope of holographic communication in the literature, and some researchers consider the transferring and rendering of 3D data in AR/VR as a type of holographic communication (Essaili et al., 2022). In this review, holographic communication refers to data transfer for autostereoscopic 3D display, i.e., 3D images that can be viewed by naked eye without the aid of eyewear or headsets and, ideally, are different when viewed from different positions, angles, or tilts. The 3D display at the receiver can be rendered via real holography, false holography such as volumetric display, or other techniques as long as the objective of autostereoscopic 3D display is achieved. Similar to existing multimedia communications, the content of holographic communication can be either generated in real time or recorded, and the communication mode can be unicast, multicast, or broadcast.
3.3.2. Basic implementation procedure
Although various approaches for holographic communication differ in the implementation procedure, the general process includes the steps of data capture, processing, transmission, and rendering. This is illustrated in Figure 7.
Except for computer-generated holography, a capture system is required to record 3D images of a physical object. An ideal capture system for holographic communication would capture the light field, i.e., all the information of each light ray, in the target scene (Apostolopoulos et al., 2012). In practice, capture is conducted with visual sensors such as a camera array (Nakamura et al., 2019) or light detection and ranging (LIDAR) sensors (Fratz et al., 2021). The depth information of the object of interest is either directly captured (e.g., in the case of a capture system with LIDAR sensors) or computed in the subsequent data processing step (e.g., in the case of a capture system with a camera array). The performance of the visual capture system depends on factors such as the number of sensors and the camera sampling rate (Apostolopoulos et al., 2012).
In the data processing step, the depth information of target objects in the scene is computed (if not directly captured), and the output from capture sensors is fused to form a composite 3D representation of the captured scene (Javidi et al., 2005). For example, in digital holography, a computer can process 2D images taken from different angles and tilts by a camera array to form a single 3D representation of the captured scene (Essaili et al., 2022). The fusion of images may help achieve visualization enhancement in the rendered 3D images such as improvement in the resolution and contrast (Javidi et al., 2005), and it can be conducted either solely at the transmitter side or with the help of an edge server. In addition, the data processing step is responsible for the compression of the fused data to speed up the transmission and reconstruction, and reduce the required data transmission rate and storage in holographic communication (Kurbatova et al., 2015; Cheremkhin and Kurbatova, 2019). The compressed data for the 3D representation is then encoded and transmitted over a network.
At the receiver side, the received data is decoded using one or multiple chosen codecs and decompressed. The captured scene is then reconstructed, possibly with the help of an edge server, and rendered on a display device. An ideal display device for holographic communication would regenerate the light field in the captured scene to create an illusion that the user is placed in the scene. In practice, creating such an illustration is difficult as it requires each point (e.g., each pixel) of the display device to emanate different light rays in different directions. However, given the limitations of human perception, the feeling of visual immersion can be created by using equipment such as a cylindrical light field display (Gotsch et al., 2018), a persistence of vision (PoV) display (Gately et al., 2011), or a static volumetric display device (Kumagai et al., 2021). Such devices render 3D images by using a large curved display to fill the user's FoV, exploiting the phenomena of a lingering afterimage on the retina, and dynamic turning on/off of voxels in a confined 3D space, among other methods for creating illusions of 3D images.
It is worth noting that holographic communication may also involve audio data capture, processing, and rendering. In such a case, capturing the sound field in the target scene and ensuring audio and video synchronization are important for users to enjoy an immersive holographic communication experience (Apostolopoulos et al., 2012).
3.3.3. Requirements
Holograms mainly come in two types, namely volumetric-based holograms and image-based holograms. The transmission of the two types of holograms requires different data rates, ranging from hundreds of Mbps up to Tbps (Clemm et al., 2020). For volumetric-based holograms, a physical object is represented as a set of 3D pixels or voxels, such as a point cloud. Transmitting a point cloud targeting an object requires a data rate on the level of hundreds of Mbps to several Gbps, depending on the resolution of the 3D content (FG-NET2030, 2020). For example, to fully represent a human, the point cloud in each frame typically consists of 105–106 points, while each point needs 15 bytes of data to represent the color and 3D coordinate of the point. In the case of 30 frames per second, the data rate requirement is between 300 Mbps and 3 Gbps (Selinis et al., 2020; Essaili et al., 2022). For image-based holograms, such as light-field video (LFV), an object is presented by an array of images captured at different angles, tilts, and/or positions. An LFV-based hologram can be more precise as compared with a volumetric-based hologram, especially in high resolution when a large number of images from different tilts, angles, and positions are used per frame (Jiang et al., 2021). For example, if the 3D representation of an object requires a separate image every 0.3°, a hologram with an FoV angle range of 30° and a tilt range of 10° needs 3,300 separate 2D images. In order to transmit an LFV-based hologram for a human-sized object, the required data rate should be between 100 Gbps and 2 Tbps (Clemm et al., 2020).
To support real-time holographic communication, the overall delay, including data capturing, processing, transmission, and rendering delay, should be <100 ms (He et al., 2023). In addition to low delay, synchronization is important to holographic communication. Generally, the hologram of objects or humans may be sampled by multiple sensors from different angles and different distances. In this case, data from different sensors should be synchronized in transmission (Strinati et al., 2019). Taking holographic teleconference as an example, as multiple participants can join the teleconference from different locations, multi-source synchronization is necessary for them to have good quality of experience (QoE) in holographic communication. Otherwise, a part of the rendered hologram can be slightly ahead or behind relative to the rest of hologram for some users, resulting in poor QoE (Lesniak and Tucker, 2018). Moreover, holographic communication can involve multi-sensory information, e.g., the haptic, audio, and video information (Taleb et al., 2021). In this case, the synchronization of different sensory information in transmission is also important for users to see the hologram, hear the voice, as well as receive touch-sensory feedback from others without a degradation of the immersive experience due to out-of-sync issues. For holographic communication involving the transmission of audiovisual and haptic data, the tolerable difference in the delay of different types of data should be lower than 80 ms for satisfactory QoE (Montagud et al., 2018).
4. Immersive communications: Challenges and solutions
After introducing the concepts, implementation procedures, and requirements of immersive communications, we now discuss challenges in XR, haptic communication, and holographic communication, as well as the state-of-the-art solutions, with the most important ones summarized in Figure 8. Note that our review here focuses on the challenges and solutions related to the communication, computing, and networking aspects of immersive communications.
4.1. Extended reality
The main challenge of XR is delivering the required content to users on time, given the limited transmission resources and computing capability in a network. A variety of network functions and resources contribute to the performance of content delivery. Systematic solutions involving data processing, rendering, transmission, etc., have been developed to address these challenges. We summarize the solutions for implementing XR in three aspects: content selection, transmission improvement, and computing optimization.
4.1.1. Content selection
The fundamental step in supporting XR applications is to identify which content needs to be processed and transmitted. This step focuses on minimizing the overall data size of the content to deliver at the cost of tolerable performance degradation, thus reducing the delivery time.
In VR services, proactive content delivery is commonly used to meet MTP delay requirements. Thus, in tile-based content transmission, the primary research challenge is how to predict user viewpoints accurately so as to determine which tiled videos to deliver to users. The prediction of user viewpoints can be achieved by sequential learning and data analysis methods based on the user's viewpoint trajectory, such as linear regression (He et al., 2018; Nasrabadi et al., 2020), and long short-term memory (LSTM) (Hou et al., 2018). A lightweight viewpoint prediction function can be deployed at the VR headset for local viewpoint prediction. Alternatively, the viewpoint trajectory can be updated to a network server (e.g., edge server), in which a more advanced machine learning model can be applied for accurate prediction (Hou et al., 2021). If the viewpoints are predicted by the network server, the prediction can be conducted based on not only current viewpoint trajectories for a group of users (Sun et al., 2020) but also the historical viewpoint trajectory data to further improve the prediction accuracy (Xu Y. et al., 2018; Feng et al., 2019). Although viewpoint prediction enables proactive tile-based content delivery, perfect prediction cannot be achieved due to the dynamics of user viewpoint movement. Even if viewpoints are known in advance, dynamic network environments such as data traffic load and processing time require adaptive resource management to ensure playback performance. With stochastic decision-making methods, such as reinforcement learning, it is possible to identify the dynamics of user viewpoint movement and determine which tiled videos to deliver to the corresponding VR device (Hu F. et al., 2022). In addition, the portion of tiled videos with different video qualities transmitted in a given time interval can be adjusted according to the viewpoint movement of a user. Increasing the portion of low-quality videos can improve the robustness against viewpoint prediction errors, while increasing the portion of high-quality videos can improve the QoE of the user. The optimal tradeoff between the robustness and the QoE is evaluated for VR video delivery in Hu M. et al. (2022).
AR devices capture raw content, i.e., video frames, which can be offloaded to network servers for prompt content processing. Once the raw content is offloaded, the server detects and processes the objects within video frames captured by users' cameras, then returns the processed content to the AR devices. Though it is easier to satisfy the MTP delay requirement in AR than VR, enabling accurate and rapid content processing (e.g., object detection) by network servers requires sufficient bandwidth to provide low-latency two-way transmission for satisfactory QoE. To balance transmission bandwidth usage for computing offloading and content processing performance, current solutions mainly focus on using machine learning techniques to adjust the number of frames offloaded by an AR device per unit time, based on the network environment and AR device movement. Specifically, offloading more video frames to a network server can improve object detection accuracy, especially when the AR device moves quickly and generates new content frequently. However, the bandwidth usage increases accordingly due to a large number of frames to offload (Liu Q. et al., 2018). Taking AR device mobility and network dynamics into account, adaptive frame rate adjustment is investigated in Chen N. et al. (2021). A deep reinforcement learning approach is used to study how mobility dynamics affect AR service performance and to determine the optimal uploading frame rate for maximal object detection accuracy and playback fluency.
XR content is expected to be further enriched in the era of 6G. Digital twins can incorporate AI to collect environmental information, characterize physical objects, and construct digital models of the physical objects accordingly. Digital models from digital twins can be used for XR applications as a new type of XR content that can be accessed by XR devices (Zhang Z. et al., 2022). For example, in an industrial Internet-of-Things scenario, designers and workers can use XR devices to interact with the digital models of machines and products in a simulated virtual environment. In addition, XR devices can collect the interactions from designers and workers. Based on the interactions, digital twins can adaptively configure their settings, such as data collection frequency (Aheleroff et al., 2021). The combination of XR and digital twins can support emerging applications such as metaverse. However, synchronizing among the physical world, digital twins, and XR content requires considerable network resources. Game theoretic methods are adopted in Han et al. (2022) to adjust the synchronization rate between the physical world and digital twins based on the demand of virtual service providers that provide content to XR devices. A network slicing-based solution is proposed for providing metaverse services (Liu et al., 2022), which allocates multi-dimensional resources for content synchronization to improve the fidelity of digital twins and the QoE of XR users.
4.1.2. Transmission improvement
As discussed in Section 3.1.3, the main bottleneck for VR video delivery is a limited data rate. Therefore, a straightforward solution to overcome the bottleneck is to increase the data rate with advanced communication techniques. As a key technology in 5G, millimeter wave (mmWave) communications can facilitate VR content delivery due to their high data rate and ultra-low propagation latency (Abari et al., 2016). In 6G, the transmission rate can be further improved by the physical layer technologies of terahertz (THz) transmission and intelligent reflecting surface (IRS), which can be applied in VR video (Chaccour et al., 2020; Du et al., 2020). However, communication links using ultra-high frequency bands, such as mmWave and THz, are prone to outage as they require line-of-sight (LoS) channels. Physical obstacles in the environment, including the user's body, may break the communication links and severely degrade the communication quality. To address this issue, a sub-6 GHz frequency band can be used as a backup if the mmWave or THz bands does not provide satisfactory channel quality. However, dynamic frequency band switching can result in a time-varying data transmission rate, thereby degrading the content delivery performance. The work (Liu et al., 2019b) models communication link state transitions corresponding to switching different frequency bands (e.g., mmWave and sub-6 GHz bands) in VR content delivery as a Markov chain. Content processing policies are adjusted to compensate for transmission delays when channel state transitions occur. In addition to adapting to channel dynamics, the reliability of mmWave or THz communication links can be improved by establishing multiple communication links between a device and several edge servers for VR content delivery (Gu et al., 2022; Yang P. et al., 2022). In addition, on the link layer, IEEE 802.11 releases a new amendment standard IEEE 802.11be - Extremely High Throughput (EHT), i.e., WiFi-7, to support high-throughput and low-latency video applications, including XR, through aggregating multiple transmission bands, exploiting MIMO enhancements, and enabling multi-AP coordination (Deng et al., 2020).
On the network layer, a network virtualization-based solution is proposed for VR content delivery, in which network controllers can create private logic networks for VR applications to satisfy their service requirements and dynamically adapt the routing schemes according to the mode of content delivery (i.e., uni-cast or multi-cast) (Huawei Technologies Co., Ltd). The transmission protocols are designed according to the features of VR content delivery. The transmission protocol based on quick UDP Internet connections (QUIC) is proposed in Yen et al. (2019) to prioritize important tiled videos, such as the videos in the center of the user's FoV or the videos to be played soon, in transmission over a QUIC connection, in order to minimize the ratio of missing tiles when playing VR videos.
4.1.3. Computing optimization
Supporting wireless XR requires networks to have sufficient computing capability for processing and rendering the content, especially for interactive applications such as VR gaming. Processing the content locally at the XR devices can be time-consuming and energy-inefficient due to their limited computing capability. Instead, the computing workload can be fully or partially offloaded to network servers, and multi-tier computing can be a potential solution to reduce computing time and bandwidth consumption when providing computing services to XR devices. Accordingly, computing strategies should base on the features of diverse network servers to improve resource utilization and service performance.
In MEC, edge servers can provide additional computing capability for resource-limited devices to reduce content processing latency for mobile XR content delivery. Specifically, in VR, edge servers can project monoscopic videos to stereoscopic videos when content is transmitted from the content provider's cloud server to VR devices. Such MEC-assisted content delivery can reduce bandwidth consumption compared to delivering stereoscopic videos from the cloud server directly, and computing time can be reduced compared to projecting the videos at the local devices (Mangiante et al., 2017). In AR, devices can offload captured content to an edge server to minimize processing latency (Siriwardhana et al., 2021). In addition, edge servers can cache the processed XR content to further reduce the content delivery and processing time (Sukhmani et al., 2018). Joint computing, caching, and communication resource management for VR video delivery is investigated in Dang and Peng (2019) and Sun et al. (2019), which studies the tradeoffs between computing and caching resource allocation for minimizing content delivery delay, given stochastic content processing time and popularity. Deep reinforcement learning methods are adopted to allocate computing resources at an edge server for individual content delivery requests in Liu and Deng (2021) and Liu et al. (2019b), aiming to minimize content delivery delay while adapting to dynamic network environments and user viewpoint movement. The work (Liao et al., 2021) further investigates trusted caching collaboration for multiple edge servers in supporting VR/AR content delivery. A distributed caching scheme is proposed to optimize the cache space and policy for edge servers while incentivizing edge servers to participate in edge caching through verification schemes in the blockchain.
Nonetheless, the computing capability at edge servers may not always be sufficient for processing XR content. Compared to cloud servers, edge servers usually have limited storage resources for caching XR content. Targeting 6G, a multi-tier computing architecture provides a potential solution for further accelerating XR content delivery by coordinating computing and storage resources among cloud servers, fog servers (e.g., servers at the gateway), and edge servers across the network. By integrating computing resources across the entire network, content processing workloads can be optimally distributed among multiple servers, and storage capacity among servers can be utilized to satisfy offloaded computing demands. However, optimizing XR performance by multi-tier computing can be complicated when there are a multitude of computing offloading and caching options to choose from. The computing and caching resource coordination between the cloud server and edge servers is studied in Al-Abbasi et al. (2019) and Mehrabi et al. (2021). Based on the information of a static network environment, e.g., transmission rate and XR computing demand, mixed integer nonlinear programming is investigated. Considering dynamic network environments and user mobility, the work (Zhou C. et al., 2022) utilizes digital twins of end users to characterize network dynamics and statuses. The meta-learning method is adopted to jointly allocate computing and caching resources at servers on different tiers of a network for context-based applications, including XR, based on the captured network statuses from digital twins. The attention of users on the virtual objects in XR content is predicted in Du et al. (2022) by an alternating least square method, and a computing resource allocation scheme is proposed to prioritize processing of the virtual objectives that attract more user attention.
In addition to jointly allocating computing and caching resources at network servers, computing performance can be further enhanced by scheduling computing tasks at edge servers. Edge servers can provide location-based content to users, which can contribute to computing optimization for XR applications. Specifically, in AR, users at close locations may offload and require similar content, and therefore, raw content offloaded from the nearby users can be processed together for improving computing efficiency (Jia and Liang, 2018). Furthermore, rendering pipelines can be optimized based on real-time communication and computing performance of network servers and local devices when part of the workloads for content rendering are offloaded. A collaborative rendering pipeline is investigated in Xie et al. (2021), which dynamically arranges the execution order of sub-tasks in content rendering on both the edge server and XR devices, based on network characteristics, to facilitate parallel computing and improve content rendering efficiency. In addition, joint computing and communication resource management for efficiently supporting multiple users in a virtual world is investigated in Ren et al. (2020b). Device-to-device links are enabled to allow each AR device to leverage the computing resources of nearby AR devices for lightweight pre-processing of the captured frames to further improve computing resource utilization in the network.
4.2. Haptic communication
The main challenge in haptic communication is to satisfy the stringent delay and reliability requirements in the delivery of haptic data, especially when the data packet rate is high. To tackle this challenge, solutions have been developed in three aspects, including haptic data reduction to reduce the packet size or the packet rate, advanced communication and networking techniques to reduce delay and improve reliability, and haptic data prediction to compensate for excessive delay and packet loss over communication networks.
4.2.1. Haptic data reduction
To improve the fidelity of haptic perception, the number of haptic sensors/actuators deployed on an HI has been increasing (Steinbach et al., 2018). For example, electronic skin (e-skin) can be attached to prosthetic limbs for sensing haptic information, or to human skin for virtual social interaction (Dahiya, 2019; Yu et al., 2019). To reproduce the function of human skin, sensors/actuators need to be densely deployed on e-skin, for example, 25 sensors/actuators per 1 cm2 (Liu et al., 2020). In addition, the required packet rate for haptic data can be higher than 1,000 packets per second (Orlosky et al., 2017). As a result, with a large number of devices and a high packet rate, the required data transmission rate of haptic communication can be high. To tackle this challenge, one solution is haptic data reduction, which is to reduce the packet size or rate of haptic data.
For reducing the packet size of haptic data, floating-point compression in the time domain or quantization of haptic data in the frequency domain can be exploited. In floating-point compression, one degree of freedom in the haptic information (e.g., the direction of the transitional movement in an axis) can be represented by a 32-bit floating-point number, and only the bits different from those in the previous haptic data are transmitted (You and Sung, 2008). Using time-frequency transformation algorithms such as discrete cosine transform, a sequence of haptic data packets in the time domain can be transformed into the data in the frequency domain, which are then quantized and transmitted (Tanaka and Ohnishi, 2009; Zeng et al., 2020). For reducing the packet rate of haptic data, the perceptual masking phenomenon is widely exploited, which suggests that a human cannot perceive the difference of haptic information below the just-noticeable difference (JND). According to the Weber's law, the JND of haptic information is proportional to the currently perceived value of the information, and the proportion is referred to as the Weber fraction (Steinbach et al., 2018). In this regard, the perceptual haptic reduction method is to transmit an updated haptic data packet only when the difference is larger than a threshold (e.g., JND) (Steinbach et al., 2010). In addition, the perceptual masking phenomenon in both time and frequency domains can be jointly exploited to achieve a higher data reduction ratio and lower data deviation (Wei et al., 2022). Moreover, by jointly evaluating the difference of the haptic information in terms of all the DoF among consecutive data packets, the perceptual haptic reduction can be further improved (Steinbach et al., 2012).
The use of haptic data reduction should adapt to the type of haptic data, the delay requirement and the reliability requirement for haptic communication. First, haptic data can exhibit different Weber fractions in the JND, e. g., 7∽15% for force data and 13∽28% for stiffness data, which results in different thresholds in perceptual haptic reduction (Chaudhuri and Bhardwaj, 2018). Second, data reduction in the frequency domain results in high processing delay since it is based on a sequence of data packets in the time domain. It is suitable for use cases with high delay tolerance, such as the passive perception and exploration of remote/virtual objects (Sachs et al., 2018). In contrast, data reduction in the time domain, implemented in real time, is suitable for use cases with low delay tolerance such as immersive gaming, which involves extensive interactions between the players (Holland et al., 2019). Third, haptic data reduction may not be suitable for use cases requiring high reliability. As discussed in Section 3.2.3, with the use of haptic data reduction, the required reliability of haptic communication increases. In this regard, for use cases with a high-reliability requirement (e.g., 99.999% for telesurgery), the reliability requirement can be difficult to satisfy if haptic data reduction is used.
4.2.2. Communication and networking solutions
To satisfy the ideal communication delay of below 1 ms for haptic communication, physical-layer delay of <0.1 ms is desired (Aijaz et al., 2016). For reducing queuing delay, haptic data may be allowed to preempt the data of other types in the downlink transmission (Ji et al., 2018). For uplink transmissions, non-orthogonal multiple access (NOMA) can improve spectrum efficiency and reduce channel access delay of haptic devices (Budhiraja et al., 2019). In addition, a grant-based user scheduling mechanism can take 0.3–0.4 ms for exchanging the scheduling request and transmission grant (Ji et al., 2018). Besides such delay, the signaling overhead, resulting from network control or grant-based scheduling, reduces the efficiency of data transmission (Ding et al., 2021). Therefore, grant-free user scheduling has been exploited to avoid the time-consuming scheduling, which periodically pre-reserves transmission resources, and the same resources can be pre-reserved to multiple haptic devices for improving resource utilization (Ali et al., 2021; Gao J. et al., 2021). To reduce the delay due to packet retransmissions, interference in multiple access should be properly managed. In grant-free NOMA, the interference can be managed by device activity detection (Ye et al., 2019) and successive interference cancellation (SIC) (Abbas et al., 2019). Rate-splitting multiple access (RSMA) encodes message streams intended for multiple devices into common streams and private streams based on available channel state information (CSI), and a device jointly decodes the common streams and the private stream intended for it, which can achieve flexible interference management and high robustness to imperfect CSI (Dizdar et al., 2020).
For improving the communication reliability of haptic communication, several approaches have been adopted in the literature. First, considering the small size of a haptic data packet, short block-length channel codes with strong error correction capabilities, such as low-density parity-check (LDPC) codes and short polar codes, have been investigated for haptic communication (Miloslavskaya and Vucetic, 2020; Yuan et al., 2022). Second, spatial diversity can be exploited by massive multiple-input and multiple-output (MIMO), IRS, and multi-connectivity techniques (Tarneberg et al., 2017; Tang et al., 2020; Anwar et al., 2021). Third, time diversity can be exploited by retransmission schemes such as K-repetition, in which a haptic device can automatically transmit K repetitions of a packet over consecutive slots, thereby avoiding the delay caused by waiting for a retransmission request from the receiver (Yang et al., 2021). NOMA can improve retransmission efficiency where the transmit power of a device can be optimized to retransmit the required minimum redundant bits for satisfying the reliability requirement (Kotaba et al., 2019, 2021).
To guarantee low delay and high reliability for haptic communication, network slicing, which allows multiple isolated virtual networks to be constructed over a shared physical network infrastructure, has been exploited (Polachan et al., 2020). The perceptual masking phenomenon of haptic information, as introduced in Section 4.2.1, can be exploited to accurately capture the maximum tolerable delay of haptic communication requests, which facilitates resource reservation in the network slice for haptic communication (Ge et al., 2019). For multiple tele-operation slices, diverse stability control capabilities of tele-operators in the presence of delay should be considered for customized transmission resource reservation (Liu S. et al., 2018). Moreover, by exploiting AI-based learning methods, traffic patterns of haptic devices can be accurately captured, and efficient resource reservation can then be facilitated (Shen et al., 2020).
4.2.3. Haptic data prediction
The delay requirement of haptic communication can impose a constraint on the distance between two users. For example, to satisfy a delay requirement of 10 ms, the distance between a transmitter HI and a receiver HI must be smaller than 3,000 km since the propagation speed is upper-bounded by the speed of light. This can create an issue for applications such as VR gaming with haptic interactions of players across continents. In addition, it is impossible to eliminate the loss of data packets or the violation of delay requirement in haptic communication (Aijaz and Sooriyabandara, 2018). To improve user experience considering the above facts, haptic data prediction can be exploited.
For haptic data prediction, model-based or model-free prediction algorithms can take historical haptic data and other correlated data as the input. In tele-operation, the force feedback from the tele-operator is predicted by evaluating the previous force feedback through an auto-regressive model (Sakr et al., 2007). In the tele-operated needle insertion, the force/torque feedback from the patient is predicted by inputting the force/torque commands of the surgeon to the hidden Markov model (HMM) (Boabang et al., 2020). Audiovisual data collected in the interaction with a surface material are input to a neural network-based semantic learning algorithm to predict the texture of the surface material (Wei et al., 2021).
Haptic data can be predicted either at the receiver side or at the transmitter side to compensate for an excessive delay or packet loss. The receiver can predict the haptic data from the transmitter when an excessive delay occurs (Maier and Ebrahimzadeh, 2019). For example, digital twin-based prediction can be used by the receiver for low-latency interactions (El Saddik, 2018). Alternatively, the transmitter can predict its future haptic data and transmit the predicted data to compensate for the transmission delay (Hou et al., 2019). In this case, the prediction of whether haptic interaction is about to occur can assist to determine whether the haptic data prediction and the subsequent transmission are necessary (Mondal et al., 2020).
Haptic data prediction algorithms, such as AI-based ones, can be computing-intensive. To this end, they can be implemented using computing resources in the network to satisfy the stringent delay requirements (Simsek et al., 2016; Sukhmani et al., 2018). In a tele-operation scenario, each of the two interacting haptic devices is associated with one edge server which caches the haptic interaction data, trains and implements the LSTM network-based prediction algorithm, and delivers the predicted haptic data to its associated haptic device (Li X. et al., 2021). Furthermore, with close proximity, auxiliary robots can be deployed around haptic devices to implement haptic data prediction and deliver the results to the devices using device-to-device (D2D) communications (Yu et al., 2022).
In addition to compensating for the delay or packet loss, haptic data prediction can be used to reduce the packet rate of haptic data (Antonakoglou et al., 2018). Specifically, the haptic transmitter can implement the haptic data prediction and evaluate the prediction deviation, and only transmit the data when the prediction deviation is higher than the JND of the receiver. If the haptic data has not been transmitted, the receiver can predict it based on the prediction algorithm shared with the transmitter.
4.3. Holographic communication
In holographic communication, users are able to view 3D holograms from different angles, tilts, and positions. As a result, a hologram synthesized with information from more viewpoints can produce more detailed and continuous visual information for users, thereby creating a more realistic immersive experience (Liu et al., 2019a). This however requires the transmission of a large amount of data. The main challenge in holographic communication is its stringent data rate and delay requirements. In this subsection, we focus on potential solutions for tackling this challenge in the aspects of data processing, communication, and networking.
4.3.1. Content selection, compression, and prediction
A high data rate is essential for holographic communication, and the demand for data rate can vary from hundreds of Mbps to several Tbps depending on the type of transmitted data, e.g., volumetric-based or image-based holograms. One way to relax the data rate requirement is to reduce the data size, for example, by transmitting only the most essential parts of a hologram through viewpoint-based content selection in holographic communication (Clemm et al., 2020). Since some parts of the hologram may not be observed depending on the user's viewpoint and position, as well as the presence of obstacles, those parts may not need to be transmitted. However, two issues remain even with the selective transmission. First, for an immersive experience in holographic communication, 6 DoF (yaw, pitch, roll, up/down, left/right, forward/backward) need to be considered when a user views a hologram, which makes content selection based on the user's viewpoint a complex problem. In addition, without head-mount devices such as VR headsets, tracking the position and viewpoint of the user is challenging and requires mechanisms such as full-body tracking (Xu W. et al., 2018) or eye tracking (Zhang X. et al., 2019).
Another solution for reducing the required data rate is to apply data compression. For a 2D real-time video, current media codecs can achieve a compression ratios from 250:1 to 1,000:1 (Selinis et al., 2020; Essaili et al., 2022). Similarly, format conversion and data compression can be applied to reduce the data size in holographic communication. The authors in Mekuria et al. (2017) propose a lossy real-time color-encoding method by exploiting the inter-frame redundancy of point clouds. Moreover, considering the strong correlation among different views in a hologram, multi-view coding (MVC) for LFV-based streaming is proposed in Xiang et al. (2016), which improves the compression rate by analyzing both horizontal and vertical correlations of images in adjacent angles and tilts. Meanwhile, many efforts have been made by standardization groups for the compression of holograms. For example, the Moving Picture Experts Group (MPEG) defined the video point cloud compression (V-PCC) by converting point clouds into two separate video sequences that capture the geometry and texture information, respectively (Schwarz et al., 2019). The Joint Photographic Experts Group (JPEG) intended to provide a standard representation framework to facilitate the compression of LFV-based or point cloud-based content for holographic communication (Schelkens et al., 2019). Different codecs for hologram compression are evaluated in Amirpour et al. (2021), in which the authors study the compression and restructure of holograms.
Retransmissions due to data packet loss result in additional delay. To avoid the retransmission delay, the lost data packets can be recovered based on predicted data according to historical information of an object such as its trajectory. For example, packets can be recovered from an LSTM-based prediction of human actions and movements in 3D (Liu J. et al., 2016) or a short-term prediction by analyzing the actions, movements, or gestures of users (Manolova et al., 2021). By predicting content, data packets can be generated at the receiver side in the event of packet loss to reduce the delay in holographic communication (Strinati et al., 2019).
4.3.2. Communication and networking solutions
In addition to data processing, some communication and networking solutions have been investigated for satisfying delay and data rate requirements of holographic communication, including computing architecture, transport protocols, and physical layer technologies.
In holographic communication, data captured from different sensors needs to be processed to form a 3D representation of the object, which is then rendered and reconstructed at the receiver side (Javidi et al., 2005). However, the limited computing capability of local devices may lead to a long processing delay due to the high workload of data fusion and rendering (Hu et al., 2017). Cloud computing is introduced to support high computing workloads for data processing in holographic communication. However, transmitting massive data to the cloud may result in a high communication delay (Wang K. et al., 2022), which is not suitable for real-time holographic communication. One promising solution is to offload computing tasks to MEC servers for data processing, since MEC servers possess considerable computing capability and are placed close to users (Gupta et al., 2021). Thanks to network function virtualization (NFV), functions such as data fusion, data compression, and data rendering can be virtualized and flexibly deployed at MEC servers. In this case, captured data from different sources can be aggregated, fused, and synchronized at an MEC server before rendering (Qian et al., 2022). Moreover, a multi-tier computing scheme is proposed for 6G networks, which can be utilized for holographic communication by integrating computing resources at cloud servers, MEC servers, and local devices, to achieve a low delay for data transmission and high computing capacity for data processing with collaboration among different servers (Yang et al., 2018; Wang K. et al., 2022). By integrating computing resources on different tiers, content can be processed at different servers to effectively utilize computing resources, and flexible computing resource management should be developed to facilitate multi-tier computing for holographic communication. For example, split rendering is introduced for an MEC server and a local device to cooperatively decode and render holograms according to the content (Essaili et al., 2022).
To satisfy the stringent delay and high reliability requirements of holographic communication, transport layer optimizations are also crucial. Current transport protocols, such as transmission control protocol (TCP) and user datagram protocol (UDP), can hardly satisfy the requirements of holographic communication. To improve the reliability and delay performance in real-time communication, new protocols based on UDP are introduced, such as QUIC over HTTP/3 (Seufert et al., 2019). Currently, the research on QUIC mainly focuses on traditional 2D video streaming services, while QUIC can serve as a potential solution for holographic communication, providing a quality-managed low-delay streaming option (Clemm et al., 2020). Moreover, the transmission of a hologram may consist of multiple substreams corresponding to different viewpoints, while the QoS requirement and the priority of each substream may be different. In this case, the transmission of the most essential substreams needs to be prioritized. To achieve this target, a new transport protocol is designed in Rozen-Schiff et al. (2021) for holographic communication to satisfy different QoS requirements of different flows by providing flow-level granular control. In addition, an adaptive retransmission mechanism based on TCP is designed to reduce retransmissions by analyzing and differentiating packets (Clemm et al., 2020). For example, only important data, such as the data used for rendering the part of the hologram in the center of the user's FoV, will be retransmitted if the related packets are lost, to reduce retransmissions.
Finally, physical layer technologies are important to supporting a high data rate for holographic communication. In order to transmit high-resolution LFV-based holograms, holographic communication requires a data rate of several Tbps, while current 5G networks cannot support it (David and Berndt, 2018; Shahraki et al., 2021). Featuring higher frequency and larger bandwidth compared with mmWave in 5G, THz communications have the potential to support holographic communication with Tbps-level data rate (Chen et al., 2019; Elayan et al., 2019). To overcome the severe propagation loss of THz communication, dense deployment of access points and extremely narrow beams can be adopted to improve connection density and communication reliability (Zhang Z. et al., 2019). Considering the absorption and reflection properties in the THz regime (Aazhang et al., 2019), the deployment of the THz base stations and the prediction of user motion require further investigation to provide sustainable LoS links for holographic communication (Chaccour et al., 2022). In addition to THz communications, visible light communication (VLC) can provide an alternative solution for holographic communication by providing large available bandwidth (Beysens et al., 2021). Featuring a high transmission data rate (Strinati et al., 2019) and accurate positioning (Li et al., 2015), VLC can potentially support holographic communication as well as user tracking in an indoor environment. The coordination of THz communication and VLC is studied in Wang et al. (2022a) for providing a reliable service with a high data rate.
Table 4 provides a summary of the solutions discussed in this section as well as their limitations or costs.
5. Immersive communication: Open issues and future directions
Despite an increasing amount of studies and solutions for supporting XR, haptic communication, and holographic communication, there exist many open issues to address before immersive communications can popularize. To name a few, synchronization of multi-modal communications, user QoE modeling and enhancement, and intelligent network management for immersive communications remain to be challenging problems. In this section, we present some major open issues in immersive communications and potential future directions to address these issues.
5.1. Multi-modal communications
While immersive communications have the potential to enhance user engagement and facilitate immersive interactions, effective network resource management for ensuring synchronized multi-modal perception in highly dynamic network environments is an open issue. The synchronization of multi-modal perception consists of two aspects: inter-stream (cross-modal) and intra-stream. First, the transmission of auditory, visual, and haptic data results in multiple data streams that should be synchronized in order to prevent motion sickness. For example, the time interval between perceived visual and tactile movement should not exceed 1 ms (Van Den Berg et al., 2017). Second, to enhance the immersive experience, a data stream can include multiple data substreams corresponding to different sensations, e.g., temperature and pressure, which also need synchronization. Data substreams corresponding to different DoF of an HI should be synchronized to maintain the stable perception of simultaneity, and data substreams transmitted from LIDAR sensors placed at different locations should be synchronized to render a 3D hologram precisely. There are many works that enable either intra-stream or inter-stream synchronization from the perspective of a single network layer (Cizmeci et al., 2017; Zhang et al., 2018). However, in order to synchronize multi-modal perception, both network-related and application-related information is necessary. This is because network resource management for multi-modal communications is affected by not only different data packet formats, data traffic patterns, and QoS requirements, but also different sensitivities of human perception. The cross-layer design of network protocols for multi-modal communications, which can support information sharing among different layers for efficient use of network resources, is a potential solution (Kumar and Muhammad, 2018; She et al., 2020). A higher-layer approach to synchronizing multi-modal information can benefit from information on network conditions at lower layers, e.g., adaptively changing the priority of modalities in transport-layer multiplexing according to real-time physical-layer data rates. In addition, lower-layer approaches can take into account application-related information for efficient network resource management, e.g., timely adjusting the amount of radio resources allocated to a user in response to the dynamic sensitivity of the user's perception. Since multi-modal perception data in immersive communications can include personal biometric information of individual users, privacy challenges can arise in the transmission and processing of such data, such as biometric data leakage or profiling (Shen et al., 2021b).
5.2. AI-native immersive communications
AI techniques have demonstrated outstanding performance in identifying data correlations and analyzing device dynamics. As a result, some application functions using AI techniques, i.e., AI-enabled functions, have been developed for exploring unknown device states in immersive communications, such as viewpoint predictions in VR devices and haptic data prediction (Wu et al., 2022). To support increased service demands on immersive communications in 6G, AI-enabled functions will be deployed at network servers, i.e., cloud and edge servers (Li M. et al., 2021). Accordingly, the network should support the entire lifecycle of AI for the functions, including data collection, data pre-processing, AI model training, inference, and AI model evaluation. By taking AI-enabled functions as the built-in component for supporting immersive communications, several potential future research directions should be investigated. First, AI-enabled functions can be configured according to network management policies for supporting immersive communications. For example, in haptic communication, the prediction horizon, i.e., the time window for the predicted information, of tactile and kinesthetic information can be adjusted to adapt to real-time network transmission and computing delay, AI-based prediction accuracy, and service reliability requirements. Second, efficient data management schemes can be developed, in which low-signaling-overhead and grant-free network management can be achieved by sharing the data obtained from AI-enabled functions. For example, in VR video delivery, network controllers can use a viewpoint prediction model or results from a viewpoint prediction function and allocate sufficient downlink communication resources to users with highly dynamic viewpoint movements. Additionally, effective resource management solutions should be developed to support AI model training in real-time, so that AI-enabled functions can be updated according to user behavior dynamics, where sufficient network resources should be allocated for supporting data collection and processing at edge and cloud servers. When supporting AI-native immersive communications, essential security issues should be addressed. For example, data and model poisoning attacks can lead to biased or incorrect results by injecting false samples into the training datasets and updating crafted local AI models in federated learning, respectively (Khisamova et al., 2019).
5.3. Time-sensitive and deterministic networking
The existing solutions mentioned in Section 4 can help reduce transmission delay in immersive communications. However, satisfying the stringent delay and reliability requirements of XR, haptic communication, and holographic communication, especially ms-level end-to-end delay, remains a challenge. Fortunately, the ongoing efforts of 3GPP, IEEE, and IETF in supporting time-sensitive networking (TSN) and deterministic networking (DetNet) (Messenger, 2018; Nasrallah et al., 2019) provide solutions to meet the requirements of immersive communications (Rost and Kolding, 2022). The current efforts largely focus on the link and network layers (i.e., layers 2 and 3) and mostly target industrial networks (Rost and Kolding, 2022). Therefore, the corresponding solutions may not be readily applicable to all use cases of immersive communications. Potential future directions of TSN and DetNet for immersive communications include the followings. First, a comprehensive solution integrating existing TSN and DetNet designs for delay minimization can be important to immersive communications. For example, the joint design of coordinated sensing/capturing and communication (on the physical layer), traffic shaping and scheduling (on the link layer), flow identification and packet treatment (on the network layer), and viewpoint/haptic data prediction (on the application layer) can help reduce the end-to-end delay in immersive communications. Second, instead of treating different data streams in a mutli-modal communication separately, joint prioritization and resource orchestration for different types of data given their respective delay and jitter requirements is another promising direction. Third, integrating environment-aware and service-oriented network management paradigms can potentially enable TSN and DetNet for immersive communications. An example is to incorporate adaptive radio access network (RAN) function splitting, network slicing, and AI-driven network management to minimize delay and jitter by customizing for a specific service and adapting to the network environment.
5.4. QoE-oriented networking
While QoS provisioning from a network perspective benefits the transmission of XR content, haptic information, and holograms, as detailed in Section 4, evaluating and guaranteeing individual users' QoE is crucial in providing them an immersive experience. This is because many factors, besides communication network conditions, can affect user experience in immersive communications, including coding, compression, and human perception. Therefore, QoE-oriented networking from users' perspective is a promising network management paradigm to support immersive communications in the 6G era, including two potential aspects: personalized QoE modeling and QoE-oriented network resource management. First, existing works on immersive communications have limitations on personalizing QoE models for individual users. Conventional QoE modeling are based on either subjective tests or objective quality assessments (Tasaka, 2022). The former, conducted in relatively static laboratory environments, is costly and inapplicable in dynamic network environments, whereas the latter, evaluated by empirical human perception models, does not differentiate individual users (Barakabitze et al., 2019; Ruan and Xie, 2021). Finding a way to model personalized QoE while adapting to dynamic network environments remains an open issue. Second, managing network resources to guarantee the QoE of individual users in immersive communications necessitates user-level information. Even if several users request the same service, they may have different resource demands for improving their QoE (Kougioumtzidis et al., 2022). For example, due to the difference in the sensitivity of haptic perception, e.g., reaction time, the haptic sensors of interest and the scan time for each haptic sensor may differ in supporting different users, yielding different communication and computing resource demands (Coutinho and Boukerche, 2022). In the 6G era, the paradigm of digital twins can be a potential solution for QoE-oriented networking. Specifically, individual users can be characterized by creating user digital twins, including user data profiles that contain extensive well-organized user data, and a variety of digital twin functions that support flexible and customized data collection and analysis (Shen et al., 2021a). Both personalized QoE modeling and QoE-oriented network resource management for immersive communications can benefit from extensive timely updated and fine-grained user-level information (Wang et al., 2021). Although QoE-centric networking can provide users with immersive experiences based on the preferences and features of individual users, privacy issues, such as unconscionable behavioral profiling and improper uses of the profiles, should be addressed when collecting and processing data with user preference information (Nguyen et al., 2021).
5.5. Collaborative multi-tier computing
Research on multi-tier computing is still at a nascent stage (Yang, 2019). In the 6G era, collaborative multi-tier computing can be a promising computing paradigm by leveraging the various characteristics of computing servers, such as service coverage and resource capacity. There are two research directions to facilitate immersive communications. First, computing tasks corresponding to different steps of immersive communications can be executed on different computing servers. Different steps of immersive communications may have different network resource demands, e.g., I/O-intensive data fusion tasks and CPU-intensive data encoding tasks require different communication and computing resources (Gao H. et al., 2021). Selecting proper computing servers for each step based on the features of computing servers and the resource demands of the step is beneficial for satisfying stringent QoS requirements of immersive communications. Second, context data management across computing servers at different tiers plays an important role in supporting immersive communications. A significant percentage of computing tasks in immersive communications will be stateful, meaning that context data are required during task execution, e.g., volumetric media objects or holograms in rendering (Gao et al., 2022b; Zhou C. et al., 2022). When stateful computing tasks are executed on a computing server, the required context data should be either stored locally on the computing server or downloaded remotely from other computing servers. As a result, managing context data, e.g., selecting proper computing servers from different tiers to proactively store context data based on the computing task arrival and mobility patterns of individual users, will have a significant impact on the performance of immersive communications. While collaborative multi-tier computing provides more options for context data management than conventional MEC, the coordination of computing servers at different tiers can significantly complicate the problem of context data management. In addition, establishing reliable trust relationships between computing servers and among computing servers and users, as well as measuring the credibility of users, is an open and important research direction in collaborative multi-tier computing for immersive communications (Shen et al., 2022).
5.6. New network architecture
Network architecture innovation is indispensable for a widespread realization of immersive communications, and innovations building on recent developments for 6G architecture are promising future directions. The need for new architectures manifests in several aspects. First, the computing-intensive nature of immersive communications, rooted from processing and compressing 3D data, predicting viewpoints and haptics data, and reconstructing 3D objects, demands a network architecture with extensive computing resources and reliable computing service provisioning. As a result, a heterogeneous network with multi-tier computing architecture (Yang, 2019; Zhou C. et al., 2022), featuring on-demand and collaborative computing task offloading and scheduling across the network, is important to immersive communications yet open to investigation at the moment. Second, as networks become increasingly complex and the requirements of immersive communications become exceedingly stringent, supporting immersive communications in 6G requires a network architecture with unprecedented scalability, flexibility, and adaptivity. A 6G architecture integrating digital twins, network slicing, and pervasive AI (Shen et al., 2021a) can be a foundation to immersive communications. Third, considering the diverse delay requirements of different XR, haptic communication, and holographic communication use cases, the Open-RAN (O-RAN) architecture featuring realtime, near-realtime, and non-realtime layers can benefit service differentiation in RAN management for immersive communications (Abdalla et al., 2022). Last, considering different user preferences and diverse user devices, a new architecture enabling user-centric networking, such as the everyone-centric architecture in Yang Y. et al. (2022), has a potential to empower immersive communications. However, as none of the above architectures is developed specifically for immersive communications, new designs and customizations based on them for supporting immersive communications are open for investigation.
6. Conclusion
In this article, we have delved into immersive communications toward 6G and presented a comprehensive review of the related concepts, representative use cases, technical challenges and potential solutions, and future directions. Focusing on XR, haptic communication, and holographic communication, we have illustrated their general procedures, network requirements, and recent developments in the context of a vision for 6G. Despite abundant emerging use cases and exciting recent advancements, we have shown that many challenges are yet to be conquered before the envisioned prosperity of immersive communications can occur. In particular, the exceeding transmission rate, delay, and reliability requirements, further complicated by the multi-modal and computing-intensive features of immersive communications, indicate the necessity of an unprecedented amount of communication and computing resources as well as novel paradigms such as, multi-tier computing and user-centric networking.
To respond to the challenges posed by supporting immersive communications and promote further research, we have presented various solutions and future directions in this survey. From physical-layer technologies such as Terahertz communications to application-layer solutions such as user behavior prediction, advances in each layer will contribute to the realization of immersive communications. Meanwhile, new paradigms envisioned for 6G, such as QoE-oriented networking and AI-native communications, represent promising future directions for researchers in the field to explore.
The paradigm shift to immersive communications is truly exciting and inspiring, especially when viewed in the context of the evolution toward 6G. Many opportunities exist, and more will emerge for researchers and engineers in the fields of communications, networking, and computer science to realize immersive communications. We hope this review inspires further interest among fellow researchers and provides fundamental knowledge on related research, thereby contributing to this much-anticipated paradigm shift and making immersive communications the next reality.
Author contributions
All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.
Funding
This work was financially supported by research grants from the Natural Sciences and Engineering Research Council of Canada.
Acknowledgments
The authors would like to thank Dr. Dongxiao Liu for his helpful comments related to the security and privacy issues in immersive communications.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
1. ^Note that the three forms may co-exist since a use case may involve more than one form, and additional forms of immersive communications may exist or emerge.
2. ^Haptic communication and the Tactile Internet are related as a service and a medium as in the case of voice over IP (VoIP) services and the Internet (Aijaz et al., 2016).
3. ^In unilateral haptic communication, either the step of haptic information acquisition or the step of haptic display is skipped depending on whether an HI is sending or receiving haptic information.
4. ^Note that the term “holographic communication” is also used in the literature of massive MIMO and IRS but with a different and unrelated meaning (Dardari and Decarli, 2021).
References
Aazhang, B., Ahokangas, P., Alves, H., Alouini, M.-S., Beek, J., Benn, H., et al. (2019). Key Drivers and Research Challenges for 6G Ubiquitous Wireless Intelligence (White Paper). Oulu: 6G Flagship; University of Oulu.
Abari, O., Bharadia, D., Duffield, A., and Katabi, D. (2016). “Cutting the cord in virtual reality,” in Proceedings of the 15th ACM Workshop on Hot Topics in Networks (Atlanta, GA: ACM), 162–168.
Abbas, R., Shirvanimoghaddam, M., Li, Y., and Vucetic, B. (2019). A novel analytical framework for massive grant-free NOMA. IEEE Trans. Commun. 67, 2436–2449. doi: 10.1109/TCOMM.2018.2881120
Abdalla, A. S., Upadhyaya, P. S., Shah, V. K., and Marojevic, V. (2022). Toward next generation open radio access networks-what O-RAN can and cannot do! IEEE Netw. doi: 10.1109/MNET.108.2100659
Abiri, A., Pensa, J., Tao, A., Ma, J., Juo, Y.-Y., Askari, S. J., et al. (2019). Multi-modal haptic feedback for grip force reduction in robotic surgery. Sci. Rep. 9, 5016. doi: 10.1038/s41598-019-40821-1
Affan, M., Khan, H., Ahmed, S., Uddin, R., and Shirazi, M. A. (2021). “Haptic-enabled virtual laboratory for hands-on E-learning: a technology for and beyond the pandemic ERA,” in IEEE International Conference on Robotics and Automation in Industry (Rawalpindi: IEEE), 1–6.
Aheleroff, S., Xu, X., Zhong, R. Y., and Lu, Y. (2021). Digital twin as a service (DTaaS) in industry 4.0: an architecture reference model. Adv. Eng. Inform. 47, 1–15. doi: 10.1016/j.aei.2020.101225
Ahmad, A. S., Alomaier, A. T., Elmahal, D. M., Abdlfatah, R. F., and Ibrahim, D. M. (2021). EduGram: education development based on hologram technology. Int. J. Online Biomed. Eng. 17, 32–49. doi: 10.3991/ijoe.v17i14.27371
Aijaz, A., Dohler, M., Aghvami, A. H., Friderikos, V., and Frodigh, M. (2016). Realizing the tactile Internet: haptic communications over next generation 5G cellular networks. IEEE Wireless Commun. 24, 82–89. doi: 10.1109/MWC.2016.1500157RP
Aijaz, A., and Sooriyabandara, M. (2018). The tactile Internet for industries: a review. Proc. IEEE 107, 414–435. doi: 10.1109/JPROC.2018.2878265
Al-Abbasi, A. O., Aggarwal, V., and Ra, M.-R. (2019). Multi-tier caching analysis in CDN-based over-the-top video streaming systems. IEEE/ACM Trans. Network. 27, 835–847. doi: 10.1109/TNET.2019.2900434
Ali, R., Zikria, Y. B., Bashir, A. K., Garg, S., and Kim, H. S. (2021). URLLC for 5G and beyond: requirements, enabling incumbent technologies and network intelligence. IEEE Access 9, 67064–67095. doi: 10.1109/ACCESS.2021.3073806
Amirpour, H., Pinheiro, A. M., Fonseca, E., Ghanbari, M., and Pereira, M. (2021). Quality evaluation of holographic images coded with standard codecs. IEEE Trans. Multimedia 24, 3256–3264. doi: 10.1109/TMM.2021.3096059
Antonakoglou, K., Xu, X., Steinbach, E., Mahmoodi, T., and Dohler, M. (2018). Toward haptic communications over the 5G tactile Internet. IEEE Commun. Surveys Tutorials 20, 3034–3059. doi: 10.1109/COMST.2018.2851452
Anwar, W., Kumar, A., Franchi, N., and Fettweis, G. (2021). Physical layer abstraction for multi-connectivity communications: modeling and analysis. IEEE Trans. Wireless Commun. 21, 1779–1793. doi: 10.1109/TWC.2021.3106771
Apostolopoulos, J. G., Chou, P. A., Culbertson, B., Kalker, T., Trott, M. D., and Wee, S. (2012). The road to immersive communication. IEEE 100, 974–990. doi: 10.1109/JPROC.2011.2182069
Baik, S., Han, I., Park, J.-M., and Park, J. (2020). “Multi-fingertip vibrotactile array interface for 3D virtual interaction,” in Proceedings of the IEEE Haptics Symposium (Crystal City, VA: IEEE), 898–903.
Barakabitze, A. A., Barman, N., Ahmad, A., Zadtootaghaj, S., Sun, L., Martini, M. G., et al. (2019). QoE management of multimedia streaming services in future networks: a tutorial and survey. IEEE Commun. Surveys Tutorials 22, 526–565. doi: 10.1109/COMST.2019.2958784
Bastug, E., Bennis, M., Medard, M., and Debbah, M. (2017). Toward interconnected virtual reality: opportunities, challenges, and enablers. IEEE Commun. Mag. 55, 110–117. doi: 10.1109/MCOM.2017.1601089
Beysens, J., Wang, Q., Van den Abeele, M., and Pollin, S. (2021). “BlendVLC: a cell-free VLC network architecture empowered by beamspot blending,” in IEEE INFOCOM 2021-IEEE Conference on Computer Communications (Vancouver, BC: IEEE), 1–10.
Billström, O., Cederquist, L., Ewerbring, M., Sandegren, G., and Uddenfeldt, J. (2006). Fifty years with mobile phones - from novelty to no. 1 consumer product. Ericsson Rev. 3, 101–106.
Boabang, F., Glitho, R., Elbiaze, H., Belqami, F., and Alfandi, O. (2020). “A framework for predicting haptic feedback in needle insertion in 5G remote robotic surgery,” in Proceedings of the IEEE 17th Annual Consumer Communications and Networking Conference (Las Vegas, NV: IEEE), 1–6.
Budhiraja, I., Tyagi, S., Tanwar, S., Kumar, N., and Rodrigues, J. J. P. C. (2019). Tactile Internet for smart communities in 5G: an insight for NOMA-based solutions. IEEE Trans. Ind. Inform. 15, 3104–3112. doi: 10.1109/TII.2019.2892763
Carroll, M., and Yildirim, C. (2021). “The effect of body-based haptic feedback on player experience during VR gaming,” in International Conference on Human-Computer Interaction (Springer), 163–171.
Caserman, P., Schmidt, P., Gobel, T., Zinnacker, J., Kecke, A., and Gobel, S. (2022). Impact of full-body avatars in immersive multiplayer virtual reality training for police forces. IEEE Trans. Games 14, 706–714. doi: 10.1109/TG.2022.3148791
Chaccour, C., Soorki, M. N., Saad, W., Bennis, M., and Popovski, P. (2020). “Risk-based optimization of virtual reality over Terahertz reconfigurable intelligent surfaces,” in IEEE International Conference on Communications (Dublin: IEEE), 1–6.
Chaccour, C., Soorki, M. N., Saad, W., Bennis, M., Popovski, P., and Debbah, M. (2022). Seven defining features of terahertz (THz) wireless systems: a fellowship of communication and sensing. IEEE Commun. Surveys Tutorials 24, 967–993. doi: 10.1109/COMST.2022.3143454
Chaudhuri, S., and Bhardwaj, A. (2018). “Predictive sampler design for haptic signals,” in Kinesthetic Perception, ed J. Kacprzyk (Singapore: Springer), 29–53.
Chen, J., Lu, J.-L., and Ochiai, Y. (2022). “Simulation object edge haptic feedback in virtual reality based on dielectric elastomer,” in Proceedings of the International Conference on Human-Computer Interaction (Springer), 3–9.
Chen, K., Kamezaki, M., Katano, T., Ishida, T., Seki, M., Ichiryu, K., et al. (2016). “Analysis of operation strategy in a multi-operator control system for four-arm disaster response robot OCTOPUS,” in IEEE/SICE International Symposium on System Integration (Sapporo: IEEE), 514–519.
Chen, K.-C., Lin, S.-C., Hsiao, J.-H., Liu, C.-H., Molisch, A. F., and Fettweis, G. P. (2021). Wireless networked multirobot systems in smart factories. Proc. IEEE 109, 468–494. doi: 10.1109/JPROC.2020.3033753
Chen, N., Quan, S., Zhang, S., Qian, Z., Jin, Y., Wu, J., et al. (2021). Cuttlefish: Neural configuration adaptation for video analysis in live augmented reality. IEEE Trans. Parallel Distribut. Syst. 32, 830–841. doi: 10.1109/TPDS.2020.3035044
Chen, Z., Ma, X., Zhang, B., Zhang, Y., Niu, Z., Kuang, N., et al. (2019). A survey on Terahertz communications. China Commun. 16, 1–35. doi: 10.23919/JCC.2019.09.001
Cheremkhin, P., and Kurbatova, E. (2019). Wavelet compression of off-axis digital holograms using real/imaginary and amplitude/phase parts. Sci. Rep. 9, 1–13. doi: 10.1038/s41598-019-44119-0
Choi, P. J., Oskouian, R. J., and Tubbs, R. S. (2018). Telesurgery: past, present, and future. Cureus 10, 2716. doi: 10.7759/cureus.2716
Choi, S., and Tan, H. (2004). “Effect of update rate on perceived instability of virtual haptic texture,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (Sendai: IEEE), 1–6.
Chukhno, O., Galinina, O., Andreev, S., Molinaro, A., and Iera, A. (2022). Interplay of user behavior, communication, and computing in immersive reality 6G applications. IEEE Commun. Mag. 1, 1–7. doi: 10.1109/MCOM.009.2200238
Chun, D. M., Karimi, H., and Sa nosa, D. J. (2022). Traveling by headset: immersive VR for language learning. CALICO J. 39, 21306. doi: 10.1558/cj.21306
Cizmeci, B., Xu, X., Chaudhari, R., Bachhuber, C., Alt, N., and Steinbach, E. (2017). A multiplexing scheme for multimodal teleoperation. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1–28. doi: 10.1145/3063594
Clemm, A., Vega, M. T., Ravuri, H. K., Wauters, T., and De Turck, F. (2020). Toward truly immersive holographic-type communication: challenges and solutions. IEEE Commun. Mag. 58, 93–99. doi: 10.1109/MCOM.001.1900272
Cobos, S., Ferre, M., Uran, M. S., Ortego, J., and Pena, C. (2008). “Efficient human hand kinematics for manipulation tasks,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (Nice: IEEE), 2246–2251.
Conti, J.-P. (2008). In search of the princess Leia effect. Eng. Technol. 3, 66–69. doi: 10.1049/et:20081613
Coutinho, R. W., and Boukerche, A. (2022). Design of edge computing for 5G-enabled tactile Internet-based industrial applications. IEEE Commun. Mag. 60, 60–66. doi: 10.1109/MCOM.001.21261
Culbertson, H., Schorr, S. B., and Okamura, A. M. (2018). Haptics: the present and future of artificial touch sensation. Ann. Rev. Control Rob. Auton. Syst. 1, 385–409. doi: 10.1146/annurev-control-060117-105043
Dahiya, R. (2019). E-skin: from humanoids to humans [point of view]. Proc. IEEE 107, 247–252. doi: 10.1109/JPROC.2018.2890729
Dahiya, R., Yogeswaran, N., Liu, F., Manjakkal, L., Burdet, E., Hayward, V., et al. (2019). Large-area soft e-skin: the challenges beyond sensor designs. Proc. IEEE 107, 2016–2033. doi: 10.1109/JPROC.2019.2941366
Dai, J., Zhang, Z., Mao, S., and Liu, D. (2020). A view synthesis-based 360° VR caching system over MEC-enabled C-RAN. IEEE Trans. Circ. Syst. Video Technol. 30, 3843–3855. doi: 10.1109/TCSVT.2019.2946755
Dang, T., and Peng, M. (2019). Joint radio communication, caching, and computing design for mobile virtual reality delivery in fog radio access networks. IEEE J. Select. Areas Commun. 37, 1594–1607. doi: 10.1109/JSAC.2019.2916486
Dardari, D., and Decarli, N. (2021). Holographic communication using intelligent surfaces. IEEE Commun. Mag. 59, 35–41. doi: 10.1109/MCOM.001.2001156
David, K., and Berndt, H. (2018). 6G vision and requirements: Is there any need for beyond 5G? IEEE Vehicular Technol. Mag. 13, 72–80. doi: 10.1109/MVT.2018.2848498
del Peral-Rosado, J. A., Raulefs, R., López-Salcedo, J. A., and Seco-Granados, G. (2018). Survey of cellular mobile radio localization methods: from 1G to 5G. IEEE Commun. Surveys Tutorials 20, 1124–1148. doi: 10.1109/COMST.2017.2785181
Deng, C., Fang, X., Han, X., Wang, X., Yan, L., He, R., et al. (2020). IEEE 802.11be Wi-Fi 7: new challenges and opportunities. IEEE Commun. Surveys Tutorials 22, 2136–2166. doi: 10.1109/COMST.2020.3012715
Diana, M., and Marescaux, J. (2015). Robotic surgery. J. Br. Surgery 102, e15-e28. doi: 10.1002/bjs.9711
Ding, J., Nemati, M., Pokhrel, S. R., Park, O.-S., Choi, J., and Adachi, F. (2021). Enabling grant-free URLLC: an overview of principle and enhancements by massive MIMO. IEEE Internet Things J. 9, 384–400. doi: 10.1109/JIOT.2021.3107242
Dizdar, O., Mao, Y., Han, W., and Clerckx, B. (2020). “Rate-splitting multiple access: a new frontier for the PHY layer of 6G,” in 92nd Vehicular Technology Conference (VTC2020-Fall) (Victoria, BC: IEEE), 1–7.
Du, H., Liu, J., Niyato, D., Kang, J., Xiong, Z., Zhang, J., et al. (2022). Attention-aware resource allocation and QoE analysis for metaverse xURLLC services. arXiv preprint arXiv:2208.05438. doi: 10.48550/arXiv.2208.05438
Du, J., Yu, F. R., Lu, G., Wang, J., Jiang, J., and Chu, X. (2020). MEC-assisted immersive VR video streaming over Terahertz wireless networks: a deep reinforcement learning approach. IEEE Internet Things J. 7, 9517–9529. doi: 10.1109/JIOT.2020.3003449
Eckstein, B., Krapp, E., Elsässer, A., and Lugrin, B. (2019). Smart substitutional reality: integrating the smart home into virtual reality. Entertain Comput. 31, 100306. doi: 10.1016/j.entcom.2019.100306
El Rassi, I., and El Rassi, J.-M. (2020). A review of haptic feedback in tele-operated robotic surgery. J. Med. Eng. Technol. 44, 247–254. doi: 10.1080/03091902.2020.1772391
El Saddik, A. (2018). Digital twins: The convergence of multimedia technologies. IEEE Multimedia 25, 87–92. doi: 10.1109/MMUL.2018.023121167
Elayan, H., Amin, O., Shihada, B., Shubair, R. M., and Alouini, M.-S. (2019). Terahertz band: the last piece of RF spectrum puzzle for communication systems. IEEE Open J. Commun. Soc. 1, 1–32. doi: 10.1109/OJCOMS.2019.2953633
Elbamby, M. S., Perfecto, C., Bennis, M., and Doppler, K. (2018a). “Edge computing meets millimeter-wave enabled VR: paving the way to cutting the cord,” in IEEE Wireless Communications and Networking Conference (Barcelona: IEEE), 1–6.
Elbamby, M. S., Perfecto, C., Bennis, M., and Doppler, K. (2018b). Toward low-latency and ultra-reliable virtual reality. IEEE Netw. 32, 78–84. doi: 10.1109/MNET.2018.1700268
Ericsson (2022). 5G Advanced: Evolution Towards 6G. Ericsson White Paper. Available online at: https://www.ericsson.com/49e389/assets/local/reports-papers/white-papers/5g-advanced-evolution-towards-6g.pdf (accessed November 23, 2022).
Essaili, A. E., Thorson, S., Jude, A., Ewert, J. C., Tyudina, N., Caltenco, H., et al. (2022). Holographic Communication in 5G Networks. Ericsson Technology Review. Available online at: https://www.ericsson.com/en/reports-and-papers/ericsson-technology-review/articles/holographic-communication-in-5g-networks (accessed August 25, 2022).
Favalora, G. (2005). Volumetric 3D displays and application infrastructure. Computer 38, 37–44. doi: 10.1109/MC.2005.276
Feng, X., Swaminathan, V., and Wei, S. (2019). Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking. Proc. ACM Interact. Mobile Wearable Ubiquitous Technol. 3, 1–22. doi: 10.1145/3328914
Feth, D., Tran, B. A., Groten, R., Peer, A., and Buss, M. (2009). “Shared-control paradigms in multi-operator-single-robot teleoperation,” in Human Centered robot Systems: Cognition, Interaction, Technology, eds H. Ritter, G. Sagerer, R. Dillmann, and M. Buss (Berlin; Heidelberg: Springer), 53–62.
Fettweis, G., Boche, H., Wiegand, T., Zielinski, E., Schotten, H., Merz, P., et al. (2014). The Tactile Internet-ITU-T Technology Watch Report. Available online at: https://www.itu.int/dms_pub/itu-t/opb/gen/T-GEN-TWATCH-2014-1-PDF-E.pdf (accessed July 20, 2022).
FG-NET2030 (2020). Representative Use Cases and Key Network Requirements for Network 2030. Available online at: http://handle.itu.int/11.1002/pub/815125f5-en (accessed June 16, 2022).
Fishel, J. A., and Loeb, G. E. (2012). Bayesian exploration for intelligent identification of textures. Front. Neurorob. 6, 4. doi: 10.3389/fnbot.2012.00004
Fitzek, F. H., Li, S.-C., Speidel, S., Strufe, T., Simsek, M., and Reisslein, M. (2021). Tactile Internet: With Human-in-the-Loop. London: Academic Press.
Fowler, M. S., Halprin, A., and Schlichting, J. D. (1986). Back to the future: a model for telecommunications. Federal Commun. Law J. 38, 145–200.
Fratz, M., Seyler, T., Bertz, A., and Carl, D. (2021). Digital holography in production: an overview. Light Adv. Manufact. 2, 283–295. doi: 10.37188/lam.2021.015
Fujimoto, T., Ishibashi, Y., and Sugawara, S. (2008). “Influences of inter-stream synchronization error on collaborative work in haptic and visual environments,” in IEEE Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems (Reno, NV: IEEE), 113–119.
Gao, H., Zhu, X., Guan, Q., Yang, X., Yao, Y., Zeng, W., et al. (2021). cuFSDAF: an enhanced flexible spatiotemporal data fusion algorithm parallelized using graphics processing units. IEEE Trans. Geosci. Remote Sens. 60, 1–16. doi: 10.1109/TGRS.2021.3080384
Gao, J., Zhuang, W., Li, M., Shen, X., and Li, X. (2021). MAC for machine-type communications in industrial IoT–Part I: protocol design and analysis. IEEE Internet Things J. 8, 9945–9957. doi: 10.1109/JIOT.2021.3051181
Gao, Y., Wei, X., Chen, J., and Zhou, L. (2022a). Toward immersive experience: Evaluation for interactive network services. IEEE Netw. 36, 144–150. doi: 10.1109/MNET.121.2100323
Gao, Y., Wu, D., and Zhou, L. (2022b). How to improve immersive experience?. IEEE Trans. Multimedia 1, 1–14. doi: 10.1109/TMM.2022.3199666
Gately, M., Zhai, Y., Yeary, M., Petrich, E., and Sawalha, L. (2011). A three-dimensional swept volume display based on LED arrays. J. Display Technol. 7, 503–514. doi: 10.1109/JDT.2011.2157455
Ge, X., Zhou, R., and Li, Q. (2019). 5G NFV-based tactile Internet for mission-critical IoT services. IEEE Internet Things J. 7, 6150–6163. doi: 10.1109/JIOT.2019.2958063
Gibney, E. (2019). Star wars-style 3D images created from single speck of foam. Nature 575, 272–274. doi: 10.1038/d41586-019-03454-y
Glushakov, M., Zhang, Y., Han, Y., Scargill, T. J., Lan, G., and Gorlatova, M. (2020). “Invited paper: edge-based provisioning of holographic content for contextual and personalized augmented reality,” in Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (Austin, TX: IEEE), 1–6.
Gokhale, V., Kroep, K., Rao, V. S., Verburg, J., and Yechangunja, R. (2020). Tixt: an extensible testbed for tactile Internet communication. IEEE Internet Things Mag. 3, 32–37. doi: 10.1109/IOTM.0001.1900075
Gooday, G. J. (2005). Electrical futures past. Endeavou. 29, 150–155. doi: 10.1016/j.endeavour.2005.07.007
Gotsch, D., Zhang, X., Merritt, T., and Vertegaal, R. (2018). “Telehuman2: a cylindrical light field teleconferencing system for life-size 3D human telepresence,” in Proceedings of the CHI Conference on Human Factors in Computing Systems (Montreal, QC: ACM), 1–10.
Gu, Z., Lu, H., Hong, P., and Zhang, Y. (2022). Reliability enhancement for VR delivery in mobile-edge empowered dual-connectivity sub-6 GHz and mmWave HetNets. IEEE Trans. Wireless Commun. 21, 2210–2226. doi: 10.1109/TWC.2021.3110099
Gupta, R., Reebadiya, D., and Tanwar, S. (2021). 6G-enabled edge intelligence for ultra-reliable low latency applications: vision and mission. Comput. Standards Interfaces 77, 103521. doi: 10.1016/j.csi.2021.103521
Gupta, R., Tanwar, S., Tyagi, S., and Kumar, N. (2019). Tactile-Internet-based telesurgery system for healthcare 4.0: An architecture, research challenges, and future directions. IEEE Netw. 33, 22–29. doi: 10.1109/MNET.001.1900063
Han, D.-I., and Jung, T. (2018). “Identifying tourist requirements for mobile ar tourism applications in urban heritage tourism,” in Augmented Reality and Virtual Reality (Manchester: Springer), 3–20.
Han, Y., Niyato, D., Leung, C., Kim, D. I., Zhu, K., Feng, S., et al. (2022). A dynamic hierarchical framework for IoT-assisted digital twin synchronization in the Metaverse. IEEE Internet Things J. 10, 268–284. doi: 10.1109/JIOT.2022.3201082
Harvey, C., Selmanović, E., O'Connor, J., and Chahin, M. (2021). A comparison between expert and beginner learning for motor skill development in a virtual reality serious game. Vis. Comput. 37, 3–17. doi: 10.1007/s00371-019-01702-w
Hashimoto, T., and Ishibashi, Y. (2006). “Group synchronization control over haptic media in a networked real-time game with collaborative work,” in Proceedings of 5th ACM SIGCOMM workshop on Network and System Support for Games (Singapore: ACM), 8-es.
He, J., Qureshi, M. A., Qiu, L., Li, J., Li, F., and Han, L. (2018). “Rubiks: practical 360-degree streaming for smartphones,” in Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (Munich: ACM), 482–494.
He, L., Liu, K., He, Z., and Cao, L. (2023). Three-dimensional holographic communication system for the metaverse. Opt. Commun. 526, 128894. doi: 10.1016/j.optcom.2022.128894
Heath, A. (2021). Inside Facebook?s Metaverse for Work. Available online at: https://www.theverge.com/2021/8/19/22629942/facebook-workrooms-horizon-oculus-vr (accessed November 23, 2022).
Hirayama, R., Martinez Plasencia, D., Masuda, N., and Subramanian, S. (2019). A volumetric display for visual, tactile and audio presentation using acoustic trapping. Nature 575, 320–323. doi: 10.1038/s41586-019-1739-5
Hoggan, E., Anwar, S., and Brewster, S. A. (2007). “Mobile multi-actuator tactile displays,” in Proceedings of the International Workshop on Haptic and Audio Interaction Design (Seoul: Springer), 22–33.
Holland, O., Steinbach, E., Prasad, R. V., Liu, Q., Dawy, Z., Aijaz, A., et al. (2019). The IEEE 1918.1 “tactile Internet” standards working group and its standards. Proc. IEEE 107, 256–279. doi: 10.1109/JPROC.2018.2885541
Hou, X., Dey, S., Zhang, J., and Budagavi, M. (2018). “Predictive view generation to enable mobile 360-degree and VR experiences,” in Proceedings of the 2018 Morning Workshop on Virtual Reality and Augmented Reality Network (Budapest: ACM), 20–26.
Hou, X., Dey, S., Zhang, J., and Budagavi, M. (2021). Predictive adaptive streaming to enable mobile 360-degree and VR experiences. IEEE Trans. Multimedia 23, 716–731. doi: 10.1109/TMM.2020.2987693
Hou, Z., She, C., Li, Y., Zhuo, L., and Vucetic, B. (2019). Prediction and communication co-design for ultra-reliable and low-latency communications. IEEE Trans. Wireless Commun. 19, 1196–1209. doi: 10.1109/ICC.2019.8762045
Hu, F., Deng, Y., and Aghvami, A. H. (2022). Cooperative multigroup broadcast 360° video delivery network: a hierarchical federated deep reinforcement learning approach. IEEE Trans. Wireless Commun. 21, 4009–4024. doi: 10.1109/TWC.2021.3126147
Hu, F., Deng, Y., Saad, W., Bennis, M., and Aghvami, A. H. (2020). Cellular-connected wireless virtual reality: requirements, challenges, and solutions. IEEE Commun. Mag. 58, 105–111. doi: 10.1109/MCOM.001.1900511
Hu, F., Deng, Y., Zhou, H., Jung, T. H., Chae, C.-B., and Aghvami, A. H. (2021). A vision of an XR-aided teleoperation system toward 5G/B5G. IEEE Commun. Mag. 59, 34–40. doi: 10.1109/MCOM.001.2000581
Hu, L., Tian, Y., Yang, J., Taleb, T., Xiang, L., and Hao, Y. (2019). Ready player one: UAV-clustering-based multi-task offloading for vehicular VR/AR gaming. IEEE Netw. 33, 42–48. doi: 10.1109/MNET.2019.1800357
Hu, M., Wang, L., Tan, B., and Jin, S. (2022). Two-Tier 360-degree video delivery control in multiuser immersive communications systems. IEEE Trans. Vehicular Technol. doi: 10.1109/TVT.2022.3219496
Hu, P., Dhelim, S., Ning, H., and Qiu, T. (2017). Survey on fog computing: architecture, key technologies, applications and open issues. J. Netw. Comput. Appl. 98, 27–42. doi: 10.1016/j.jnca.2017.09.002
Huang, Y., Zhu, Y., Qiao, X., Su, X., Dustdar, S., and Zhang, P. (2022). Toward holographic video communications: a promising AI-driven solution. IEEE Commun. Mag. 60, 82–88. doi: 10.1109/MCOM.001.220021
Huawei Technologies Co., Ltd. (2016). Whitepaper on the Bearer Network VR-Oriented Requirement. Available online at: https://www.huawei.com/en/technology-insights/industry-insights/technology/white-papers/whitepaper-on-the-vr-oriented-bearer-network-requirement (accessed August 10, 2022).
Huzaifa, M., Desai, R., Grayson, S., Jiang, X., Jing, Y., Lee, J., et al. (2022). ILLIXR: an open testbed to enable extended reality systems research. IEEE Micro 42, 97–106. doi: 10.1109/MM.2022.3161018
Javidi, B., Ferraro, P., Hong, S.-H., Nicola, S. D., Finizio, A., Alfieri, D., et al. (2005). Three-dimensional image fusion by use of multiwavelength digital holography. Opt. Lett. 30, 144–146. doi: 10.1364/OL.30.000144
Jay, C., Glencross, M., and Hubbold, R. (2007). Modeling the effects of delayed haptic and visual feedback in a collaborative virtual environment. ACM Trans. Comput. Hum. Interact. 14, 8-es. doi: 10.1145/1275511.1275514
Jayasankar, U., Thirumal, V., and Ponnurangam, D. (2021). A survey on data compression techniques: from the perspective of data quality, coding schemes, data type and applications. J. King Saud Univer. Comput. Inf. Sci. 33, 119–140. doi: 10.1016/j.jksuci.2018.05.006
Ji, H., Park, S., Yeo, J., Kim, Y., Lee, J., and Shim, B. (2018). Ultra-reliable and low-latency communications in 5G downlink: physical layer aspects. IEEE Wireless Commun. 25, 124–130. doi: 10.1109/MWC.2018.1700294
Jia, M., and Liang, W. (2018). “Delay-sensitive multiplayer augmented reality game planning in mobile edge computing,” in Proceedings of the 21st ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (Montreal, QC: ACM), 147–154.
Jiang, W., Han, B., Habibi, M. A., and Schotten, H. D. (2021). The road towards 6G: a comprehensive survey. IEEE Open J. Commun. Soc. 2, 334–366. doi: 10.1109/OJCOMS.2021.3057679
Jones, A., McDowall, I., Yamada, H., Bolas, M., and Debevec, P. (2007). “Rendering for an interactive 360° light field display,” in Proceedings of the International Conference on Computer Graphics and Interactive Techniques (San Diego, CA: ACM), 40-es.
Jumreornvong, O., Yang, E., Race, J., and Appel, J. (2020). Telemedicine and medical education in the age of COVID-19. Acad. Med. 95, 1838–1843. doi: 10.1097/ACM.0000000000003711
Jung, T., tom Dieck, M. C., and Rauschnabel, P. A. (2020). Augmented Reality and Virtual Reality: Changing Realities in a Dynamic World. Cham: Springer.
Kaluschke, M., Yin, M. S., Haddawy, P., Srimaneekarn, N., Saikaew, P., and Zachmann, G. (2021). “A shared haptic virtual environment for dental surgical skill training,” in IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (Lisbon: IEEE), 347–352.
Kavanagh, S., Luxton-Reilly, A., Wuensche, B., and Plimmer, B. (2017). A systematic review of virtual reality in education. Themes Sci. Technol. Educ. 10, 85–119.
Kerrigan, S. (2018). 13 Hologram Projections Around the World that Are More Than Just a Pretty Display. Available online at: https://interestingengineering.com/innovation/13-hologram-projections-around-the-world-that-are-more-than-just-a-pretty-display (accessed June 20, 2022).
Khisamova, Z. I., Begishev, I. R., and Sidorenko, E. L. (2019). Artificial intelligence and problems of ensuring cyber security. Int. J. Cyber Criminol. 13, 564–577. doi: 10.17150/2500-4255.2019.13(4).564-574
Kim, H., Yang, J., Choi, M., Lee, J., Yoon, S., Kim, Y., et al. (2018). “Immersive 360° VR tiled streaming system for esports service,” in Proceedings of the 9th ACM Multimedia Systems Conference (Amsterdam: ACM), 541–544.
Kotaba, R., Manchón, C. N., Balercia, T., and Popovski, P. (2021). How URLLC can benefit from NOMA-based retransmissions. IEEE Trans. Wireless Commun. 20, 1684–1699. doi: 10.1109/TWC.2020.3035517
Kotaba, R., Manchon, C. N., Pratas, N. M. K., Balercia, T., and Popovski, P. (2019). “Improving spectral efficiency in URLLC via NOMA-based retransmissions,” in IEEE International Conference on Communications (ICC) (Shanghai: IEEE), 1–7.
Kougioumtzidis, G., Poulkov, V., Zaharis, Z., and Lazaridis, P. (2022). “QoE assessment aspects for virtual reality and holographic telepresence applications,” in International Conference on Future Access Enablers of Ubiquitous and Intelligent Infrastructures (Springer), 171–180.
Kugler, L. (2021). The state of virtual reality hardware. Commun. ACM. 64, 15–16. doi: 10.1145/3441290
Kumagai, K., Miura, S., and Hayasaki, Y. (2021). Colour volumetric display based on holographic-laser-excited graphics using drawing space separation. Sci. Rep. 11, 1–9. doi: 10.1038/s41598-021-02107-3
Kumar, A., Kumar, S., Kaushik, A., Kumar, A., and Saini, J. (2020). Real time estimation and suppression of hand tremor for surgical robotic applications. Microsyst. Technol. 28, 305–311. doi: 10.1007/s00542-019-04736-1
Kumar, A., and Muhammad, B. (2018). “Multi-sensory framework for holistic communications,” in IEEE International Symposium on Wireless Personal Multimedia Communications (Chiang Rai: IEEE), 399–404.
Kurbatova, E., Cheremkhin, P., Evtikhiev, N., Krasnov, V., and Starikov, S. (2015). Methods of compression of digital holograms. Phys. Procedia 73, 328–332. doi: 10.1016/j.phpro.2015.09.150
Laamarti, F., Eid, M., and El Saddik, A. (2014). An overview of serious games. Int. J. Comput. Games Technol. 2014, 1–15. doi: 10.1155/2014/358152
Lederman, S. J., and Klatzky, R. L. (2009). Haptic perception: a tutorial. Attent. Percept. Psychophys. 71, 1439–1459. doi: 10.3758/APP.71.7.1439
Lee, J., Zhang, X., Park, C. H., and Kim, M. J. (2021). Real-time teleoperation of magnetic force-driven microrobots with 3D haptic force feedback for micro-navigation and micro-transportation. IEEE Rob. Automat. Lett. 6, 1769–1776. doi: 10.1109/LRA.2021.3060708
Leng, J., Sha, W., Wang, B., Zheng, P., Zhuang, C., Liu, Q., et al. (2022). Industry 5.0: prospect and retrospect. J. Manufact. Syst. 65, 279–295. doi: 10.1016/j.jmsy.2022.09.017
Lesniak, K., and Tucker, C. S. (2018). Dynamic rendering of remote indoor environments using real-time point cloud data. J. Comput. Inf. Sci. Eng. 18, 1–11. doi: 10.1115/1.4039472
Li, M., Gao, J., Zhou, C., Shen, X., and Zhuang, W. (2021). Slicing-based artificial intelligence service provisioning on the network edge: balancing AI service performance and resource consumption of data management. IEEE Vehicular Technol. Mag. 16, 16–26. doi: 10.1109/MVT.2021.3114655
Li, T., An, C., Tian, Z., Campbell, A. T., and Zhou, X. (2015). “Human sensing using visible light communication,” in Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (Paris), 331–344.
Li, X., Yuan, Z., Zhao, J., Du, B., Liao, X., and Humar, I. (2021). Edge-learning-enabled realistic touch and stable communication for remote haptic display. IEEE Netw. 35, 141–147. doi: 10.1109/MNET.011.2000255
Liao, S., Wu, J., Li, J., and Konstantin, K. (2021). Information-Centric massive IoT-based ubiquitous connected VR/AR in 6G: a proposed caching consensus approach. IEEE Internet Things J. 8, 5172–5184. doi: 10.1109/JIOT.2020.3030718
Lipton, J. I., Fay, A. J., and Rus, D. (2017). Baxter's homunculus: virtual reality spaces for teleoperation in manufacturing. IEEE Rob. Autom. Lett. 3, 179–186. doi: 10.1109/LRA.2017.2737046
Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016). “Spatio-temporal LSTM with trust gates for 3D human action recognition,” in European Conference on Computer Vision (Amsterdam: Springer), 816–833.
Liu, Q., Huang, S., Opadere, J., and Han, T. (2018). “An edge network orchestrator for mobile augmented reality,” in IEEE Conference on Computer Communications (Honolulu, HI: IEEE), 756–764
Liu, S., Li, M., Xu, X., Steinbach, E., and Liu, Q. (2018). “QoE-driven uplink scheduling for haptic communications over 5G enabled tactile Internet,” in Proceedings of the IEEE International Symposium on Haptic, Audio and Visual Environments and Games (Dalian: IEEE), 1–5.
Liu, X., and Deng, Y. (2021). Learning-based prediction, rendering and association optimization for MEC-enabled wireless virtual reality (VR) networks. IEEE Trans. Wireless Commun. 20, 6356–6370. doi: 10.1109/TWC.2021.3073623
Liu, X., Dohler, M., Mahmoodi, T., and Liu, H. (2017). “Challenges and opportunities for designing tactile codecs from audio codecs,” in Proceedings of the European Conference on Networks and Communications (Oulu: IEEE), 1–5.
Liu, X., Kang, S., Plishker, W., Zaki, G., Kane, T. D., and Shekhar, R. (2016). Laparoscopic stereoscopic augmented reality: toward a clinically viable electromagnetic tracking solution. J. Med. Imaging 3, 045001. doi: 10.1117/1.JMI.3.4.045001
Liu, Y., He, W., Wang, Y., and Yang, H. (2019a). Network-assisted neural adaptive naked-eye 3D video streaming over wireless networks. IEEE Access 7, 141363–141373. doi: 10.1109/ACCESS.2019.2944437
Liu, Y., Liu, J., Argyriou, A., and Ci, S. (2019b). MEC-assisted panoramic VR video streaming over millimeter wave mobile networks. IEEE Trans. Multimedia 21, 1302–1316. doi: 10.1109/TMM.2018.2876044
Liu, Y., Zheng, H., Zhao, L., Liu, S., Yao, K., Li, D., et al. (2020). Electronic skin from high-throughput fabrication of intrinsically stretchable lead zirconate titanate elastomer. Research 2020, 1085417. doi: 10.34133/2020/1085417
Liu, Y.-J., Du, H., Niyato, D., Feng, G., Kang, J., and Xiong, Z. (2022). Slicing4Meta: An intelligent integration framework with multi-dimensional network resources for Metaverse-as-a-Service in Web 3.0. ArXiv Preprint. doi: 10.48550/arXiv.2208.06081
Liu, Z., Li, Q., Chen, X., Wu, C., Ishihara, S., Li, J., et al. (2021). Point cloud video streaming: challenges and solutions. IEEE Netw. 35, 202–209. doi: 10.1109/MNET.101.2000364
Maddikunta, P. K. R., Pham, Q.-V., B, P., Deepa, N., Dev, K., Gadekallu, T. R., et al. (2022). Industry 5.0: a survey on enabling technologies and potential applications. J. Ind. Inf. Integrat. 26, 100257. doi: 10.1016/j.jii.2021.100257
Maier, M., and Ebrahimzadeh, A. (2019). Towards immersive tactile Internet experiences: Low-latency FiWi enhanced mobile networks with edge intelligence. J. Opt. Commun. Network. 11, B10-B25. doi: 10.1364/JOCN.11.000B10
Maier, M., Ebrahimzadeh, A., and Chowdhury, M. (2018). The tactile Internet: automation or augmentation of the human? IEEE Access 6, 41607–41618. doi: 10.1109/ACCESS.2018.2861768
Maimone, A., and Wang, J. (2020). Holographic optics for thin and lightweight virtual reality. ACM Trans. Graph. 39, 1–14. doi: 10.1145/3386569.3392416
Makransky, G., and Petersen, G. B. (2021). The cognitive affective model of immersive learning (CAMIL): a theoretical research-based model of learning in immersive virtual reality. Educ. Psychol. Rev. 33, 937–958. doi: 10.1007/s10648-020-09586-2
Mangiante, S., Klas, G., Navon, A., GuanHua, Z., Ran, J., and Silva, M. D. (2017). “VR is on the edge: How to deliver 360° videos in mobile networks,” in Proceedings of the Workshop on Virtual Reality and Augmented Reality Network (Los Angeles, CA: ACM), 30–35.
Manolova, A., Tonchev, K., Poulkov, V., Dixir, S., and Lindgren, P. (2021). Context-aware holographic communication based on semantic knowledge extraction. Wireless Pers. Commun. 120, 2307–2319. doi: 10.1007/s11277-021-08560-7
Mauve, M. (2000). “Consistency in replicated continuous interactive media,” in Proceedings of ACM conference on Computer Supported cCooperative Work (Philadelphia, PA: ACM), 181–190.
Mehrabi, A., Siekkinen, M., Kämäräinen, T., and ylski, A. (2021). Multi-tier CloudVR: leveraging edge computing in remote rendered virtual reality. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1–24. doi: 10.1145/3429441
Mekuria, R., Blom, K., and Cesar, P. (2017). Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Trans. Circ. Syst. Video Technol. 27, 828–842. doi: 10.1109/TCSVT.2016.2543039
Messenger, J. L. (2018). Time-sensitive networking: an introduction. IEEE Commun. Stand. Mag. 2, 29–33. doi: 10.1109/MCOMSTD.2018.1700047
Meyer, J., Wilm, T., Fiess, R., Schlebusch, T., Stork, W., and Kasneci, E. (2022). “A holographic single-pixel stereo camera sensor for calibration-free eye-tracking in retinal projection augmented reality glasses,” in Proceedings of the Symposium on Eye Tracking Research and Applications (Seattle, WA: ACM), 1–7.
Miloslavskaya, V., and Vucetic, B. (2020). Design of short polar codes forSCL decoding. IEEE Trans. Commun. 68, 6657–6668. doi: 10.1109/TCOMM.2020.3014946
Mohan, A., Wara, U. U., Shaikh, M. T. A., Rahman, R. M., and Zaidi, Z. A. (2021). Telesurgery and robotics: an improved and efficient era. Cureus 13, 14124. doi: 10.7759/cureus.14124
Mohan, N., Corneo, L., Zavodovski, A., Bayhan, S., Wong, W., and Kangasharju, J. (2020). “Pruning edge research with latency shears,” in Proceedings of the 19th ACM Workshop on Hot Topics in Networks (ACM), 182–189.
Mondal, S., Ruan, L., Maier, M., Larrabeiti, D., Das, G., and Wong, E. (2020). Enabling remote human-to-machine applications with AI-enhanced servers over access networks. IEEE Open J. Commun. Soc. 1, 889–899. doi: 10.1109/OJCOMS.2020.3009023
Montagud, M., Cesar, P., Boronat, F., and Jansen, J. (2018). MediaSync: Handbook on Multimedia Synchronization. Springer.
Morín, D. G., Pérez, P., and Armada, A. G. (2022). Toward the distributed implementation of immersive augmented reality architectures on 5G networks. IEEE Commun. Mag. 60, 46–52. doi: 10.1109/MCOM.001.2100225
Nakamura, T., Yano, T., Watanabe, K., Ishii, Y., Ono, H., Tambata, I., et al. (2019). “360-degree transparent holographic screen display,” in Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference (Los Angeles, CA: ACM), 1–2.
Nasrabadi, A. T., Samiei, A., and Prakash, R. (2020). “Viewport prediction for 360° videos: a clustering approach,” in Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (Istanbul: ACM), 34–39.
Nasrallah, A., Thyagaturu, A. S., Alharbi, Z., Wang, C., Shao, X., Reisslein, M., et al. (2019). Ultra-Low Latency (ULL) networks: the IEEE TSN and IETF DetNet standards and related 5G ULL research. IEEE Commun. Surveys Tutorials 21, 88–145. doi: 10.1109/COMST.2018.2869350
Next G Alliance (2022). 6G Applications and Use Cases. Available online at: https://www.nextgalliance.org/white_papers/6g-applications-and-use-cases/ (accessed August 16, 2022).
Nguyen, V.-L., Lin, P.-C., Cheng, B.-C., Hwang, R.-H., and Lin, Y.-D. (2021). Security and privacy for 6G: a survey on prospective technologies and challenges. IEEE Commun. Surveys Tutorials 23, 2384–2428. doi: 10.1109/COMST.2021.3108618
O?malley, M. K., and Gupta, A. (2008). “Haptic interfaces,” in HCI beyond the GUI: Design for haptic, speech, olfactory, and other nontraditional Interfaces, ed P. Kortum (San Francisco, CA: Morgan Kaufmann Publishers), 25–64.
Okamoto, S., Nagano, H., and Yamada, Y. (2012). Psychophysical dimensions of tactile perception of textures. IEEE Trans. Haptics 6, 81–93. doi: 10.1109/TOH.2012.32
Orlosky, J., Kiyokawa, K., and Takemura, H. (2017). Virtual and augmented reality on the 5G highway. J. Inf. Process. 25, 133–141. doi: 10.2197/ipsjjip.25.133
Ornati, M. (2022). “Fashion touch. Surface haptics in fashion E-commerce,” in International Conference on Human Haptic Sensing and Touch Enabled Computer Applications (Hamburg: Springer Nature), 464–467.
Ozioko, O., Karipoth, P., Hersh, M., and Dahiya, R. (2020). Wearable assistive tactile communication interface based on integrated touch sensors and actuators. IEEE Trans. Neural Syst. Rehabil. Eng. 28, 1344–1352. doi: 10.1109/TNSRE.2020.2986222
Pacchierotti, C., Sinclair, S., Solazzi, M., Frisoli, A., Hayward, V., and Prattichizzo, D. (2017). Wearable haptic systems for the fingertip and the hand: taxonomy, review, and perspectives. IEEE Trans. Haptics 10, 580–600. doi: 10.1109/TOH.2017.2689006
Pang, J., Wang, S., Tang, Z., Qin, Y., Tao, X., You, X., et al. (2022). A new 5G radio evolution towards 5G-advanced. Sci. China Inf. Sci. 65, 1–45. doi: 10.1007/s11432-021-3470-1
Patel, R. V., Atashzar, S. F., and Tavakoli, M. (2022). Haptic feedback and force-based teleoperation in surgical robotics. Proc. IEEE 110, 1012–1027. doi: 10.1109/JPROC.2022.3180052
Pellas, N., Dengel, A., and Christopoulos, A. (2020). A scoping review of immersive virtual reality in stem education. IEEE Trans. Learn. Technol. 13, 748–761. doi: 10.1109/TLT.2020.3019405
Petkov, N., Christoff, N., Manolova, A., Tonchev, K., and Poulkov, V. (2022). “Comparative study of latent-sensitive processing of heterogeneous data in an experimental platform for 3D video holographic communication,” in Proceedings of the Global Conference on Wireless and Optical Technologies (Malaga: IEEE), 1–6.
Polachan, K., Turkovic, B., Prabhakar, T., Singh, C., and Kuipers, F. A. (2020). “Dynamic network slicing for the tactile internet,” in Proceedings of the ACM/IEEE 11th International Conference on Cyber-Physical Systems (Sydney, NSW: IEEE), 129–140.
Promwongsa, N., Ebrahimzadeh, A., Naboulsi, D., Kianpisheh, S., Belqasmi, F., Glitho, R., et al. (2020). A comprehensive survey of the tactile Internet: State-of-the-art and research directions. IEEE Commun. Surveys Tutorials 23, 472–523. doi: 10.1109/COMST.2020.3025995
Qian, P., Huynh, V. S. H., Wang, N., Anmulwar, S., Mi, D., and Tafazolli, R. R. (2022). Remote production for live holographic teleportation applications in 5G networks. IEEE Trans. Broadcast. 68, 451–463. doi: 10.1109/TBC.2022.3161745
Qiao, X., Ren, P., Dustdar, S., and Chen, J. (2018). A new era for Web AR with mobile edge computing. IEEE Internet Comput. 22, 46–55. doi: 10.1109/MIC.2018.043051464
Rega, F., and Saxena, D. (2022). “Free-roam virtual reality: a new avenue for gaming,” in Advances in Augmented Reality and Virtual Reality (Singapore: Springer), 29–34.
Ren, J., He, Y., Huang, G., Yu, G., Cai, Y., and Zhang, Z. (2019). An edge-computing based architecture for mobile augmented reality. IEEE Netw. 33, 162–169. doi: 10.1109/MNET.2018.1800132
Ren, P., Qiao, X., Huang, Y., Liu, L., Dustdar, S., and Chen, J. (2020a). Edge-assisted distributed DNN collaborative computing approach for mobile Web augmented reality in 5G networks. IEEE Netw. 34, 254–261. doi: 10.1109/MNET.011.1900305
Ren, P., Qiao, X., Huang, Y., Liu, L., Pu, C., Dustdar, S., et al. (2020b). Edge AR X5: an edge-assisted multi-user collaborative framework for mobile web augmented reality in 5G and beyond. IEEE Trans. Cloud Comput. 10, 2521–2537. doi: 10.1109/TCC.2020.3046128
Rost, P. M., and Kolding, T. (2022). Performance of integrated 3GPP 5G and IEEE TSN networks. IEEE Commun. Standards Mag. 6, 51–56. doi: 10.1109/MCOMSTD.0001.2000013
Rozen-Schiff, N., Navon, A., Bruckman, L., and Pechtalt, I. (2021). “PRISM based transport: How networks can boost QoS for advanced video services?” in Proceedings of the Workshop on Design, Deployment, and Evaluation of Network-assisted Video Streaming (ACM), 1–7.
Ruan, J., and Xie, D. (2021). A survey on QoE-oriented VR video streaming: some research issues and challenges. Electronics 10, 2155. doi: 10.3390/electronics10172155
Sachs, J., Andersson, L. A., Araújo, J., Curescu, C., Lundsjö, J., Rune, G., et al. (2018). Adaptive 5G low-latency communication for tactile Internet services. Proc. IEEE 107, 325–349. doi: 10.1109/JPROC.2018.2864587
Sahin, E., Stoykova, E., Mäkinen, J., and Gotchev, A. (2021). Computer-generated holograms for 3D imaging: a survey. ACM Comput. Surveys 53, 8444. doi: 10.1145/3378444
Sakr, N., Georganas, N., Zhao, J., and Shen, X. (2007). “Motion and force prediction in haptic media,” in Proceedings of IEEE International Conference on Multimedia and Expo (Beijing: IEEE), 2242–2245.
Schelkens, P., Ebrahimi, T., Gilles, A., Gioia, P., Oh, K.-J., Pereira, F., et al. (2019). JPEG Pleno: Providing representation interoperability for holographic applications and devices. Electron. Telecommun. Res. Inst. J. 41, 93–108. doi: 10.4218/etrij.2018-0509
Schmitz, P., Blut, T., Mattes, C., and Kobbelt, L. (2020). High-fidelity point-based rendering of large-scale 3-D scan datasets. IEEE Comput. Graph. Appl. 40, 19–31. doi: 10.1109/MCG.2020.2974064
Schwarz, S., Preda, M., Baroncini, V., Budagavi, M., Cesar, P., Chou, P. A., et al. (2019). Emerging MPEG standards for point cloud compression. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 133–148. doi: 10.1109/JETCAS.2018.2885981
Selinis, I., Wang, N., Da, B., Yu, D., and Tafazolli, R. (2020). “On the Internet-scale streaming of holographic-type content with assured user quality of experiences,” in IFIP Networking Conference (Paris: IEEE), 136–144.
Seufert, M., Schatz, R., Wehner, N., and Casas, P. (2019). “QUICker or not? -an empirical analysis of QUIC vs TCP for video streaming QoE provisioning,” in Conference on Innovation in Clouds, Internet and Networks and Workshops (Paris: IEEE), 7–12.
Shahbazi, M., Atashzar, S. F., and Patel, R. V. (2018). A systematic review of multilateral teleoperation systems. IEEE Trans. Haptics 11, 338–356. doi: 10.1109/TOH.2018.2818134
Shahraki, A., Abbasi, M., Piran, M., Taherkordi, A., et al. (2021). A comprehensive survey on 6G networks: applications, core services, enabling technologies, and future challenges. arXiv:2101.12475 [cs.NI]. doi: 10.48550/arXiv.2101.12475
She, C., Sun, C., Gu, Z., Li, Y., Yang, C., Poor, H. V., et al. (2020). A tutorial on ultra-reliable and low-latency communications in 6G: integrating domain knowledge into deep learning. ArXiv Preprint. doi: 10.1109/JPROC.2021.3053601
Shen, X., Gao, J., Wu, W., Li, M., Zhou, C., and Zhuang, W. (2021a). Holistic network virtualization and pervasive network intelligence for 6G. IEEE Commun. Surveys Tutorials 24, 1–30. doi: 10.1109/COMST.2021.3135829
Shen, X., Gao, J., Wu, W., Lyu, K., Li, M., Zhuang, W., et al. (2020). AI-assisted network-slicing based next-generation wireless networks. IEEE Open J. Vehicular Technol. 1, 45–66. doi: 10.1109/OJVT.2020.2965100
Shen, X., Huang, C., Liu, D., Xue, L., Zhuang, W., Sun, R., et al. (2021b). Data management for future wireless networks: architecture, privacy preservation, and regulation. IEEE Netw. 35, 8–15. doi: 10.1109/MNET.011.2000666
Shen, X., Liu, D., Huang, C., Xue, L., Yin, H., Zhuang, W., et al. (2022). Blockchain for transparent data management toward 6G. Engineering 8, 74–85. doi: 10.1016/j.eng.2021.10.002
Shimobaba, T., Blinder, D., Birnbaum, T., Hoshi, I., Shiomi, H., Schelkens, P., et al. (2022). Deep-learning computational holography: a review (invited). Front. Photonics 3, 854391. doi: 10.3389/fphot.2022.854391
Siemonsma, S., and Bell, T. (2022). HoloKinect: holographic 3D video conferencing. Sensors 22, 8118. doi: 10.3390/s22218118
Sim, D., Baek, Y., Cho, M., Park, S., Sagar, A. S., and Kim, H. S. (2021). Low-latency haptic open glove for immersive virtual reality interaction. Sensors 21, 3682. doi: 10.3390/s21113682
Simsek, M., Aijaz, A., Dohler, M., Sachs, J., and Fettweis, G. (2016). 5G-enabled tactile Internet. IEEE J. Select. Areas Commun. 34, 460–473. doi: 10.1109/JSAC.2016.2525398
Siriwardhana, Y., Porambage, P., Liyanage, M., and Ylianttila, M. (2021). A survey on mobile augmented reality with 5G mobile edge computing: architectures, applications, and technical aspects. IEEE Commun. Surveys Tutorials 23, 1160–1192. doi: 10.1109/COMST.2021.3061981
Smalley, D., Nygaard, E., Squire, K., Van Wagoner, J., Rasmussen, J., Gneiting, S., et al. (2018). A photophoretic-trap volumetric display. Nature 553, 486–490. doi: 10.1038/nature25176
Son, J., Jang, D., and Ryu, E.-S. (2018). “Implementing motion-constrained tile and viewport extraction for VR streaming,” in Proceedings of the 28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video (Amsterdam: ACM), 61–66.
Song, P., Verinaz-Jadan, H., Howe, C. L., Foust, A. J., and Dragotti, P. L. (2022). Light-field microscopy for the optical imaging of neuronal activity: when model-based methods meet data-driven approaches. IEEE Signal Process Mag. 39, 58–72. doi: 10.1109/MSP.2021.3123557
Speicher, M., Cucerca, S., and Krüger, A. (2017). VRShop: a mobile interactive virtual reality shopping environment combining the benefits of on-and offline shopping. Proc. ACM Interact. Mobile Wearable Ubiquitous Technol. 1, 1–31. doi: 10.1145/3130967
Srinivasan, M. A., and Basdogan, C. (1997). Haptics in virtual environments: taxonomy, research status, and challenges. Comput. Graph. 21, 393–404. doi: 10.1016/S0097-8493(97)00030-7
Stark, M., Pomati, S., D'Ambrosio, A., Giraudi, F., and Gidaro, S. (2015). A new telesurgical platform - preliminary clinical results. Minimally Invasive Therapy Allied Technol. 24, 31–36. doi: 10.3109/13645706.2014.1003945
Steinbach, E., Hirche, S., Ernst, M., Brandi, F., Chaudhari, R., Kammerl, J., et al. (2012). Haptic communications. Proc. IEEE 100, 937–956. doi: 10.1109/JPROC.2011.2182100
Steinbach, E., Hirche, S., Kammerl, J., Vittorias, I., and Chaudhari, R. (2010). Haptic data compression and communication. IEEE Signal Process. Mag. 28, 87–96. doi: 10.1109/MSP.2010.938753
Steinbach, E., Strese, M., Eid, M., Liu, X., Bhardwaj, A., Liu, Q., et al. (2018). Haptic codecs for the tactile Internet. Proc. IEEE 107, 447–470. doi: 10.1109/JPROC.2018.2867835
Strinati, E. C., Barbarossa, S., Gonzalez-Jimenez, J. L., Ktenas, D., Cassiau, N., Maret, L., et al. (2019). 6G: the next frontier: from holographic messaging to artificial intelligence using subterahertz and visible light communication. IEEE Vehicular Technol. Mag. 14, 42–50. doi: 10.1109/MVT.2019.2921162
Sukhmani, S., Sadeghi, M., Erol-Kantarci, M., and El Saddik, A. (2018). Edge caching and computing in 5G for mobile AR/VR and tactile Internet. IEEE Multimedia 26, 21–30. doi: 10.1109/MMUL.2018.2879591
Sun, L., Mao, Y., Zong, T., Liu, Y., and Wang, Y. (2020). “Flocking-based live streaming of 360-degree video,” in Proceedings of the 11th ACM Multimedia Systems Conference (ACM), 26–37.
Sun, Y., Chen, Z., Tao, M., and Liu, H. (2019). Communications, caching, and computing for mobile virtual reality: Modeling and tradeoff. IEEE Trans. Commun. 67, 7573–7586. doi: 10.1109/TCOMM.2019.2920594
Suzuki, S.-N., Kanematsu, H., Barry, D. M., Ogawa, N., Yajima, K., Nakahira, K. T., et al. (2020). Virtual Experiments in metaverse and their applications to collaborative projects: the framework and its significance. Procedia Comput. Sci. 176, 2125–2132. doi: 10.1016/j.procs.2020.09.249
Tahara, T., Quan, X., Otani, R., Takaki, Y., and Matoba, O. (2018). Digital holography and its multidimensional imaging applications: a review. Microscopy 67, 55–67. doi: 10.1093/jmicro/dfy007
Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M., and Burdet, E. (2017). Physically interacting individuals estimate the partner's goal to enhance their movements. Nat. Hum. Behav. 1, 1–6. doi: 10.1038/s41562-017-0054
Taleb, T., Nadir, Z., Flinck, H., and Song, J. (2021). Extremely inter active and low-latency services in 5G and beyond mobile systems. IEEE Commun. Stand. Mag. 5, 114–119. doi: 10.1109/MCOMSTD.001.2000053
Tan, H. Z., Choi, S., Lau, F. W. Y., and Abnousi, F. (2020). Methodology for maximizing information transmission of haptic devices: a survey. Proc. IEEE 108, 945–965. doi: 10.1109/JPROC.2020.2992561
Tanaka, H., and Ohnishi, K. (2009). “Haptic data compression/decompression using dct for motion copy system,” in Proceedings of the IEEE International Conference on Mechatronics (Malaga: IEEE), 1–6.
Tang, F., Chen, X., Zhao, M., and Kato, N. (2022). The roadmap of communication and networking in 6G for the metaverse. IEEE Wireless Commun. 1, 1–15. doi: 10.1109/MWC.019.2100721
Tang, W., Dai, J. Y., Chen, M. Z., Wong, K.-K., Li, X., Zhao, X., et al. (2020). MIMO transmission through reconfigurable intelligent surface: system design, analysis, and implementation. IEEE J. Select. Areas Commun. 38, 2683–2699. doi: 10.1109/JSAC.2020.3007055
Tarneberg, W., Karaca, M., Robertsson, A., Tufvesson, F., and Kihl, M. (2017). “Utilizing massive MIMO for the tactile Internet: advantages and trade-offs,” in Proceedings of the IEEE International Conference on Sensing (San Diego, CA: Communication and Networking; IEEE), 1–6.
Tasaka, S. (2022). An empirical method for causal inference of constructs for QoE in haptic-audiovisual communications. ACM Trans. Multimedia Comput. Commun. Appl. 18, 1–24. doi: 10.1145/3473986
Tataria, H., Shafi, M., Molisch, A. F., Dohler, M., Sjöland, H., and Tufvesson, F. (2021). 6G wireless systems: vision, requirements, challenges, insights, and opportunities. Proc. IEEE 109, 1166–1199. doi: 10.1109/JPROC.2021.3061701
Thanh, N. T., Jiang, X., Abiko, S., Tsujita, T., Konno, A., and Uchiyama, M. (2012). “Collaborative haptic interaction in virtual environment of multi-operator multi-robot teleoperation systems,” in Proceedings of SICE Annual Conference (Akita: IEEE), 1585–1590.
Van Den Berg, D., Glans, R., De Koning, D., Kuipers, F. A., Lugtenburg, J., Polachan, K., et al. (2017). Challenges in haptic communications over the tactile Internet. IEEE Access 5, 23502–23518. doi: 10.1109/ACCESS.2017.2764181
Velana, M., Sobieraj, S., Digutsch, J., and Rinkenauer, G. (2022). The advances of immersive virtual reality interventions for the enhancement of stress management and relaxation among healthy adults: a systematic review. Appl. Sci. 12, 7309. doi: 10.3390/app12147309
Wang, D., Ohnishi, K., and Xu, W. (2019). Multimodal haptic display for virtual reality: a survey. IEEE Trans. Ind. Electron. 67, 610–623. doi: 10.1109/TIE.2019.2920602
Wang, K., Jin, J., Yang, Y., Zhang, T., Nallanathan, A., Tellambura, C., et al. (2022). Task offloading with multi-tier computing resources in next generation wireless networks. ArXiv Preprint. doi: 10.1109/JSAC.2022.3227102
Wang, X., Liang, C.-J., Menassa, C. C., and Kamat, V. R. (2021). Interactive and immersive process-level digital twin for collaborative human-robot construction work. J. Comput. Civil Eng. 35, 04021023. doi: 10.1061/(ASCE)CP.1943-5487.0000988
Wang, Y., Chen, M., Yang, Z., Saad, W., Luo, T., Cui, S., et al. (2022a). Meta-reinforcement learning for reliable communication in THz/VLC wireless VR networks. IEEE Trans. Wireless Commun. 21, 7778–7793. doi: 10.1109/TWC.2022.3161970
Wang, Y., and Li, Z. (2022). “Living with smell dysfunction: a multi-sensory VR experience,” in ACM SIGGRAPH 2022 Immersive Pavilion (Vancouver, BC: ACM), 1–2.
Wang, Y., Su, Z., Zhang, N., Xing, R., Liu, D., Luan, T. H., et al. (2022b). A survey on metaverse: Fundamentals, security, and privacy. IEEE Commun. Surveys Tutorials. doi: 10.1109/COMST.2022.3202047
Wei, X., Shi, Y., and Zhou, L. (2021). Haptic signal reconstruction for cross-modal communications. IEEE Trans. Multimedia 24, 4514–4525. doi: 10.1109/TMM.2021.3119860
Wei, X., Yao, Y., Wang, H., and Zhou, L. (2022). Perception-aware cross-modal signal reconstruction: from audio-haptic to visual. IEEE Trans. Multimedia. doi: 10.1109/TMM.2022.3194309
Wenzlhuemer, R. (2013). Connecting the Nineteenth-Century World: The Telegraph and Globalization. Cambridge: Cambridge University Press.
Wu, W., Zhou, C., Li, M., Wu, H., Zhou, H., Zhang, N., et al. (2022). AI-native network slicing for 6G networks. IEEE Wireless Commun. 29, 96–103. doi: 10.1109/MWC.001.2100338
Xiang, W., Wang, G., Pickering, M., and Zhang, Y. (2016). Big video data for light-field-based 3D telemedicine. IEEE Netw. 30, 30–38. doi: 10.1109/MNET.2016.7474341
Xie, C., Li, X., Hu, Y., Peng, H., Taylor, M., and Song, S. L. (2021). “Q-VR: system-level design for future mobile collaborative virtual reality,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ACM), 587–599.
Xiong, J., Hsiang, E.-L., He, Z., Zhan, T., and Wu, S.-T. (2021). Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light Sci. Appl. 10, 1–30. doi: 10.1038/s41377-021-00658-8
Xu, M., Ng, W. C., Lim, W. Y. B., Kang, J., Xiong, Z., Niyato, D., et al. (2022). A full dive into realizing the edge-enabled metaverse: visions, enabling technologies, and challenges. IEEE Commun. Surveys Tutorials. doi: 10.1109/COMST.2022.3221119
Xu, W., Chatterjee, A., Zollhöfer, M., Rhodin, H., Mehta, D., Seidel, H.-P., et al. (2018). Monoperfcap: human performance capture from monocular video. ACM Trans. Graph. 37, 1–15. doi: 10.1145/3181973
Xu, X., Cizmeci, B., Schuwerk, C., and Steinbach, E. (2015). “Haptic data reduction for time-delayed teleoperation using the time domain passivity approach,” in Proceedings of IEEE World Haptics Conference (Evanston, IL: IEEE), 512–518.
Xu, Y., Dong, Y., Wu, J., Sun, Z., Shi, Z., Yu, J., et al. (2018). “Gaze prediction in dynamic 360° immersive videos,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (Salt Lake City, UT: IEEE), 5333–5342.
Yadav, P. K., and Ooi, W. T. (2020). “Tile rate allocation for 360-degree tiled adaptive video streaming,” in Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA: ACM), 3724–3733.
Yang, L., Dong, H., Alelaiwi, A., and Saddik, A. E. (2016). See in 3D: state of the art of 3D display technologies. Multimed Tools Appl. 75, 17121–17155. doi: 10.1007/s11042-015-2981-y
Yang, L., Wu, D., and Zhou, L. (2021). Heterogeneous stream scheduling for cross-modal transmission. IEEE Trans. Commun. 69, 6037–6049. doi: 10.1109/TCOMM.2021.3086522
Yang, P., Quek, T. Q. S., Chen, J., You, C., and Cao, X. (2022). Feeling of presence maximization: mmWave-enabled virtual reality meets deep reinforcement learning. IEEE Trans. Wireless Commun. 21, 10005–10019. doi: 10.1109/TWC.2022.3181674
Yang, Q., Zhao, Y., Huang, H., Xiong, Z., Kang, J., and Zheng, Z. (2022). Fusing blockchain and AI with metaverse: a survey. IEEE Open J. Comput. Soc. 3, 122–136. doi: 10.1109/OJCS.2022.3188249
Yang, Y. (2019). Multi-tier computing networks for intelligent IoT. Nat. Electron. 2, 4–5. doi: 10.1038/s41928-018-0195-9
Yang, Y., Ma, M., Wu, H., Yu, Q., Zhang, P., You, X., et al. (2022). 6G network AI architecture for everyone-centric customized services. IEEE Netw. doi: 10.1109/MNET.124.2200241
Yang, Y., Wang, K., Zhang, G., Chen, X., Luo, X., and Zhou, M.-T. (2018). MEETS: maximal energy efficient task scheduling in homogeneous fog networks. IEEE Internet Things J. 5, 4076–4087. doi: 10.1109/JIOT.2018.2846644
Yao, R., Heath, T., Davies, A., Forsyth, T., Mitchell, N., and Hoberman, P. (2017). Oculus VR Best Practices Guide. Available online at: http://developer.oculusvr.com/best-practices (accessed July 17, 2022).
Yaqoob, A., Bi, T., and Muntean, G.-M. (2020). A survey on adaptive 360° video streaming: solutions, challenges and opportunities. IEEE Commun. Surveys Tutorials 22, 2801–2838. doi: 10.1109/COMST.2020.3006999
Ye, N., Li, X., Yu, H., Wang, A., Liu, W., and Hou, X. (2019). Deep learning aided grant-free NOMA toward reliable low-latency access in tactile Internet of Things. IEEE Trans. Ind. Inform. 15, 2995–3005. doi: 10.1109/TII.2019.2895086
Yen, S.-C., Fan, C.-L., and Hsu, C.-H. (2019). “Streaming 360° videos to head-mounted virtual reality using DASH over QUIC transport protocol,” in Proceedings of the 24th ACM Workshop on Packet Video (Amherst, MA: ACM), 7–12.
Yokokohji, Y., Hollis, R. L., and Kanade, T. (1996a). “What you can see is what you can feel-development of a visual/haptic interface to virtual environment,” in IEEE 1996 Virtual Reality Annual International Symposium (Santa Clara, CA: IEEE), 46–53.
Yokokohji, Y., Hollis, R. L., Kanade, T., Henmi, K., and Yoshikawa, T. (1996b). “Toward machine mediated training of motor skills. Skill transfer from human to human via virtual environment,” in IEEE International Workshop on Robot and Human Communication (Tsukuba: IEEE), 32–37.
You, Y., and Sung, M. Y. (2008). “Haptic data transmission based on the prediction and compression,” in Proceedings of the IEEE International Conference on Communications (Beijing: IEEE), 1824–1828.
Yrjölä, S., Ahokangas, P., and Matinmikko-Blue, M. (2022). “Visions for 6G futures: a causal layered analysis,” in 2022 Joint European Conference on Networks and Communications and 6G Summit (EuCNC/6G Summit), 535–540.
Yu, B., Cai, Y., Zou, Y., Li, B., and Chen, Y. (2022). Can we improve the information freshness with prediction for cognitive IoT? IEEE Internet Things J. 9, 17577–17591. doi: 10.1109/JIOT.2022.3155717
Yu, K., Gorbachev, G., Eck, U., Pankratz, F., Navab, N., and Roth, D. (2021). Avatars for teleconsultation: effects of avatar embodiment techniques on user perception in 3D asymmetric telepresence. IEEE Trans. Vis. Comput. Graph. 27, 4129–4139. doi: 10.1109/TVCG.2021.3106480
Yu, X., Xie, Z., Yu, Y., Lee, J., Vazquez-Guardado, A., Luan, H., et al. (2019). Skin-integrated wireless haptic interfaces for virtual and augmented reality. Nature 575, 473–479. doi: 10.1038/s41586-019-1687-0
Yuan, Z., Kang, B., Wei, X., and Zhou, L. (2022). Exploring the benefits of cross-modal coding. IEEE Trans. Circ. Syst. Video Technol. 32, 8781–8794. doi: 10.1109/TCSVT.2022.3196586
Zare, A., Aminlou, A., Hannuksela, M. M., and Gabbouj, M. (2016). “HEVC-compliant tile-based streaming of panoramic video for virtual reality applications,” in Proceedings of the 24th ACM International Conference on Multimedia (Amsterdam: ACM), 601–605.
Zawish, M., Dharejo, F. A., Khowaja, S. A., Dev, K., Davy, S., Qureshi, N. M. F., et al. (2022). AI and 6G into the metaverse: fundamentals, challenges and future research trends. ArXiv Preprint. doi: 10.48550/arXiv.2208.10921
Zeng, C., Zhao, T., Liu, Q., Xu, Y., and Wang, K. (2020). “Perception-lossless codec of haptic data with low delay,” in Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA: ACM), 3642–3650.
Zhang, Q., Liu, J., and Zhao, G. (2018). Towards 5G enabled tactile robotic telesurgery. ArXiv Preprint. doi: 10.48550/arXiv.1803.03586
Zhang, W., Chen, J., Zhang, Y., and Raychaudhuri, D. (2017). “Towards efficient edge cloud augmentation for virtual reality MMOGs,” in Proceedings of the Second ACM/IEEE Symposium on Edge Computing (ACM), 1–14.
Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. (2019). MPIIGaze: Real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 162–175. doi: 10.1109/TPAMI.2017.2778103
Zhang, Y., Yang, J., Liu, Z., Wang, R., Chen, G., Tong, X., et al. (2022). VirtualCube: an immersive 3D video communication system. IEEE Trans. Vis. Comput. Graph. 28, 2146–2156. doi: 10.1109/TVCG.2022.3150512
Zhang, Z., Wen, F., Sun, Z., Guo, X., He, T., and Lee, C. (2022). Artificial intelligence-enabled sensing technologies in the 5G/Internet of Things era: from virtual reality/augmented reality to the digital twin. Adv. Intell. Syst. 4, 1–23. doi: 10.1002/aisy.202100228
Zhang, Z., Xiao, Y., Ma, Z., Xiao, M., Ding, Z., Lei, X., et al. (2019). 6G wireless networks: vision, requirements, architecture, and key technologies. IEEE Vehicular Technol. Mag. 14, 28–41. doi: 10.1109/MVT.2019.2921208
Zhou, C., Gao, J., Li, M., Shen, X., and Zhuang, W. (2022). Digital twin-empowered network planning for multi-tier computing. J. Commun. Inf. Netw. 7, 221–238. doi: 10.23919/JCIN.2022.9906937
Zhou, F., Li, W., Yang, Y., Feng, L., Yu, P., Zhao, M., et al. (2022). Intelligence-endogenous networks: Innovative network paradigm for 6G. IEEE Wireless Commun. 29, 40–47. doi: 10.1109/MWC.004.00320
Keywords: 6G networks, immersive communications, extended reality, haptic communication, holographic communication
Citation: Shen X, Gao J, Li M, Zhou C, Hu S, He M and Zhuang W (2023) Toward immersive communications in 6G. Front. Comput. Sci. 4:1068478. doi: 10.3389/fcomp.2022.1068478
Received: 13 October 2022; Accepted: 19 December 2022;
Published: 11 January 2023.
Edited by:
Chintha Tellambura, University of Alberta, CanadaReviewed by:
Jiasi Zhou, Xuzhou Medical University, ChinaDimitris Mourtzis, University of Patras, Greece
Copyright © 2023 Shen, Gao, Li, Zhou, Hu, He and Zhuang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Mushu Li, bXVzaHUxLmxpQHJ5ZXJzb24uY2E=