Published on 29.01.24 in Vol 1, No 1 (2024): Jan-Dec
Original Paper
What is Diminished Virtuality? A Directional and Layer-Based Taxonomy for the Reality-Virtuality Continuum
ABSTRACT
The concept of reality-virtuality (RV) continuum was introduced by Paul Milgram and Fumio Kishino in 1994. It describes a spectrum that ranges from a purely physical reality (the real world) to a purely virtual reality (a completely computer-generated environment), with various degrees of mixed reality in between. This continuum is “realized” by different types of displays to encompass different levels of immersion and interaction, allowing for the classification of different types of environments and experiences. What is often overlooked in this concept is the act of diminishing real objects (or persons, animals, etc) from the reality, that is, a diminution, rather than augmenting it, that is, an augmentation. Hence, we want to propose in this contribution an update or modification of the RV continuum where the diminished reality aspect is more prominent. We hope this will help users, especially those who are new to the field, to get a better understanding of the entire extended reality (XR) topic, as well as assist in the decision-making for hardware (devices) and software or algorithms that are needed for new diminished reality applications. However, we also propose another, more sophisticated directional and layer-based taxonomy for the RV continuum that we believe goes beyond the mediated and multimediated realities. Furthermore, we initiate the question of whether the RV continuum truly ends on one side with physical reality.
JMIR XR Spatial Comput 2024;1:e52904
doi:10.2196/52904
KEYWORDS
Introduction
The reality-virtuality (RV) continuum is a concept introduced by Paul Milgram and Fumio Kishino [
] in 1994. It describes a spectrum that ranges from a purely physical reality (the real world) to a purely virtual reality (VR; a completely computer-generated environment), with various degrees of mixed reality (MR) in between. This continuum is “realized” by different types of displays [ ] to encompass different levels of immersion and interaction, allowing for the classification of different types of environments and experiences. The RV continuum helps us understand the varying levels of immersion and interactivity that technology can provide. As technology advances, the boundaries between these immersion levels can become more fluid, and new hybrid experiences can emerge. The continuum is particularly relevant in fields such as VR, augmented reality (AR), and MR, where researchers and developers aim to create more compelling and natural experiences that bridge the gap between the physical and virtual worlds. We used ChatGPT (OpenAI) [ ] to gauge the current state of the RV continuum. According to ChatGPT, the continuum is often divided into several main categories (note, we adapted the ChatGPT results and enhanced it with concrete examples, where necessary; [ ]). The original ChatGPT transcript is shown in [ ].![](https://asset.jmir.pub/textbox.gif)
Diminished Reality
What is often overlooked in this concept is the act of diminishing real objects (or persons, animals, etc) from reality, rather than augmenting the reality with virtual things [
, ]. An introduction to the topic can be found in Cheng et al [ ]. A reason for this is that diminishing something from reality needs, in general, a sophisticated understanding of the real scene or environment to make the diminishing aspect convincing. In AR, the real world is just overwritten with a virtual object. In diminished reality (DR), however, the real-world part that is augmented or diminished needs to seemingly fit to the reality around it. In addition, this should all be performed in real time when a user is walking around the real world, and an algorithm has to do the following (note that the first 3 items are part of the Extent of World Knowledge axis of the taxonomy by Milgram and Kishino [ ]):- Detect and track the real object that has to be removed or diminished;
- Perform geometric modeling of the scene and objects to be added or subtracted (preexisting or captured once or in real time);
- Apply the lighting model of the scene to objects added or to part of the revealed scene when something is removed (preexisting or captured once or in real time); and then
- Combine all the previous points together as the scene description for the rendering algorithm.
All of this has to be done not only in real time but also with very high precision. The inserted virtual object has to fit seamlessly into and make sense with the reality; minor discrepancies will appear to be a glitch and will be noticed immediately by the user, as we recently observed in a DR user study [
]. In fact, we think that diminution and augmentation require fundamentally different technologies. In our opinion, an augmentation may be needed to alter reality at a certain position with regard to other (real) objects (eg, displaying a patient’s tumor as an AR hologram on the patient in front of you, at the real position, such as for needle guidance [ ]), but no seamless and semantic fitting is necessary. As soon as a virtual object needs to fit into the scene semantically, we consider this to require diminution. Hence, for augmentation, you only need a volume rendering process with some basic options, such as position, size, and transparency. For diminution, however, additional fundamentally different technologies are needed. The scene has to be analyzed and understood, and a meaningful replacement has to be generated and inserted as an AR hologram. An example could be glasses that are removed from a person in front of you.In summary, the user has to get the impression that the real, diminished object does not exist at all in reality [
]. Besides sophisticated algorithms, this course of action needs a considerable amount of computing power. Fortunately, there has been tremendous progress in both areas during the last years, with deep learning–based approaches and GPUs that can run these kinds of algorithms, even in real time. As a result, DR has already found its way into some applications [ ], such as virtual furniture removal for redecorating purposes (eg, IKEA Kreativ [ ]). Other possible applications for DR include the following:- Privacy enhancing: In a live video feed, certain objects or information can be blurred or removed in real time to protect sensitive or private data.
- Training and education: DR can be used to remove distractions in a learning environment or highlight specific items to focus on.
- Therapeutic applications: For someone with a phobia of spiders, a DR system could recognize spiders in the person’s field of view and diminish or replace them with less threatening images to reduce anxiety. Additionally, sensory overload, a feature of autism, could be diminished with a DR system, to reduce overstimulation.
Directional and Layer-Based Taxonomy
Nevertheless, for all these aforementioned reasons, we think that DR needs to be more prominent on the RV continuum, as shown in
[ ], without delving deeper into the broad topics of mediated reality [ ] or even multimediated reality [ ]. This will not only assist in the decision-making for hardware (devices) and software that are needed for new DR applications but also help unfamiliar users to get a better understanding of the entire extended reality (XR) topic (note that we are addressing this revision to the continuum purely from an application or user point of view [POV], not from the POV of an MR researcher or engineer). An example application for DR could be the real-time anonymization of a face via XR. There is a huge difference if a device detects the eye area and simply inpaints a black bar over the eyes (without considering the surrounding facial area) or inpaints the eyes with different or meaningful ones that fit perfectly to the surrounding facial area. The black bar approach can probably be performed on a current smartphone, whereas the second approach needs much more sophisticated hardware and computing power, with an integrated GPU that can run a trained, deep inpainting neural network in real time (note that a user with an XR headset would move around in general, which also changes the POV on the face to be anonymized, so the inpainting algorithms also has to be executed continuously in real time). In this context, we also think that the upcoming Apple Vision Pro will push the limits in DR, because it is a video-see-through device that can enable DR to reach its full potential [ ]. In fact, the Digital Crown hardware of the Apple Vision Pro, which also exists for the Apple Watch, should enable us to seamlessly walk along the whole RV continuum (back and forth) and bring medical DR applications to reality, which are still almost nonexistent currently [ ]. A potential example of the photo-editing capabilities of newer cell phones as a diminution operation is shown in [ ]. In this medical example, DR enables the removal of a skin tumor virtually from a patient’s face before surgery.