Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 1;22(2):20.
doi: 10.1167/jov.22.2.20.

End-to-end optimization of prosthetic vision

Affiliations

End-to-end optimization of prosthetic vision

Jaap de Ruyter van Steveninck et al. J Vis. .

Abstract

Neural prosthetics may provide a promising solution to restore visual perception in some forms of blindness. The restored prosthetic percept is rudimentary compared to normal vision and can be optimized with a variety of image preprocessing techniques to maximize relevant information transfer. Extracting the most useful features from a visual scene is a nontrivial task and optimal preprocessing choices strongly depend on the context. Despite rapid advancements in deep learning, research currently faces a difficult challenge in finding a general and automated preprocessing strategy that can be tailored to specific tasks or user requirements. In this paper, we present a novel deep learning approach that explicitly addresses this issue by optimizing the entire process of phosphene generation in an end-to-end fashion. The proposed model is based on a deep auto-encoder architecture and includes a highly adjustable simulation module of prosthetic vision. In computational validation experiments, we show that such an approach is able to automatically find a task-specific stimulation protocol. The results of these proof-of-principle experiments illustrate the potential of end-to-end optimization for prosthetic vision. The presented approach is highly modular and our approach could be extended to automated dynamic optimization of prosthetic vision for everyday tasks, given any specific constraints, accommodating individual requirements of the end-user.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic illustration of a cortical visual neuro-prosthesis. The visual environment is captured by a camera and sent to a mobile computer. Electrodes in the brain implant are selectively activated to stimulate neurons in the primary visual cortex (V1). Making use of the retinotopic organization of V1, a controlled arrangement of phosphenes can be generated to create a meaningful representation of the visual environment.
Figure 2.
Figure 2.
Schematic representation of the end-to-end model and its three components. (a) The phosphene encoder finds a stimulation protocol, given an input image. (b) The personalized phosphene simulator maps the stimulation vector into a simulated phosphene vision (SPV) representation. (c) The phosphene decoder receives a SPV-image as input and generates a reconstruction of the original image. During training, the reconstruction dissimilarity loss between the reconstructed and original image is backpropagated to the encoder and decoder models. Additional loss components, such as sparsity loss on the stimulation protocol, can be implemented to train the network for specific constraints.
Figure 3.
Figure 3.
Results of Experiment 1. The model was trained to minimize mean squared error loss. (a) The training curves indicating the loss on the training dataset and validation dataset during the training procedure. (b) Visualization of the network input (left) the simulated prosthetic vision (middle) and the reconstruction (right).
Figure 4.
Figure 4.
Results of Experiment 2. The model was trained on a combination of mean squared error loss and sparsity loss. The 13 different values for sparsity weight κ were tested. (a) Visualization of the results for three out of the 13 values for κ. Each row displays the performance metrics for the best-performing model out of five random restarts, and one input image from, the validation dataset (left), with the corresponding simulated phosphene representation (middle) and reconstruction (right). (b) Regression plot displaying the sparsity of electrode activation and the reconstruction error in relation to the sparsity weight κ. The red circles indicate the best-performing model for the corresponding sparsity condition, as visualized in panel a.
Figure 5.
Figure 5.
Comparison between different values of d for the perceptual reconstruction task that was used in Experiment 3, where d indicates the layer depth for the VGG-based feature loss.
Figure 6.
Figure 6.
Results of Experiment 3. The model was trained on naturalistic stimuli, comparing three reconstruction tasks. (a) Original image. (b) Pixel intensity-based reconstruction task with MSE loss (see Equations 1–3). (c) Perceptual reconstruction task, using VGG feature loss (see Equation 4; d is set equal to 3). (d) Semantic boundary reconstruction task, using weighted BCE loss (see Equation 5) between the reconstruction and the ground truth semantic boundary label (i.e. a binary, boundary-based, version of the ground truth label from the dataset). (e) Simulated prosthetic percept after conventional image preprocessing with (left) Canny edge detection (Canny, 1986) and (right) holistically nested edge detection (Xie & Tu, 2017).
Figure 7.
Figure 7.
Receiver-operator curves for the semantic boundary prediction task in Experiment 3. Our proposed end-to-end method is compared against existing approaches: Canny edge detection (Canny, 1986) and holistically nested edge detection (Xie & Tu, 2017). The specificity (1 - False Positive Rate), sensitivity and area under the curve (AUC) of the thresholded predictions are also provided in Table 3.
Figure 8.
Figure 8.
Results of Experiment 4. The model was trained on naturalistic stimuli with a customized phosphene mapping. (a) Reconstruction performance for the different phosphene resolutions (AUC: area under the receiver-operator curve). (b) Visualization of the phosphene coverage for each resolution (left: 650 phosphenes, middle: 488 phosphenes, and right: 325 phosphenes). (c) Validation examples for the training condition with 650 phosphenes.

Similar articles

Cited by

References

    1. Asgari Taghanaki, S., Abhishek, K., Cohen, J. P., Cohen-Adad, J., & Hamarneh, G. (2021). Deep semantic segmentation of natural and medical images: A review. Artificial Intelligence Review, 54(1), 137–178.
    1. Beauchamp, M. S., Oswalt, D., Sun, P., Foster, B. L., Magnotti, J. F., Niketeghad, S., et al. . (2020). Dynamic stimulation of visual cortex produces form vision in sighted and blind humans. Cell, 181(4), 774–783. - PMC - PubMed
    1. Beauchamp, M. S., & Yoshor, D. (2020). Stimulating the brain to restore vision. Science, 370(6521), 1168–1169. - PubMed
    1. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. - PubMed
    1. Beyeler, M., Nanduri, D., Weiland, J. D., Rokem, A., Boynton, G. M., & Fine, I. (2019). A model of ganglion axon pathways accounts for percepts elicited by retinal implants. Scientific Reports, 9(1), 1–16. - PMC - PubMed

Publication types

-