Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun;111(Pt B):182-96.
doi: 10.1016/j.visres.2014.10.023. Epub 2014 Nov 3.

Active confocal imaging for visual prostheses

Affiliations

Active confocal imaging for visual prostheses

Jae-Hyun Jung et al. Vision Res. 2015 Jun.

Abstract

There are encouraging advances in prosthetic vision for the blind, including retinal and cortical implants, and other "sensory substitution devices" that use tactile or electrical stimulation. However, they all have low resolution, limited visual field, and can display only few gray levels (limited dynamic range), severely restricting their utility. To overcome these limitations, image processing or the imaging system could emphasize objects of interest and suppress the background clutter. We propose an active confocal imaging system based on light-field technology that will enable a blind user of any visual prosthesis to efficiently scan, focus on, and "see" only an object of interest while suppressing interference from background clutter. The system captures three-dimensional scene information using a light-field sensor and displays only an in-focused plane with objects in it. After capturing a confocal image, a de-cluttering process removes the clutter based on blur difference. In preliminary experiments we verified the positive impact of confocal-based background clutter removal on recognition of objects in low resolution and limited dynamic range simulated phosphene images. Using a custom-made multiple-camera system based on light-field imaging, we confirmed that the concept of a confocal de-cluttered image can be realized effectively.

Keywords: Clutter; Confocal imaging; Light-field; Retinal implant; Sensory substitution device; Visual prosthesis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the proposed removal of background clutter for visual prostheses. (a) A blind person with visual prosthesis facing a schematic natural three-dimensional (3D) scene that includes a person in front of a tree and a building behind the tree. (b) The overlapping objects at different depths that clutter each other are captured by a head-mounted camera. In the color high resolution image, the overlapping objects of interest (OIs) can be easily separated perceptually. (c) Following image compression into low resolution (about 1,000 pixels), even with 8-bit grayscale, recognition is severely impacted. (d) Compressed binary image (simulated phosphene vision) at the same low resolution makes it difficult if not impossible to recognize the objects. (e) If the background clutter is removed by using image processing or other imaging technology, only the OI (e.g., the nearest person) will remain, thus object recognition through the visual prostheses will be improved.
Figure 2
Figure 2
Comparison of the effect of compression (low resolution and dynamic range) on a conventional (wide DOF, a-d) and confocal (narrow DOF, e-h) image; a cup in front of a complex background (bookshelves), as captured by conventional camera. When converted into low resolution (38 × 25, 950 pixels) and low dynamic range images such as (b) 8-level, (c) 4-level, and (d) binary, the background detail of the wide DOF image clutters the OI more as the dynamic range decreases. (e) With the scene captured by using confocal imaging (narrow DOF) at the selected depth plane, only the OI is in-focus and the background is naturally suppressed by blur. However, the background suppression in the compressed images (f–h) is not as appearent as in the original image. As dynamic range gets lower, the natural background suppresion effect of confocal imaging is diminished.
Figure 3
Figure 3
Illustration of the impact of the confocal de-cluttering. The images in Figs. 2a and 2e were processed by edge detection and are shown here in (a) and in (c), respectively. Following compression of the image in (a) into the low resolution and dyanmic range of visual prostheses, much detail of the background remains and clutters the OI in (b) and makes recognition difficult. With the confocal de-cluttered image shown in (c), the edge filtering removes the background clutter and leaves only the OI at the selected depth visible, even with compression, as shown in (d). The latter is easier to recognize, especially with regard to the handle of the cup.
Figure 4
Figure 4
The 20 dataset images in non-confocal conventional photography compressed into 950 pixels (38 by 25) following the edge detection process. Compressing the edge images results in cluttering of objects and disruption of the borders between the OI and background. To recognize the OI with these imaging, higher resolution or dynamic range is required.
Figure 5
Figure 5
The confocal de-cluttered versions of the images shown in Fig. 4 compressed in the same way. With the removal of background clutter using confocal de-cluttering, it is possible for at least a few objects to be recognized, even at this resolution.
Figure 6
Figure 6
A background cluttered image compressed into different resolutions. (a) 96 pixels, (b) 486 pixels, (c) 950 pixels, (d) 3,290 pixels, (e) 6,370 pixels, (f) 17,876 pixels, (g) 40,344 pixels, and (h) 160,884 pixels. As the resolution increases, the cluttering declines and overlapped outlines are separated. However, the recognition of the OI is still not easy at least until the level shown in (e).
Figure 7
Figure 7
The background de-cluttered image compressed into the same resolution levels as in Fig. 6. Overall, the object is easier to locate and recognize in these images than in those shown in Fig. 6. Although the background clutter is removed at all level, details of this OI are not easily resolved below level (d). Note that zooming in on the object will improve the resolution and enable recognition at higher levels of compression.
Figure 8
Figure 8
The recognition rates of the 20 objects by the 6 subjects as a function of resolution. The recognition rates started to increase rapidly at about 1,000 (103) and about 3,100 (103.5) pixels in background de-cluttered and cluttered conditions, respectively. The recognition rate with the background de-cluttered condition was higher than with the background cluttered condition. Weibull pyschometric funtions were fitted to the data.
Figure 9
Figure 9
The number of pixels required for 50% recognition rate by each subject under background cluttered and de-cluttered conditions. Each marker is slightly off center to prevent overlapping of markers. The 50% threshold of recognition rate over all subjects’ responses is at a resolution of 8,695 pixels with cluttered background and 3,532 pixels with de-cluttered background as illustrated in gray bars. The dashed line (at 1,500 pixels) indicates the resolution of current and next-generation visual prostheses.
Figure 10
Figure 10
Details of a simulated elemental image (light-field information) shown in two magnified insets. The simulated scene of Fig. 1 was captured by a simulated (computed) light-field camera composed of a 1 mm pitch lens array behind relay optics and in front of a CCD sensor. Each inset shows a magnified 9 × 10 subset of the elemental image. Each subset represents a different perspective view (with low resolution of 10 × 10 in this simulation) captured by a lenslet in a different position. The total light-field image contains the full 3D information of the scene.
Figure 11
Figure 11
Confocal images (308 × 385) in different depth planes generated from a simulated elemental image frame obtained computationally (Fig. 10) from the simulated 3 plane scene of Fig. 1. (a) The confocal images at the depth plane of the person (1 m), (b) between the person and the tree (2.5 m), (c) of the tree (4 m), (d) between the tree and the building (6.5 m), and (e) of the building (9 m). Animation 1 in the online supplement shows the confocal image sequence being scanned between near and far in depth.
Figure 12
Figure 12
Confocal de-cluttered images (308 × 385) at the different depth planes shown in Fig. 11, achieved through Sobel edge detection. Note that although there are only 3 objects in different planes in the original simulated scene, additional depth planes between objects were selected (in b and d). These intermediate depth planes (b and d) do not provide as good a result as the confocal de-cluttered image at object planes (a, c, and e). Animation 2 in the supplement shows the confocal de-cluttered image depth sequence obtained from one elemental image frame.
Figure 13
Figure 13
Confocal de-cluttered images of Fig 12 are compressed to fit the limited resolution of a 980 pixel (28 × 35) visual prosthesis. Animation 3 in the supplement shows the compressed confocal de-cluttered images in sequence, obtained from one elemental image frame.
Figure 14
Figure 14
Effect of zooming using cropping of the high resolution confocal image before confocal de-cluttering and compression. (a) Zoomed OI in the high resolution confocal image of Fig. 11a using cropping and therefore requiring a lesser compression. (b) The confocal de-cluttered zoomed image has a higher level of details. (c) With zoom preceding compression, more detail can be preserved in the low resolution compressed image than the compressed result without zooming of Fig. 13a.
Figure 15
Figure 15
Estimation of object depth planes. The fraction of overlapping (collocating) edge pixels between the edges of the center view image and the edges in 200 confocal images reconstructed at steps of 30 mm apart. The first maximum at 0.6 m distance from the camera indicates the location of the objects of interest in front (mainly the camera and the mug in Fig. 16a). The next maximum is around 3 m, which is the distance to the background.
Figure 16
Figure 16
Results of automatic OI depth plane selection with confocal de-cluttering using a light-field setup. Top row (a–c) shows the center view image, together with the edge image and its resolution-compressed version. Middle row (d–f) shows the same images for the confocal image reconstructed at the 0.6 m distance identified by the detection algorithm. The bottom row (g–i) shows the same results for reconstruction at the other local peak distance of 3 m.
Figure 17
Figure 17
Operating modes of active confocal imaging for visual prostheses (a) Confocal-extension mode. The user, trying to find an object, reaches and touches around the area where the object is expected to be. The system first detects the tip of the finger or cane and sets the focal distance to a predefined distance in front of it. In this mode, users can see farther than the arm or cane length, hence the designation confocal-extension, and we expect this extended search range to reduce the search time. (b) Obstacle avoidance mode, to be used mainly when walking. The system displays only objects that enter the pre-selected distance range and will alert the user when such an object is detected (moving from location A to B in the figure). The range included may be selected to be very narrow or wider. This mode calls attention to obstacles or hazards that are missed or not reachable by the cane. When an obstacle is detected the user may execute an avoidance maneuver based on the “visual” information displayed.

Similar articles

Cited by

References

    1. Ahuja AK, Behrend MR. The Argus II retinal prosthesis: Factors affecting patient selection for implantation. Progress in Retinal and Eye Research. 2013;2(4):1–15. - PubMed
    1. Al-Atabany W, McGovern B, Mehran K, Berlinguer-Palmini R, Degenaar P. A processing platform for optoelectronic/optogenetic retinal prosthesis. IEEE Transactions on Biomedical Engineering. 2013;60(3):781–791. - PubMed
    1. Aloni D, Yitzhaky Y. Detection of object existence from a single reconstructed plane obtained by integral imaging. IEEE Photonics Technology Letters. 2014;26(7):726–728.
    1. American Foundation for the Blind. [Accessed 16 June];Statistical Snapshots from the American Foundation for the Blind. 2011 http://www.afb.org/Section.asp?SectionID=15,
    1. Bender R, Grouven U. Using Binary Logistic Regression Models for O rdinal Data with Non-proportional Odds. J Clin Epidemiol. 1998;51(10):809–816. - PubMed

Publication types

-