Simulation of thalamic prosthetic vision: reading accuracy, speed, and acuity in sighted humans

Vurro, Milena; Crowell, Anne Marie; Pezaris, John S.

doi:10.3389/fnhum.2014.00816

ORIGINAL RESEARCH article

Front. Hum. Neurosci., 04 November 2014

Sec. Speech and Language

Volume 8 - 2014 | https://doi.org/10.3389/fnhum.2014.00816

Simulation of thalamic prosthetic vision: reading accuracy, speed, and acuity in sighted humans

Department of Neurosurgery, Massachusetts General Hospital/Harvard Medical School Boston, MA, USA

Abstract

The psychophysics of reading with artificial sight has received increasing attention as visual prostheses are becoming a real possibility to restore useful function to the blind through the coarse, pseudo-pixelized vision they generate. Studies to date have focused on simulating retinal and cortical prostheses; here we extend that work to report on thalamic designs. This study examined the reading performance of normally sighted human subjects using a simulation of three thalamic visual prostheses that varied in phosphene count, to help understand the level of functional ability afforded by thalamic designs in a task of daily living. Reading accuracy, reading speed, and reading acuity of 20 subjects were measured as a function of letter size, using a task based on the MNREAD chart. Results showed that fluid reading was feasible with appropriate combinations of letter size and phosphene count, and performance degraded smoothly as font size was decreased, with an approximate doubling of phosphene count resulting in an increase of 0.2 logMAR in acuity. Results here were consistent with previous results from our laboratory. Results were also consistent with those from the literature, despite using naive subjects who were not trained on the simulator, in contrast to other reports.

Introduction

Restoring sight to the blind is a challenge that researchers around the globe have been addressing through a variety of approaches, from genetics (e.g., Acland et al., 2001; Beltran et al., 2012), to replacement surgery (e.g., corneal transplant or keratoprosthesis as in Zerbe et al., 2006) to visual prostheses (reviewed in Mertz, 2012; Ong and da Cruz, 2012) such as the device-based approach our laboratory has been investigating (Pezaris and Reid, 2007, 2009; Pezaris and Eskandar, 2009; Bourkiza et al., 2013; Jeffries et al., 2014). Electrically-based visual prostheses operate on the principle that current passed through an electrode implanted in the early stages of the visual pathway (retina, optic nerve, thalamus or primary visual cortex) evokes the sensation of a spot of light or phosphene. An appropriate set of electrodes, the field thus hypothesizes, could be used essentially as a direct-to-brain display to evoke a more complex visual scene through patterned stimulation, allowing researchers and ultimately physicians to bypass the damaged structures and provide restoration of function. Among such devices, the thalamic visual prostheses proposed by our group have the potential to restore high resolution vision and be applied to a wide variety of causes of blindness, from retinal disease, to cancer, to trauma (Pezaris and Reid, 2007).

Recent reviews have discussed the relative merits of different approaches to device-based artificial vision (e.g., Zhou and Greenberg, 2009; Dagnelie, 2011), including the thalamic approach (Pezaris and Eskandar, 2009) we have been pursuing. The LGN, in particular, provides an excellent target, having a well-understood retinotopic map, functional characteristics that closely match the retinal ganglion cell layer, macroscopic segregation of the magno-, parvo-, and konio cellular pathways, and, thanks to the unrelated field of deep brain stimulation (DBS, reviewed in Bronstein et al., 2011), routine surgical access. This last point bears some amplification: DBS had become a common clinical therapy for treatment of movement disorders such as Parkinsonian tremor, providing symptomatic relief through electrical activation of stimulating electrodes placed in structures that are 1–2 cm from the LGN. Given nearly 100,000 patients with DBS implants worldwide (Tierney et al., 2011), safe and reliable surgical access to the mid-brain for stimulating electrodes is, essentially, a solved problem, overcoming the primary barrier to use of the LGN as a target for artificial vision.

The current report continues our efforts to assess the performance of thalamic visual prosthesis designs that intend to implant electrodes in the dorsal lateral geniculate nucleus of the thalamus (Pezaris and Reid, 2009; Bourkiza et al., 2013). In preceding work, we used a simulation of thalamic vision to investigate the relationship between electrode number and visual acuity using a standardized test in sighted humans (Bourkiza et al., 2013). Here, we use a more advanced simulation to examine the same relationship for reading performance, again employing a standardized test in sighted humans. While a few studies have investigated the interaction of device parameters on reading acuity and speed for retinal prosthetic vision (Cha et al., 1992; Humayun, 2001; Hayes et al., 2003; Sommerhalder et al., 2003, 2004; Pérez Fornos et al., 2005, 2011; Dagnelie et al., 2006; Fu et al., 2006), this is the first such study for thalamic prosthetic vision.

Reading is involved in several fundamental tasks both for work and leisure, and is considered an activity of daily living (Dagnelie, 2008). Reading is a more complex ability than letter recognition alone as it requires the integration of several additional cognitive and perceptual processes. A person who is reading must first visually acquire and process the words presented, then match them to stored semantic representations, and finally combine these representations to create a meaningful sentence. To create a standard measurement of reading ability in normal and low-vision subjects, several reading acuity charts (Legge et al., 1989b; Radner et al., 1998) have been developed, of which the Minnesota Reading Acuity (MNREAD) test is one of the most common (Crossland et al., 2008). Consisting of a series of simple three-line sentences, shown in font sizes that decrease proportionally from one sentence to the next, the MNREAD chart is used to measure reading accuracy (the percentage of correctly recognized words), reading acuity (the smallest size of print that the patient can reliably resolve), reading speed (the number of words per minute that are read correctly), and critical print size (the smallest print that a subject can read while maintaining their maximum reading speed).

Although standard tests have been widely used to evaluate reading ability for retinal diseases (Virgili et al., 2004; Cappello et al., 2009; Uppal et al., 2011), glaucoma (Ramulu et al., 2009, 2012; Burton et al., 2012), and aging (e.g., Sass et al., 2006), only three studies on visual prosthesis reading performance have employed a standard reading chart (Humayun, 2001; Hayes et al., 2003; Fu et al., 2006) and only one the MNREAD standard test procedure (Fu et al., 2006), while the rest have used ad-hoc methods (Cha et al., 1992; Sommerhalder et al., 2003, 2004; Pérez Fornos et al., 2005, 2011; Dagnelie et al., 2006). This is unfortunate as variations in test methodologies make it difficult to accurately compare results between studies (see Discussion). The effects of experimental and methodological variation are underscored by Legge and colleagues who, in the past three decades, have shown that, for normal and low vision, reading performance is affected by pixel density to character size ratio (Legge et al., 1985a), contrast (Legge et al., 1987; Rubin and Legge, 1989), font type (Mansfield et al., 1996), spacing and size (Legge et al., 1985b, 1997), word size (Legge et al., 1997), drifting or static text (Legge et al., 1989b), cognitive content of the text (Legge et al., 1989a) and central or peripheral vision (Legge et al., 2001). Since the artificial vision studies reported in the literature employed different values for the parameters investigated by Legge, the results for simulated prostheses have been influenced not only by the experimental parameters (e.g., number of simulated electrodes, electrode drop outs, or phosphene spacing and simulation characteristics) but also by differences in methodology, including stimulus design.

In this study, we sought to investigate the reading accuracy, speed and acuity of sighted humans in a simulation of thalamic prosthetic vision. To accomplish this, we constructed a real-time simulation of artificial vision including current understanding of what recipients of a future thalamic prosthesis are likely to experience based on previous work from our laboratory (Pezaris and Reid, 2007, 2009; Pezaris and Eskandar, 2009). As in our previous report on simulated thalamic vision (Bourkiza et al., 2013), we base our experimental task on a standardized method, here the MNREAD test, to simplify cross-laboratory comparisons.

We tested three phosphene pattern densities spanning anticipated device complexity, and six font sizes spanning 1.0–1.5 logMAR. Text shown directly on the screen without filtering, designated as clear in the reminder of this paper, was used as a control condition. We hypothesized that reading accuracy for simulated prosthetic vision would be similar to clear-text reading for the largest font size combined with high phosphene counts, and would decrease proportionally with letter size. We further expected reading speed to be in general lower than clear-text reading and to vary with letter size and phosphene count. Finally, we anticipated that comparable results could be obtained to our previous work with isolated letter recognition (Bourkiza et al., 2013) but in much less time and with methods more agreeable to subjects.

Methods

Overview

Subjects performed a reading task. Images of the text were manipulated in real-time so as to simulate the perception of a thalamic visual prosthesis wearer. Overall design followed the MNREAD test, including text taken directly from the chart. During testing, two experimental parameters were varied: the viewing condition, either normal viewing as a control or simulated artificial sight using one of three phosphene patterns with approximately 2000, 1000, and 500 electrodes; and the letter size, decreasing from 1.50 to 1.0 logMAR in steps of 0.1 logMAR. By design, participation required a total of about 20 min per subject.

Subjects

Twenty-four subjects (9 M, 15 F; range 19–50 years of age) participated in the study. Subjects, recruited from students and post-docs at Massachusetts General Hospital (MGH) and the general population, were required to have self-reported normal or corrected-to-normal vision, and be able to read English text. Subjects were assigned pseudonyms for the purpose of anonymizing data collection and received modest monetary compensation for their participation.

Ethics statement

The research protocol used in this study was approved by the MGH Institutional Review Board and adhered to the guidelines of the Declaration of Helsinki. As this study was classified as a minimal risk experiment, verbal consent was obtained from each subject, and was implied by the existence of a data record.

Apparatus

The experimental apparatus consisted of a heads-free binocular gaze tracker with integrated display (TX300, Tobii, Inc.), and two additional computers (M92p, Lenovo, Inc.) running custom-written software for interfacing, behavioral control, and data collection (Figure 1). The gaze tracker provided streaming gaze information at 300 Hz (0.4° accuracy and 0.14° precision) that was received and processed on the interface computer to be made available upon periodic request from the behavioral control computer running the experiment. The behavioral control computer coordinated experimental activities, including computing stimuli and presenting them on the TX300 integrated display, and logged experimental data. A small consumer-grade computer microphone was used to record audio during the experiment for post-hoc blind verification of subject performance. The stimulus display was operated at the native 1920 × 1080 resolution with 60 Hz vertical refresh rate. With the standard viewing distance of 65 cm, the display subtended 43° × 25° of visual angle.

Figure 1

The apparatus was located in a small office with normal levels of lighting. Subjects were seated in front of the TX300 that was placed on a normal desk-height table (see Figure 2). An office chair without casters was used to prevent subjects from moving too far away from the optimal distance from the screen. The experiment control displays were arranged to one side of the TX300 and positioned such that they did not visually distract the subjects. An experimenter was present during all data collection.

Figure 2

Substantial effort was put into optimizing stimulus generation so as to minimize system latency, including careful selection of the video card used for stimulus generation (Asus AMD Radeon HD7750-1GD5-V2, ASUSTeK Computer, Inc.; the HD7750 is not a high-end card, but one that had the best 2-D performance of the dozen or so tested). Typical delays from eye movement to an updated screen image (including monitor screen latency) were 20 ms or less, with a maximum of 37 ms. The longer delays were often seen with the densest phosphene pattern, and rarely, if ever, with the sparser patterns. An image taken from one frame of the screen during the simulation is shown in Figure 3. Details of how the phosphene locations were approximately stabilized on the retina through frame-by-frame gaze-contingent stimulus generation are described in the caption to Figure 4.

Figure 3

Figure 4

Procedure

At the beginning of the experiment each subject was comfortably seated at a distance of about 65 cm from the stimulus display. Specific instructions were given for the calibration and reading tasks which were then performed in that order.

Calibration

The calibration task consisted of a series of small dots that appeared one at a time in a 3-by-3 array of locations spanning the stimulus monitor. Subjects were instructed to, “look as closely and accurately at each dot as possible.” These fixation points were presented in a balanced, interleaved fashion. The first nine presentations (one for each location) were used to trigger TX300 calibration. The subsequent 27 presentations (three for each location) were used to calibrate a second-order non-linear correction in the experimental software that automatically accounted for gain, offset, and minor distortions by fitting a two-dimensional parabaloid to the calibration points (as typically only gain and offset were required, the TX300's output could be used without this additional correction). Upon occasion of a poor TX300 calibration as subjectively assessed by the experimenter (caused, for example, if the subject blinked at an inopportune time), the procedure was repeated.

Experiment

The experiment consisted of a single block of 48 trials. Each trial in the block was subdivided into a series of four phases, Start, Pre-Stimulus, Reading, and End (see Figure 5). During the Start Phase, a fixation point appeared in the middle of the screen that the subject was required to foveate in order to engage the experiment. Once foveated for the duration of the Pre-Stimulus phase, the fixation point was extinguished. The Reading phase then followed with one of the sentences displayed along with an additional dot near the top center of the screen. The subject was required to read the sentence out loud, as quickly and accurately as possible, or to declare their inability to read it. In order to advance to the next sentence, subjects looked at the top center dot. Subjects could take as long as they wanted, consistent with reading quickly and accurately. Once the subject foveated the top center dot for 350 ms, the trial entered the End Phase, the screen was blanked, and a 2000 ms pause provided an intertrial interval before the next trial commenced. An audio recording was made for the entirety of each experiment.

Figure 5

Each subject was presented an identical sequence of stimuli that varied in experimental condition. Mimicking the presentation used in MNREAD assessment, sentences were shown starting with a large font size and progressing to smaller ones. For each font size, text was presented in four viewing conditions, first in the clear as a control, and then in artificial vision simulation with three sets of phosphene patterns with decreasing phosphene count. The overall progression was therefore from easiest to most difficult in steps. After the full set of combinations was carried out, the conditions were repeated with a fresh set of sentences. Each sentence was presented exactly once, and each combination of viewing condition and font size presented exactly twice for a completed experiment (see Appendix A in Supplementary Material).

Snellen screening

To verify approximately normal vision, each subject was administered a standard Snellen chart task. Subjects were instructed to stand at a mark on the floor 20 feet away from the vertical surface where the Snellen chart was affixed. No attempts were made to control lighting beyond making sure the overhead lights were on to provide consistent, ordinary levels of illumination. The task was performed binocularly at a pace determined by the subject. Corrective lenses were worn if the subject normally used them and preferred to do so. Measurements were all expressed in equivalent logMAR units.

Stimuli

Forty-eight sentences from the MNREAD chart were used to create the experimental stimuli. These sentences are in simple English that is suitable for readers 8 years old or older. Each sentence contains 60 characters including spaces, and from 10 to 13 words. Sentences were displayed across three lines broken as evenly as possible without hyphenation.

Each sentence was rendered in Times New Roman (as available as a system font under Microsoft Windows) with a regular weight and style at one of six font sizes (from 1.5 logMAR down to 1.0 logMAR in steps of 0.1 logMAR, corresponding to the sizes of the lowercase x, see Table 1), in white letters on a black background. Depending on the trial conditions, the sentence was presented on the screen in the clear at the native monitor resolution, without any manipulations, or in a simulation of prosthetic vision using one of three phosphene patterns that varied in the number and density of phosphenes (see Table 2) but shared a common overall center-weighted profile based on previous work (Pezaris and Reid, 2009). The full sequence of stimulus conditions, including the text of each sentence, can be found in Appendix A (Supplementary Material). While some researchers studying low-vision reading use sans-serif fonts, we opted to mimic the readily available MNREAD charts (Lighthouse Low Vision Products, Long Island City, NY) for clinical compatibility; these charts use a serif font in the Times Roman class.

Table 1

Font condition (f)	Font size (logMAR)
1	1.5
2	1.4
3	1.3
4	1.2
5	1.1
6	1.0

Font conditions and sizes.

The variable f is used in the formulas for reading accuracy α, reading speed β, and reading acuity γ described in the Methods Section. The largest font that allowed the MNREAD sentences to properly fit on the monitor as three lines had an x height of 2.9 cm, corresponding to logMAR 1.5 at the viewing distance of 65 cm.

Table 2

Viewing condition (v)	Phosphene pattern density	Total phosphenes (count)	Central (10°) phosphenes (count)	Electrode spacing (μm)
1^*	n/a	n/a	n/a	n/a
2	High	1757	381	375
3	Medium	1029	231	475
4	Low	522	124	600

Viewing conditions and reading acuity.

The first viewing condition, marked with an asterisk (

) is the Clear condition where text is presented without a simulation of artificial vision as a control. The variable v is used in the formulas for reading accuracy α, reading speed β, and reading acuity γ described in the Methods section.

Simulated prosthetic thalamic vision

During the Reading Phase of trials where the stimuli were presented in a simulation of prosthetic vision, the text did not appear directly on the screen, but, rather, a virtual filter was placed in front of the text and the output of the filter shown on the screen to give the illusion of prosthetic vision. The simulation is described in detail in a previous publication (Bourkiza et al., 2013), and is summarized as follows. All current artificial prosthesis designs that include stimulation through sets of microwire contacts provide the recipient with a coarse visual experience made up of a set of isolated pixel-like percepts called phosphenes. The size and distribution of the set of phosphenes for a given device is part of the device design, being a function of both electrode pattern and visual field map in the stimulated structure; the thalamic visual prosthesis forming our line of inquiry is expected to elicit a pattern of phosphenes that is denser toward the center of vision, and, for lower electrode counts, is relatively sparse (Pezaris and Reid, 2009; Bourkiza et al., 2013). The pattern of phosphenes is referenced to retinal coordinates (Pezaris and Reid, 2007) and thus moves about with the direction of gaze, as do after-images or retinal features like the blind spot. Our simulation approximated this effect by taking the instantaneous gaze position from the gaze tracker, translating the position of the set of phosphenes to that location on the monitor, and then activating each phosphene according to the brightness of the image of the text at the corresponding location (details in Figure 4). This can be thought of as similar to looking through a colander held at arms length: as the colandar is moved about, different parts of the scene behind are revealed. For initially conceived devices, such as described by Pezaris and Reid (2007), where an external video camera is mounted on a set of glasses worn by the patient, a mechanism will be needed to read the instantaneous eye position and electronically shift the video left or right and up or down, frame by frame, according to the point of regard.

Each phosphene was drawn as a white, circular Gaussian, and was assumed to be independent of all other phosphenes. When phosphenes overlapped, they were combined additively, saturating at the maximum brightness available from the monitor. Phosphene size σ (the one-sigma extent of each Gaussian) varied as a linear function of radial distance ρ from the point of regard (or, eccentricity), according to the formula σ = 0.043 ρ + 0.083 (see Discussion), in degrees of visual angle.

Experimental parameter: phosphene pattern density

Three different densities of phosphenes were investigated, all following the global density profile expected from a thalamic visual prosthesis (Pezaris and Reid, 2009), and selected for near-future engineering plausibility. The underlying profile is denser toward the point of regard, and reflects the endogenous acuity profile across the visual scene in monkey LGN. The three sampling densities of the profile, High, Medium, and Low, corresponded to devices with approximately 2000, 1000, and 500 phosphenes spanning the entire visual field (see Figure 2 for an example rendering). The exact phosphene counts, along with the number of phosphenes in central vision, can be seen in Table 2.

Analysis

Three aspects of subject ability in the reading task were analyzed: reading accuracy (the percentage of correctly recognized words), reading speed (the number of correctly recognized words per minute, or WPM), and reading acuity (the subject's visual acuity assessed through reading). Our analysis formulae are generalizations of the published MNREAD formulas (Mansfield et al., 1994), with extensions for the number of font sizes, the step from one font size to the next, and the number of repeated observations. In conditions matching those of a traditional MNREAD task, the expressions reduce exactly to published formulae. The extensions described provide additional mathematical insight along with the flexibility to support a broader range of conditions with precision, so are presented in detail as a reference for the field.

Reading accuracy

The ability to accurately recognize words is a fundamental parameter reflecting available visual utility. The corresponding quantification of reading accuracy, α, is computed for each trial class (the combination of font f and viewing condition v) as the mean of the N observations of the normalized number of words read correctly in each sentence. For the present experimental design that contains exactly two trials for each (f, v) combination, N is 2. As the standard in the field is to record the number of words missed when scoring performance, α is computed using the number of words in each sentence, n, minus the number of words read incorrectly or not read at all, e, pooled over observations i by trial class (f, v):

Reading speed

The speed with which words are recognized and vocalized is also a fundamental parameter reflecting available visual utility. The corresponding quantified value of reading speed, β, is computed as the mean of the number of words correctly read in each sentence divided by the time t it took a subject to read them, again pooled over observations i by trial class (f, v):

The value of t for each trial was defined as the time spent in the Reading Phase, and did not include the 350 ms required to activate stepping to the next trial.

Reading acuity

Subject performance was used to derive a visual acuity measurement γ, following the standard methods for the MNREAD task. The sum of reading accuracies across font sizes was interpreted as the fractional number of size intervals down from the largest font that the subject was effectively reading. As the data have already been pooled over observations, reading acuity is computed by pooling over font sizes f as the base font size, Δ₀ (here 1.5 logMAR), plus the sum of accuracies weighted by the incremental difference in size from one font size to the next, Δ, giving us a result for each viewing condition v:

Note that values of Δ after Δ₀ are negative because the font sizes are ordered largest to smallest, and that for the general case of uneven font size steps, the values of Δ_f will not be uniform. To verify the values of γ with a more robust method, reading acuity was also determined by the midpoint of logistic fits to the population data (using asymptote values of 0 and 100%). Logistic fitting is considerably more robust, as it, importantly, has better noise rejection, in addition to allowing for uneven intervals, missing observations, and sampling that stops short of spanning the entire transition band.

Each of the values for reading accuracy α, reading speed β, and reading acuity γ, were calculated for each subject, and subsequently combined into population values to be shown in the figures presented below. Statistical tests were applied as described in the Results Section to assess significance at the p < 0.05 level.