Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 18;12(5):e1004948.
doi: 10.1371/journal.pcbi.1004948. eCollection 2016 May.

Neuroprosthetic Decoder Training as Imitation Learning

Affiliations

Neuroprosthetic Decoder Training as Imitation Learning

Josh Merel et al. PLoS Comput Biol. .

Abstract

Neuroprosthetic brain-computer interfaces function via an algorithm which decodes neural activity of the user into movements of an end effector, such as a cursor or robotic arm. In practice, the decoder is often learned by updating its parameters while the user performs a task. When the user's intention is not directly observable, recent methods have demonstrated value in training the decoder against a surrogate for the user's intended movement. Here we show that training a decoder in this way is a novel variant of an imitation learning problem, where an oracle or expert is employed for supervised training in lieu of direct observations, which are not available. Specifically, we describe how a generic imitation learning meta-algorithm, dataset aggregation (DAgger), can be adapted to train a generic brain-computer interface. By deriving existing learning algorithms for brain-computer interfaces in this framework, we provide a novel analysis of regret (an important metric of learning efficacy) for brain-computer interfaces. This analysis allows us to characterize the space of algorithmic variants and bounds on their regret rates. Existing approaches for decoder learning have been performed in the cursor control setting, but the available design principles for these decoders are such that it has been impossible to scale them to naturalistic settings. Leveraging our findings, we then offer an algorithm that combines imitation learning with optimal control, which should allow for training of arbitrary effectors for which optimal control can generate goal-oriented control. We demonstrate this novel and general BCI algorithm with simulated neuroprosthetic control of a 26 degree-of-freedom model of an arm, a sophisticated and realistic end effector.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. A BCI has an effector, such as a robotic arm, with predefined degrees of freedom.
Given a task objective (e.g. an objective function corresponding to reaching and grasping a target), an intention-oracle can be computed to provide the intended updates to the arm kinematics. The actual trajectory of the arm is evaluated deterministically from the neural activity via the decoder. In practice, the oracle update would be recomputed at each timestep to reflect the instantaneous best movement in the direction of the goal.
Fig 2
Fig 2. Left panel is a cartoon of the cursor task.
The blue cursor is under user control and the user intends to move it towards the green target. On a given reach trajectory, the cursor is decoded according to the current decoder yielding the path made up of red arrows. At each state, the oracle intention is computed (green arrows) to be aggregated as part of D and incorporated into the update to the decoder. In the right panel, we compare the performance of the algorithms on a simulation of the cursor task (loss incurred during each trial k). We use Alg 1 with the three update rules discussed (Alg. 1 and Table 1). Intuitively, OGD makes less efficient use of the data and should be dominated by FTL. Moreover OGD has additional parameters corresponding to learning rate which were tuned by hand. MA performs least well, though we selected λ to be sufficiently close to 1 as to permit performance to gradually improve (smaller lambda leads to more unstable learning). Each update index corresponds to the inclusion of 1 additional reach. The entire learning procedure is simulated 100 times for each algorithm and errorbars are 2 standard errors across the simulations.
Fig 3
Fig 3. Left panel is a visualization of 100 3D reach trajectories for a poorly-performing initial decoder (trained on 1 reach).
Right panel visualizes 100 trajectories for a well-performing decoder fit from 20 reaches (approximately at performance saturation for this level of noise). Each trajectory is depicted with yellow corresponding to initial trial time and blue corresponding to end of trial (time normalized to take into account different reach durations). The goals were in random locations, so to superimpose the set of traces, all positions have been shifted relative to the goal such that goal is always centered. Observe that the initial decoder is essentially random and the learned decoder permits the performance of reaches which mostly proceed directly towards the goal (modulo variability inherited from the neural noise). Units here relate to those in Fig 2—here referring to position as compared with MSE of corresponding velocity units.
Fig 4
Fig 4. Left panel depicts arm model in MuJoCo software and a trajectory of the arm during a simulated closed-loop experiment, after the decoder has learned to imitate the optimal policy (for illustration).
This particular trajectory consists mostly of movement of an elbow joint, followed by slight movements of the middle finger and thumb when near the target. Right panel depicts a comparison of loss (here SSE of decoded joint angular velocities relative to oracle) as a function of reach index for the different update rules (similar to Right panel in Fig 2). In this plot, we consider only the loss for the shoulder, elbow, and wrist DOF as these are the dominant DOF (curves are similar when other critical joints are included). We see that FTL again gives good performance both in terms of rate of convergence and resulting solution (see Fig 6 or S2 Mov for a sense of the quality of the performance). The entire learning procedure is simulated 50 times for each algorithm and errorbars are 2 standard errors across the simulations.
Fig 5
Fig 5. Panels depict correlation between “true” encoding model and estimated encoding model parameters as a function of index over reach trajectories (for a single trial).
Each curve corresponds to the correlation for a different DOF. The encoding model parameters are not directly guaranteed to converge. We see, as expected, that the encoding model will improve for specific DOF in proportion to the extent to which those dimensions are relied on to perform the task. Shoulder DOF are crucial for the task, being implicated in most reaches, so are learned rapidly. Wrist and finger joints are relatively less critical for task performance, so are learned more gradually. In the thumb and middle finger panels above, the least well-learned DOF (thumb DOF 3 and mid DOF 3) can be interpreted as the “distal inter-phalangeal joint” (i.e. the small joint near tip of the finger), which is not heavily relied upon in this reach task.
Fig 6
Fig 6. Plots depict reach trajectories of a representative shoulder DOF for 4 paired examples of reaches, from separate re-initializations of the decoder (i.e. different trials).
Left panels show a poorly-performing early decoder (k = 2), and right panels show a well-performing decoder (k = 30). Rows correspond to matched pairs of reaches for different repeats of the experiment. Blue curves correspond to the actual decoded pose of the DOF over time, and red arrows depict the local oracle update (only visualized for a subsampling of timesteps). For the early reaches, observe that the decoder does not always proceed in the intended direction. For the late reaches, observe that actual pose updates are quite consistent with the oracle and trajectories are shorter because the targets are acquired more frequently and more rapidly.
Fig 7
Fig 7. Plots depict decline in performance (i.e. loss between noise-free oracle and decoded intention) with intention noise model mismatch using sum square error (SSE) over the duration of a reach for (left) cursor task and (right) arm reaching task trajectories, comparable to performance curves in Figs 2 and 4 respectively.
In each task, noise performance curves are obtained when the user’s intent is a noisy version of the oracle, captured by a linear combination of intention oracle and a random vector. The noise level is indicated by a noise percentage, corresponding to the magnitude of the noise relative to the intention oracle signal. The effects of the relative noise are not directly comparable across tasks because the noise is distributed over more dimensions in the arm task.
Fig 8
Fig 8. Left panel depicts a cartoon for a 2D projection of the arc-trajectory intention mismatch setting for the cursor task.
Contrary to the assumption that the intention is directly towards the goal (black arrow), the user intention actually is such that it would have induced an arc with initial angle ϕ (green arrow). After training, the decoder partly compensates for the arc-offset, undercompensating initially and overcompensating near the goal (red arrows). Center panel visualizes single trials from trained decoders from the 45° setting (each trace is from a different realization of neural encoding and training). All decoded trajectories have been projected from 3D into 2D and rotated to match the center panel orientation, and trials have a diversity of initial distances from the goal. Time during the trial is depicted from yellow to blue as in Fig 3. Right panel shows performance curves under increasing levels of nonlinear mismatch for the cursor task, trained using FTL (axes comparable to left panel of Fig 7).

Similar articles

Cited by

  • Workshops of the Sixth International Brain-Computer Interface Meeting: brain-computer interfaces past, present, and future.
    Huggins JE, Guger C, Ziat M, Zander TO, Taylor D, Tangermann M, Soria-Frisch A, Simeral J, Scherer R, Rupp R, Ruffini G, Robinson DKR, Ramsey NF, Nijholt A, Müller-Putz G, McFarland DJ, Mattia D, Lance BJ, Kindermans PJ, Iturrate I, Herff C, Gupta D, Do AH, Collinger JL, Chavarriaga R, Chase SM, Bleichner MG, Batista A, Anderson CW, Aarnoutse EJ. Huggins JE, et al. Brain Comput Interfaces (Abingdon). 2017;4(1-2):3-36. doi: 10.1080/2326263X.2016.1275488. Epub 2017 Jan 30. Brain Comput Interfaces (Abingdon). 2017. PMID: 29152523 Free PMC article.

References

    1. Serruya MD, Hatsopoulos NG, Paninski L, Fellows MR, Donoghue JP. Brain-machine interface: Instant neural control of a movement signal. Nature. 2002;416(6877):141–142. - PubMed
    1. Taylor DM, Tillery SIH, Schwartz AB. Direct cortical control of 3D neuroprosthetic devices. Science. 2002;296(5574):1829–1832. 10.1126/science.1070291 - DOI - PubMed
    1. Carmena JM, Lebedev MA, Crist RE, O’doherty JE, Santucci DM, Dimitrov DF, et al. Learning to control a brain-machine interface for reaching and grasping by primates. PLoS Biology. 2003;1(2):E42 10.1371/journal.pbio.0000042 - DOI - PMC - PubMed
    1. Hochberg LR, Serruya MD, Friehs GM, Mukand JA, Saleh M, Caplan AH, et al. Neuronal ensemble control of prosthetic devices by a human with tetraplegia. Nature. 2006;442(7099):164–171. 10.1038/nature04970 - DOI - PubMed
    1. Georgopoulos A, Caminiti R, Kalaska J. Static spatial effects in motor cortex and area 5: quantitative relations in a two-dimensional space. Experimental Brain Research. 1984;54(3):446–454. 10.1007/BF00235470 - DOI - PubMed

Publication types

Grants and funding

This work was supported by ONR N00014-16-1-2176 (http://www.onr.navy.mil/) and a Google Research Award (http://research.google.com/university/relations/) to LP. Simons Global Brain Research Awards SCGB#325171 and SCGB#325233 (https://www.simonsfoundation.org/) supported LP and JPC. JPC is supported by a Sloan Research Fellowship (http://www.sloan.org/sloan-research-fellowships/). All authors receive support from the Grossman Center at Columbia University (http://grossmancenter.columbia.edu/), and the Gatsby Charitable Trust (http://www.gatsby.org.uk/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

-