Modeling the Dynamics of Multielectrode Array Recordings Using Hidden Markov Models

Micah O'Halloran and Beau Cronin

 

Multielectrode arrays allow the simultaneous recording of the activity of many cultured cells. While it is assumed that these cells form networks during their growth, little is known about the nature of these networks or the dynamical properties of their activity. The specific data examined here is the result of stimulating a single pin in the array at a time. We apply the Hidden Markov Model (HMM) framework in order to characterize the resulting activity patterns. We find that, given appropriate initial conditions, HMMs can be trained to recognize sharp borders between a number of overall firing states. In addition, we show that appropriately trained HMMs can recognize which pin was stimulated given a previously unseen trial. This suggests that 1) stimulation of a particular pin invokes a stereotyped spatial and temporal pattern of activity, and 2) these patterns are distinct from one electrode to the next.

Introduction

Multielectrode arrays (MEAs) are two-dimensional arrays of electrodes. Using the proper cell culturing techniques, neurons can be induced to grow on the surface of the arrays. The electrodes can then be used to record the spiking activity of the neurons, as well as to stimulate those cells at certain locations on the array. The large amounts of data resulting from these recordings prompt many questions and pose a great analytical challenge. What is the pattern of connectivity between the cultured neurons, and how does this connectivity relate to the observed signals? Do the cells project axons across the entire extent of the array, or do they connect preferentially to those cells which are nearby? Do the cultured networks produce stereotyped patterns of activity, or is this activity diffuse and undifferentiated? The present study will attempt to answer this last question.

The data used in this study was collected by Nathan Wilson using Med64 electrode arrays (http://www.med64.com). These arrays contain 64 electrodes in an 8-by-8 square, with electrodes spaced 150 microns from each other. Cells were cultured on the array for several days, and then a series of experiments were performed in which pins in the second and seventh columns were stimulated individually (Figure 1). Seven or eight trials were recorded for each stimulation pin; an example of the raw data for a single trial can be seen in Figure 2. Each of the 64 channels were sampled at a rate of 20 kHz.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

Figure 1: MEA pin layout; highlighted pins are those which were stimulated

 

Figure 2: Single trial raw data from each of the 64 pins; only the first 100 ms is shown

 

This data was first filtered and spikes were extracted for each channel. This spike data was then used to train an HMM for each stimulated pin, which succeeded in extracting the general dynamical properties of the trials. An HMM is a state transition model defined primarily by the number of states represented, the transition probabilities between these states (P), and the likelihood of observing a particular signal given a certain state (O). These HMMs were then used to both generate a generic template of activity, and also to recognize which pin was stimulated on a previously unseen data set.

All HMMs were implemented using Kevin MurphyÕs MATLAB toolbox, and many of our choices regarding how to adapt HMMs to this application closely follow the work of Seidemann et. al.

Methods

Spike Extraction

Before the HMM framework can be used to extract information from the MEA data, the raw signals from each of the pins must be converted into discrete spike trains. This is an essential step for two reasons. The first is that the observation sequence O used to train the HMM must be a discrete sequence consisting of only one event at each time step. A natural way of defining an observation at a particular time, Ot, in this sequence is to assign to it the pin number on which a spike occurs at time t (Seidemann et. al.). If more than one pin in the array spikes at the exact same time, one is arbitrarily chosen to be the observation and the others are ignored (this case is rare: <5%). The second reason to extract spikes from the raw data is to reduce the amount of information that must be stored and manipulated during the analysis. The raw MEA data is sampled at 20kHz, and is stored as a double-precision floating point array. Performing spike extraction on the data allows the array to be stored in logical format (spike = 1, no spike = 0) and the sparsity of the resulting spike vector allows for compact storage using MATLAB's sparse command. Typical results on the MEA data yielded compressions greater than 500:1.

Now that the utility of extracting spikes is clear, the methods employed can be briefly outlined. A representative raw waveform from one pin of the MEA array is shown below in Figure 3.

 

Figure 3: Representative raw waveform.

 

The large transient present at 10ms is the result of an external stimulus being applied on pin 2 of the MEA array; this same transient can be seen on all of the array's pins. The simplest method of extracting a spike from this data might be to set a voltage threshold, say ±0.1mV, and call any deviation in the wave larger than this level a spike. Any deviations that don't exceed this threshold would be considered to be due to noise. Applying this threshold approach to the waveform shown above would yield poor results for several reasons. These will be addressed below, and the techniques employed to solve these problems will be briefly discussed. At the end of the paper, links to the MATLAB scripts developed for data cleansing and spike extraction, as well as other data preparation tasks are provided.

Deterministic Noise

One immediate non-ideality that can be seen in the response in Figure 3 is that a deterministic noise component is present, which is apparent from the periodic oscillations in the noise floor after 40ms. Applying basic signal theory to a generic spike train, we would see that there should be no significant spectral peaks in data away from DC. Any peaks that are present hurt the threshold spike detection algorithm. The reason this hurts the algorithm is that the true spikes ride on top of this deterministic noise, and will thus have different apparent amplitudes depending on where they occur in the noise's oscillation period. Since amplitudes are essential to the threshold detection algorithm, this noise source must be removed.

To remove spectral peaking, MATLAB's fft command was used to derive the spectrum, and any spectral peaks above a user-defined threshold were removed along with frequency components in a small band around these peaks to account for spectral smearing. Then the ifft command was used to return to the time domain.

White Noise

A second noise source that is inevitably present in the data is white noise due to the experimental setup. With a 20kHz sampling rate, the noise bandwidth of this setup is likely around 10kHz. Again, applying signal theory to a generic spike train that might be seen in this experiment, one can conclude that 5kHz of bandwidth is probably more than enough for the timing resolution that will be needed for training the HMMs. This choice of bandwidth should yield on the order of 0.2ms resolution in spike timing. By low-pass filtering (LPF) the signal, white noise power can be removed without significantly affecting the spectral components important in defining the spike train. This was again implemented using the fft and ifft commands.

Low-Frequency Movements

Slow movements in the underlying waveform that are not due to noise can also detrimentally affect the threshold detection algorithm. This type of movement can be seen immediately following the initial stimulus, where there is some residual decaying offset from the transient input. These movements can also be seen when a series of spikes occur close to each other in time. To counter this effect a high-pass filter (HPF) was also applied to the data, with an empirical cutoff set near 300Hz.

The spectrum of the waveform of Figure 3 is shown in Figure 4, before and after spectral cleansing. The deterministic noise was actually relatively small for this particular example and therefore no significant spectral peaking is seen in the original spectrum. However, other pins did show more significant peaking, making this step necessary.

 

Figure 4: Spectrum of raw and cleansed data.

 

The result of applying these spectral filters is apparent in the time domain as well, and can be seen in the first two panes of Figure 5. The top pane shows the original signal, while the second pane shows the filtered signal. Clearly, the periodic oscillations after 40ms have been eliminated, as well as much of the low-frequency movements near spikes. The oscillations in this case were eliminated by the HPF step, as the original noise is dominated by 60Hz interference.

After the waveform has been prepped, the final steps are to calculate the standard deviation s of the noise in the cleansed waveform, set a threshold for spike detection (the empirical value used was ± 3.2s), and then group deviations that occur very close in time into single spiking events. The results of these steps are shown in the bottom two panes of Figure 5. The third pane is a binary vector that is "1" at any point in time where the signal exceeds ± 3.2s, and the fourth is a binary vector of extracted spikes (for details on this part of the implementation, see the script SpikeExtract_Mult.m).

 

Figure 5: Original and cleansed signals, standard deviation detection, and extracted spike train.

 

HMM Application

A Hidden Markov Model (HMM) is a probabilistic model of time series data. A Markov process is one in which the state at time t+1 depends only on the state at time t. An HMM extends this basic principle by including hidden states, which are not directly observable, as depicted schematically in Figure 6.

 

Figure 6: Hidden Markov Model (HMM) Schematic

 

The key parameters of the model are: S, the set of hidden states; O, the set of possible observables; p, the stochastic vector of starting state; P, the stochastic matrix of hidden state transitions; and Q, the stochastic matrix of output probabilities given the current hidden state. Optimal parameter values are found using a version of the Expectation-Maximization algorithm (Dempster et. al.).

 

 

Name

Description

Definition

S

The state sequence through which the model progresses

St = state at time t

n = number of possible states

O

The observation sequence of the model

Ot = observation at time t

x = # of distinct observables

p

A stochastic n-vector containing the probabilities of starting in each state

pi = P(S1 = i)

P

A stochastic n-by-n matrix of the probabilities of transitioning between states

Pij = P(St+1 = j | St = i)

Q

A stochastic n-by-x matrix of the probabilities of outputting each observable given a current hidden state

Qij = P(Ot = j | St = i)

Table 1: HMM Parameters

 

Our application of the HMM framework closely followed Seidemann et. al., who modeled data from 6 electrodes implanted in the prefrontal cortex of monkeys while the animals performed behavioral tasks. In that application, the experimenters had the advantage of being able to correlate the HMM state transitions to both stimuli and behavioral events. Although there was no behavior with which to compare the behavior of our models, we were able to obtain qualitative confirmation that our models produced state transitions which were correlated with broad shifts in activity patterns (see, for example, the movie below).

Parameter Initialization

The choices for the number of hidden states and the initial parameter values are crucial to the success of any application of the HMM framework. In practice, this is often performed through a combination of the judicious use of domain-specific knowledge, direct observation of the data, and repeated execution of the training algorithm to find a combination that works.

            For the number of hidden states, we tried using values ranging from 2 up to about 8. It should be noted that adding states always improves the achievable likelihood. This is because, in the limit, perfect agreement can be reached between model and data by allowing for a different state at every time step. The likelihood measure does not in any way penalize for increasing the number of states. As will be discussed shortly, the solution to this problem was to heavily bias the model toward remaining in the same state from one time step to the next. Once this condition was imposed, transitions became relatively rare and the number of states which were profitably used became relatively stable at 5. Any additional states added above this number became likely only rarely, if at all. After this initial exploration, all subsequent models were created with five hidden states.

            The remaining decisions concerned the initial values to be used for the state transition matrix P, the output likelihood matrix Q, and the initial state vector p. Like Seidemann et. al, we found that the initial configuration of the P matrix was crucial. In particular, it was necessary to initialize this matrix such that the chance of remaining in the same state from one time step to the next was very high. We obtained good results when the diagonal values of this matrix were initialized to values distributed around 0.99. This had the effect of starting the optimization algorithm in a region of the search space where state transitions occur only rarely. This initialization regime is justified, because it is likely that the activity of the neuronal network is, indeed, changing its overall activity pattern only rarely over the time course of a 200 ms trial. If the P values are instead distributed uniformly, the resulting models change state at nearly every time step. This extreme sensitivity in turn makes comparison between models impossible.

            Unlike the P matrix, we found that the initial values of the Q and p parameters had little effect on the HMMs produced. This makes sense, given the nature of our application. The hidden states have no intrinsic meaning, so there is no reason to believe that any one state will be more likely than another; therefore, the p vector can reasonably be initialized uniformly. Similarly, there is no a priori relationship between the output symbols (i.e., a spike on a given electrode) and any particular state. This means that the Q matrix can also be safely initialized with uniformly distributed values.

HMM Construction

In order to reduce the complexity of the experiments, we selected 6 pins whose stimulation appeared to produce the most spatially and temporally varied activity; all subsequent work was confined to these pins (pins 2, 31, 39, 42, 47, 50). It should be noted that these pins were selected before any models were generated or experiments performed.

Once the state count and initial parameter values were specified as above, HMMs were generated. For each selected pin, multiple HMMs were generated with different randomly generated initial values. Because the learning algorithm performs only local optimization, each of these initializations produced a model with adifferent log likelihood. The model with the highest log likelihood was selected, and all subsequent experiments were performed using these chosen models. This process resulted in a set of six HMMs, one for each pin. The state machine represented by an example HMM is shown in Figure 7.

 

Figure 7: An example of a state machine produced by an HMM, along with the corresponding state transition matrix P.

Results

Once the "optimal" HMM, lj, has been trained for each of the stimulus pins j, several questions can be answered. The most pertinent is to determine whether or not HMMs are actually an appropriate framework for modeling the dynamics of the MEA response. To determine whether or not this is the case, the following steps were performed:

 

1.     A new observation sequence Onew resulting from a stimulus on pin j was collected. This observation was not used during the training of the HMM lj.

 

2.     One question that HMMs are well-suited to answer is the probability that the actual state at time t, St, is equal to the state i, given a certain observation sequence O and a certain HMM l (Rabiner et. al.). This can be written more compactly as P(St = i| Ol). These probabilities were calculated at each time step with O = Onew and l = lj.

 

3.     The resulting probabilities of existing in each state versus time, St, were plotted. An example from applying this procedure to pin 2 is shown below in Figure 8.

 

Figure 8: Probability of HMM l2 residing in a given state at each time step based on the observation sequence Onew.

 

Associated with each of the five states of l2 is a vector defining the probability of a spike occurring at a given MEA pin (the Q matrix is composed of these vectors). The firing probabilities of each pin in each of the possible states are depicted graphically in Figure 9.

 

Figure 9: Firing probabilities of each pin in each state.

 

Using these two pieces of information, a visual comparison of how well the model l2 predicts the MEA response can be constructed. The linked animation (if you are not using a windows machine, try the raw version) shows the actual MEA response on top, and the predicted response from the HMM model l2 on the bottom. As time advances, the bottom window shows the present state of the HMM and the predicted spatial probability of response over the array in that state. By comparing the regions of activity in the actual MEA response with those in the predicted response, it is clear that there is a strong correlation between the two Ð the HMM seems to model the dynamics of this response ÒwellÓ. The HMM framework allows us to also quantify the how well the model fits the data, but not in an absolute sense. This topic is addressed further below.

The other useful question that HMMs are adept at answering is the following: How likely is it that an observation sequence O was produced by a given HMM lj? Mathematically, the probability of interest is P(O|lj). This probability is useful in the MEA experiment in the following way. Suppose an observation sequence O from the array is available, but the pin that was stimulated to elicit the response is unknown. Given the trained HMM models lj, 1 ² j ² 6, the probabilities P(O|lj) can be easily evaluated, and max{P(O|lj); 1 ² j ² 6} predicts which pin was most likely stimulated to produce the given O. By evaluating these probabilities for six fresh observation sequences (one per stimulus pin; none used for HMM training), the results in Figure 10 were obtained. Each plot in this figure shows the log likelihood that a given observation sequence O was produced by each lj. The most likely HMM to have produced each observation sequence is shown in red. In each case, the HMMs correctly predict which pin was stimulated based only on the observed MEA array response.

This final demonstration is akin to the task that is normally given to HMMs in speech recognition. While these results show 100% accuracy, only one observation sequence per pin was tested. With proper training on larger datasets, however, this accuracy could easily stay in the region of 90%.

Figure 10: Most likely HMM to produce a given observation sequence.

Discussion and Conclusion

HMMs have been shown to be capable of 1) generating putative activity patterns which resemble actual neuronal activity for a given stimulus condition, and 2) discriminating between different stimulus conditions based upon the invoked activity patterns. These results suggest that the waves of activity which follow the stimulation of individual electrodes are both highly stereotyped for a certain stimulation location and strongly dependent on the identity of that location. This, in turn, suggests that the cultured network fires in reliable patterns and produces highly repeatable waves of activity.

            Important questions remain, however. In particular, it is not immediately obvious how to interpret the ÒmeaningÓ of the hidden states produced by each model. As seen in Figure 9, each state corresponds to a particular firing probability for each electrode. This partitioning is not transparent, and each running of the training algorithm will produce a set of states with widely diverging firing probability distributions. Because no repeatable division of labor between state emerges on subsequent iterations, it is difficult to interpret the meaning of the states. It may be fruitful to combine information about the dynamic properties of the networks with the constraints on its connectivity produced by the other project groups. This may allow for an interpretation of the dynamic behavior which is more closely related to the structure of the connections between the cells.

            HMMs are a promising tool in the analysis of multichannel data from cultured cells. The framework can be applied with a minimum of domain-specific assumptions, and succeed in classifying complex activity patterns. Based on these preliminary results, the continued application of HMMs to MEA data is warranted.

 

References

  1. Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. J Roy Stat Soc 39(1):1-38
  2. Murphy K (2003) Hidden Markov Model (HMM) Toolbox for Matlab. http://www.ai.mit.edu/~murphyk/Software/HMM/hmm.html
  3. Rabiner LR (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc IEEE 77(2):257-286
  4. Seidemann E, Meilijson I, Abeles M, Bergman H, Vaadia E (1998) Simultaneously Recorded Single Units in the Frontal Cortex Go through Sequences of Discrete and Stable States in Monkeys Performing a Delayed Localization Task. J Neurosci 16(2):752-768
  5. All data was acquired by Nathan Wilson

Related MATLAB Scripts

 

Data Cleanup and Spike Extraction

 

DataCleansing.m & DataCleansing_Mult.m

 

These scripts are used to examine the effects of filtering on a single pinÕs spike extraction and spectrum, and were used to create the data presented above. The two versions are for single and multiple trial data.

 

SpikeExtract.m & SpikeExtract_Mult.m

 

These are non-graphical versions of the cleansing algorithms with code built around the filtering shell to handle batch processing and extraction of entire data sets from both single and multiple trial data.

 

Decimate.m

 

This script is used to reduce the effective sampling rate of the data if further data compression is desired.

 

visualize.m (courtesy of Jennifer Wang)

 

This script can be used to make movies of array activity, as well as simply visualize array activity (for visualization only, Spk_animate.m is preferred).

 

 

Spk_animate.m

 

This script is used to visualize (flicker-free) spiking activity over the entire array, but doesnÕt consume large amounts of system memory (unlike visualize.m)

 

Rasterize.m & Rasterize_Mult.m

 

These two scripts are used to generate raster plots of the array response using single or multiple trial data sets.

 

Hidden Markov Model Training and Manipulation

 

HMM_train_Mult_2.m

 

This script, in conjunction with the MATLAB HMM toolbox performs training of HMM models on multiple-trial spike data.

 

IteratePin.m

 

This script was used in conjunction with HMM_train_Mult_2.m to randomly search the available parameter space and find the most likely HMM model to fit a set of multiple-trial spike data.

view_HMM.m

 

This script is used to view the relevant parameters of an HMM model that has been trained using the above two scripts. It was used to generate the plots of Figure 8 and Figure 9.

Extract_Observ.m

 

This script is used when the observation arrays of a set of multiple-trial data is needed.

 

Division of Labor

The division of labor was complex, and in general we felt as if we contributed equal effort. Micah performed the filtering and spike extraction. In addition, his Matlab fu was more powerful so we ended up primarily using his scripts. Beau performed some initial cross-correlation analyses and also acquired the stimulation data from Nathan when it became apparent that the spontaneous data would not meet our needs. Other than that, we worked very closely, both in serial and in parallel, to complete the analysis and the presentation/writeup.