MEA Data Analysis: Information Theoretic Approach

9.290 Midterm Project Report, Spring 2004

Jake Bouvrie and Michael Baker

 

Abstract

Given recordings of neural network spike activity from a 64 channel microelectrode array, we take an information theoretic approach towards inferring connections between small sub-networks of neurons situated within the array. We compare and contrast two distinct methods for discovering such connections: one in which distributions of pairwise channel arrival times are chosen to satisfy a minimum entropy criterion, and another in which we quantify and evaluate the amount of information each node of the array conveys about the others by way of mutual information maximization. We then discuss the resulting connection possibilities, as well as the validity of the connection hypotheses, and finally justify the use of information theory as an attractive alternative to traditional cross-correlation analysis.

 

Introduction

The mammalian nervous system is characterized by a multitude of elements, coupled through their electrochemical signals.  To better understand this system we wish to begin to answer questions concerning how such networks of neurons form, how information is represented in such networks, and how information flows through the network. 

Several techniques have been used for measuring the cellular potential of neurons.  The patch-clamp method permits the exploration of the intracellular potential of an individual neuron.  This micro-pipette method has been extended to make measurements of multiple neurons where the interconnections between neurons could be determined [7].   Since this method requires individual micro-pipettes to be positioned, it is not typically suitable for recording from a large number of neurons.  Furthermore, it could be difficult to apply patch-clamp techniques to the development of new connections among a group of dissociated neurons. 

The Multi-Electrode Array (MEA) allows an experimentalist to observe the extracellular potential across a relatively large area.  The array used in this work was 1mm x 1mm with 64 electrodes on an 8 x 8 grid.  The large area afforded by the array means the properties of complexes of many neurons can be studied.  Because the measurements are extracellular, potentials from several neurons can contribute to the measured potential at any one electrode.  

In vitro observations of dissociated neurons indicate that a new network of neuronal connections establishes after several days of preparing the culture.  The output of the electrodes exhibits spontaneous firing behavior.  Although the exact properties of this new network can not be completely described by the activity of the channels, there are several things that can be inferred.  First, if connections can be drawn between the electrodes, some implications concerning the physical distance of connections can be made.  Second, if it can be established that several channels are causally described by the activity of one other channel, then we can begin to infer the number of connections made. 

We have chosen to draw these connections and infer causality using information theoretic methods.  Information theory is uniquely suitable to this task because it combines independence over our choice of random events with a quantifiable definition of our certainty called entropy.

Prior Work

 

The use of information theory as a tool for analyzing neuronal data is nothing new. Several authors [2,3] have applied information maximization in the context of blind source separation, where spike activity from several neurons has been recorded simultaneously and must be separated (or “sorted”) into spikes specific to each neuron on the basis of unique spike signatures. Other investigators [1,4,5] have calculated the mutual information between a known stimulus and the corresponding neuron responses in order to determine the rate of information transfer, while Chechik [6] has investigated communication between a group of neurons firing into a single destination neuron and examined the corresponding information transfer properties of such a configuration.

 

Our approach however, represents a departure from much of the established work. Because spike data recorded from a given MEA electrode can possibly include spikes from an unknown quantity of neurons, we cannot match specific spikes to specific pre/post synaptic neurons with known spatial locations. Many of the above papers made use of patch-clamp data recordings where the investigators were sure as to the source and destination of each spike. In addition, our MEA spike data does not represent a response to a known applied stimulus, making it impossible to compute information transfer rates or symbol efficiencies. We therefore stress that our results must be interpreted with some degree of flexibility; they are not “hard truths” but high-probability suggestions, and the presented analysis is truly “exploratory” in nature.

Spike Extraction from Noisy Data: Methods

Examining the distribution of voltages that are present in the noisy signal we can determine several properties that are relevant to extracting spike symbols. 

Noise Properties

Figure 1 shows a representative histogram of the noise from a channel.  Note that the approximate histogram shape appears to be Gaussian, fitting with our expectation of the binomial distribution of noise arising from shot processes. 

Figure 1: A plot of the histogram of voltages on channel 1 for bic1_file1.  Note the Gaussian fit with σ = 0.003 V.

Knowing a rough distribution for the noise can assist us in determining a reasonable threshold above which data points are likely not associated with noise. 

Spike Properties

A number of distinct spike shapes are expected in an MEA array – corresponding to several different locations on the spiking neurons [8].  Examining the channels we observe many common spike shapes.  Figure 2 shows 5 representative spikes from our data overlaid to show their general similarities. 

Figure 2: Spike shapes overlaid to show similarities.  The x-axis is samples of 50us.  Peak amplitudes have been normalized to 1.

As we will show in the next section, knowing the shape of the spikes is useful in detecting spikes. 

Detecting Signals in Noise

Consider the case of a periodic signal such as a sinusoid in noise which we wish to detect.  An easy way to automate detection is to employ a comparator which is sensitive to a voltage level.  Since a sinusoid is not a constant voltage level, we can convert it to a constant voltage level by multiplying our noisy signal, xn(t), with our detector signal, s(t), we can extract a constant component,

If we filter out the high-frequency components, we are left with a constant voltage level, .  This approach is used for both analog and digital radio receivers with amplitude modulation schemes.  Not surprisingly, this receiver principle is also applied to finding non-periodic signals in noise by finding a matching filter.

The matching filter is a time-series that is convolved with the input signal to produce a maximum output in the event of a symbol event.  It is possible to show that for a given symbol sequence, x[n], the maximizing filter for convolution is also, x[n].  We can compute an estimate of x[n] by computing a representative spike shape.  Figure 3 shows the filter used for this purpose.

Figure 3: A representative spike shape used as a maximal filter, x[n].

Spike Extraction from Noisy Data: Results

We have no a priori information about the probability distribution of spikes in our data.  We do, however, have some knowledge about the nature of the noise in our system and its statistics. 

Note the operation of the maximal filter on the data preserves the Gaussian character of the noise.  Because the convolution takes a sum of random variables, the output at any sample is also a random variable.  The distribution mean has changed, however, because the sum of the weights in the maximal filter is not zero. 

To extract the spike events, we can look at the peak levels in the data channel and determine the membership in either the spike distribution or the noise distribution.  Figure 4 shows the chosen threshold from the distribution fitting to noise and spikes.  The vertical blue line indicates the threshold at which the distributions are indistinguishable, corresponding to the maximum a posteriori probability of deciding correctly between membership in either set.

Figure 4: A histogram of extracted peak amplitudes is plotted after the maximal filter is applied.  Note the Gaussian noise is integrated by the filter and remains Gaussian.  A threshold, shown as a vertical line, is used to distinguish two distributions.

Determining Causal Relationships: Methods

Causal relationships among channels can help us understand the emergent neural network in several ways.  First, it indicates the possibility of connection between groups of neurons and the distances in the array over which this occurs.  Second, causal relationships can be useful as indications of hidden sources of excitation which are represented in a common way across the electrodes. 

Measuring channel-wise entropy is an appropriate way to obtain this because it captures our overall certainty about the presence of a particular condition.  It is most important, however, that the condition which we are seeking to identify is clearly represented in our choice of random variables.

The chosen variable must capture the common timing information between channels.  The chosen variable for this work is shown in Figure 5.  Consider the arrival time of a spike on channel j, after a spike on channel i is observed.  If we have certainty over the arrival time, we have a possible causal connection. 

Figure 5:  Random variable, X, shown as the arrival time on some channel j after a spike on channel i.

Estimating the distribution in each case was hampered by the lack of extensive data on some channels.  As a result we experimented with approximation methods for a smoother distribution.  Ultimately, we discarded event times that were unlikely and we binned the distributions into 32 bins.  We then computed the entropy of all the pair-wise channel distributions.

The measure of entropy is given by,

.

For a distribution P(X), where a particular outcome in AX is almost assured, we have a very low H(X).  Similarly, distributions where P(X) is roughly even for all AX will have high entropy.  The entropy is measured in bits, where each bit can be thought of as the result of a fair coin flip [3].

We computed this random variable for all channels, and sorted the output to find strongest causal connections following the rule,

.

If we define a connection only between channels where we have less than 1 bit of entropy for this measure C(z), we are choosing connections where more often than not, the pattern of activity on the target channel is a delayed version of the activity on the source channel.  Thus, we can infer how information is moving across the set of electrodes for the duration of our experiment.

Determining Causal Relationships: Results

The sorted outcomes for the channels exhibited some expected properties.  First, we noticed no connections stronger than about 0.6 bits.  This means that under most conditions we do not have complete certainty over the causal relationships between the data.   There are notable exceptions to this, however, where some channels exhibited almost concomitant firing behavior during some periods.  Figure 6 shows an example of this behavior for two channels.

Figure 6: Channels where firing occurred almost concomitantly.

The remaining connections which were stronger (less than) 2 bits are shown in a directed, weighted graph, Figure 7.  Note that there are several super-nodes (27, 62, 57) which appear causally related to several other nodes.  Underlying these may be connections which are exciting activity across the network.  Further, there are other nodes (49, 64) which appear to be causally related to more than one of these super-nodes.

Figure 7: A directed graph showing the results of the entropy minimization.

It is worth noting that long-range connections have been established between nodes spanning a distance of approximately 1 mm. 

Measuring Mutual Information: Methods

 

As a second approach to uncovering connectivity within the array, we investigated a form of mutual information between pairs of electrodes. Before describing this approach, we first offer a brief explanation of mutual information, and its applicability to MEA data.

 

Generally speaking, mutual information quantifies the information one random variable tells us about another. Seen in the context of array data, we can describe mutual information as measuring the reduction in uncertainty for one channel we gain given our knowledge about another channel’s activity. Thus for two random variables X and Y, we can write the mutual information I between the two in terms of the conditional entropy:

 

 

with I(X;Y) = I(Y;X) and I(X;Y) non-negative, and where we define H(X|Y) as

 

.

In a significant fraction of the literature concerning the application of information theory to neural spike data, mutual information is computed between a known stimulus and the corresponding neuronal response (see for example, [1]). In an attempt to cast our exploratory data analysis problem within this framework, we consider as the “stimulus” spike activity found on one given MEA channel, and as the “response”, spike activity on another. Taking all possible pairs of channels, we compute the resulting mutual information and extract possible connections under the assumption that spikes from groups of neurons under one electrode might cause (or at least contribute to) spiking at another.

 

More formally, we first define the “distribution” Pi(Y) to be the probability of a spike occurring in channel i at time Y=t, computed as the spike triggered average over a Gaussian time window (here chosen to be 40ms.). An example of such an average is shown for channel 26 of the MEA array in Figure 8.

 

Figure 8: Probability Pi(Y=t) of spiking over a range of times t in milliseconds.

 

Next, we define the distribution Pj(X) to be the probability of seeing the spike interval time X=x (ms) in channel j as illustrated in Figure 9. These distributions of spike arrival times for each channel look strikingly Poisson, however we create the distributions by computing a histogram of observed times from the spike data, and do not assume any particular parametric form.

 

Figure 9: Spike arrival times for a given channel j are collected to form the distribution Pj(X).

 

Lastly, we define the conditional distribution Pij(X|Y) to encode the probability that a spike in channel j occurs X=x milliseconds after a spike Y at time t in channel i. Note that this is the same distribution defined in the previous approach for each pairwise channel random variable Xij, although we now reinterpret the reference spike as a conditional stimulus. In practice, as with the previous distributions, we computed estimates based on histograms of observed timing from the recorded data.

 

Given the above definitions, we may now build a joint distribution over “reference” spikes in one channel and the presumed resulting spikes in another by taking the product of the conditional distribution and spike probabilities:

 

.

 

Rather than compute this joint distribution explicitly however, we simply use the above definition of conditional entropy to arrive at the mutual information calculation

 

Given Iij for all i and j (which range over the channels of the array), we may now infer connectivity by choosing the connection C(z) for a node z as the node which conveys the maximum amount of information about z:

 

 

The Iiz (over all i) may also be thresholded according to a knee pattern (if present) or some descriptive statistics, in order to decide whether a connection is present or not. Alternatively, one might assume that multiple connections are physically likely, and interpret the mutual information quantities as a sort of confidence in each prospective connection.

 

Finally, it should be noted that the above mutual information calculation is not, strictly speaking, mutual information since it is not always the case that Iij(X;Y) = Iij(Y;X). Because the “distribution” P(Y) represents the instantaneous probability of seeing a spike in a particular channel, we average over all time to obtain the final information quantity and we cannot simply “swap” P(X) with P(Y). Similarly, it is not always the case that Iij(X;Y) = Iji(X;Y) since different channels might have different arrival time distributions. In practice, the fact that Iij(X;Y)  does not exactly equal Iij(Y;X) can be overlooked, and we force the matrix I of pairwise information with entries (I)ij to be symmetric in order to make the computation a “true” mutual information.

 

Measuring Mutual Information: Results

 

In order to ensure that the various distributions above would be estimated accurately, we chose to consider only those channels with more than 40 recorded spikes, and collected only biologically plausible spike interval times between 5ms. and 15ms. We found that it is important to enforce realistic constraints on the time intervals used to build the various distributions in order to prevent spurious timing themes from overpowering other timing patterns, resulting in false connections (e.g. all nodes connect to the same other node). In addition, confining the analysis to the above time window restricts the connections to those that are direct, and does not consider connections which travel through one or more intermediate neurons. A symmetric color-coded matrix (I) ij of mutual information calculations from the bic1 dataset is shown in Figure 10, where red represents high mutual information, and blue low mutual information. The diagonal entries have all been set to zero (dark blue) to enforce the notion that a variable cannot reduce the uncertainty of itself.

 

Figure 10: For each channel (row), the mutual information with each of the other channels (columns) is shown with red corresponding to higher mutual information, and blue low mutual information. The specific array channels (with more than 40 spikes) are denoted on the left.

 

Given such a matrix, the single strongest connection for each node was identified according to the procedure described under Methods above. In Figure 11, we show a handful of the strongest of these best connections amongst array electrodes, and note that some of the connections found in the first (Entropy minimization) approach are re-discovered using the mutual information framework, although directedness of the graph has been lost due to the symmetry of mutual information.

 

Figure 11: A subset of the strongest connections is shown. Line thickness denotes relative strength, and we note that some of the connections were also found using the first approach.

 

Conclusion & Caveats

 

This project has shown two possible ways of analyzing multi-electrode array spike data within an information theoretic framework. We firstly proposed to examine distributions of spike arrival times across pairs of MEA channels, and uncovered network connectivity by looking for coherent interspike timing themes in the distributions of pairwise arrival times. The most convincing timing patterns were identified by choosing, for each channel, the distribution with minimum entropy. Our second approach quantified the amount of information each node in the array conveyed about each of the others (in the form of mutual information), given distributions of pairwise arrival times, single channel arrival times, and single channel spike probabilities. The results of both methods were then used to propose connections between sub-networks of neurons in the array, giving directed and undirected graphs in the first and second approaches respectively.

 

While both methods uncovered likely connections between each node, they clearly do not discover all connections, and at times identify possibly spurious ones. This is due in part to the fact that there might be multiple neurons beneath a given node, which makes it impossible to clearly identify pre- and post-synaptic neurons. In addition, it is possible that some of the observed spike activity might be due to internal sub-network communication, and is not at all related to external sub-networks. Lastly, intermediate neurons cannot be identified from the spike timing, and it could be the case that spikes are traveling through multiple neurons rather than along direct paths, while others may be entirely lost in situations where the post-synaptic neuron does not read enough inputs to itself fire in response. We also entirely ignore possible inhibitory effects.

 

As an added complication, the neurons from which the spike data was recorded were not firing in direct response to a known stimulus. Thus, we have no time reference for activity, and no “ground truth” with which to compare our conclusions, other than physical (im)plausibility. Furthermore, the bic spike promotion chemical causes flurries of simultaneous spiking shortly after its application, and it is not clear when real cause/effect “communication” begins. Without bic however, spontaneous spiking data over anything but very long time periods would be too sparse to accurately estimate the required distributions.

 

Despite these difficulties, the information theoretic approaches outlined in this project can surely at least shed some light on the organization of a cultured neural network. It is also worth noting that information theory can provide a viable alternative to, and explanation for, traditional correlation analysis. Explaining relationships between networks of neurons in terms of bits of information can be more compelling than finding and comparing peaks in hundreds of cross-correlation plots. In the case of binary data vectors, cross-correlations are also somewhat unreliable due to the fact that, in order to detect coherent patterns, spikes must line up exactly. If the sampling rate is high, this is very unlikely to happen when it ought to, and many patterns might be lost. Estimating distributions of spike times on the other hand, provides some flexibility or “fuzziness” in the search for timing themes. If a prominent spiking pattern is in the data, then a distribution of arrival times ought to have comparably low entropy. We therefore argue that a probabilistic/information theoretic approach would in most cases be preferable to correlation analysis.

 

References                                                                          

 

[1] P. Dayan & L.F. Abbott, Theoretical Neuroscience, Ch. 4, MIT Press, 2001.

 

[2] Amari, S., Chichocki, A. and Yang, H. H., ``A New Learning Algorithm for Blind Signal Separation,'' Laboratory for Information Representation, FRP, RIKEN, Wako-shi, Saitama, Japan. NIPS 95.

 

[3] Bell, A. J. and Sejnowski, T. J., ``An Information Maximization Approach to Blind Separation and Blind Deconvolution, '' Neural Comp. 7(6):1129-1159 (1995).

 

[4] C.J. Rozell and D.H. Johnson. Examining methods for estimating mutual information in spiking neural systems. In Annual Computational Neuroscience Meeting, Baltimore, MD, July 2004. Submitted.

 

[5] C.J. Rozell, D.H. Johnson, and R.M. Glantz. Measuring information transfer in crayfish sustaining fiber spike generators. Biological Cybernetics, 90(2):89-97, February 2004.

 

[6] G. Chechik, Spike-Timing-Dependent Plasticity and Relevant Mutual Information Maximization. Neural Computation, 15, 1481-1510, 2003.

[7] M. Alreja, G. K. Aghajanian, Use of the whole-cell patch-clam method in studies on the role of cAMP in regulating the spontaneous firing of locus coeruleus neurons.  Journal of Neuroscience Methods, 59 (1995): 67-75.

[8] E. Claverol-Tinture, J. Pine, Extracellular potentials in low-density dissociated neuronal cultures, Journal of Neuroscience Methods, 117 (2002): 13-21.

 

Division of Labor

Joint Work:            - Conceptual design of methods for entropy analysis and mutual information analysis.

                             - Progress presentations (see hebb.mit.edu/people/neville/mea)

Jake Bouvrie:                   - Mutual Information Analysis Code

                             - Entropy Analysis Code & Entropy Ranking Figures

Michael Baker:        - Maximum Aposteriori Decoder Code

                             - Directed Graph Code for MI and Entropy matrices

Writeup:

- Abstract - Jake

- Introduction – Mike

- Prior Work – Jake

- Spike Extraction from Noisy Data Methods & Results – Mike

- Determining Causal Relationships Methods & Results – Mike

- Measuring Mutual Information Methods & Results – Jake

- Conclusion & Caveats - Jake