Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their...

8
Decoding Emotions from Brain Signals Using Recurrent Neural Networks Raphaelle Hoang Shasha Liu Jos Parayil [email protected] [email protected] [email protected] Aaliyah Sayed Jonathan Yao Eva Zhang [email protected] [email protected] [email protected] Sabar Dasgupta* [email protected] New Jersey’s Governor’s School of Engineering and Technology July 24, 2020 *Corresponding Author Abstract—Despite playing a large role in human interactions and decisions, emotions are not well understood. Mediums such as music videos, which are mentally stimulating, often elicit a wide range of emotions from receivers. These emotions can appear as electrical signals through electroencephalography (EEG). Being able to classify and quantify an individual’s emotions has useful applications in many fields. This paper uses data from the Dataset of Emotional Analysis Using Physiological Signals (DEAP) dataset, which recorded the EEG signals of participants as they watched videos that were specifically chosen to elicit strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network was trained to find correlations between EEG data and self- reported emotional scores. The purpose of this study was to create an accurate and efficient method of decoding electrical activity into quantitative emotional values. It was found that valence scores, measures of the degree of happiness, predicted by the neural network matched somewhat closely with the actual valence scores given by the participants. It was concluded that the trained LSTM neural network achieved a reasonable accuracy in predicting valence scores given EEG signals. I. I NTRODUCTION Emotions are complex states of mind that are affected by a person’s internal and external environment. Due to the influence of various stimuli, it is difficult to interpret emotion via a standardized system. Subject responses are not reliable to evaluate emotion either; people cannot consistently describe the intensity and effect of their emotions. In addition, it is easy to manipulate self-declared emotions, yielding inaccurate results. As a result, researchers have turned greater attention towards other methods that do not rely on subject responses, such as identifying electrical signals in the brain. One method of detecting brain signals is electroencephalography (EEG). The process of collecting EEG data is noninvasive, leading to an increased focused and more data from this method. By utilizing EEG datasets, emotions can be accurately and effectively described on a quantitative scale from brain wave patterns, as seen from Figure 1. Fig. 1. Correlation between frequency strength and emotional state around the head [1] A reliable emotional analysis tool has medical applications. For instance, control groups can be compared with patients that have mental illnesses to see differences regarding emo- tions generated. Commercial applications are also promising; companies can use EEG data to determine accurate emotional responses to a particular product, since facial expressions and words may be misleading. Further research into training neural networks to correlate EEG graphs and various emotional states will permit reliable detection and relay of human emotions. The research paper is broken down into four parts. Part 1 covers background regarding emotion detection via EEG, electroencephalography, an overview of neural networks and machine learning, details about LSTM, and a brief explanation of the various Python modules and libraries that were used. Part 2 explains the methodology. Part 3 highlights major results observed from the neural network. Part 4 summarizes the conclusions drawn from the research. A. Background To decode emotions from EEG data, machine learning techniques must be used to find patterns that are not easily de- 1

Transcript of Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their...

Page 1: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

Decoding Emotions from Brain Signals UsingRecurrent Neural Networks

Raphaelle Hoang Shasha Liu Jos [email protected] [email protected] [email protected]

Aaliyah Sayed Jonathan Yao Eva [email protected] [email protected] [email protected]

Sabar Dasgupta*[email protected]

New Jersey’s Governor’s School of Engineering and TechnologyJuly 24, 2020

*Corresponding Author

Abstract—Despite playing a large role in human interactionsand decisions, emotions are not well understood. Mediums such asmusic videos, which are mentally stimulating, often elicit a widerange of emotions from receivers. These emotions can appear aselectrical signals through electroencephalography (EEG). Beingable to classify and quantify an individual’s emotions hasuseful applications in many fields. This paper uses data fromthe Dataset of Emotional Analysis Using Physiological Signals(DEAP) dataset, which recorded the EEG signals of participantsas they watched videos that were specifically chosen to elicitstrong emotions and rated their emotions on quantitative scales.A Long Short-Term Memory (LSTM) recurrent neural networkwas trained to find correlations between EEG data and self-reported emotional scores. The purpose of this study was tocreate an accurate and efficient method of decoding electricalactivity into quantitative emotional values. It was found thatvalence scores, measures of the degree of happiness, predictedby the neural network matched somewhat closely with the actualvalence scores given by the participants. It was concluded that thetrained LSTM neural network achieved a reasonable accuracyin predicting valence scores given EEG signals.

I. INTRODUCTION

Emotions are complex states of mind that are affectedby a person’s internal and external environment. Due to theinfluence of various stimuli, it is difficult to interpret emotionvia a standardized system. Subject responses are not reliableto evaluate emotion either; people cannot consistently describethe intensity and effect of their emotions. In addition, it iseasy to manipulate self-declared emotions, yielding inaccurateresults. As a result, researchers have turned greater attentiontowards other methods that do not rely on subject responses,such as identifying electrical signals in the brain. One methodof detecting brain signals is electroencephalography (EEG).The process of collecting EEG data is noninvasive, leadingto an increased focused and more data from this method.

By utilizing EEG datasets, emotions can be accurately andeffectively described on a quantitative scale from brain wavepatterns, as seen from Figure 1.

Fig. 1. Correlation between frequency strength and emotional state aroundthe head [1]

A reliable emotional analysis tool has medical applications.For instance, control groups can be compared with patientsthat have mental illnesses to see differences regarding emo-tions generated. Commercial applications are also promising;companies can use EEG data to determine accurate emotionalresponses to a particular product, since facial expressions andwords may be misleading. Further research into training neuralnetworks to correlate EEG graphs and various emotional stateswill permit reliable detection and relay of human emotions.

The research paper is broken down into four parts. Part1 covers background regarding emotion detection via EEG,electroencephalography, an overview of neural networks andmachine learning, details about LSTM, and a brief explanationof the various Python modules and libraries that were used.Part 2 explains the methodology. Part 3 highlights majorresults observed from the neural network. Part 4 summarizesthe conclusions drawn from the research.

A. Background

To decode emotions from EEG data, machine learningtechniques must be used to find patterns that are not easily de-

1

Page 2: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

tected. An artificial neural network, modeled after the humanbrain, is well suited for this type of detection and classificationtask. Inputs are passed through levels of neurons in order toproduce an output. In a biological neural network, a stimuluselicits electrical signals that are propagated down axons of theneuron and activate other neurons. Similarly, a machine learn-ing neural network accepts inputs, computes functions usingthose values, and produces a singular output–the network’sprediction. The network then calculates its loss, a value thatdefines how much the network’s prediction varies from theactual result. Following this, the network adjusts the functionthat the neuron uses to minimize the loss value [2]. The inputsare fed through hidden layers of neurons eventually to anoutput layer. For this research, array data from the EEG signalstaken from participants are the inputs, and emotional statesranging from varying levels of valence are the outputs.

Emotion may be analyzed in two dimensions: valence andarousal. Valence measures emotion on levels of pleasantness.Arousal measures emotions on levels of excitement. To de-scribe emotions with specificity, a valence-arousal chart canbe plotted with valence on the horizontal axis and arousal onthe vertical axis as seen in Figure 2. For example, “depressed”would have both a low valence score (unpleasant) and lowarousal score (low excitement), resulting in that emotion beinggraphed in the third quadrant of the valence-arousal chart [3].

Fig. 2. Valence-Arousal Chart of Emotions [3]

Being able to detect emotions from electrical activity inthe brain provides many useful applications. One exampleof its use is in the medical field. There has been increasedinterest in the difference between healthy individuals and thosewith mental illnesses such as bipolar disorder, depression, orschizophrenia. One possible difference lies in the electricalactivity of the brain. A study found altered electrical activityin the form of lower or higher frequencies in different regionsof the brain for schizophrenic subjects [4]. Networks thatcan take electrical activity from the brain and translate theminto quantitative emotional states can serve as useful tools forresearchers. Knowing the emotional state of a patient with acertain illness allows doctors to create more effective therapy

methods tailored to alleviate the specific emotion. In addition,researchers can compare various illnesses via emotion gen-eration. Breakthroughs in solutions from one illness can beapplied to another.

Moreover, companies wishing to test new products canconduct analysis on brain signals of a trial group. The emo-tional feedback will be more accurate than verbal or writtenreactions. Such feedback can be applied to the film industry.For instance, an entertainment company may log EEG datafrom critically acclaimed films and utilize machine learningto determine how high-quality movies made audiences feel.Then, they could have the same focus group watch a compa-rable unreleased movie to determine if the new movie elicitssimilar emotions from the focus group. If the emotions fromboth movies match, then the EEG emotion test could be anencouraging indication that the new movie resembles a qualityfilm. If the emotions do not match, then the marketing depart-ment could utilize the emotions which the focus group felt toidentify the appropriate target audience to market the movietowards. Evidently, EEG emotion analysis through machinelearning neural networking is an exciting advancement in avariety of fields.

A study conducted by You-Yun Lee and Shulan Hsiehused film clips to classify emotions as positive, neutral, andnegative [5]. The group then analyzed brain connectivity asa result of watching clips varying in emotional classification.Brain connectivity is measured using three indices: correlation,coherence, and phase synchronization index. Correlation refersto the strength of the relationship between two brain sites.Coherence relates to how closely two brain sites are workingtogether at a specific frequency. Finally, phase synchronizationindex describes the resemblance of the phases of two signals.It was found that negative emotion had greater correlationsin with occipital site than neutral or positive emotion withregards to theta and alpha bands, which are different rangesof frequencies. Positive emotion had greater correlations inthe temporal site than neutral emotion, especially in the righthemisphere of the brain. For the theta, alpha, and beta bands,there was higher coherence for negative emotions than positiveemotions, with greatest difference in the right parietal andoccipital regions seen in Figure 3. Positive emotion wasmore synchronized than negative emotion at each frequencyband, especially at the frontal region. Overall, negative stateshad higher correlation and coherence than positive states,especially at the occipital and temporal regions. This studyindicated that there is certainly a significant connection be-tween different EEG signals for videos varying in perceivedemotional output.

Another study by Yimin Hou and Shuaiqi Chen analyzed therelationship between EEG signals and emotional states evokedthrough music [7]. It was found that there was higher energy ofvarious frequencies in the frontal area when the individual waslistening to emotion evoking music. Theta and alpha energywere especially high in the occipital region while beta energywas higher around the forehead while listening to joyful music.Alpha signals were more active while listening to sad music.

2

Page 3: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

Fig. 3. Diagram of the various section in the brain. [6]

Overall, the study concluded that emotional music evokes thehighest activity from the alpha band because it changed themost.

B. EEG and Signal Processing

Electroencephalography, or EEG, is a noninvasive methodof collecting electrical activity generated by the human brain.During an EEG test, electrodes are mounted to the surfaceof the scalp at specific locations corresponding to differentregions of the brain. The electrodes are able to capture thesimultaneous firing of thousands of neurons in volts. The datais passed through an amplifier, where it is shown as voltagevalues over time [8].

EEG signal graphs display a mix of many frequency values,which can be classified into frequency bands: delta (1-4 Hz),theta (4-8 Hz), alpha (8-12 Hz), beta (12-25 Hz), and gamma(above 25 Hz) [10]. Delta bands are associated with slow-wave sleep during the REM cycle. The theta band and gammaband have been shown to be correlated with cognitive andemotion processing within the hippocampus. If theta activityis dominant and the person is overly emotional and/or de-pressed, reducing the theta activity often alleviates symptoms.However, there are no differences between a negative andpositive state in emotions in the theta band. The alpha band iscorrelated with the level of valence, or level of pleasantness,for certain emotions [9].

Emotions are classified by three factors: valence, domi-nance, and arousal [3]. Valences of emotions are distinguishedby being negative, neutral, or positive. Negative emotions areassociated with the occipital site, labeled “1” in Figure 3, whilepositive emotions have a higher correlation within the temporalsite, labeled ’6’ in Figure 3. Dominance refers to whetherthe feeling is submissive or in control of the situation, whilearousal is the range between how excited or calm the emotionis.

C. Neural Networks and Machine Learning

Neurons relay information to the brain through electricalsignals. Linked neurons are able to transmit signals to differentparts of the body quickly, allowing for almost instantaneous

reactions or thoughts [10]. As seen in Figure 4, recurrentneural networks are a type of artificial neural network thatsimulate memory recall within complex organisms. Recur-rent neural networks are able to be trained for supervisedlearning tasks. They are unique because they have the abilityto remember past stimuli, whereas regular neural networkscan not remember specific things between different learningtasks. There are three types of layers: the input , hidden,and output layer [2]. The input layer has input nodes thatreceive information about the task. The hidden layers’ nodesmodify the existing data based on the preexisting conditionsand guidelines (bounds) that the programmer has set. Outputnodes will receive data from all of the hidden nodes. Recurrentneural networks allow for efficient use of space and time inthe physical world. In regular recurrent neural networks, thereis a vanishing gradient problem, therefore in this experiment,a Long Short Term Memory recurrent neural network is usedto decode emotions based on preexisting EEG data.

Fig. 4. Diagram of the working of a recurrent neural network. [2]

D. Long Short-Term Memory (LSTM)

When someone is asked to recount the events in a movie,they would recall memorable moments that have great impacton the movie’s plot: the origins of the main character, plottwists, or character deaths. Details like describing the clothingof a specific character are irrelevant. This type of informationwould not help in forming judgements on what the movie islike or whether or not it is worth watching. Similar to howwe automatically process which information to remember andwhich to forget, LSTMs use a memory recall and evaluation[11], [12], [13]. LSTMs use filters to remember importantinformation and remove irrelevant portions in order to createa more accurate prediction.

Backpropagation uses a method called the chain rule totrain a neural network. After a forward pass in a network,back propagation performs a backward pass and adjusts themodel’s weight and biases, the adjustable parameters of amachine learning model. During backpropagation, recurrentneural networks experience a shrinkage in gradient,which iscalculated during the training of a neural network and usedto update the network’s parameters. Shinkage diminishes thegradients ability to update the network’s weight. If the value ofthe gradient becomes too small, the first layers, and thereforethe entire network, is unable to learn. LSTMs are used to

3

Page 4: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

counter gradient shrinkage. Using internal gates, the flow ofinformation is regulated so that certain unimportant data isdisposed.

Inputs are translated into vectors. A vector is an arrayof numbers describing a specific combination of properties.Multiple vectors can be used to identify a sample; for example,a banana is yellow, curved, and cylindrical. If an object ispurple, curved, and cylindrical, it is not a banana. Each vectoris passed through a hidden state that also holds previouslyencountered information.

LSTM contains two different kinds of non-linear Activationfunctions: tanh activation and sigmoid activation. When a tanhfunction is plotted on a graph the values will always rangefrom -1 to 1. Tanh activation is efficient at modeling inputsbecause any input is recognized as strongly positive, stronglynegative or neutral (value close to 0). When a sigmoid functionis graphed, all values will range from 0 to 1. The sigmoidfunction is helpful in neural networks because values can beupdated or, if the values approach 0, forgotten. A value of 1is recognized as extremely important and must be kept, whilea value of 0 is recognized as unimportant and is thereforediscarded [2].

Neural networks employ activation functions to turn inputsof a node into an appropriate output. The Rectified Linear Unit(ReLu) is an activation function where, if the input is less thanor equal to zero, the output is zero; if the input is greater thanzero, the output is equal to the input via a linear function [14].The ReLu function, as seen in Figure 5, is commonly used inneural networks due to its simplicity and ability to output atrue zero value – other functions require exponents and canonly produce values close to zero. The ReLu function can berepresented by the equation R(x) = max(0, x).

Fig. 5. Graphical Representation of the ReLu Function [14]

The sigmoid activation function, as seen in Figure 6, outputsa number between zero and one to retain important inputswhile filtering out extraneous information [2]. It turns inputsthat are significantly greater than one to a value around one.Inputs significantly less than zero are turned to a value aboutzero [15]. The outputs are then multiplied by the input, soinputs with outputs of zero are considered extraneous while

inputs with outputs of one are kept in their entirety. Theequation for the sigmoid function is S(x) = 1/(1 + e−x).

Fig. 6. Graphical Representation of the Sigmoid Function [15]

Softmax activation functions convert outputs into an arrayof probabilities. The sum of the series in each array is equal toone. The inputs for softmax functions are called logits, whichare the outputs of the preceding layer in a neural network.Softmax activation functions are typically used in the last layerof the neural network [16], [17].

LSTM cells have 3 sigmoid gates and 2 tanh gates. The 3sigmoid gates function as input, output and forget operations.The tan gates function to regulate outputs from sigmoid gates.Gates containing a sigmoid activation reduce vectors to valuesfrom 0 to 1. Data is retained by multiplying data by a factorof 1, allowing the data to remain the same and be sent to thecell state [2].

The vector passes through tan activation, where the vectorsare set to a value between -1 and 1. Information emerges asa new vector with both current and previous information.

E. Python Modules and Libraries

In this project, Python was used as it is the most proficientlanguage for machine learning and neural networks becauseof the available libraries and packages.

• NumPy: NumPy is a package that allows for scientificcomputing, specifically in array manipulation and fea-tures. NumPy was used to produce multi-dimensionalarrays to sort the data in the neural network.

• Pickle: Pickle is a Python module that converts the objecthierarchy into a byte stream. Pickle was used to obtaininformation from the .dat file and extract the NumPyarray.

• Pandas: Pandas is a Python library that is proficient indata manipulation and analysis. Pandas was used for timeseries analysis for the EEG data.

• PyEEG: PyEEG is a Python module that contains func-tions for analysis of brain signals and time series. PyEEGwas used to process EEG signals from the DEAP data.

• TensorFlow and Keras: TensorFlow is an open sourcelibrary that helps to train and develop machine learningmodels. Keras is imported from TensorFlow; it imple-

4

Page 5: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

ments neural network layers and supports recurrent neuralnetworks, such as LSTM networks.

• MatPlotLib: MatPlotLib is a Python library that visualizesdata; it was primarily used to display the data into graphs.

• Seaborn: Seaborn is a Python library built on MatPlotLibthat visualizes data. It provides a high level interface forstatistical modeling.

• Scikit-Learn: Scikit-Learn is a Python library that con-tains tools for data analysis. It helped with preprocessing,as well as splitting the data into training and testing sets.

II. METHODOLOGY

A. Data and Materials

The Dataset for Emotion Analysis using Physiological sig-nals, also known as DEAP, is a publicly available dataset inwhich the emotions and EEG data of 32 participants wererecorded. In this study, participants were shown 40 one-minutevideo clips while their EEG data was recorded [18]. After eachvideo, participants rated their emotions on a scale of 1-9 basedon arousal, valence, dominance, and personal liking of thevideo. These self-reported ratings are taken as the ground truthfor testing and prediction. The data for each of the participantsconsists of a dictionary mapping EEG signals as keys to afour-vector score of valence, arousal, dominance, and personalliking as values. For our experiment, we chose to only focuson predicting valence scores. All 32 electrodes recorded 60seconds of EEG data at 512 Hz from each music video clip,for a total of 8064 total readings per electrode, as seen inFigures 7 and 8. The data was then reduced to 128Hz andfiltered with a 4-50Hz bandpass filter [19].

Fig. 7. This is a graph of the voltage of a single electrode position over time.The data was collected for all frequencies of one participant watching onevideo for the entire minute of measurement.

B. Preprocessing

In preparation to pass the data into the neural network,the DEAP data was converted into a NumPy array [20] fromthe raw BioSemi format and underwent a series of furtherpreprocessing steps. First, the EEG readings were passedthrough a bandpass filter between the frequencies of 4-45Hz,specifically to isolate the frequencies related to emotions (4 - 8

Fig. 8. The voltage of a single electrode position over time. The data wascollected for all frequencies of one participant watching one video for theentire minute of measurement.

Hz, 12- 16 Hz, 25 - 45 Hz). A bandpass filter is a combinationof both a low pass and high pass filter. Low pass filters putupper bounds, while high pass filters put lower bounds toisolate the data. The butterworth filter, a signal processingfilter, was used to flatten the frequency response.

The position of certain electrodes have a strong correlationwith brain activity related to emotions, so the data from the32 electrodes was further processed so that only the data from14 electrodes would be considered by the neural network. Theelectrodes selected were Fp1, AF3, F7, F3, FC5, P7, Pz, 02,P4, P8, CP6, FC6, AF4, Fz, shown in Figure 9. FP1 and F3are most related to mild depression; similarly, Pz is associatedwith musical stimuli and higher recognition rates [7].

Fig. 9. Visualization of electrode locations on the scalp in the 10/20 system[21]

After this initial preprocessing, the data underwent furtherprocessing to be prepared for the LSTM. LSTMs take inputsin timestamps, so the data for each video was grouped intowindows of 256 samples to capture the band power over2 seconds of EEG signals. Since the data was sampled at128 Hz, there are 128 samples per electrode per second of

5

Page 6: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

video. A step size of 16 was established to further breakdown each 2 second window into .125 second increments with.125 seconds overlap between windows. These windows werethe primary method by which the neural network learned thepatterns associated with emotion valence and how it was testedfor accuracy. These windows were two-dimensional arrays inwhich the rows were the 14 electrodes and the columns werethe corresponding EEG recordings for 2 second increments,as seen in Figure 10. The training set consisted of data fromsubjects 1, 2, and 3.

Fig. 10. This is a graph of the voltage generated by 32 electrodes overour chosen window of two seconds. The data was collected for all electrodepositions and all frequencies of one participant watching one video for thefirst two seconds. Each trace corresponds to the voltage over time emitted byan electrode. This graph presents an example of raw data before undergoingpreprocessing techniques, highlighting the complexity of EEG data.

One caveat is that while the EEG readings for each videowere split into 2 second increments for the LSTM to findpatterns among the noisy readings, the valence and arousalscores remain consistent across the entire video. For example,seconds 1-2 and 46-47 of video 1 for participant 1 both yield avalence score of 7.71. The nature of the LSTM is to rememberinformation and discover long term dependencies across thetime windows, so data was split into smaller timeslices to takeadvantage of this unique memory quality. LSTMs are usuallyused when both the x and y values are time-dependent, butonly the x value is time dependent in this case.

From here, the x data was normalized in mean and am-plitude. In addition, the valence scores were grouped, orbucketed, into one of 10 values to simplify classification.

With these preprocessing steps complete, the data was splitinto a training set and a test set. Across all three subjects, 20%of the videos were added to the test set, while the remaining80% were added to the training set. The network consisted ofseven total layers. The five hidden layers were separated bydropout layers of 30% 50%, 30%, 30%, and 20% respectively.The first 5 layers were LSTM layers which used the ReLUactivation function having 512, 256, 128, 64, and 32 nodesrespectively. The last hidden layer was a dense layer and usedthe sigmoid activation function. The final layer had ten nodesand used the softmax activation function to transform theoutputs into a probability distribution between the 10 possiblescores for valence. The network was compiled using a meansquare error (MSE) cost function and the rmsprop optimizer.

The final network was trained for 250 epochs, a batch size of150 and a learning rate of 0.0001. Epochs are iterations overthe entire data set.

III. RESULTS

A maximum training accuracy of 84.3% and a validationaccuracy of 90.6% was achieved, as seen in Figure 11. Basedon the high performance on the training and validation set, atesting accuracy of more than 50% was expected. However, thealgorithm achieved a maximum testing accuracy of 32.3% onthe testing set and subjects 4, 5, and 6. The valence scores wereinitially inputted as floats, but the training accuracy reacheda maximum value of 35% after 300 epochs. This led to thecategorization of the numeric floats into 10 discrete values,one bucket per valence score. The addition of several dropoutlayers prevented overfitting and increased accuracy, but it wasnot able to pass the 33% accuracy rate threshold. The trainingsize of 250 epochs proved sufficient, as the accuracy of thetraining and validation sets remained around 84.3% and 90.6%and did not increase with more epochs.

Fig. 11. Graphical representation of the accuracy of the neural network inpredicting emotion valence scores.

A maximum training accuracy of 84.3% and a validation ac-curacy of 90.6% was achieved. Based on the high performanceon the training and validation set, a testing accuracy of morethan 50% was expected. However, the algorithm achieved amaximum testing accuracy of 32.3% on the testing set andsubjects 4, 5, and 6. Subjects 4, 5, and 6 were not included inthe training set, so one explanation for this drastic decrease inaccuracy is that the network was unable to generalize acrosssubjects. The valence scores were initially inputted as floats,but the training accuracy reached a maximum value of 35%after 300 epochs. This network was not applied to the testingset due to the low validation accuracy. To improve this, thenumeric floats for valence scores were categorized into 10discrete values: one bucket per valence score. The additionof several dropout layers prevented some overfitting, but thenetwork could not exceed 33% accuracy. The training size of250 epochs proved sufficient, as the accuracy of the training

6

Page 7: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

and validation sets remained around 84.3% and 90.6% and didnot increase with more epochs.

Fig. 12. The graph plots the differences between predicted and actual valencescores for 117 videos. The y-axis represents the percentage of occurrences foreach value of difference.

Figure 12 is a visual of the differences between valencescores predicted by the neural network and actual scoresreported by the participant for 117 videos. For each of the117 videos used in testing, the actual score is subtracted fromthe predicted score, and the distribution of these differencesis shown. X values represent magnitude and direction ofvariance. The heights at each given x value correlate withthe percent occurrence of those specific values. Taller barssignify high numbers of that specific quantitative variance. Theline graph superimposed on the bar graph gives a continuousrepresentation of an otherwise discrete bar graph. As can beseen, there exists a mode at a difference value of 0, whichindicates that instances where the predictions match the actualscores exactly occur the most. Thus, it can be concluded thatthe LSTM neural network was at least somewhat successfulin predicting valence values.

Fig. 13. The confusion table compares predicted valence scores to actualvalence scores. Darker boxes represent a greater number or matches.

Figure 13 displays a confusion matrix that compares pre-dicted valence scores against actual valence scores. Eachnumber represents an occurrence of predicted vs actual valence

scores. For example, the five in the top left indicates that therewere five videos for which the actual valence score was 2,but the algorithm predicted 1. For highly accurate algorithms,there should be a dark diagonal down the graph starting at thetop left corner, where the x and y values are equal, indicatingthat predicted scores equal the actual scores. Digital correlationbetween predicted and actual scores exist at equal or near equalvalues, meaning that the algorithm was able to predict valencescores correctly or one number away in many cases. However,a sharp diagonal line cannot be seen due to the low accuracyon the testing set.

IV. CONCLUSIONS

This project explored the use of EEG data and LSTMneural networks to quantify emotions at a given time. Thefinished neural network had an accuracy of 32.4% for thetesting set and 82% for the training set, showing that decodingemotions through EEG data is achievable and will possiblybe a more reliable way of obtaining emotional data in thefuture. Artificial neural networks have the ability to simulatethe patterns in complex relationships. As a powerful toolin machine learning, LSTMs will undoubtedly be helpfulin future studies concerning signal decoding. Further studieson training neural networks with EEG data are required toimprove upon this project and make it viable in medical andcommercial fields.

With more time, the neural network would have been trainedto give more accurate predictions. Specifically, a Fouriertransform, which can break any waveform down into sinusoidsof different frequencies [22], would have been used to changeEEG signal graphs from the time domain to the frequencydomain. Doing so would have yielded graphs clearly demon-strating the intensity of various signal frequencies, whichprovide more information than the time that signals took place.In addition, a more in-depth neural network would have beentrained using more data. This would have been achieved withmore combinations of electrodes, alternate window sizes, and agreater number of subjects in the training set. In addition, moreexperimentation on preprocessing techniques would have beencarried out to achieve more efficient and accurate methods. Forexample, data was sliced into 2D arrays, but perhaps slicing adifferent way would have proved more valuable. These resultsare attributed to the highly variable nature of EEG readings.This highly variable nature differs not only from person toperson but also between electrodes during a single EEG scan,where the algorithm is overfit to the model. Perhaps if it wastrained on all thirty two patients instead of just three, thealgorithm would be better at adapting and account for thevarious differences.

ACKNOWLEDGEMENTS

The authors of this paper gratefully acknowledge the fol-lowing: Project Mentor Sabar Dasgupta for his extensiveknowledge of signal processing and neural networks andcontinued guidance; Head Residential Teaching Assistant andProject Liaison Rajas Karajgikar for his valuable insight and

7

Page 8: Decoding Emotions from Brain Signals Using Recurrent ... · strong emotions and rated their emotions on quantitative scales. A Long Short-Term Memory (LSTM) recurrent neural network

support; Research Coordinator Benjamin Lee for his assistancein conducting proper research; Dean Jean Patrick Antoine,Director of GSET, and Dean Ilene Rosen, Director Emeritus ofGSET; Rutgers University and Rutgers School of Engineeringfor the chance to advance knowledge, explore engineering,and open up new opportunities; Lockheed Martin, New JerseySpace Consortium, and other corporate sponsors for fundingour scientific studies; and lastly NJ GSET Alumni, for theircontinued participation and support.

REFERENCES

[1] W. Long, J. Zhu, and B. Lu, “Identifying Stable Patterns over Timefor Emotion Recognition from EEG,” IEEE Transactions on AffectiveComputing, vol. 10, no. 3, pp. 417-429, 2017. [Image].

[2] “Understanding LSTM Networks,” Colah’s Blog, 2013. [Online]. Avail-able: https://colah.github.io/posts/2015-08-Understanding-LSTMs/. [Ac-cessed Jul. 18, 2020].

[3] S. Mullin, “Emotional Persuasion 101: What It Is, WhyIt Matters and How to Use It,” Shopify, 2017. [Online].Available: https : //www.shopify.com/blog/emotional −persuasion?utmsource = Shopify + Blog + Email +Updatesutmcampaign = 2c9b028799 − shopify − daily −blogutmmedium = emailutmterm = 01077797dad −2c9b028799 − 349020725mccid = 2c9b028799mceid =31ae67a940. [Image]. [Accessed Jul. 18, 2020].

[4] J. M. Morihisa, F. H. Duffy, R. J. Wyatt, “Brain Electrical ActivityMapping (BEAM) in Schizophrenic Patients,” Arch Gen Psychiatry, vol.40, no. 7, pp. 719-728, 1983. [Abstract].

[5] Y. Lee and S. Hsieh, “Classifying Different Emotional States by Meansof EEG-Based Functional Connectivity Patterns,” PLoS ONE, vol. 9,no. 1, 2014.

[6] Y. Bhagat, “Diffusion Tensor Imaging of the Human Brain,” Heartand Stroke Foundation of Canada, 2007. [Online]. Available: https ://www.researchgate.net/figure/1−An− illustration− of −the − human − brain − with − several − regions − labeled −Source − Heart − andf ig5291346848. [Image] [Accessed Jul. 7,2020].

[7] Y. Hou and S. Chen, “Distinguishing Different Emotions Evoked byMusic via Electroencephalographic Signals,” Hindawi, vol. 2019, 2019.

[8] B. Farnsworth, “What is EEG (Electroencephalography) and How DoesIt Work?” 2020. [Online]. Available: https://imotions.com/blog/what-is-eeg/. [Accessed Jul. 18, 2020].

[9] L. Qian, C. Xi, H. Tom, X. Duo, C. Frederick, and B. James, “ThetaBand Activity in Response to Emotional Expressions and Its Relation-ship with Gamma Band Activity as Revealed by MEG and AdvancedBeamformer Source Imaging,” Frontiers in Human Neuroscience, vol.7, pp. 940, 2014.

[10] N. T. Carnevale and M. L. Hines, “A Tour of the NEURON SimulationEnvironment,” in The NEURON Book. Cambridge: Cambridge Univer-sity Press, 2006, pp. 1-31.

[11] R. C. Staudemeyer and E. R. Morris, “Understanding LSTM: A Tutorialinto Long Short-Term Memory Recurrent Neural Networks,” CornellUniversity Computer Science, 2019.

[12] C. Nicholson, “A Beginner’s Guide to LSTMs and RecurrentNeural Networks,” Pathmind,2019. [Online].Available:https://pathmind.com/wiki/lstm. [Accessed Jul. 18, 2020].

[13] Y. Fan, Y. Qian, F. Xie, F. K. Soong, “TTS Synthesis with Bidirec-tional LSTM Based Recurrent Neural Networks,” International SpeechCommunication Association INTERSPEECH, 2014.

[14] J. Brownlee, “A Gentle Introduction to the Rectified Linear Unit(ReLU),” Machine Learning Mastery, 2019. [Online]. Available:https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/. [Accessed Jul. 19, 2020].

[15] J. Evans, “The Post-Exponential Era of AI andMoore’s Law,” TechCrunch, 2019. [Online]. Available:https://techcrunch.com/2019/11/10/the-post-exponential-era-of-ai-and-moores-law/. [Accessed Jul. 18, 2020].

[16] “Understand the Softmax Function in Minutes,” Medium,2018. [Online]. Available:https://medium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes-f3a59641e86d.[Accessed Jul. 18, 2020].

[17] S. H. Khor, “The Gentlest Introduction to Tensorflow- Part 4,” KD Nuggets, 2017. [Online]. Available:https://www.kdnuggets.com/2017/02/gentlest-introduction-tensorflow-part-4.html. [Accessed Jul. 18, 2020].

[18] Database for Emotion Analysis Using Physiologi-cal Signals, “DEAP Dataset.” [Online]. Available:http://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html. [AccessedJul. 8, 2020].

[19] S. Koelstra, C. Muhl, M. Soleymani, J. Lee, A. Yazdani, T. Ebrahimi,T. Pun, A. Nijholt, and I. Patras, “DEAP: A Database for EmotionAnalysis Using Physiological Signals,” IEEE Transactions on AffectiveComputing, vol. 3, no. 1, 2012

[20] S. van der Walt, S. C. Colbert, G. Varoquaux, ”The NumPy Array: AStructure for Efficient Numerical Computation,” IEEE, vol. 13, no. 2,pp. 22-30, 2011.

[21] G. Placidi, P. D. Giamberardino, A. Petracca, M. Spezialetti, and D. Ia-coviello, “Classification of Emotional Signals from the DEAP Dataset,”Proceedings of the 4th International Congress on Neurotechnology,Electronics and Informatics, 2016. [Image].

[22] P. Bevelacqua, “Introduction to the Fourier Transform,”The Fourier Transform, 2020. [Online]. Available:http://www.thefouriertransform.com/introduction. [Accessed Jul.18, 2020].

8