DEEPWATER HORIZON CRISIS: A SOCIAL NETWORK ANALYSIS

28
DEEPWATER HORIZON CRISIS: A SOCIAL NETWORK ANALYSIS NetworkThink Team A Kaitlin Donohue Yilin Wei Yunjing Yao Subramanian Vellaiyan Jetson Guy

description

658 Final Project_Group ASocial Network Analysis of Tweets about BP Oil Spill

Transcript of DEEPWATER HORIZON CRISIS: A SOCIAL NETWORK ANALYSIS

  • DEEPWATER HORIZON CRISIS: A SOCIAL NETWORK ANALYSIS

    NetworkThink Team A Kaitlin Donohue Yilin Wei Yunjing Yao Subramanian Vellaiyan Jetson Guy

  • 2

    1 Introduction On April 20, 2010 the BP-operated Mobile Offshore Drilling Unit (MODU) Deepwater Horizon experienced a loss of well control. The events that followed led to 11 deaths, fires, explosions and ultimately sinking of the unit. As a result of this disaster, high levels of liquid and gaseous hydrocarbon leaked into the Gulf of Mexico until the well was finally stopped on July 15, 2010.1 During the three months prior to closure, the Deepwater Horizon Crisis, as it has come to be known, is estimated to have caused discharge of 4.9 billion barrels of oil into the waters of the Gulf. 2 The event, including its environmental, financial, and political repercussions, was the topic of much debate on the micro-blogging social network, Twitter. To complete the task put forth to us by NetworkThink, we have examined the set of 70,000 tweets related to the crisis from April 2010-July 2010. We have broken our analyses into segments that spend time and effort examining the relationships within and between the various stakeholders in the Deepwater Horizon Twitter network. The results of our analyses can be found below.

    1 Republic of the Marshall Islands Office of the Maritime Administrator (2011) Deepwater Horizon Marine Casualty Investigation Report. Accessed 12/11/2014 at < http://www.register-iri.com/forms/upload/Republic_of_the_Marshall_Islands_DEEPWATER_HORIZON_Marine_Casualty_Investigation_Report-Low_Resolution.pdf> 2 On Scene Coordinator Report Deepwater Horizon Oil Spill (2011) Accessed 12/11/14 at < http://www.uscg.mil/foia/docs/dwh/fosc_dwh_report.pdf>

  • 3

    2 Mention Network

    2.1 Create the mention network

  • 4

    The above pictures represent the mention network. The vertex color represents the closeness centrality. The vertex shape represents eigenvector centrality. If the eigenvector centrality is above 0.22 (average eigenvector centrality) then it is solid triangle else it is solid diamond. The vertex size represents the out degree that is the person who has mentioned the most will have a higher size. The first picture represents the tweeters who have mentioned the most. Tweeters who have mentioned the most: The second picture represents the tweeters who has been mentioned the most Tweeters who has been mentioned the most: nwf ibrrc whodat35 bpamerica therightblue gohsep

    seachele420 whodat35 winterthur oceanshaman Endrunlv Zbleumoon

  • 5

    2.3 Most influential tweeters

    The above picture represents the most influential tweeters in all the groups. The average betweenness centrality is taken and if it is above the average then it is a solid diamond else it is a solid square. Similarly, if the average eigenvector centrality is above average it is greenish else it is orange. The most influential tweeters are

    TWEETER GROUP whodat35 GRASSRT seachele420 SOCMOV

    Nwf CELEB Winterthur GRASSRT digiphilE MEDIA Ibrrc GOV

    Bpamerica CORP humidcity OTHER oil_leaks SOCMOV CELEB group has the highest aggregate of influential tweeters.

  • 6

    There is an evidence of power law distribution as you can see from the above histogram.

    Communication is more concentrated across the groups.

    0 100 200 300 400 500 600 700 800 900

    1000

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    Num

    ber

    Degree

    Degree Distribution

    Frequency

  • 7

    The following represents the shapes for each group in the above picture. GRASSRT Solid Diamond OTHER Solid Square SOCMOV Sphere MEDIA Solid Triangle GOV Disk CELEB Diamond CORP Circle

    3 Hashtag Network

    3.1 Create the Hashtag Network

    Network 1

  • 8

    Network 2

    Nodes: hashtages Edges: When two hashtages appeared in the same tweet, there is an edge between the hashtages. Nodes size: times the hashtage mentioned Nodes color: Modularity class

  • 9

    This shows the network after grouping. The pink group counts for 50.27% nodes in this graph. 3.2 Most frequent hashtags We used the following SQL to count the times of a hash tag mentioned:

    The hashtags mentioned above 7000 times are as follows: Hashtag Times Themes #bp 122249 Industry #gulf 62614 Environment #oil 30866 Environment #p2 21839 Opinion #environment 21552 Environment #blacktide 19467 Environment #boycottbp 17139 Opinion #eco 16958 Environment #tcot 16908 Industry #news 15331 News #ocean 13080 Environment #cdnpoli 13006 Opinion #oceans 12438 Environment #green 9832 Environment #bigoil 9685 Environment #bpoilspill 9472 News

  • 10

    #enbridge 9013 Industry #tarsands 8914 News #chevron 7852 Industry #wildlife 7621 Environment #corexit 7366 Measures #louisiana 7310 News

    3.3 Different hashtages by time In the following two graphs, the size and color of labels shows the times that a hashtage was mentioned.

    This picture shows the popular hashtages in the first half of period that were also mentioned totally above 7000 time. We can see many familiar hashtages that we discussed in 3.2.

  • 11

    In the second half of period, we just saw #enbridge. All those popular environmental hashtages disappeared.

  • 12

    The two pictures above also substantiates that there is a shift in hashtag usage. It appears that the hashtags grew increasingly negative the longer the spill went on. 4 Affiliation Network

    4.1 Choices for nodes, edges and their attributes select h.hashtag,e.type from hashtag as h, tweeter as e, tweet as t where h.tweetid=t.tweetid and t.tweeter=e.tweeter We used this code to link the hashtag table and tweeter table through tweet table. The nodes are hashtag and type. There are seven different types CELEB, CORP, GOV, GRASSRT, MEDIA, Other, SOCMOV and total 4139 hashtags. select type,hashtag,count(hashtag) from joint group by type,hashtag We used this code to count every hashtag in every type. Edges are the relationship between hashtag and tweeter. Edges indicate the hashtags used by Twitter users. Edges can be weighted by the number of times each hashtag was used. We can use edge width to represent the number of times a hashtag was used. Attributes of type are name, size, location and functions of different originations. Attributes of hashtag are information, combination, and key words of searching information. 4.2 Insights from the affiliation network select type,hashtag,count(hashtag) from joint group by type,hashtag order by type,count(hashtag) We used this code to find the number of times each hashtag was used within the tweeter groups in order. Below are some of the hashtags that were used the most times by the tweeter groups:

    CELEB #bp 29, #gulf 21, #p2 11, #ocean 7, #cnnhelpgulf 7 CORP #bp 179, #39top 5, #oilspill2010 4 GOV #louisiana 78, #bp 67, #gulf 33, #nola 29, #uk 23, #weather 19, #usfws 19, #nationalguard 19, #hurricane 19 GRASSRT #bp 3250, #p2 2073, #tcot 1691, #gulf 685, #blacktide 587, #bpoilspill 484, #teaparty 395, #ocra 341, #eco 304, #tlot 248,

  • 13

    #ocean 232, #topprog 231, #boycottbp 221, #news 219, #oilpocalypse 204, #oil 202, #gop 166, #wildife 159, #environment 159 SOCMOV #bp 957, #ocean 675, #gulf 272, #environment 209, #oilpocalypse 188, #p2 169, #blacktide 146, #peakoil 117 MEDIA #bq 898, #nola 187, #news 157, #p2 140, #cnn 113, #tcot 110 OTHER #bp 562, #p2 224, #blacktide 150, #gulf 149, #tcot 126, #bpoilspill 118 According to the data, we can see that hashtag #bp is used the most times in the groups CELEB, CORP, SOCMOV, GRASSRT, MEDIA. Hashtag #bp is the second most used hashtag in the group GOV. #gulf also appears many time is CELEB, GOV, SOCMOV, OTHER types. BP is the British multinational oil and gas company which operates in over 80 countries. It reportedly produces some 3.4 million barrels of oil equivalent per day.

    4.3 Visualization

    The labels show the seven types. CELEB: orange CORP: gray GOV: pink GRASSART: red MEDIA: green SOCMOV: blue

  • 14

    Other: yellow Hashtags are black dots. From the visualization above, we can see that some hashtags form clusters, which show frequently in different types. 5 Sentiment Analysis

    5.1 Computation method Export the TWEET table as an Excel file and save the CONTENT column in TWEET table as contentwincsv.csv. Use the following code to calculate the polarity and print lists:

    Programming error pointed some unorganized data in the dataset. Find this unorganized data in excel and clear them:

    The program can run smoothly. The output is as follows:

  • 15

    Copy the programming output into Excel as the ORIGINAL LIST column in the following picture. And use formulae to extract CONTENT and POLARITY from ORIGINAL LIST: The polarity of each tweet is as follows:

    5.2 Stakeholders expressing negative sentiments According to TWEETER:

    Top 10 negative sentiments by tweeter: Tweeter Type Polarity cnneditorchuck OTHER -4 cnnireport OTHER -4 forbesintellect OTHER -4 greenprogress OTHER -4 cnygreg OTHER -3 joenbc OTHER -3 sfkarenmc OTHER -3 politicolnews MEDIA -2.5 datelinenbc OTHER -2.5 wcpblog SOCMOV -2.322580645

  • 16

    According to TYPE: Top 3 negative sentiments by type:

    Type Polarity CORP 0.292035398 GOV 0.203401843 MEDIA -0.104506232

    5.3 Hashtags associated with negative/positive sentiments We used the following SQL to select desirable dataset and export it to Excel. Then we sorted data and obtained the hashtags associated with most negative and positive sentiments as follows:

    Hashtag Polarity #3g -9 #att -9 #mtr -9 #lives -8 #p2#hcr -8 #wayofliving -8

    Hashtag Polarity #failedeconomy 8 #okaloosaisland 8 #pain 8 #random 8 #hebrewnational 7.5 #goodheartandsmart 7 #jimmybuffett 7 #paulwatson 7

  • 17

    The following picture presents the popular hashtags.

    5.4 Visualize mention network Use SQL to create two csv files. One contains all edges (TWEET.TWEETER and MENTION.MENTION); another contains nodes (TWEET,TWEETER and POLARITY.POLARITY). In TWEET table, several tweets are likely to have the same tweeter name, but we can only know which tweeter connects with which tweeter, so we replace the MENTION.TWEETID with TWEET.TWEETER. Then we use Gephi to visualize this dataset as follows:

  • 18

    In this picture, the red nodes represent positive sentiment (high polarity); the blue nodes represent negative sentiment (low polarity); and the yellow nodes represent mild sentiment. Based on this graph, we can find that peoples sentiment are likely to be affected by their neighbors. An individual that receives positive information is likely to be positive or neither too positive nor too negative. On the other hand, someone that receives both positive reviews and negatives review is likely to hold a neutral attitude towards this event. 5.5 Change of sentiments We used SQL to generate the dataset. And we created a TIME INTERVAL column, which began from the date the content was posted to ten days later. Then we imported this dataset into Gephi as follows. Because we just wanted to observe the change over time, we did not add any edges to this dataset.

  • 19

    We still use blue to represent negative sentiment and red to represent positive sentiment. The whole picture is as follows:

    Then we use the TIMELINE function in Gephi to observe the change as the following picture. At the beginning, we have just 4 nodes. It means that only a few tweets cover this event. Then the number of tweets about this event increased (Video record at https://www.youtube.com/watch?v=G69nA6_HwpM&feature=youtube_gdata_player).

  • 20

    We found that the sentiment was not simply changing from negative to positive. The sentiment fluctuated. So we used SQL to calculate the average daily polarity and made a chart in Excel as follows:

    The polarity fluctuates between -1 and 0.6. This substantiated our observation in Gephi. It is easy to observe that after June 10th 2010, the polarity reached its lowest point. The reason for this, we believe, is that during the observed time period, TIME publicized that Oil-spill estimate upped again. on June 10th , and The Guardian declared, Obama compares the BP oil spill to 9/11 on June 14th . These influential people and media companies affected peoples sentiment towards this event. Then we used SQL to export the polarity of different types of tweeters on different dates. We then utilized Excel to draw following line charts.

  • 21

    The graph above shows that celebrities have higher polarity when the average polarity is positive; they also have lower polarity when the average polarity is negative.

    This graph presents that in most of time, corporations polarity was higher than average polarity.

    This picture indicates that in the first half of period, government hold less stronger polarity than other groups, such as celebrities. During this period, the trend of governments polarity is opposite to the average polarity. Besides, in the second half of period, government hold strong positive attitude in most of time.

  • 22

    The polarity of grassroot fluctuates between -1.2 and 0.8, which is smaller than the scale of other groups polarities.

    In most of time, the average trend is very close to media trend. It is possible that media opinion leads common opinion. At April 28 and June 18, medias polarity is different form the average polarity. At April 28, it was indicated that the flow of oil was five times larger than first estimation. At June 17 and 18, Hayward was accused, and Moodys decreased BPs credit rate3.

    In most of time, the polarity of social movement was lower than the average polarity. 3 The Guardian: BP oil spill timeline

  • 23

    There is no special connection between other polarity and average polarity. To summarize, the polarities of different types tweeter were consistently related to average polarity, but the relations were not similar. 6 Implications & Conclusion Our examination of the Deepwater Horizon dataset allowed us to gain additional insight into the crisis, which we may not have been able to glean otherwise. For instance, while the majority of the press surrounding the crisis focused on the federal and corporate response to the event, these two groups had a rather small social media footprint on Twitter. Another lesson learned came from our sentiment analysis of the dataset. Using negative/positive connotations to examine tweets in a chronological fashion gave us a uniquely different view of the overall network, and one which we had not seen before. By examining this information over the course of time, we were also able to see how individual tweeters were affected by the sentiment of their neighbors. For example, if an individual was connected to others who shared equally opposing views (e.g., one very negative and one very positive), that individual tended to maintain a more neutral position. Using sentiment analysis over a small, but distinct period of time, allowed us to see how individual users on social network can be swayed by the information that they receive from others. Interestingly enough, individuals with some of the most polarizing accounts, tended to be heavily followed individuals (i.e., those with significant influence), such as media correspondents and corporations. Information produced by these Twitter accounts reached a very large audience, allowing them the opportunity to influence many within their network. One lesson learned, which made us question the validity and accuracy of our analyses, involved our implicit trust in the organization of the dataset. Upon closer examination, we realized that much of the labeling of the dataset was incorrect. For example, Twitter handles which should have belonged to the Media category were

  • 24

    labeled Other and vice versa. Below is a table that shows just a handful of these instances: Twitter Account Original Label Revised Label @CBSRadioNews Other Media @HuffPostHill Other Media

    @NBCNightlyNews Other Media @CDCEmergency Other Gov Based on these discrepancies, we felt it important to caveat our analyses by stating that we did not scrub the data to correct any labeling inaccuracies. Instead, we completed our work based on the data provided by Topsy. That being said, based on our overall analyses of the dataset, we used betweenness as a measure for determining the most influential actors in the Deepwater Horizon network. We did this two different ways: 1. We affixed the type from the tweeter table to determine the maximum betweenness centrality for each group. 2. Second, we separated the mention network into different groups and then measured betweenness centrality to determine which group was most influential. Based on these two methods, we found that @nvf is the most influential tweeter in the SOCMOV group. Overall, the most influential tweeters were:

    whodat35 seachele420

    Nwf Winterthur digiphilE Ibrrc

    Bpamerica humidcity oil_leaks In addition to our knowledge gained about the dataset, our analyses also allowed us to learn a great deal about Twitter and its utility. As part of our analysis, we examined whether Twitter can be considered a medium for companies to disseminate information or whether it is a platform for the masses to express their ideas. Of the 675 unique tweeters in this network, 131 of these, approximately 19%, represent Media outlets and the remaining 544 represent Celebrities (3), Grassroot Organization (180) Social Movement Organizations (74), Corporations (4),

  • 25

    Government (12) and Other (268) which is composed mainly of the average twitter user.

    Figure : An overall breakdown of the individual groups within the Deepwater Horizon Twitter Network. However, while this breakdown suggests that the average twitter user may be the dominant user of this social media service, as discussed earlier, a closer examination of the Tweeter table shows that there may be some error in the classification scheme used to organize these data. As a result, it would be difficult to base our answer off of these categories. Instead, we will use the overall number of tweets vs. retweets as a very general breakdown of information dissemination (retweets) vs. expression of new ideas (tweets), an idea put forth by researchers Macksassy & Michelson. 4 In the Deepwater Horizon Twitter network, there were 29,888 instances of original tweets and 42,828 instances of retweets. This breakdown, which shows almost double the number of retweets compared to original tweets, suggests that Twitter is mainly being used as a means of information dissemination as opposed to the expression of individual ideas. 4 Macskassy, S. & Michelson, M. (2011) Why Do People Retweet? Anti-Homophily Wins the Day! Association for the Advancement of Artificial Intelligence. Accessed at:

    19% 0% 27% 11%

    1% 2%

    40%

    Tweeter Groups

    Media Celebrities Grassroot Social Movement Corportations Government Other

  • 26

    Figure : Using retweets as a metric for information dissemination, we can see that the majority of the messages from the Deepwater Horizon network accomplish this purpose. Using a similar strategy, we sought to determine whether Social Movement Organizations (SOCMOV) were benefitting from Twitter for a call to action. As you can see from the chart below, the majority of the SOCMOV accounts in the Deepwater Network relied on information dissemination. This suggests that Social Movements were not using Twitter as their main platform for expressing new ideas, but rather to ensure that a larger target audience could be exposed to their existing ideologies and practices.

    Figure : Using retweets as a metric for information dissemination, we can once again see that SOCMOV are relying on Twitter as a means of perpetuating their message.

    41% 59%

    Tweets vs. Retweets

    Tweet Retweet

    42% 58%

    Social Movement Organization Twitter Activity

    Tweets Retweets

  • 27

    In addition to SOCMOVs, one of the other big players in the Deepwater Horizon crisis was the Government. Although they did not have as many accounts within their group, we used the existing group members and their respective tweets to determine which political party was most active. Within the GOV group, we picked out the following three Twitter accounts since they were the only accounts present that represented an individual political figure: David Vitter: a Junior US Senator from Louisiana and a member of the

    Republican Party. 64 Unique Tweets and 21 retweets. Senator Bob Menendez: Senior US Senator from New Jersey and a member of the Democratic Party. 7 Unique Tweets and 4 retweets Senator Bernie Sanders: a Junior US Senator from Vermont and a member of the Independent Party. 9 Unique tweets (account is run by staff and not the senator) and 4 retweets. While there is a clear split in party representation (1/3 each) amongst the individual actors, other accounts such as the Senate_GOPS handle was very active during the Oil Spill. This account, which provides News updates from Senators and their Staff, had 43 distinct tweets and 17 retweets during the timeframe in question. In comparison, the White House (lead by Pres. Obama of the Democratic Party) had only 19 total tweets, of which 14 were retweets.

    Figure : Twitter activity in terms of tweets and retweets per most active GOV Twitter accounts. As a result of this breakdown, we can see that the Republican Party seemed to be the most active. However, the next step was to examine whether this activity was being used as a means of engaging in debate or simply relaying the partys pre-existing frames.

    0 10 20 30 40 50 60 70

    Num

    ber of Tweets

    Account Name

    GOV Account Activity on Twitter

    Tweets Retweets

  • 28

    For the most part, tweets from GOV handles were largely retweets. These retweets were used to announce TV show appearances or to publicize articles that featured quotes from Senators and other members from each party. However, two accounts in particular, David Vitter and the Senate_GOPS, were both very active in producing original tweets that challenged the response from both the President as well as BP. One possible reason for David Vitters large social media footprint could have been his close geographic relationship to the spill. As a junior Senator from Louisiana, one of the regions most affected by the oil spill, he was very vocal to ensure his constituents that he was working to enact an appropriate and timely cleanup procedure. Similarly, one possible reason for the activity on the Senate_GOPS handle, could have been related to critique of the White House which was being led by a Democratic president. In conclusion, this project allowed us to learn a great deal about the use and impact of Twitter in documenting an event such as an environmental disaster. However, in the future, in order to ensure that the results of data analyses pulled from a social media network are accurate, it is important to have faith in the data being used. This means that multiple rounds of data cleaning and peer review should occur before an analysis can be considered useful for reporting purposes. Once this has occurred, data from platforms such as Twitter provide a real-time snapshot of public sentiment and can be incredibly beneficial in disseminating information.