Twitter datamining analysis: Irish General Election 2016

Preface

Data collected and analysed by @EoghanH.

Not interested in how the data was collected and parsed? Scroll down and skip to the section titled “The Results”. Full SQL and CSV downloads are available at the bottom of the page, under the Google Adwords banner ad. If you’re using an ad blocker, it’s below the gaping white space at the bottom. However, if you are using an ad blocker – it would be hugely appreciated if you could switch it off (just this once!) – as the costs for running the dedicated server for datamining, and the API charges for sentiment analysis ran up quite a bill.

Overview – Data Collection & Parsing

On the 7th of February 2016, at 7:03am, I launched a twitter dataminer which began logging tweets in relation to the 2016 Irish General Election. This was done with the intent of providing an unbiased, purely data-based view over what the public talk about, what politicians talk about, and what the media talks about.

Over 320,000 tweets were logged between the launch time, and the chosen end time: 11:59pm on February 22nd, directly after the final “Leaders Debate”, hosted by RTÉ.

From this selection of tweets, a total of 212,551 were deemed relevant – by either triggering the name of a politician, or a political party in their context. Additionally, any tweets that mentioned any high ranking relevant hashtags (such as #ge16, and #leadersdebate) were also deemed relevant, pending they also touched on any social issues (such as healthcare, or taxes). Note that there was a 24hr~ downtime on the server at the time of TV3’s first Leaders Debate broadcast on Feb 11th; any tweets made during this time were not logged.

All tweets were then parsed, and separated into over 300 different categories; noting which politician/party was talked about, which social issue was mentioned, if it was retweeted.. etc etc. Finally, all unique tweets (excluding retweets) were passed to jamiembrown‘s Tweet Sentiment Analysis Toolkit, in order to provide an insight on the “tone” of all tweets – i.e: whether a person was talking in a positive light, or a negative light.

Tools used: Python, PHP5.6.18, 140dev Twitter Toolkit, Tweet Sentiment Analysis Toolkit, Plotly, Jason Davies’ Wordcloud GeneratorBarry’s Tea

Parties, Media, and Keywords Logged

A full list of keywords is available at the bottom of this page for download. When choosing these keywords, with the limit of 400 search terms imposed by Twitter, I initially opted to log independent candidates for analysis. However, following the completion of these logs, I found that it was wholly unfair to average all independents into one group; as their opinions on different subjects may wildly vary, while parties are typically composed of members which align their views accordingly. As such, no analysis has been made available on independent politicians.

Further to this, as very few keywords were chosen which represent Renua, I would also consider it to be unfair to provide any analysis of their tweets. With only 1,595 of a total of 212,551 (0.75%) tweets relevant to Renua, I would not believe there to be enough data to fairly represent their party.

The parties chosen for analysis are as follows;
Fine Gael, Labour, Fianna Fáil, Sinn Féin, AAAPBP, Social Democrats, Green Party Ireland

For Media outlets, the following online publications were chosen for analysis;
RTÉ, Irish IndependentThe Journal, Irish Examiner, The Irish Times

Other publications are available as per the CSV/SQL data – but did not have enough data available to make any meaning analysis.

Social/Economic/Infrastructure Issues

Social/Economic/Infrastructure issues logged across all tweets were;

  • Taxes
  • Poverty
  • Unemployment
  • Economy
  • JobBridge
  • Education
  • Pensioners
  • Roads
  • Crime
  • Housing
  • Health Care
  • Mental Health
  • Abortion

The following categories were also logged, but did not result in enough data to provide any relevant analysis;

  • Minimum Wage
  • Emigration
  • Refugees/Asylum Seekers
  • Environment (Too many false positives, chosen keywords too broad)
  • Water (Too many false positives, chosen keywords too broad)

The Results

Social/economic/infrastructure
public/media mentions

media

Parties:

The following gallery is a breakdown of how often each individual party (or party member) mentioned, or commented on any of the above listed issues.

Keyword Domination

The following gallery is an alternate look at the above issues; rather than showing how a party mentioned issues, this gallery focuses on the issues themselves, and how their overall audience was built up by political parties. Note that this is not a truly fair representation of how much a party cares about any one issue; as some parties have notably more politicians tweeting than others, and as such, will more easily dominate a keyword.

Public/Media Mentions of Parties

How often are parties, and their associated politicians mentioned by the media, and the public? This was noted to be a critical topic of discussion among the public – with Padraig O’Mara of Medium.com accusing state-owned media of Rigging the Elections, by providing biased, or scattered information on some political parties.

Coupled with this, some parties have been accused of “sockpuppeting” – whereby one person maintains several different social media accounts in order to fake higher levels of support for one party – or having automated social media accounts which would “like”, or vote up positive stories regarding their affiliated party; likewise, voting down any positive stories regarding any other parties.

In order to examine the above, firstly, note how often our chosen media outlets mention our political parties;

Followed by this, we can now look at how often members of the general public mention any particular party in “unique posts” (excluding retweets). As with our social issues, all party-affiliated accounts are excluded from these results.

publicmentions-nort

Finally, we can examine these numbers inclusive of retweets. In the case of automated accounts, or sockpuppeted accounts, it can be assumed that the amount of mentions any one particular party gets should dramatically rise.

Note that this is not by any means an intelligent turing test; it is simply an analysis of unique tweets, vs. retweets. Based on the above information alone, I would not personally be able to accuse any party of any notable amount of sockpuppeting, or automation. This analysis however, could of course be “tricked” by any individual avoiding retweets – and simply copy/pasting a tweet’s contents.

Sentiment Analysis

Sentiment analysis is the process of observing any given piece of text, and determining if it is positive, negative, or neutral in context. Tweet sentiment analysis was used to carry out sentiment analysis across our dataset.

Example Tweet sentiment analysis:
“I love cats!” : Positive, with 41% certainty.
“I hate cats!” : Negative, with 48% certainty.

We will use classification to determine the general sentiment of a tweet, rather than weighting. In short, this means that “I love cats!” will return as purely positive, wheras “I hate cats” will return as purely negative. From here, the average sentiment will be between 0 and 1 – with 0 being fully negative, and 1 being fully positive.

Note that sentiment analysis was only performed on unique tweets; retweets are excluded from these results.

The Public

Again, sentiment is measured from 0 to 1 – with 0 being completely negative, and 1 being completely positive. From our data set, we can see that the Labour party receives mostly negative comments – wheras AAA-PBP receives an above-average amount of positive comments. Other parties receive mostly neutral comments.

The Media

The same logic was applied to our chosen media outlets for examination. In terms of an absolutely neutral publication, all bars would read 0.5. Typically however, anywhere between 0.45 – 0.55 can be considered “mostly neutral” for any publication. In the event of any publication having less than five tweets/article links regarding any one particular party, a default value of 0.5 is assumed. An “error margin” line of 100% is run through the data bar to visualise that not enough data was available to make any analysis.

Wordclouds

Finally, what’s everyone been saying? First up, let’s examine a wordcloud of what the public has been saying. This includes retweets, but excludes the media, and any accounts affiliated with political parties.

Congrats to Waterford Whispers, and “whingers” for making the list of most-discussed terms!

The Parties

Finally, we can take a look at the wordclouds generated by all party-affiliated tweeters.

Conclusion

The data above is not provided with the intent of holding any opinion over any issues, or political parties; instead, it’s intended to provide an unbiased overview of Ireland’s options for the 32nd Dáil, alongside the opinions of the public, and the Irish media.

Regardless of your preferred parties, please remember to get out and vote on the 26th – otherwise, please abandon ye all intentions of complaining once the politician you dislike gets into power.



Download Links:
SQL: (322MB) Download at Mega.nz
CSV: (356.6MB) Download at Mega.nz
Twitter Search Keywords (TXT): Download at Mega.nz

Leave a Reply

Your email address will not be published. Required fields are marked *