My Progress on the Final Project

When I first began the project, I knew that I wanted to use a dataset that I am considering to use for my capstone on. It is a dataset of tweets done by suspected ISIS sympathizers. When I began the process of sketching my ideas, I initially planned on creating a dashboard that would allow me to look at specific tweets and users and gain more insight on them.

Here are some of my earliest sketches:

I knew that this was lacking and needed help so I journeyed to office hours where Professor Field highlighted a potential issue that I knew that I had but was struggling to articulate.

In essence, I was viewing the data at a micro level and not a macro level. In other words, I wasn't looking at the dataset as a whole. This lead me to take a deeper look at the data and determine if I could make any patterns out of it. To do this, I knew that I wanted to generate more information. I decided to initially generate 6 new things.

description-sentiment
Each entry was given a profile description and I reported the sentiment score.
description- subjectivity
For the profile description, report subjectivity through pertained model.

Same as above but for the actual tweet content

tweet-sentiment
tweet- subjectivity

Next, I used a python package called spacy, to perform entity recognition in the tweets to see if I could pull out any mentions of companies, people, or countries.

Here's an example from Spacy's Documentation.

People in tweet
Organizations in tweet
"NORP" (Nationalities or religious or political groups.) in tweet

Now, since I have all this new information, I can recreate the dashboard at the micro level.

The plan is to take features built in the dataset and compare them with the new information that I have generated.

For instance, one thing that I could do is have a location page. (Location is built into the dataset) On this page, I could display which people are mentioned the most frequently in a location. I can also ask if certain locations have a more negative text. For instance, I assume that America is talked about extremely negatively, is this true?

Next Steps

I'm still working on narrowing down the scope and determine which questions that I want to ask. Ultimately, I now have the ability to easily query the data in certain ways and am looking forward to laying everything-out together.

Appendix

Here's some of the quick analysis that I have done.

Mentions of Nationalities
Mentions of Organizations
Mentions of People

Still need to do some cleaning here- appears to be stumped by camel case and some locations.