Data Management (MIST 4610)
Instructions: Download the file that contains Delta Airlines’s performance data for February 2013. Use sqldf to calculate both the average departure and average arrival delay in minutes for the Atlanta airport for each day in February 2013 (Hint: use column DayOfMonth to select the days). Once that is done use ggvis to graph a scatterplot of the results (Hint: use layer_lines). Your graph should contain two lines: one for average departure delay and one for average arrival delay. Write one or two sentences with your conclusions from the graph.
url = "http://people.terry.uga.edu/csalge/Delta_2013_2.csv" t = read.table(url, header = T, sep = ',') library(sqldf) library(ggvis) a = sqldf("SELECT 'Arrival Delay' as category, DayofMonth, AVG(ArrDelayMinutes) AS Delay FROM t WHERE Month = 2 AND Year = 2013 AND Dest = 'ATL' GROUP BY DayOfMonth") b = sqldf("SELECT 'Departure Delay' as category, DayofMonth, AVG(DepDelayMinutes) as Delay FROM t WHERE Month = 2 AND Year = 2013 AND Origin = 'ATL' GROUP BY DayOfMonth") z = sqldf("SELECT category, DayofMonth, Delay FROM b UNION SELECT category, DayofMonth, Delay FROM a") z %>% group_by(category) %>% ggvis(~DayofMonth, ~Delay, stroke = ~ category) %>% layer_lines() %>% add_axis('x', title = 'Day of the Month') %>% add_axis('y', title = 'Delay in Minutes', title_offset = 50)
The plot reveals that departure and arrival delay times per day have similar patterns, which would seem like an obvious conclusion. The interesting characteristics of this plot are the spikes in average delay time around certain days. The 4th, 7th, 10th, 22nd, and 26th all have extreme spikes in delays with days around the 13th and 18th having smaller spikes. There were obviously some significant events such as weather that grounded planes or kept them from landing for long periods of times on these days.
Business Intelligence (MIST 5620)
Instructions: Create a Twitter account with a valid phone number. Go to Twitter App, login and create an application. Use the information from your account in RStudio to set up access to Twitter's API connection. Choose a topic of your interest and search for the most recent 10,000 tweets in English. Extract the text from your tweets and clean them by removing punctuations, numbers, stop words, and white space. In addition, transform the text to lower case and remove the keyword(s) included in your search. Finally, create a word cloud to get a crude idea of what is recently being said about your chosen topic on Twitter. Note. You will need to install and use four different packages for this assignment: twitteR, RCurl, tm, and wordcloud.
library(plyr) library(twitteR) library(stringr) require(tm) require(wordcloud) consumer_key <- '' consumer_secret <- '' access_token <- '' access_secret <- '' setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret) treat_tweets <- searchTwitter("#treatyoself", n = 10000, lang="en", resultType = "recent") treat_text <- laply(treat_tweets, function(t) t$getText()) treat_text <- str_replace_all(treat_text,"[^[:graph:]]", " ") treat_corpus <- Corpus(VectorSource(treat_text)) treat_corpus <- tm_map(treat_corpus, content_transformer(tolower), mc.cores=1) treat_corpus <- tm_map(treat_corpus, removePunctuation, mc.cores=1) treat_corpus <- tm_map(treat_corpus, removeNumbers, mc.cores=1) treat_corpus <- tm_map(treat_corpus, stripWhitespace, mc.cores=1) treat_corpus <- tm_map(treat_corpus, removeWords, stopwords("english"), mc.cores=1) treat_corpus <- tm_map(treat_corpus, removeWords, c("treatyoself", "amp")) wordcloud(treat_corpus, max.words=100, min.freq=5, random.order = F, colors=brewer.pal(8, "Dark2"))