Instructor

Carolina Salge: MTWRF 9:15 - 11:30am (Sanford Hall 309)

Final Exam: Fri., Jun. 30, 9:15 - 11:30am

Course Description

This course provides a basic understanding of information management in organizations and specifically the design and use of database systems. Topics include data modeling, relational databases, structured query language, and Web access to databases. The course is divided into three sections: (1) data modeling and database design; (2) data analytics; and (3) data management.

The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary.

Prerequisites, Corequisites

MIST 2090

Objectives

Students completing this course will

  1. be able to develop a valid data model for a business system of medium complexity
  2. be able to build and use a relational database
  3. be able to formulate complex relational database queries
  4. be able to use R for data analytics
  5. be aware of the issues related to managing "big data"
  6. be familiar with the principles of managing and exploiting organizational data

Topics

  1. Data modeling and SQL
  2. Relational DBMS
  3. Organizational intelligence technologies
  4. Introduction to R
  5. Data visualization
  6. Exploratory
  7. Big Data
  8. Data integrity

Text

Watson, R. T. (2013). Data management: databases and organizations (6th ed.). Athens, GA: eGreen Press. (text Web site).

Software

We will use MySQL Workbench for data modeling and SQL execution. Macintosh owners can also use Sequel Pro for browsing.

MySQL database

  1. Laptop for individual assignments and exams
    MySQL Community Server

  2. Terry College server for SQL exercises
    Hostname: wallaby.terry.uga.edu, Port: 3306,Username: student, Password: student

R software

R is an open-source software environment for statistical computing and graphics, and RStudio is the interface to R. Download the latest versions for your operating system since you will be required to complete a number of assignments to learn the foundations of R in this course.

Group size

Groups should contain three-four persons.

Academic honesty

As a University of Georgia student, you have agreed to abide by the University's academic honesty policy, "A Culture of Honesty, " and the Student Honor Code. All academic work must meet the standards described in "A Culture of Honesty." Lack of knowledge of the academic honesty policy is not a reasonable explanation for a violation. Questions related to course assignments and the academic honesty policy should be directed to the instructor.

Team work

In this class, you will work in teams. As a result, review a short report on team effectiveness and establish a team agreement (sample agreement). Give the instructor a copy of your team agreement by the end of the first week of classes.

Freeloader policy

It occasionally happens in class and enterprise settings that someone in a group is not prepared to do his/her share. In the case of my classes, I recommend that the team give the freeloader one warning and then fire that person from the team. That person will then do group assignments individually or find another team to join. The team should notify me of the change in team composition immediately. I distribute a form to assess team participation at the end of the semester. If a major disparity in team contribution is reported, I will adjust team project grades.

Laptop policy

Students are welcome to use laptops in class for note taking and completing class exercises, exclusively. If you plan to take notes, please advise and email a copy of the notes at the end of each class.

Assignments

See the class schedule for the due date. The due time is 11:59pm on the due date.

Exercises

Identifier Chapter Exercise
A1 Single Entity
(Rubric)
Etsy wants to keep track of its antique inventory. Each antique item has a name, description, category (e.g., home & living), quantity on hand, price, and an identifier (e.g., DA3249). Antiques are classified as vintage and handmade.
Create the data model with MySQL workbench.
List each product's name, description, and its buy price with a 20% discount.

List the product name, quantity in stock, and market suggested retail price (MSRP) of products with a name ending with Mustang.
A2 One-to-Many Relationship
(Rubric)
BBC News wants to keep track of its correspondents in different countries around the world. Usually, the broadcasting agency sends over a dozen correspondents to a country. Yet, a correspondent may not be assigned to cover more than one country. Each country has a code, name, and particular level of security risk (e.g., Canada has a low risk-level while Brazil has a high one). Make sure to track the id, full name, tenure (e.g., how long the correspondent has been working for BBC news), and email address of each correspondent.
Create the data model for this situation using MySQL workbench.
Count the number of employees in each country.

What is the total revenue of products in the Classic Cars product line?
Note: Product revenue = Price of each product * Quantity ordered.
A3 Many-to-Many Relationship
(Rubric)
Some large cities (e.g., Cologne, Germany) have started a project called car-sharing to reduce the number of cars in the city and also to reduce the cost of owning a car. In this program, a car is partially owned by several people. The number of owners per car may vary. Typically, not more than four people own a particular car. Also, each person can own a different percentage of the car (e.g., Marie can own 10% of a BMW X6 while Jan can own 50% of the same BMW X6), depending on how often he/she uses the car. Each person might also own part of more than one car (e.g., Marie can own 10% of a BMW X6 and 50% of a Ford Mustang). Draw a data model to accurately represent the case here (i.e., draw a model to keep track of which people own which cars and the various percentages that they own).
List the total payment amount for disputed orders by country. Sort and group results by country name.

Report the total revenue for each product shipped on Black Friday in 2004 (November 26). Sort and group results by product name. Hint: See MySQL date functions.
A4 One-to-One and Recursive Relationships
(Rubric)
An airline company has many planes that travel back and forth between different airports. A trip is handled by one pilot and involves one particular plane. Trips depart on a certain day and time from a departing airport, arriving at a different day and time in an arrival airport. Each airport is identified by a code, is located in a particular city and country, and has a maximum plane capacity (e.g., 100). Each plane has a unique name and a date of construction. A plane also has both weight and passenger capacities that cannot be exceeded. Each pilot has a name and an experience level (junior or senior). A senior pilot can supervise one (or many) junior pilots. However, a junior pilot can be supervised by only one senior pilot, if any. Create a database to record which pilots and planes are assigned to which trips, and to record the departure/arrival dates, times, and locations of different trips.
Who is Gerard Bondur's boss? Report the full name and the job title of his boss.

How many employees work for Anthony Bow?
A5 SQL
(Rubric)
What is the difference in total revenue for items ordered in December 2004 when compared to December 2003? Format results to show two decimal places.
A6 Introduction to R
(Rubric)
A file contains details of Twitter users protesting against the realization of the 2014 FIFA World Cup in Brazil. Write an R script to read the file, convert it into a format suitable for use with R (Hint: use the reshape package - see an example here) and report the average number of followers for each Twitter user only if those users have an average number of followers lower than 200. Order the results by average number of followers in descending order. Use head() to show the first six rows. Which user has the highest average number of followers?
A7 Data Visualization
(Rubric)
A file contains Delta’s performance data for February 2013. Use sqldf to calculate both the average departure and average arrival delay in minutes for the Atlanta airport for each day in February 2013 (Hint: use column DayOfMonth to select the days). Once that is done use ggvis to graph a scatterplot of the results (Hint: use layer_lines). Your graph should contain two lines: one for average departure delay and one for average arrival delay. Write one or two sentences with your conclusions from the graph.
A8 Exploratory
(Rubric)
Assume you are a business analyst for Superstore and you have a sense that there are profitability issues in your products. You don’t know exactly how to define the problem nor what factors contribute to the issues. But you want to explore this situation by analyzing the data you’ve received from those kind folks in IT. Import the Coffee Chain dataset and begin to explore by asking:
  • What products are under performing? What correlates with profit?
  • Are there issues with certain product lines, products, markets, pricing structures (margins), and costs?
There are multiple answers. You’ve been given data and you need to find where the problems are. In your report, provide an explanation of what you discovered containing screen shots from your analysis to show the logic of how you reached your conclusions. Convince me that you found the problem. With each screen shot define what question you are asking, what you observe, and why you went where you did next in your step-by-step problem exploration process. Treat this as a problem solving, treasure hunt, or business case. You may want to watch some Exploratory Getting Started Videos as you complete the assignment.

State of the art presentations will be made commencing June 12 (group) (there will be one presentation per class).

A presentation is required from each group on a data management technology or product, with a particular concentration on open source products.

Some suggested topics follow, and you can propose others by contacting the instructor. You should submit your bid for a topic via e-mailing your instructor. When submitting a bid, also specify your team's name. Those who bid early present early.

Topic Team Name Presentation Date
Presto (Facebook's SQL Engine) Big Bang Query (ppt) June 20
Autonomous Vehicles Team One (ppt) June 13
PostgreSQL    
Bitcoin Dawgs (ppt) June 14
MongoDB    
Gig Economy    
MySQL    
Internet of Things The Slequels (pdf) June 13
NoSQL    
Smart Cities Moxie (ppt) June 21
CUBRID Manager    
Chat Bots Managers of Data (ppt) June 19
MariaDB The Databaes (pdf) June 16
Firebird    

Database design and implementation assignment (group)

Many information services have both a web site and an iPhone application that share a common database. The two interfaces enable consumers to select how they want to interact. There are several iPhone grocery list applications and most are missing a corresponding web site. Design a data model to support a grocery list. Review existing grocery list apps and exercise your creativity to determine the data model's entities. Convert the data model to a relational database and write 10 queries to access the database. The queries should demonstrate your breadth of understanding of SQL (i.e., 10 simple queries will not score as well as say 4 simple queries and 6 nontrivial queries).

Follow the guidelines for database projects.

Data modeling is difficult and to facilitate learning a preliminary model must be submitted on the date specified in the schedule.

Grading

Item Points
Exercises 20%
State of the Art Presentation 10%
Database Design Project 25%
Midterm Exam (sample) 20%
Final Exam (sample) 25%
Total 100%
If you are unable to complete an exercise on time or take an exam at the specified time, please advise the instructor as soon as possible so that alternative arrangements can be made.

Schedule - Summer 2017

Class Day Date Chapter Assignment Resubmission
1 M 5-Jun-17 Syllabus - Data (1) and Information (2)    
2 T 6-Jun-17 Single Entity (3) (ppt)    
3 W 7-Jun-17 One-to-Many Relationships (4) (ppt) A1  
4 R 8-Jun-17 Many-to-Many Relationships (5) (ppt) A2 A1
5 F 9-Jun-17 One-to-One and Recursive Relationships (6) [One-to-One and One-to-Many Recursive] (ppt) A3 A2
6 M 12-Jun-17 Sick Day    
7 T 13-Jun-17 One-to-One and Recursive Relationships (6) [One-to-One Recursive and Many-to-Many Recursive] (ppt) A4 A3
8 W 14-Jun-17 Midterm Review: Data Modeling (7) (ppt) & SQL (10) (ppt)   A4
9 R 15-Jun-17 Midterm    
10 F 16-Jun-17 Project Review (Database Preliminary) A5  
11 M 19-Jun-17 Introduction to R - Part One (14) (ppt)   A5
12 T 20-Jun-17 Introduction to R - Part Two (14) (ppt) Database Design Preliminary  
13 W 21-Jun-17 Data Visualization - Part One (15) (ppt)    
14 R 22-Jun-17 Data Visualization - Part Two (15) (ppt) A6  
15 F 23-Jun-17 Exploratory A7 A6
16 M 26-Jun-15 Project Review   A7
17 T 27-Jun-17 Data Integrity (21) (ppt) A8  
18 W 28-Jun-17 Final Exam Review (Data Modeling and SQL) (ppt)   A8
19 R 29-Jun-17 Final Exam Review (Introduction to R, Data Visualization, and Exploratory) (ppt) (data1) (data2) Database Design Project  

Team evaluation

The following form should be submitted by June 29.