(15pts) For following instructions and submission requirements
• Put your name at the top of this file
• All data cleaning should be done using a dplyr chain connected directly to the read.csv() line (only
one read.csv() per Part, you don’t need separate ones for each question)
• You only have to “clean” anything that you need to clean to answer the questions, I’m not expecting
you to clean every column in every dataset.
• All visualizations should be completed using the ggplot2 package.
• All questions should be answered in a single line (or chain) of code. No saving intermediate datasets
• No irrelevant code or unnecessary code or output in your final document.
• Submit your Rmd file AND a knitted document to Blackboard
• Your knitted document must show your code AND output
• Submit the right documents the first try
Question 1: (5pts) Please load any package needed in this script in the code chunk below.
Part 1 – Paycheck Protection Program
The Paycheck Protection Program (PPP) is a $953 billion business loan program established by the US
government through the Coronavirus Aid, Relief, and Economic Security Act (CARES Act) to help certain
businesses, self-employed workers, sole proprietors, certain nonprofit organizations, and tribal businesses
continue paying their workers.
Loan data is available through the Small Business Administration website. I took the data for all loans over
$150K, filtered for Florida, cleaned it a bit to match topics covered in our course, and am providing you
with it through the link below:
Read in and clean data (10pts)
Question 2: (5pts) Return the total amount of money loaned by month from 2020 to 2021.
Question 3: (5pts) Return the name and total money loaned for Florida’s top PPP lender in 2021 (“top”
= most money loaned).
Question 4: (5pts) What percent of loan money went to borrowers in the city of Miami?
Question 5: (7pts) Please display the average jobs by business age with a bar plot, with business age
grouped into the following 4 categories: Startup, 0-2 Years, 3+ Years, and Other/Unknown.
Question 6: (7pts) Please visualize the total amount borrowed in each zipcode in the city of Coral Gables
with a horizontal bar plot. All zipcodes should be 5-digits long. Order in descending order, and format
amount loaned as a currency in units of a million (e.g. $40M for $40,000,000).
Part 2 – AIAAIC
AIAAIC (AI, Algorithmic, and Automation Incidents and Controversies) is an independent, non-partisan,
public interest initiative that examines and makes the case for real AI, algorithmic, and automation transparency and openness.
AIAAIC is looking to make AI, algorithms, and automation more transparent by:
• Empowering civil society entities including researchers, academics, teachers, NGOs, journalists, and
• Educating end users, citizens, students, and others
• Making the case to policymakers, regulators, and businesses
Data provided details incidents and controversies driven by and relating to artificial intelligence, algorithms,
and automation, and can be read in from:
Read in and clean data (10pts)
Question 7: (7pts) Visualize the number of safety risk incidents by year in the USA with a horizontal bar
chart. Your final plot should look similar to template below (you’ll have different axis limits):
0 250 500 750 1000
Safety risk incidents are on the rise in the USA
Question 8: (7pts) Visualize the number of safety risk incidents by year separately for China, USA, and
the UK. Your final plot should look similar to the template below (again, you’ll have different axis limits).
NOTE that This question is potentially difficult. Skip to Part 3 (much easier) and return later if needed.
China UK USA
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Number of Incidents
Safety risk incidents by year
The data for this section comes from Atlanta’s open data portal, and contains information related to purchasing by Atlanta city government from 2016 to 2018.
The data is stored at the address below.
Read in and clean data (5pts)
Please recreate any two of the following three plots using ggplot. If you need to clean or aggregate data,
you should Your colors do not have to match mine but please do change the defaults.
Question 9 (7pts)
Question 10 (7pts)
2016 2017 2018
Total Spending (in millions $)
Annual spending by expense category
ATLANTA CITIZENS REVIEW BOARD
DEPARTMENT OF AUDIT
DEPARTMENT OF AVIATION
DEPARTMENT OF CORRECTIONS
DEPARTMENT OF ETHICS
DEPARTMENT OF FINANCE
DEPARTMENT OF FIRE SERVICES
DEPARTMENT OF HUMAN RESOURCES
DEPARTMENT OF INFORMATION TECHNOLOGY
DEPARTMENT OF LAW
DEPARTMENT OF POLICE SERVICES
DEPARTMENT OF PROCUREMENT
DEPARTMENT OF PUBLIC DEFENDER
DEPARTMENT OF PUBLIC WORKS
DEPARTMENT OF THE SOLICITOR
DEPARTMENT OF WATERSHED MANAGEMENT
DEPT OF PARKS, RECREATION & CULTURAL AFF
DEPT OF PLANNING & COMMUNITY DEVELOPMENT
$0 $20 $40 $60
Total 2018 Spending (millions $)
Spending by department in 2018
GENERAL FUNDS SPECIAL REVENUE FUNDS
CAPITAL PROJECTS FUNDS ENTERPRISE FUNDS
Spending (in million
Data analytics using r studio