## Exploratory Data Analysis 2

STAT100 R Assignment 3: Exploratory Data Analysis 2
You need to submit a Word document or PDF for this assignment. Make sure you do the following:
1. Upload only one file/document in ELMS for R Assignment 3.
2. Include your name in the document in the upper left-hand corner. Under your name, write
STAT 100 and your section number. Write R Assignment 3 centered on the page.
5. Upload the assignment in ELMS before the deadline of Sunday 10/01 at 11:59 PM otherwise it
is considered late. Make sure you save your document on your computer or email it to
yourself so that you keep an electronic copy.
STAT100 R Assignments need to be completed by students in RStudio. Students should refer to the
Tutorial for R Assignment 3 as they are working on this R Assignment. All Tutorials for R Assignments
can be found under the file folder “R Assignments” of the STAT100 ELMS course.
For this assignment you need to use the Course_Data_Set.RData file, which includes data collected
from students enrolled in introductory statistics courses in various colleges and universities in MD. The
population of interest is all students enrolled in introductory statistics courses in the US.
R Assignment 3 has two questions worth 25 points. Read each question carefully and follow all
• For #1.a., 1.b., 2.a. and 2.d., you need to provide the R code that you use to generate the
output or data display and provide the output as displayed in the RStudio Console or the data
display as shown in the Zoom window in RStudio.
• For all other questions, you should type your responses directly in the document you submit
for R Assignment 3. You should NOT provide R code or RStudio output for all other questions.
1. (10 points) Open the Course_Data_Set.RData file in RStudio. You need to generate descriptive
statistics for variables and answer questions related to the distribution of two variables.
a. For the Height_inches variable, use the summary command to generate descriptive
statistics. The Height_inches variable measures the height (in inches) of the students in
the Course Data Set. In the document you upload for this assignment, include:
i. the R code you used to generate the descriptive statistics, including any
comment line of the R code for #1.a.)
ii. an image of the descriptive statistics output in RStudio
b. For the HS_GPA variable, use the summary command to generate descriptive statistics.
The HS_GPA variable measures the high school grade point average (GPA) of the
students in the Course Data Set. In the document you upload for this assignment,
include:
i. the R code you used to generate the descriptive statistics, including any
comment line of the R code for #1.b.)
ii. an image of the descriptive statistics output in RStudio
c. Using the 1.5*(IQR) criterion for outliers, for the distribution of students enrolled in
introductory statistics courses, would a height of 80 inches be considered an outlier?
Show all work.
d. Using the 1.5*(IQR) criterion for outliers, for the distribution of students enrolled in
introductory statistics courses, would a high school GPA of 1.85 be considered an
outlier? Show all work.
2. (15 points) Open the Course_Data_Set.RData file in RStudio. You need to create histograms for
variables and answer questions related to the distribution of the variables.
a. For the Height_inches variable, create two histograms (one with the default number of
bins/intervals, and one with 6 bins/intervals). Your histogram should have proper axis
labels. In the document you upload for this assignment, include:
i. the R code you used to generate the histograms, including any comments
ii. separate images of the histograms from the Plots/Zoom section of RStudio.
IMPORTANT INSTRUCTIONS: the titles of the histograms MUST include your
name. The title of the histogram with default bins MUST read “Distribution of
Student Height with default bins, created by FirstName LastName”. For
example, if Dr. Khurana created the histogram, the title would read
“Distribution of Student Height with default bins, created by Dr. Khurana”. The
title of the histogram with 6 bins MUST read “Distribution of Student Height
with 6 bins, created by FirstName LastName”.
b. Based on the histograms, describe the distribution of student height (shape, center,
spread, outliers). Does your interpretation of the distribution of student height differ
depending on which histogram you examine (number of bins/intervals)?
c. Using only one of the histograms for student height, within which interval does the
median height fall? In your response to this question, do NOT refer to the descriptive
statistics generated for #1.a. Provide the endpoints of the interval, indicate which
histogram you used (number of bins/intervals), and explain how you arrived at your
d. For the HS_GPA variable, create two histograms (one with the default number of
bins/intervals, and one with 8 bins/intervals). In the document you upload for this
assignment, include:
i. the R code you used to generate the histograms, including any comments
ii. separate images of the histograms from the Plots/Zoom section of RStudio.
IMPORTANT INSTRUCTIONS: the titles of the histograms MUST include your
name. The title of the histogram with default bins MUST read “Distribution of
High School GPA with default bins, created by FirstName LastName”. For
example, if Dr. Khurana created the histogram, the title would read
“Distribution of High School GPA with default bins, created by Dr. Khurana”. The
title of the histogram with 8 bins MUST read “Distribution of High School GPA
with 8 bins, created by FirstName LastName”.
e. Based on the histograms, describe the distribution of high school GPA (shape, center,
spread, outliers). Does your interpretation of the distribution of high school GPA differ
depending on which histogram you examine (number of bins/intervals)?
f. Using only one of the histograms for high school GPA, within which interval does the
median high school GPA fall? In your response to this question, do NOT refer to the
descriptive statistics generated for #1.b. Provide the endpoints of the interval, indicate
which histogram you used (number of bins/intervals), and explain how you arrived at