## Regression

Problem Set 1
PSC 2101, Fall 2023 – Due Thurs., Sept. 28, 2023
Instructions
Answer the questions in the following document. Make sure to include sufficient detail, and format all graphs
so that no raw variable names are visible.
You should submit a PDF containing the questions, your finished graphs, and your interpretations. You can
copy/paste your graphs from RStudio into a separate file in which you write your interpretations. If you
have experience working with R Markdown or Quarto (or want to challenge yourself), you can use this .Rmd
file as a template and create each graph in a code chunk. Working in R Markdown is not required for this
assignment.
Number 1: Scatterplots
Open the h117 data (H117_members.csv). Consider the relationship between the partisan lean of a
congressional district (proxied by how its population voted in the last presidential election) and the ideology
scores of its members of Congress. Use the following variables:
• nominate_dim1- House Member’s first-dimension NOMINATE ideology score for the 117th Congress
(2021-2023). More negative scores are more liberal, more positive scores are more conservative.
• biden20- Pres. Biden’s vote share in 2020.
A. Which variable is the independent variable? Which is the dependent variable?
B. Generate a scatterplot using the two variables specified above. Interpret what you see in the graph.
C. Generate a new scatterplot that adds a “line of best fit” to your scatterplot from Part B. What additional
information does your new scatterplot provide?
Number 2: Linegraphs
Use the parties data (parties_dwnom.csv) to examine how congressional ideology scores have changed
over time.
• nominate_dim1_median- Median first-dimension NOMINATE score for House members belonging
to a given party during a given session.
• congress- Number of congressional session (e.g., 117th session ran from Jan. 2021 to Jan. 2023).
• party_name- Party name, Democrat or Republican.
A. Generate a linegraph of the median NOMINATE score over time (sessions of Congress, rather than years),
with a separate line for each party. Interpret what you see.
B. Add a smoothed line to the graph you created for part A. What information does this line provide?
1
C. Generate a linegraph of the median NOMINATE score over time without grouping by party. What does
the graph show, and why is/isn’t it useful in this case?
Number 3: Histograms
Use the same variables from h117 data to examine the distribution of NOMINATE scores among House
members.
A. Generate a histogram of House members’ NOMINATE scores. Try changing the number or size of the
bins, and explain why you settled on the number you did. Interpret what you see in the graph.
B. Generate a histogram of Biden vote share across districts, and interpret the distribution.
Number 4: Boxplots
With the same h117 data, compare the distributions of Biden vote share in House districts represented by
Democrats and Republicans.
• party_code – Party of House member. 100=Democrat, 200=Republican.
A. Generate a single boxplot of the distribution of Biden vote share across House districts. Interpret the
graph, referencing the exact values of the summary statistics that are represented in the boxplot.
B. Now generate side-by-side boxplots for districts represented by Democrats and Republicans, and interpret
what you see.
Number 5: Barplots
Using the h117 data again, compare the partisan balance of states’ congressional delegations.
• state_abbrev- 2-letter abbreviation for state name.
• region- Census region of state (Northeast, Midwest, South, West).
A. Generate a side-by-side/grouped barplot of the count of House members from each region belonging to
each party. Interpret what you see. What limitations does this graph have?
B. Generate a grid of side-by-side/group barplots of the number of House members from each state belonging
to each party. Facet by region. Interpret what you see.
2

Both comments and pings are currently closed.