The Method of Least Squares

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Reading Questions
  • Break
  • Worksheet: The Method of Least Squares
  • Break
  • Lab 4: Baseball - Part 1
  • Appendix: - more practice!

Announcements

  • Quiz 2 grades out. Please submit your regrade requests by Friday.
  • Portfolio 7 due on Friday (Mon., Tue., Wed., Thu. worksheets).
  • Lab 4: Baseball released today and due next Tuesday. See Ed for any further updates.
  • Do not worry about the video in the notes last night. It is an interesting watch, but the summary of the video in the notes is sufficient for this semester.

Reading Questions

  • Please put your laptops under your desk and your phones away.
  • Write your name, ID, and bubble in Version “A” on your answer sheet.
  • You may work only with those at your table!

Which quantity is minimized in order to find the least squares estimates of \(b_0\) and \(b_1\) in the equation \(\hat{y} = b_0 + b_1x\)?

  • A: \(\sum_{i = 1}^n \left(y_i - \hat{y}_i \right)^2\)
  • B: \(\sum_{i = 1}^n \left(x_i - \hat{x}_i \right)^2\)
  • C: \(\sum_{i = 1}^n \left(y_i - \hat{x}_i \right)^2\)
  • D: \(\sum_{i = 1}^n \left|y_i - \hat{y}_i \right|\)
  • E: \(\sum_{i = 1}^n y_i^2 - \hat{y}_i^2\)
00:40

Read this first.

An analyst for the EPA is working to devise a method to determine whether or not a set of water samples are contaminated with PCB (a carcinogenic compound). They measure the levels of four other compounds in each of the samples that are easier to measure and whose presence is highly correlated with PCB. They then separate their water samples into two groups (samples suspected to be contaminated with PCB and samples that are clean) based on the presence/absence of the compounds in each sample.

What type of claim will this analyst be making?

  • A: Prediction (Regression)
  • B: Summary
  • C: Prediction (Classification)
  • D: Causal
  • E: Generalization
00:30

Imagine you have a data frame df containing two numerical variables, \(x\) and \(y\). Select the techniques which are most appropriate to estimate the values of the slope and intercept of the equation \(\hat{y} = b_0 + b_1x\) of the data in df.

  • A: A bootstrapped confidence interval for the slope/intercept obtained by least squares regression.
  • B: Least squares regression - Numerical optimization
  • C: Least squares regression - pen and paper (calculus)
  • D: Hypothesis test
00:45

What might be wrong with the following R function g, which is trying to emulate the function \(g(x) = 2x + 8\)? Select all that apply.

g <- function{x}(
     2x + 8
)
  • A: function should be replaced with g.
  • B: 2x must be written as 2*x in order for the multiplication to work.
  • C: The position of the parentheses and the curly braces should be switched.
00:50

Break

05:00

Worksheet: The Method of Least Squares

30:00

Break

05:00

Lab 4: Baseball

30:00

Appendix

Concept Questions

Concept Question 1

An engineer working for Waymo self-driving cars is working to solve a problem. When it rains, reflections of other cars in puddles can disorient the self-driving car. Their team is working on a model to determine when the self-driving car is seeing a reflection of a car vs a real car.

Think of a potential response and predictor, and about whether this is a regression or classification problem.

01:00

Concept Question 2

An analyst working for the UC Berkeley Admissions Office is working to help the university decide how many students to send offer letters to. They have a target class size (that fits within the budget and residence halls), but they’re not sure how many students will accept the offer. How many should they admit?

Think of a potential response and predictor, and about whether this is a regression or classification problem.

01:00

Concept Question 3

  • Here is a function f.
f <- function(x, y) {
  y*(x + 3) 
}

What will the following line of code return?

f(3,5)