Lab 10: Arbuthnot

The Data

John Arbuthnot was a polymath who lived in London, England during the early 18th-century. This period was one of rapid growth and modernization for the city, but there was still no Google! What we take for granted in terms of how we reason from data today was nearly absent from life in 18th-century England. Most people would reason from direct experience, anecdote or appeals to tradition. Arbuthnot took the big step of realizing that one can learn a lot by bringing together information in a systematic way.

Arbuthnot became interested in finding the ratio of boys to girls that were born in the city; specifically, he was interested in the proportion of births that were girls. How might he find information on sex at birth? It turns out that there was an institution that systematically collected this data - the church! At that time, most children were taken to the nearby parish church and “christened” - given a name in the church - shortly after they were born. The parish churches record the name and date of each of these christenings. Using their data, Arbuthnot tabulated the total count of names in each year that were traditionally female and male name, and published his findings in 1710. If you’d like to read the publication, click here! In any case, the data from the paper is contained in the arbuthnot data frame in the stat20data package.

This week’s lab is meant to be a review of previous topics. It will also show you how far you’ve come!

Questions: Understanding context

On Ed, you can find an image of what a parish church’s birth records looked like! Use this snapshot to help you answer the first three questions.

Question 1

What is the unit of observation in the original christening records?

Question 2

List the variables that appear to have been recorded.

Question 3

What do you think the probability is that a newborn child is recorded as a girl? Explain your reasoning in one or two sentences.

Questions: Computing on the data

As alluded to earlier, Arbuthnot’s data frame (arbuthnot), located in the stat20data package, contains aggregated results from christening records across many years.

Question 4

What does each row in the arbuthnot data frame correspond to?

Question 5

What is the time frame covered by Arbuthnot’s data? You may answer this question with or without code.

Question 6

Write dplyr code to find the year with the greatest number of children christened.

Question 7

Write dplyr code to find the proportion of girls christened in 1700.

Question 8

What is the trend over time in the proportion of girls christened?

part a

Use ggplot to visualize this trend. Label your axes and give the plot a title.

part b

Interpret your visualization in at least two sentences.


The present data frame, also located in the stat20data package, contains similar data to the arbuthnot data frame, but from the modern-day United States.

Question 9

From a glance, how do the counts in Arbuthnot’s data compare to the counts in the present-day data in terms of size? Answer in one to two sentences.

Question 10

Make a similar plot to that of Question 8 with the present-day data. Label your axes and give the plot a title.

Question 11

Based on the results from the two data frames, how would you answer Arbuthnot’s original question? Answer in at least three sentences. If you have any reservations about using this data to make a claim, you may bring those up.