2  {ggplot2} and categorical data

2.1 factor variables

Factors are how R represents categorical data.

There are two kinds of factors:

  • factor - used for nominal data (“Ducks”,“Cats”,“Dogs”)
  • ordered- used for ordinal data (“10-30”,“31-40”,“41-60”)

We’ll manipulate our barplots and add more information using factors.

Here’s the dataset we’ll use to investigate how to work with factors in ggplot2.

2.1.1 Exercise

  • Use the glimpse() function (it is part of the dplyr package, which we load for you) on pets to see the levels for the different categories.
  • Which of the variables are categorical (indicated by <fct>, or <ord>?)
Tip
##use glimpse here
glimpse(pets)

There are 4 variables that are categorical in this dataset: name, animal, shotsCurrent, and ageCategory

2.2 A Basic Barplot using geom_bar()

Now that we understand what categories exist in our dataset, we can begin to visualize them using barplots generated with the geom_bar() geom.

The geom_bar() default is to count the number of values with each factor level. Note that you don’t map to a y-aesthetic here, because the y values are the counts.

Given this dataset, we might want to ask how many pets have the same name.

Map the name variable to x in the ggplot statement. What is the most popular name?

2.2.1 Exercise

Solution
##show a barplot and count by name and fill by animal
##theme() allows us to angle the text labels so that we can read them
ggplot(pets, aes(x=name)) + geom_bar() + 
    ##we make the x axis text angled 
    ##for better legibility
    theme(axis.text.x = element_text(angle=45))

2.3 Stacked Bars

Let’s see how many of each animal got shots. We can do this by mapping shotsCurrent to fill.

Map shotsCurrent to the fill aesthetic.

2.3.1 Exercise

Tip
#map the right variable in pets to fill
ggplot(pets, aes(x=animal, fill=shotsCurrent)) + 
  geom_bar()

2.4 Quick Quiz

What does mapping color to "black" in geom_bar() do? For example:

ggplot(pets, aes(x=animal, fill=shotsCurrent)) + 
  geom_bar(color="black")

If you’re unsure, compare the graph above to the previous graph.




2.5 Proportional Barchart

We may only be interested in the relative proportions between the different categories. Visualizing this is useful for various 2 x 2 tests on proportions.

By mapping position = "fill", we can show proportions rather than counts.

Change the position argument in geom_bar() to "fill". What percent of dogs did not receive shots?

2.5.1 Exercise

Tip
ggplot(pets, aes(x=animal,fill=shotsCurrent)) + 
  geom_bar(position= "fill", color="black")

2.6 Dodge those bars!

Instead of stacking, we can also dodge the bars (move the bars so they’re beside each other).

2.6.1 Exercise

Change the position argument in geom_bar() to "dodge".

Tip
ggplot(pets, aes(x=animal,fill=shotsCurrent)) + 
      geom_bar(position= "dodge", color="black")

2.7 Faceting a graph

Say you have another factor variable and you want to stratify the plots based on that. You can do that by supplying the name of that variable as a facet. Here, we facet our barplot by shotsCurrent.

You might notice that there are blank spots for the categories in each facet. We can remove these in each facet by using scale="free_x" argument in facet_wrap().

2.7.1 Exercise

Add free_x to the scale argument. How many animals named “Morris” did not receive shots?

Tip
ggplot(pets, aes(x=name)) + geom_bar() + 
  facet_wrap(facets=~shotsCurrent, scale= "free_x") +
  theme(axis.text.x = element_text(angle=45))

2.8 Super Quick Review

Faceting a graph allows us to:




2.9 Your Task: Bar Charts

Now you’ll put everything you’ve learned together into a single barplot.

Given the pets data.frame: + plot a stacked proportional barchart that shows the ageCategory counts by animal type. + Facet this plot by shotsCurrent.

Is the proportion of animals receiving shots the same across each age category?

Think about what to map to x, and what to map to fill, and what position argument you need for geom_bar(). Finally, think about how to facet the variable.

Tip
ggplot(pets, aes(x=ageCategory, fill=animal)) + 
  #what argument goes here?
  geom_bar(position = "fill") +
  facet_wrap(facets=~shotsCurrent, scale = "free_x")

2.10 Boxplots

Boxplots allow us to assess distributions of a continuous variable (weight) conditioned on categorical variables (shotsCurrent).

What does this tell us? Is there a difference in weight between those animals who recieved shots or not?




2.11 Exercise: Try out geom_boxplot() yourself

Plot a boxplot of weight conditioned on animal. Is there a difference in weight between animal types?

Think about what variables map to what aesthetics.

Tip
ggplot(pets, aes(x= animal, y= weight)) + geom_boxplot()

2.12 Your final task: How heavy are our pets?

  • Visualize weight by ageCategory status by using geom_boxplot()
  • What do you conclude? Which age of animal weighs more on average than the other?

2.12.1 Exercise

Tip
ggplot(pets, aes(x= ageCategory, y= weight)) + geom_boxplot()

2.13 What you learned in this chapter

  • How to visualize categorical data
  • Two more types of plots: geom_bar() and geom_boxplot()
  • Aesthetics that can be mapped to these geoms (fill, x, y)
  • Options for geom_bar(): position = "fill" (proportional bars) and position = "dodge" (dodged bars)
  • How to stratify your graphs using facet_wrap()
  • More about how to put together a ggplot

2.14 More Resources: