Week 6 Starter File

Author

Biagio Palese

Intermediate Visualizations

Coding is fun!!!

The following sections of the book (R for Data Science) used for the first portion of the course are included in the first week:

Going beyond the basic

In the basic data visualization class, we built a solid foundation, learning how to create compelling charts with ggplot2. We explored the essential template of a plot, how to control aesthetics like axes mapping, color, fill, size, alpha, and shape. We emphasized the importance of understanding your data, focusing on the columns data type and chart objective to make informed decisions when choosing the right chart. By working with distribution, ranking, correlation, and evolution charts, you gained hands-on experience with some of the most commonly used geoms and you’ve gotten a taste of how powerful visualizations can be in uncovering insights from your data.

Now, as we transition into the beyond basic data visualization class, we will build on this foundation and take your skills to the next level. We’ll cover more advanced topics like static mapping to fix aesthetics to specific values, faceting to create multiple subplots for better comparison, and using multiple geoms in a single plot to enrich your visualizations. The tools covered in this class will open up new ways to explore, present, and gain deeper insights from your data. Get ready to elevate your skills and bring your data to life in ways that will captivate your audience!

Beyond the basic: Chart 1 inspired from R4DS book

Beyond the basic: Chart 2 inspired from R4DS book

Load packages

This is a critical task:

  • Every time you open a new R session you will need to load the packages.

  • Failing to do so will incur in the most common errors among beginners (e.g., ” could not find function ‘x’ ” or “object ‘y’ not found”).

  • So please always remember to load your packages by running the library function for each package you will use in that specific session 🤝

Ggplot chart template

Important

ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()

Let’s learn how to complete and extend this template beyond the basic.

Static mapping

So far, we’ve seen that when you map an aesthetic to a variable, the ggplot2 package automatically handles the rest. It dynamically selects an appropriate scale for the aesthetic and even generates a legend to explain the relationship between the variable and its visual representation. For aesthetics like x and y, instead of a legend, ggplot2 creates axis lines with tick marks and labels, which serve as guides, showing how data points correspond to values.

But what if you want more control? What if you want to manually adjust the layout of your chart to better fit your needs/preferences? Can you do that?

The answer is yes! Let’s explore how.

Important

To set an aesthetics (color, size., shape) manually/statically, set them by name as an argument of your geom function; i.e. they go outside of aes() and do not map them to a variable!

Moreover, you need to pick a level that makes sense for that aesthetic:

  • The name of a color as a character string (“blue”).

  • The size of a point in mm (2).

  • The shape of a point as a number (18), see figure below.

List of static shapes: source R4DS book

Let’s create a few more charts to practice static mapping:

Activity 1 (a & b in class c & d at home): Charts with static mapping - 5 minutes:

[Write code just below each instruction; finally use MS Teams R - Forum channel for help on the in class activities/homework or if you have other questions]

Knowledge Check 1

Question: What static mapping was used in the chart above?

- answer 1: color, size, shape
- answer 2: color, shape
- answer 3: color, size
- answer 4: fill, size, shape

Faceting

One powerful way to incorporate additional variables into a chart is by mapping them dynamically through aesthetics inside the aes() function. However, especially when working with categorical variables, another effective method is to use faceting, which splits your plot into multiple subplots—each displaying a subset of the data. This is almost like visually ‘grouping’ your data, similar to how we used group_by() in data manipulation. However, instead of summarizing values, faceting allows us to see the observations within each group displayed in separate charts, making patterns or differences easier to spot.

To facet your chart by a single variable, you can use facet_wrap(), where the variable passed should be discrete. If you want to facet by the combination of two variables, you can apply facet_grid(), allowing you to create a matrix of plots that can reveal deeper insights into your data’s structure.

Activity 2 (a & b in class c & d at home): Charts with faceting - 7 minutes:

[Write code just below each instruction; finally use MS Teams R - Forum channel for help on the in class activities/homework or if you have other questions]

More on geometric objects

A geom is the geometric shape a plot uses to represent data. We’ve already discussed how choosing the right geom is essential and how much the different geoms impact the final outcome of your visual. But beyond selecting the right geom, it’s important to understand how aesthetics like color, size, and shape can vary depending on the geom you choose.

Each geom interprets these aesthetics differently, so by changing the geom, you’re not just changing the chart type—you’re also affecting how the visual elements are displayed, giving your plot more depth and meaning. From now on, when you change the geom remember that you probably need to change also the aesthetics used.

Caution

Every geom function in ggplot2 takes a mapping argument. However, not every aesthetic works with every geom. For example: you could set the shape of a point, but you couldn’t set the “shape” of a line. On the other hand, you could set the linetype of a line and not of a point. geom_smooth() will draw a different line, with a different linetype, for each unique value of the variable that you map to linetype.

In the above examples geom_smooth() separates the cars into three lines based on their drv value, which describes a car’s drive train. This way you can see how the drive train impact the relationship between hwy and engine size. Remember, 4 stands for four-wheel drive, f for front-wheel drive, and r for rear-wheel drive.

Notice the warning and that the shape of the line doesn’t change but it still distinguish 3 separate line. The problem is that you can’t determine which one is which.

Multiple geoms on the same chart

One of the powerful features of ggplot2 is the ability to layer multiple geoms on the same chart. This allows you to combine different visual elements and create more insightful visualizations. For example, you might use points to represent individual data values while overlaying a smooth line to show a trend. By layering geoms, you’re able to reveal different aspects of your data in one cohesive view, enriching the story your visualization tells. The flexibility of multiple geoms opens up new ways to highlight patterns, relationships, and trends that might not be as clear with a single geom.

Because we are inserting the mapping inside the ggplot() function the ggplot2 package will treat these mappings as global mappings that apply to each geom in the graph. So, adding a second geom is as simple as adding a new layer. Let’s check how in the examples below:

Important

The order in which you layer geoms significantly affects the final visualization, much like stacking one chart on top of another. The layering sequence matters because elements added later can obscure or enhance the ones added before. Think of it as building up the plot step by step.

Just as the |> operator in data manipulation allows you to chain operations, the + in ggplot2 lets you seamlessly stack layers in your plot. Understanding this parallel helps you see how both data manipulation and visualization follow a logical flow of transformation and refinement.

Activity 3 (a & b in class c & d at home): Multiple geoms charts. - 7 minutes

[Write code just below each instruction; finally use MS Teams R - Forum channel for help on the in class activities/homework or if you have other questions]

Ggplot’s real superpower: combining global with local mappings

Combining global and local mappings allows for flexible control over your visualizations. When you define mappings inside a geom function, they are considered local mappings—specific to that layer only. These local mappings can either add to or override the global mappings defined in the main ggplot() call, giving you the power to display different aesthetics across different layers of your chart. This technique is particularly useful when you want certain layers to stand out with unique colors, shapes, or sizes, without affecting the entire plot.

Beyond aesthetics, this flexibility extends to data as well; you can assign different datasets to individual layers, enabling you to overlay distinct visual representations in a single chart. This dynamic interplay between global and local settings is what makes ggplot2 so versatile, allowing you to tell a more nuanced data story.

Activity 4: Unleash ggplot power and control global and local mappings - 7 minutes:

[Write code just below each instruction; finally use MS Teams R - Forum channel for help on the in class activities/homework or if you have other questions]

Important

I won’t be covering or testing you on the materials beyond this point. However, I’ve made the “Completing the chart” section below available for those of you who are captivated by the fascinating world of data visualization with ggplot and want to explore further.

Completing the chart

The charts we have created so far are definitely insightful and useful. However, they are missing some important final touches. Details can make a difference so now we will learn how to add them. The good news is that also in this case, adding them means adding a layer to our chart.

While the idea is the same.. adding these details clearly enhance the complexity of the chart and it is important to execute one layer at the time if you run into errors.

In past two weeks you have created charts in R and you have discovered how powerful the ggplot2 package is. If you are passionate about visualizations try to create similar charts using datasets of your interest. Remember that practice makes perfect. Moreover, you always need to explore and get to know your data before making any modeling on them. Charts will help you in visually exploring the variables in your dataset and the relationships among them. Welcome to the magic world of visualizations!

On completing another R coding class!