What is an R package?

An R package is a type of R project, in other words, a way to work with R on a specific topic. A package, just like any project, is set up from a folder in your computer that you set up where you store all your files and data. If you develop your own R package it will work similarly to those packages you install and load every time you run an R script (i.e ggplot2, nlmer etc.) when you need a function that does something useful, only this time, you will be the person behind the scenes!

A package can include, functions, data, and all kinds of scripts to process or document your work. As an example, I have developed the package cataract por the EcoLunch presentation. I can load this package and with it I can specify other packages that need to be both installed and loaded with it with the function

devtools::load_all()


When loading the package I can also access functions that I have written myself to perform specific tasks. For example, here is a function that generates facts about Chris H. with different degrees (1-5) of spicyness:

generate_chris_fact(spicyness = 3)
## [1] "Chris has an awesome mustache"


I can also access any data sets I am working with. I can actually save them after being processed and with the variables I want so when I load them they are ready to be played with. As an example, I have added some avengers data to the cataract project:

# just typing the dataset's name when the package is loaded will make the data pop up!
avengers
##               name intelligence power
## 1    black panther           88    66
## 2      black widow           75    32
## 3  captain america           63    65
## 4   captain marvel           80    95
## 5   doctor strange           95    84
## 6           falcon           63    30
## 7            groot            6    75
## 8          hawkeye           70    35
## 9             hulk           10    94
## 10        iron man           99    67
## 11  scarlett witch           96   100
## 12          vision          100    96
## 13  winter soldier           55    64
## 14            thor           60    90


And this is just the tip of the iceberg of the things you can do with your R package! Would you be interested in creating one for yourself? Here are some tips:

Creating your own R package

Okay dude but how in the hell do you actually do this? No problem! Here is a step by step tutorial for you:

Step 1

Have you ever noticed when you open Rstudio that little R sign that says something like Project:(None). Click there and go to New Project.


Step 2

Then this window will pop up, we want a New Directory.



Step 3

Here we will choose R package.



Step 4

Which will open this:



Three things to consider here:

    1. Name of your package: Package naming needs to meet certain formalities. In this link: https://r-pkgs.org/workflows101.html?q=namin#naming you will find them as well as some pragmatic advice on naming. My personal recommendation is to name your packages with a single word that is descriptive enough and that motivates you to work on it. What I do is name my packages using extremely geeky associations which are related with what the package is about in a funny way.
    1. Make sure you create the folder for the package on a directory you know!: You can set it up on your Desktop, that’s fine! But remember it is there!. What I recommend, as kind of a technological tidiness nerd is to create a specialized folder for projects where you can locate all your work!
    1. Don’t forget about GIT! You should have GIT installed in your computer right now (if not go to: https://ggcostoya.github.io/cataract/articles/git.html and check how to do it). If you indeed have it, a little check box with Create git repository should appear in this screen. Be sure to check that box! It will be essential later.

Once you are done, click on Create Project and KABOOM! Congrats! You’ve created your first R package, now it’s time to see what is going on here.

Elements of your R package

As soon as you create your R package you will see that a bunch of folders automatically appear on the folder for it. You can see these in the lower right-hand panel in Rstudio where it says Files, it should look something like this:



Among these files and documents the ones you got to pay attention to are:

.Rproj

The .Rproj file is the source to all of your R package. In this case, as an example, I’ve created the kurama package so in the picture above you see kurama.Rproj. When you create projects of your own the .Rproj file will be your package’s name. If you go to the folder where you created the package and click on it an R session with your package name will appear. Notice also that on the top right hand side it doesn’t say Project:(None) anymore! The name of your package should appear up there.

DESCRIPTION

The DESCRIPTION file contains all the information related to your project. If you open it, it should look like this:

The DESCRIPTION file can include all Dependencies, which are all packages that need to be installed and loaded for your package to run. You will need to add them manually. You can also add information about the package (title, author, description etc.). As an example, here is the DESCRIPTION file for cataract:

Setting up a DESCRIPTION file like this saves you the trouble of having to install and load packages every time. You just need to include them as Dependencies and BOOM! They will be loaded every time you load your R package!

\R

The \R folder will contain all the functions for your package. If you click there as soon as you create your package you’ll see that R has automatically created a function called hello, which is saved on the hello.R file that looks like this:

# Hello, world!
#
# This is an example function named 'hello' 
# which prints 'Hello, world!'.
#
# You can learn more about package authoring with RStudio at:
#
#   http://r-pkgs.had.co.nz/
#
# Some useful keyboard shortcuts for package authoring:
#
#   Install Package:           'Ctrl + Shift + B'
#   Check Package:             'Ctrl + Shift + E'
#   Test Package:              'Ctrl + Shift + T'

hello <- function() {
  print("Hello, world!")
}


The goal here is that you create functions for yourself to make tasks easier and more repeatable. Writing your own functions is a way to avoid repetitions and keep your code short. For example, let’s say that for the avengers data set I showed you earlier I want to determine who would win in a fight between Vision and Thor:

# let's check the data first once again: 
avengers
##               name intelligence power
## 1    black panther           88    66
## 2      black widow           75    32
## 3  captain america           63    65
## 4   captain marvel           80    95
## 5   doctor strange           95    84
## 6           falcon           63    30
## 7            groot            6    75
## 8          hawkeye           70    35
## 9             hulk           10    94
## 10        iron man           99    67
## 11  scarlett witch           96   100
## 12          vision          100    96
## 13  winter soldier           55    64
## 14            thor           60    90
# filter the data to keep only data for vision and thor 
avg_vision_thor <- filter(avengers, name %in% c("vision", "thor"))
# keep the avenger with the highest power value 
winner <- filter(avg_vision_thor, power == max(power))
# extract the name of the winner 
name_winner <- as.character(winner$name)
# and here's the winner! VISION! of course... 
name_winner
## [1] "vision"
# what if I needed to know who would win between scarlett witch and hulk? 
# Redo the entire shannanigan again? HELL NO! 
# I can write a function to automate this process. I can call this function "match_avengers": 

match_avengers <- function(avg_1, avg_2){
  # avg_1 and avg_2 are the arguments of the function,
  # filter avenger data and keep only two specified avengers as well as the most powerful one
  match <- avengers %>% filter(name %in% c(avg_1, avg_2)) %>% filter(power == max(power))
  # get the name of the winner
  winner <- as.character(match$name)
  return(winner)
}
# let's test the function out! SCARLETT WITCH ALL DAY!! 
match_avengers("scarlett witch", "hulk")
## [1] "scarlett witch"


If you check the \R folder for the cataract package here: https://github.com/ggcostoya/cataract/tree/master/R , you’ll see that one of the files there is for the match_avengers function. If it is there I can use the function any time cataract is loaded!

.gitignore

The .gitignore file will specify which files or folders you don’t want GIT tracking version-control style. For example, a good addition to .gitignore is the .Rproj file, it will never change, why keep track of it?

Elements you NEED TO ADD to your R package

Although the package would work with the basic stuff that is generated when you created, to upgrade your R package to the next level you’ll need to create the following folders:

\raw_data

In the \raw_data folder you will save all data that comes fresh from the field or the lab. I am talking about those messy excels, .txt files, .csv files anything! All in there. The \raw_data folder will also contain what I call cleaning scripts. These will have all the data processing, filtering and cleaning necessary for your raw data to become the actual data you’ll use in your scripts.

If you go to: https://github.com/ggcostoya/cataract/tree/master/raw_data, you’ll be to see the raw data for the cataract. As an example I have included some meerkats data a friend of mine used for a project a couple of years ago. Below is an an example cleaning script (called meerkats_data_prep.R):

## Meerkat data preparation
library(here) # load here package
meerkats <- read.table(file.path(here(), "raw_data", "meerkats.txt")) # load the raw meerkat data
# rename some columns to make analysis easier
meerkats <- data.frame(obs_feeds = meerkats$observedNfeeds,
                       obs_hours = meerkats$observedhours,
                       r_2_brood = meerkats$relatedness2brood,
                       group_id = meerkats$socialgroupID,
                       brood_id = meerkats$broodID,
                       delta_mass = meerkats$massChange)

# save meerkat data in "data" folder
save(meerkats, file = file.path(here::here(), "data", "meerkats.RData"))


As you see above, I am reading it, renaming some variables to make it look prettier and easier to handle when running stats on it and finally I am saving it in the \data folder, which I’ll talk about next:

\data

The \data folder will contain the actual data you will use in your scripts (in .RData format) and will be available to you immediately after you load your R package. As an example, I can take a look at the meerkats data without having to load anything because it is saved in the data folder!

head(meerkats)
##   obs_feeds obs_hours r_2_brood  group_id brood_id delta_mass
## 1         0      11.1      0.25 bfb_c_sfi  bfb0802      -0.03
## 2         0      11.2      0.25 bfb_c_sfi  bfb0802       0.28
## 3         0      12.8      0.25 bfb_c_sfi  bfb0802      -0.08
## 4         3      12.6      0.25 bfb_c_sfi  bfb0802       0.31
## 5        16      12.2      0.25 bfb_c_sfi  bfb0802       0.17
## 6        12      11.5      0.25 bfb_c_sfi  bfb0802       0.14


\sandbox

And finally, the \sandbox is where you will save all those scripts in which you are trying out stuff. This is a personal recommendation of mine; a folder where you can save all your playing around with models and data.

Where did you find all this?

Everything I have explained is extracted from the book “R packages” by the most awesome Hadley Wickham, the head developer for Rstudio and co-author of packages as awesome as tidyverse, ggplot2, pkgdown etc. The book is absolutely free and you can find it here: https://r-pkgs.org/. It has soo many more awesome things I did not develop here! Feel free to check it out!


The main man Hadley Wickham himself. I want to be like him when I grow up.