tl;dr
I got curious about R package startup messages, so I grabbed all the special zzz.R files from R packages that are on CRAN and sourced on GitHub. You can jump to the table of results.
Start me up
I learnt recently from Hernando Cortina that his and Amanda Dobbyn’s {multicolor} package prints to the console some multicoloured ASCII-art text of the package’s name when you call it with library(multicolor)
.
It gave me an itch to scratch: how often are these sorts of startup messages used by R packages? What do people put in them? Is there anything funny in them? Anything nefarious?
A strong attachment
A package may need to run additional code before its functions can work, like maybe some options()
need to be set.
There are two times this kind of code can be run: when the package is loaded, including namespace calls like dplyr::select()
, or more specifically when the package is attached with library()
.
To prepare code for running on-load or on-attach, you create the special functions .onLoad()
and .onAttach()
. These go in a zzz.R file in the R/ directory of your package, because… convention?
The on-attach option is useful for printing messages for the user to see in the console, like the {multicolor} example above. You want this to happen on-attach and not on-load, since you wouldn’t want to print a message every single your script uses the ::
namespace qualifier.
To specify a message in the body of your .onAttach()
function, you use packageStartupMessage()
. Why not just cat()
or message()
? Because it allows the user to quell startup messages using suppressPackageStartupMessages()
.
You can learn more in Hadley Wickham’s R Packages book.
As an example, consider the {tidyverse} package, which has some verbose output on attach:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
But you can shush it with the suppressPackageStartupMessages()
function:1
detach("package:tidyverse") # first detach it
suppressPackageStartupMessages(library(tidyverse))
Peace.
So the startup messages of {multicolor} and {tidyverse} do two completely different things: one is fun and frivolous and the other is informative. Isn’t it possible that someone could put ads in the startup message or use it in evil ways? Well, perhaps.
Let’s find out what R package developers put in their startup messages. How many packages even have a zzz.R
file and how many of those even contain a packageStartupMessage()
call?
Catching some Zs
I understand if all this talk of zzz.R causes you to 💤. In short, if you want to get all the zzz.R files, you can:
- Get a list of R packages on CRAN
- Identify which ones have an associated GitHub repo
- Get the default branch name of each one and construct the possible URL to their zzz.R file
- Contact the possible zzz.R file to see if it exists
- If it exists, download it
- Filter for zzz.R files that contain
packageStartupMessage()
Note
If you’re thinking this approach is a bit long-winded, you’re right. As Tim pointed out, we could just extract the info via METACRAN, an unofficial CRAN mirror hosted on GitHub. It even has its own API. I’ll probably give that a go at some point.
We’ve already attached the tidyverse packages, but we’ll also need two more packages:
library(gh) # interact with GitHub API
library(httr) # requests via the internet
Packages
Luckily you can grab info for all current CRAN packages with the very handy CRAN_package_db()
function.2
cran_pkgs <- as_tibble(tools::CRAN_package_db())
This returns a dataframe containing 18338 rows, where each one is a package, along with 66 variables. We get information like the stuff that’s found in package DESCRIPTION files, but it doesn’t tell us whether a package has a zzz.R file.
One way to do this is to visit the GitHub repo associated with the package, if it has one, and see if a zzz.R exists. Of course, many packages are not on GitHub, but we’re going to ignore those for simplicity.
Github repos
A quick way of discovering if a package has a GitHub repo is to check for ‘github.com’ in the BugReports
section of it DESCRIPTION file.3 Again, this doesn’t capture all the possible repos, but is fine for now.
has_repo <- cran_pkgs %>%
select(Package, BugReports) %>%
filter(str_detect(BugReports, "github")) %>%
transmute(
Package,
owner_repo = str_extract(
str_replace_all(paste0(BugReports, "/x"), "//", "/"),
"(?<=github.com/).*(?=/[a-zA-Z])"
)
) %>%
separate(owner_repo, c("owner", "repo"), "/") %>%
filter(!is.na(Package), !is.na(owner), !is.na(repo)) %>%
distinct(Package, owner, repo) %>%
arrange(Package)
sample_n(has_repo, 5)
## # A tibble: 5 × 3
## Package owner repo
## <chr> <chr> <chr>
## 1 PowerTOST Detlew PowerTOST
## 2 pdfetch abielr pdfetch
## 3 torchdatasets mlverse torchdatasets
## 4 featureflag szymanskir featureflag
## 5 wordpredictor pakjiddat word-predictor
There were 18338 CRAN packages total and now we have 6691 (36%) that appear to have a GitHub repo.
If you’re wondering why we didn’t just use the package name as the repo name, it’s because they sometimes don’t match, e.g. {baseballDBR} is in a repo called ‘moneyball’.
Now we can use the repo details to build a URL to a potential zzz.R URL. This comes in the form https://raw.githubusercontent.com/<owner>/<repo>/<defaultbranch>/R/zzz.R"
.
Default branch
You’ll notice we don’t yet know the default branch of the package’s GitHub repo. Historically, we could probably have just hard-coded ‘master’, but the automatic default is now ‘main’. And of course, the default branch could be something else entirely.
We can grab the default branch for each repo from the GitHub API using the excellent {gh} package by Gábor Csárdi, Jenny Bryan and Hadley Wickham. You’ll need to do some setup to use it yourself.
The key function is gh()
, to which you can pass a GET request for the information we want: GET /repos/{owner}/{repo}
. We can iterate for each repo by passing each owner and repo name in turn. It returns a list object with lots of information about the repo.
I’ve created ‘possibly’ function variants with {purrr} so that any errors in the process are handled by returning NA
, rather than breaking the loop, which would kill the process.
# Create 'try' function versions
map2_possibly <- possibly(map2, NA_real_)
gh_possibly <- possibly(gh, NA_real_)
# Function: fetch repo details, print message on action
get_repo <- function(owner, repo) {
cat(paste0("[", Sys.time(), "]"), paste0(owner, "/", repo), "\n")
gh_possibly("GET /repos/{owner}/{repo}", owner = owner, repo = repo)
}
maybe_zzz <- has_repo %>%
mutate(
repo_deets = map2_possibly(
has_repo$owner, has_repo$repo, get_repo
)
) %>%
mutate(
default_branch = map(
repo_deets, ~pluck(.x, "default_branch")
),
default_branch = pluck(default_branch, 1),
zzz_url = paste0(
"https://raw.githubusercontent.com/",
owner, "/", repo, "/", default_branch, "/R/zzz.R"
)
)
So now we have a column with the returned repo information, the extracted default branch name and a URL that points to a potential zzz.R file in that repo.
head(maybe_zzz)
## # A tibble: 6 × 6
## Package owner repo repo_deets default_branch zzz_url
## <chr> <chr> <chr> <list> <chr> <chr>
## 1 AATtools Spiritspeak AATtools <gh_rspns … master https://r…
## 2 abbyyR soodoku abbyyR <gh_rspns … master https://r…
## 3 abctools dennisprangle abctools <gh_rspns … master https://r…
## 4 abdiv kylebittinger abdiv <gh_rspns … master https://r…
## 5 abess abess-team abess <gh_rspns … master https://r…
## 6 ABHgenotypeR StefanReuscher ABHgenotypeR <gh_rspns … master https://r…
Status codes
Now we can check the status code for each of the URLs we’ve built. A return of 200
tells us that the file exists and 404
means it doesn’t.4 Again, we can prevent the loop breaking on error by creating a ‘possibly’ version of map()
.
library(httr) # for status_code()
map_possibly <- possibly(map, NA_character_)
maybe_zzz_status <- maybe_zzz %>%
mutate(
status = map_possibly(
zzz_url, ~status_code(GET(.x))
)
) %>%
unnest(status)
count(maybe_zzz_status, status)
## # A tibble: 2 × 2
## status n
## <int> <int>
## 1 200 1237
## 2 404 5332
Okay, great, we’ve got over a thousand zzz.R files.
Read content
Now we know which packages have a zzz.R file, we can use readLines()
to grab their content from their URL, which again we can protect from errors with purrr::possibly()
.
Note that I’ve created a special version of readLines()
that reports to the user the path being checked, but also has a random delay. This is to dampen the impact on GitHub’s servers.
# Function: readLines() but with a pause and message
readLines_delay <- function(path) {
sample(1:3, 1)
cat(paste0("[", Sys.time(), "]"), path, "\n")
readLines(path, warn = FALSE)
}
readLines_delay_possibly <- possibly(readLines_delay, NA_character_)
fosho_zzz <- maybe_zzz_status %>%
select(-repo_deets) %>%
filter(status == 200) %>% # just the
mutate(lines = map_possibly(zzz_url, readLines_delay_possibly))
dim(fosho_zzz)
So now we have a dataframe with a row per package and a list-column containing the R code in the zzz.R file.
Startup messages
Finally, we can find out which packages have a packageStartupMessage()
call inside their zzz.R.
has_psm <- fosho_zzz %>%
select(Package, lines) %>%
unnest(lines) %>%
filter(str_detect(lines, "packageStartupMessage")) %>%
mutate(lines = str_remove_all(lines, " ")) %>%
distinct(Package) %>%
pull()
fosho_psm <- filter(fosho_zzz, Package %in% has_psm)
So we started with 18338 CRAN packages and have winnowed it to down to 467 (3%) that have a call to packageStartupMessage()
in their zzz.R.
Table of results
I could provide a table with all the zzz.R scripts that I collected, but I don’t want to break any licenses by reproducing them all here. Instead, here’s an interactive table that links to the GitHub page for each zzz.R file.
Click for table code
library(reactable)
reactable(
data = fosho_zzz %>% select(package = Package, owner, url = zzz_url),
searchable = TRUE,
paginationType = "jump",
defaultPageSize = 10,
columns = list(
url = colDef(cell = function(value) {
htmltools::tags$a(href = value, target = "_blank", "zzz.R")
})
)
)
Patterns
I had a scan through the scripts and found some frequent uses of packageStartupMessages()
to:
- show a basic salutation (e.g. {afex})
- show the version number, a check to see if the user has the latest version, sometimes a prompt to download the latest version for them (e.g. {vistributions}), sometimes a note that the package has been superseded by another (e.g. {drake})
- links to guidance, examples, documentation (e.g. {bayesplot})
- provide a citation or author names (e.g. {unvotes})
- link to issue tracking or bug reporting (e.g. {timeperiodsR})
- check for required supplementary software (e.g. {DALY})
- remind of the need for credentials or keys for packages that access APIS, for example (e.g. {trainR})
- provide terms of use, warranties, licenses, etc (e.g. {emmeans})
I was also interested to see:
- a random tip, so you get something new each time you attach the package (e.g. {shinyjs})
- appeals for GitHub stars (e.g. {sigminer})
- links to purchasable course materials (e.g. {anomalise})
And perhaps the most self-aware were several packages that reminded the user that they can turn off startup messages with suppressPackageStartupMessages()
if the messages get too annoying (e.g. {dendextend}).
A few interesting specifics (possible spoiler alerts!):
- {bayestestR} and {sjmisc} have displays a special Star Wars message on a certain day of the year…
- {SHT} and {symengine} load ASCII art, as does {BetaBit}, which also prompts the user for a game they’d like to play
- {depigner} says ‘Welcome to depigner: we are here to un-stress you!’
- {mde} has a friendly ‘Happy Exploration :)’ salutation and {manymodelr} says ‘Happy Modelling! :)’
- {sjPLot} says ‘#refugeeswelcome’
You can use the interactive table above to reach each of the zzz.R files for these packages, or have a sift through yourself to see what you can find.
Buy my stuff?
Is there a line somewhere? Is it okay to advertise something? You could argue that someone has gone out of their way to release a package for free, so what harm is it in trying to get something back? or does this approach undermine the whole ‘open’ process?
I know some people find startup messages a bit annoying, but I think it’s easy enough for users to opt out of seeing them with a call to suppressPackageStartupMessages()
.
Mostly I’m kind of surprised by the lack of abuse of packageStartupMessage()
in this sample. Let me know of any cheeky business you might have come across.
Session info
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.1.0 (2021-05-18)
## os macOS Big Sur 10.16
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2021-10-04
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
## backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
## blogdown 1.4 2021-07-23 [1] CRAN (R 4.1.0)
## bookdown 0.23 2021-08-13 [1] CRAN (R 4.1.0)
## broom 0.7.9 2021-07-27 [1] CRAN (R 4.1.0)
## bslib 0.2.5.1 2021-05-18 [1] CRAN (R 4.1.0)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0)
## cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
## colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
## crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
## crosstalk 1.1.1 2021-01-12 [1] CRAN (R 4.1.0)
## DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
## dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
## dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
## emo 0.0.0.9000 2021-08-26 [1] Github (hadley/emo@3f03b11)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
## fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
## fontawesome 0.2.2 2021-07-02 [1] CRAN (R 4.1.0)
## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0)
## fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
## gh * 1.3.0 2021-04-30 [1] CRAN (R 4.1.0)
## glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
## haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.0)
## hms 1.1.0 2021-05-17 [1] CRAN (R 4.1.0)
## htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
## htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.1.0)
## httr * 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
## knitr 1.34 2021-09-09 [1] CRAN (R 4.1.0)
## lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
## lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.1.0)
## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
## pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
## Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
## reactable * 0.2.3 2020-10-04 [1] CRAN (R 4.1.0)
## reactR 0.4.4 2021-02-22 [1] CRAN (R 4.1.0)
## readr * 2.0.1 2021-08-10 [1] CRAN (R 4.1.0)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0)
## reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
## rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
## rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.1.0)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
## rvest 1.0.1 2021-07-26 [1] CRAN (R 4.1.0)
## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
## stringi 1.7.4 2021-08-25 [1] CRAN (R 4.1.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
## tibble * 3.1.4 2021-08-25 [1] CRAN (R 4.1.0)
## tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.1.0)
## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0)
## tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.1.0)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
## withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
## xfun 0.26 2021-09-14 [1] CRAN (R 4.1.0)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
##
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
Which makes me wonder what the longest R function name is.↩︎
I made use of this for the {kevinbacran} package and the associated ‘What’s your Hadley Number?’ app.↩︎
I chose the
BugReports
field rather theURL
field because people put all sorts of things in the latter, like links to websites, etc.BugReports
(I think) tends to point to the source on GitHub.↩︎I wrote about status codes as part of the post on my {linkrot} package.↩︎