tl;dr
Use ls()
on a package name in the form "package:base"
to see all the objects it contains. I’ve done this to find the longest (and shortest) function names in base R and the {tidyverse} suite.
Naming things
I try to keep to a few rules when creating function names, like:
- use a verb to make clear the intended action, like
get_badge()
from {badgr} - start functions with a prefix to make autocomplete easier, like the
dh_*()
functions from {dehex} - try to be descriptive but succinct, like
r2cron()
from {dialga}
It can be tricky to be succinct. Consider the base R function suppressPackageStartupMessages()
1: it’s a whopping 30 characters, but all the words are important. Something shortened, like suppPkgStartMsg()
, wouldn’t be so clear.
It made me wonder: what’s the longest function name in R?2
But! It seems tricky and time consuming to find the longest function name from all R packages. CRAN alone has over 18,000 at time of writing.
A much easier (lazier) approach is to focus on some package subsets. I’ll look at base R and the {tidyverse}.
The long and the short of it
Base R
Certain R packages are built-in and attached by default on startup.
base_names <- sessionInfo()$basePkgs
base_names
## [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
## [7] "base"
How can we fetch all the functions from these packages? We can use ls()
to list all their objects, supplying the package name in the format "package:base"
, for example. Note that I said ‘objects’, not ‘functions’, since it will also return names that refer to things like datasets.
For fun, we can use this as an excuse to demo ‘lambda’ syntax and the dog’s balls approach to function-writing, both introduced in R v4.1.3
base_pkgs <- paste0("package:", base_names)
base_fns <- lapply(base_pkgs, ls) |>
setNames(base_names) |>
lapply(\(object) as.data.frame(object)) |>
(\(x) do.call(rbind, x))() # the balls ()()
base_fns$package <- gsub("\\.\\d{,4}$", "", row.names(base_fns))
row.names(base_fns) <- NULL
base_fns$nchar <- nchar(base_fns$object)
base_fns <- base_fns[order(-base_fns$nchar), ]
Of the 2424 objects across these packages, a quick histogram shows that the most frequent character length is under 10, with a tail stretching out to over 30.
hist(
base_fns$nchar,
main = "Character length of base-object names",
xlab = "Number of characters",
las = 1
)
Here’s the top 10 by character length.
base_fns_top <- base_fns[order(-base_fns$nchar), ]
rownames(base_fns_top) <- seq(length = nrow(base_fns_top))
head(base_fns_top, 10)
## object package nchar
## 1 aspell_write_personal_dictionary_file utils 37
## 2 getDLLRegisteredRoutines.character base 34
## 3 getDLLRegisteredRoutines.DLLInfo base 32
## 4 reconcilePropertiesAndPrototype methods 31
## 5 suppressPackageStartupMessages base 30
## 6 as.data.frame.numeric_version base 29
## 7 as.character.numeric_version base 28
## 8 print.DLLRegisteredRoutines base 27
## 9 as.data.frame.model.matrix base 26
## 10 conditionMessage.condition base 26
So there are four objects with names longer than suppressPackageStartupMessages()
, though they are rarely used as far as I can tell. The longest is aspell_write_personal_dictionary_file()
, which has 37(!) characters. It’s part of the spellcheck functions in {utils}.
It’s interesting to me that it follows some of those rules for function naming that I mentioned earlier. It has a verb, is descriptive and uses a prefix for easier autocomplete; ‘aspell’ refers to the GNU open-source Aspell spellchecker on which it’s based.
I’m intrigued that the function uses snake_case rather than camelCase or dot.case, which seem more prevalent in base functions. You could argue then that the underscores have ‘inflated’ the length by four characters. Similarly, the prefix adds another six characters. So maybe the function could be simplified to writePersonalDictionaryFile()
, which is merely 27 characters.
What about shortest functions? There are many one-character functions in base R.
sort(base_fns[base_fns$nchar == 1, ][["object"]])
## [1] "-" ":" "!" "?" "(" "[" "{" "@" "*" "/" "&" "^" "+" "<" "=" ">" "|" "~" "$"
## [20] "c" "C" "D" "F" "I" "q" "t" "T"
Some of these will be familiar, like c()
to concatenate and t()
to transpose. You might wonder why operators and brackets are in here. Remember: everything in R is a function, so `[`(mtcars, "hp")
is the same as mtcars["hp"]
. I have to admit that stats::C()
and stats::D()
were new to me.
{tidyverse}
How about object names from the {tidyverse}?
To start, we need to attach the packages. Running library(tidyverse)
only loads the core packages of the tidyverse, so we need another approach to attach them all.
One method is to get the a vector of the package names with the tidyverse_packages()
function and pass it to p_load()
from {pacman}, which prevents the need for a separate library()
call for each one.4
First, here’s the tidyverse packages.
# install.packages("tidyverse") # if not installed
suppressPackageStartupMessages( # in action!
library(tidyverse)
)
tidy_names <- tidyverse_packages()
tidy_names
## [1] "broom" "cli" "crayon" "dbplyr"
## [5] "dplyr" "dtplyr" "forcats" "googledrive"
## [9] "googlesheets4" "ggplot2" "haven" "hms"
## [13] "httr" "jsonlite" "lubridate" "magrittr"
## [17] "modelr" "pillar" "purrr" "readr"
## [21] "readxl" "reprex" "rlang" "rstudioapi"
## [25] "rvest" "stringr" "tibble" "tidyr"
## [29] "xml2" "tidyverse"
And now to load them all.
# install.packages("pacman") # if not installed
library(pacman)
p_load(char = tidy_names)
Once again we can ls()
over packages in the form "package:dplyr"
. Now the {tidyverse} is loaded, we might as well use it to run the same pipeline as we did for the base packages.
tidy_pkgs <- paste0("package:", tidy_names)
tidy_fns <- map(tidy_pkgs, ls) |>
set_names(tidy_names) |>
enframe(name = "package", value = "object") |>
unnest(object) |>
mutate(nchar = nchar(object))
So we’re looking at even more packages this time, since the whole tidyverse contains 3018 of them.
The histogram is not too dissimilar to the one for base packages, though the tail is shorter, it’s arguably more normal-looking and the peak is perhaps slightly closer to 10. The latter could be because of more liberal use of snake_case.
hist(
tidy_fns$nchar,
main = "Character length of {tidyverse} object names",
xlab = "Number of characters",
las = 1
)
Here’s the top 10 by character length.
slice_max(tidy_fns, nchar, n = 10)
## # A tibble: 10 × 3
## package object nchar
## <chr> <chr> <int>
## 1 googlesheets4 vec_ptype2.googlesheets4_formula 32
## 2 googlesheets4 vec_cast.googlesheets4_formula 30
## 3 cli cli_progress_builtin_handlers 29
## 4 rstudioapi getRStudioPackageDependencies 29
## 5 rstudioapi launcherPlacementConstraint 27
## 6 cli ansi_has_hyperlink_support 26
## 7 ggplot2 scale_continuous_identity 25
## 8 ggplot2 scale_linetype_continuous 25
## 9 haven vec_arith.haven_labelled 24
## 10 rstudioapi getActiveDocumentContext 24
The longest two are 32 and 30 characters in length and are both from {googlesheets4}. The help pages say they’re ‘internal {vctrs} methods’. The names of these are long because of the construction: the first part tells us the method name, e.g. vec_ptype2
, and the second part tells us that they apply to the googlesheets4_formula
S3 class.
So maybe these don’t really count because they would be excuted as as vec_ptype2()
and vec_cast()
? And they’re inflated because they contain the package name, {googlesheets4} , which is quite a long one (13 characters). That would leave cli::cli_progress_builtin_handlers()
and rstudioapi::getRStudioPackageDependencies()
as the next longest (29 characters). The latter uses camelCase—which is typical of the {rstudioapi} package—so isn’t bulked out by underscores.
On the other end of the spectrum, there’s only one function with one character: dplyr::n()
. I think it makes sense to avoid single-character functions in non-base packages, because they aren’t terribly descriptive. n()
can at least be understood to mean ‘number’.
Instead, here’s all the two-letter functions from the {tidyverse}. Note that many of these are from {lubridate} and are shorthand expressions that make sense in context, like hm()
for hour-minute. You can also see some of {rlang}’s operators creep in here, like bang-bang (!!
) and the walrus (:=
).5
filter(tidy_fns, nchar == 2)
## # A tibble: 16 × 3
## package object nchar
## <chr> <chr> <int>
## 1 cli no 2
## 2 dplyr do 2
## 3 dplyr id 2
## 4 lubridate am 2
## 5 lubridate hm 2
## 6 lubridate ms 2
## 7 lubridate my 2
## 8 lubridate pm 2
## 9 lubridate tz 2
## 10 lubridate ym 2
## 11 lubridate yq 2
## 12 magrittr or 2
## 13 rlang := 2
## 14 rlang !! 2
## 15 rlang ll 2
## 16 rlang UQ 2
Both the {dplyr} functions here are no longer intended for use. I’m sad especially for dplyr::do()
: the help page says it ‘never really felt like it belong[ed] with the rest of dplyr’ 😢.
In memoriam:
do()
.
Session info
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.1.0 (2021-05-18)
## os macOS Big Sur 10.16
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2021-11-27
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
## backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
## blogdown 1.4 2021-07-23 [1] CRAN (R 4.1.0)
## bookdown 0.23 2021-08-13 [1] CRAN (R 4.1.0)
## broom * 0.7.9 2021-07-27 [1] CRAN (R 4.1.0)
## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0)
## cli * 3.1.0 2021-10-27 [1] CRAN (R 4.1.0)
## colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
## crayon * 1.4.2 2021-10-29 [1] CRAN (R 4.1.0)
## data.table 1.14.0 2021-02-21 [1] CRAN (R 4.1.0)
## DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
## dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
## digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.0)
## dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
## dtplyr * 1.1.0 2021-02-20 [1] CRAN (R 4.1.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
## emo 0.0.0.9000 2021-08-26 [1] Github (hadley/emo@3f03b11)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
## fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0)
## fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
## gargle 1.2.0 2021-07-02 [1] CRAN (R 4.1.0)
## generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0)
## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
## glue 1.5.0 2021-11-07 [1] CRAN (R 4.1.0)
## googledrive * 2.0.0 2021-07-08 [1] CRAN (R 4.1.0)
## googlesheets4 * 1.0.0 2021-07-21 [1] CRAN (R 4.1.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
## haven * 2.4.3 2021-08-04 [1] CRAN (R 4.1.0)
## highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
## hms * 1.1.1 2021-09-26 [1] CRAN (R 4.1.0)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
## httr * 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
## jsonlite * 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
## knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0)
## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
## lubridate * 1.8.0 2021-10-07 [1] CRAN (R 4.1.0)
## magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
## modelr * 0.1.8 2020-05-19 [1] CRAN (R 4.1.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
## pacman * 0.5.1 2019-03-11 [1] CRAN (R 4.1.0)
## pillar * 1.6.4 2021-10-18 [1] CRAN (R 4.1.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
## Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
## readr * 2.0.2 2021-09-27 [1] CRAN (R 4.1.0)
## readxl * 1.3.1 2019-03-13 [1] CRAN (R 4.1.0)
## reprex * 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
## rlang * 0.4.12 2021-10-18 [1] CRAN (R 4.1.0)
## rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.1.0)
## rstudioapi * 0.13 2020-11-12 [1] CRAN (R 4.1.0)
## rvest * 1.0.1 2021-07-26 [1] CRAN (R 4.1.0)
## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
## stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
## tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
## tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.1.0)
## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0)
## tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.1.0)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
## withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
## xfun 0.26 2021-09-14 [1] CRAN (R 4.1.0)
## xml2 * 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
##
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
I wrote recently a whole post about package startup messages.↩︎
Luke was curious too, so that’s at least two of us. (Luke also noticed that a link to my {linkrot} package was itself rotten, lol.)↩︎
My understanding is that a future version of R will allow an underscore as the left-hand-side placeholder, in a similar manner to how the {tidyverse} allows a dot. That will do away with the need for
()()
. Also ignore my badly-written base code; I’m trying to re-learn.↩︎In fact,
p_load()
will attempt installation if the package can’t be found in your library. Arguably, this is poor behaviour; you should always ask the user before installing something on their machine.↩︎Bang-Bang and the Walrus, touring Spring 2022.↩︎