tl;dr
It’s ‘d-ply-r’ according to Hadley himself… maybe.
What does it sound like when your computer tries to pronounce R package names? Is this an accessibility issue?
Deep liar
Sometimes I hear a word being spoken and think ‘oh wait, is that how it’s actually pronounced?’
I know people struggle with pronouncing R package names. They’re often hard to parse.
Is {dplyr} ‘dee-ply-arr’ or ‘d’plier’? Is {data.table} ‘data table’ or ‘data-dot-table’?
Speak the truth
How does this affect users of assitive technology? VoiceOver is a macOS accessibility tool that helps people navigate their computers via audio. It reads text on a page. What happens when VoiceOver reads R package names?
I used the say
command at the command line to test this out. For example, you can type say dplyr
to get your machine to interpret and vocalise ‘dplyr’.
You can add flags to the command to read text from an input file (-f
) and then store the audio output (-o
):
say -f input.txt -o output.aiff
I generated some audio of package names being read via say
and embedded them in the sections below. These were:
- the tidyverse
- the top 20 downloads from CRAN
- 20 random CRAN packages
You can download all the text and audio files as a zip file (note that the audio is in .aiff format).
Tidyverse
You can get the tidyverse packages with the tidyverse_packages()
function from the {tidyverse} package.
Click for code
# Fetch the packages of the tidyverse
tidy_pkgs <- tidyverse::tidyverse_packages()
tidy_pkgs <- gsub("\n\\(>=", "", tidy_pkgs) # replace rogue characters
# Add terminal periods so that 'say' pauses between package names
tidy_pkgs <- paste0(tidy_pkgs, ".")
# Write the list to a text file
write.table(
tidy_pkgs,
file = "say_tidy.txt",
row.names = FALSE,
col.names = FALSE
)
# Get say command to read from text file and output an audio file
system("say -f say_tidy.txt -o say_tidy.aiff")
"broom" "cli" "crayon" "dplyr" "dbplyr" "forcats" "ggplot2" "haven"
"hms" "httr" "jsonlite" "lubridate" "magrittr" "modelr" "purrr" "readr"
"readxl" "reprex" "rlang" "rstudioapi" "rvest" "stringr" "tibble"
"tidyr" "xml2" "tidyverse"
CRAN top 20
You can get the top 20 downloads from CRAN in the last month with the cran_top_downloads()
function from the {cranlogs} package.
Click for code
# Fetch the top 20 downloaded packages from CRAN in past month
cran_top_pkgs <- cranlogs::cran_top_downloads(when = "last-month", count = 20)
# Add terminal periods so that 'say' pauses between package names
cran_top_pkgs$package <- paste0(cran_top_pkgs$package, ".")
# Write the list to a text file
write.table(
cran_top_pkgs$package,
file = "say_cran_top.txt",
row.names = FALSE,
col.names = FALSE
)
# Get say command to read from text file and output an audio file
system("say -f say_cran_top.txt -o say_cran_top.aiff")
"magrittr" "aws.s3" "aws.ec2metadata" "rsconnect" "rlang" "Rcpp" "dplyr"
"ggplot2" "ellipsis" "vctrs" "tibble" "digest" "glue" "pillar" "zeallot"
"backports" "stringr" "markdown" "fansi" "stringi"
Random CRAN packages
You can get the full list of packages currently on CRAN with the CRAN_package_db()
function in the {tools} package (part of base R).
Click for code
# Fetch and clean CRAN packages
cran <- tools::CRAN_package_db()
# Select random packages
set.seed(1337)
crandom_pkgs <- sample(cran$Package, size = 20)
# Add terminal periods so that 'say' pauses between package names
crandom_pkgs <- paste0(cran_rand_pkgs, ".")
# Write the list to a text file
write.table(
crandom_pkgs,
file = "~/Desktop/say_cran_random.txt",
row.names = FALSE,
col.names = FALSE
)
# Get say command to read from text file and output an audio file
system("say -f ~/Desktop/say_cran_random.txt -o ~/Desktop/say_cran_random.aiff")
"NScluster" "nlnet" "Bivariate.Pareto" "lisa" "homtest" "glarma" "ttdo"
"flock" "equSA" "coreCT" "WEE" "xtable" "shinyKGode" "DiffNet" "WGCNA"
"aqfig" "Voss" "tidymv" "gogarch" "erp.easy"
Results
Obviously there’s a lot of subjectivity, but what was strange to your ears? To my southern English ears, it seems like there were a few patterns:
- English word pronounced as expected: {haven}, {broom} and {glue}
- American English: {crayon} (‘crain’)
- unexpected English parsing: {lubridate} (‘loobridot’)
- the trouble with ‘tidy’: {tidyr} (‘tid-ear’ instead of ‘tidy-arr’) and {tidyverse} (‘tid-a-verse’ instead of ‘tidy-verse’)
- the trouble wirh ‘r’: {rvest}, {rlang} and {rstudioapi} (‘r’ not pronounced as ‘arr’ in any of these)
- the trouble with ‘read’: {readr} and {readxl} (‘reed’ becomes ‘ree-add’ because the whole thing is being read as one word)
- spelled out: {vctrs} (rather than ‘vectors’ in a New Zealand accent)
- what the actual heck: {ttdo} (I think it tries to pronounce the whole thing)
And what about {dplyr}? Well, it was something like ‘d’pleur’. I’m pretty sure that’s not quite right.
Of course, there are other text-to-speech engines, which may interpret and synthesise words differently. For example, espeak
vocalises {dplyr} as ‘deepler’ which is definitely not right.
If you’re a user of assistive technology, does the way the machine reads the package names impact your pronunciation of the package name?