tl;dr
I presented some slides at the EARL 2022 conference about {a11ytables}: an R package that helps automate the production of reproducible and accessible spreadsheets, with a focus on publication of government statistics.
Counting sheets
The UK government publishes a lot of spreadsheets that contain statistical tables. Compared to each other—and to themselves over time—these files are often:
- inconsistent in structure (e.g. cover or contents sheets are missing)
- inconsistent in style (e.g. different fonts, different shorthand codes for suppressed values)
- inaccessible to users of assistive technology (e.g. they contain blank columns or unannounced footnotes)
Luckily, the government’s Analysis Function released some excellent guidance for releasing statistics in spreadsheets, with particular attention to accessibility.
The government’s grassroots Reproducible Analytical Pipelines (RAP) movement is also growing at pace. RAP’s purpose is to overcome a legacy of fragmented point-and-click processes into code-driven end-to-end pipelines that improve speed, accuracy and reproducibility; including workflows that generate statistical spreadsheets for publication.
It will take time for these approaches to become 100% embedded across government, due to factors like the inevitable inertia that comes with trying to leave legacy processes behind.1
How can we grease the wheels?
Easy does it
A major aim of {a11ytables} is to make it easy for stats producers to more easily complete the last mile of their ‘data-in to spreadsheet-out’ pipeline. As such, the workflow is relatively simple and is composed of only three functions (arguments ignored for brevity):
create_a11ytable() |>
generate_workbook() |>
openxlsx::saveWorkbook()
Basically:
- Pass information and data as arguments to
create_a11ytable()
, which creates a special a11ytables-class dataframe representation of your spreadsheet content - Pass that object to
generate_workbook()
to convert it to a Workbook-class object that applies the required structure and styling - Use
saveWorkbook()
from the {openxlsx} package to write the spreadsheet output to an xlsx file
I recommend that you read the vignettes and function documentation on the package website to better understand how to use {a11ytables} and to learn about its caveats3; I won’t go into depth in this post.
Over-easy does it
I wrote some slides about the package and presented it at the EARL 2022 conference4 in London. Yes, to expose the package, but also to make a wider point about the general importance of reproducibility, accessibility and the power of reusable tools.
You can access the slides for my talk on the web, or find the source on GitHub.
The slides show an example of some tables published by the UK government—the latest UK egg statistics5—and walks through how they might be developed using {a11ytables}.
I wrote the slides in Quarto and made heavy use of {quartostamp}—my package of Quarto helpers exposed as an RStudio Addin—which I wrote about recently. Click ‘settings’ in the hamburger menu (lower left) to go fullscreen, see presenter notes, or get a slide overview.
Session info
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.0 (2022-04-22)
## os macOS Big Sur/Monterey 10.16
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2022-09-07
## pandoc 2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## blogdown 1.9 2022-03-28 [1] CRAN (R 4.2.0)
## bookdown 0.26 2022-04-15 [1] CRAN (R 4.2.0)
## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.2.0)
## cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
## digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
## evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
## jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)
## knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
## rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)
## rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
## sass 0.4.1 2022-03-23 [1] CRAN (R 4.2.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
## stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
## xfun 0.30 2022-03-02 [1] CRAN (R 4.2.0)
## yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
##
## [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
##
## ──────────────────────────────────────────────────────────────────────────────
Check out the excellent RAP strategy and Goldacre Review for how this might be achieved.↩︎
This blog and everything on it is personal and doesn’t represent government policy in any possible shape or form. Unless the Geospatial Commission needs me for drawing procedural dungeon maps, or whatever.↩︎
Note that the package is not intended for creating perfectly accessible spreadsheets but will help with the bulk of the work needed. Users of the package should refer back to the main spreadsheet guidance or the spreadsheet accessibility checklist after using it to make sure nothing has been missed. Please email analysis.function@ons.gov.uk if you use the package so they can monitor its use and the outputs produced.↩︎
‘Enterprise Applications of the R Language’. Long-time readers may remember that I spoke at EARL in 2018 about the {crosstalk} package, largely through the medium of memes.↩︎
Long-time readers may remember that I’ve used this publication before to demonstrate the {drake} package for workflow reproducibility.↩︎