Code
library(dplyr)
Tyler Clark
October 10, 2022
Whenever possible I try to use graphs and plots to back up my story about data analysis. That said, sometimes you’ve just got use some data tables. That could be summary statistics, regression/ANOVA tables, or maybe wedding the visual aids by creating sparklines or mini-plots within a table.
In 2022 there are no shortage of R libraries for making beautiful graphs and tables, but my most commonly used tools right now are kableExtra and flextable. gt looks promising but it’s still fairly new. I’ve tried using huxtable but for whatever reason it’s never quite clicked with me. DT is a great library but I try to avoid it for rmarkdown/quarto documents due to how large it blows up HTML files. It’s great when I need to include a full dataset in a report though, or in Shiny apps.
I am unaware of any pdf-exclusive r table libraries, but that doesn’t mean they don’t exist. All of the previously mentioned tables can produce HTML tables, but I really only use:
mostly in that order.
kableExtra is my go-to for HTML tables in my documents. I think the default settings look pretty good, and it’s easy to tweak them to get exactly what I want.
iris |>
group_by(Species) |>
summarise(across(.fns=list(mean=mean, sd=sd))) |>
kableExtra::kbl(
col.names = c("Species", rep(c("Mean", "SD"), 4)),
digits=1
) |>
kableExtra::add_header_above(
c(" ",
"Sepal Length"=2,
"Sepal Width"=2,
"Petal Length"=2,
"Petal Width"=2),
align = "c"
) |>
kableExtra::kable_styling(
bootstrap_options = c("hover", "responsive")
)
Sepal Length |
Sepal Width |
Petal Length |
Petal Width |
|||||
---|---|---|---|---|---|---|---|---|
Species | Mean | SD | Mean | SD | Mean | SD | Mean | SD |
setosa | 5.0 | 0.4 | 3.4 | 0.4 | 1.5 | 0.2 | 0.2 | 0.1 |
versicolor | 5.9 | 0.5 | 2.8 | 0.3 | 4.3 | 0.5 | 1.3 | 0.2 |
virginica | 6.6 | 0.6 | 3.0 | 0.3 | 5.6 | 0.6 | 2.0 | 0.3 |
It’s pretty easy to get a nice-looking table, and I really like the “hover” and “responsive” bootstrap options in kableExtra::kable_styling()
. Hover gives a quasi-interactive feel to the table, without having to load all the javascript required for sorting, filtering, etc. DT
is better suited for that level of interactivity, but most tables probably don’t need that.
Flextable also makes pretty nice HTML tables, but they are more static than kableExtra’s.
iris |>
group_by(Species) |>
summarise(across(.fns=list(mean=mean, sd=sd))) |>
flextable::flextable() |>
flextable::set_header_labels(
Sepal.Length_mean = "Mean",
Sepal.Length_sd = "SD",
Sepal.Width_mean = "Mean",
Sepal.Width_sd = "SD",
Petal.Length_mean = "Mean",
Petal.Length_sd = "SD",
Petal.Width_mean = "Mean",
Petal.Width_sd = "SD"
) |>
flextable::add_header_row(
values=c("", "Sepal Length", "Sepal Width", "Petal Length", "Petal Width"),
colwidths = c(1, 2, 2, 2, 2)
) |>
flextable::colformat_double(j=2:9, digits=1) |>
flextable::align(i=1, j=2:9, align="center", part="header")
Sepal Length | Sepal Width | Petal Length | Petal Width | |||||
---|---|---|---|---|---|---|---|---|
Species | Mean | SD | Mean | SD | Mean | SD | Mean | SD |
setosa | 5.0 | 0.4 | 3.4 | 0.4 | 1.5 | 0.2 | 0.2 | 0.1 |
versicolor | 5.9 | 0.5 | 2.8 | 0.3 | 4.3 | 0.5 | 1.3 | 0.2 |
virginica | 6.6 | 0.6 | 3.0 | 0.3 | 5.6 | 0.6 | 2.0 | 0.3 |
One of the big differences between the 2 packages is how you make changes. kableExtra, being based on knitr, leans towards to original R paradigm of a few functions with a lot of internal options, while flextable leans more towards to modern approach of several small functions that have a few options each. Neither approach is better than the other. With many small functions there are more commands that have to be remembered, but they are usually named in a way that easily explains what they do, and they are typically logicial to read and don’t require memorizing (and maintaining!) as much documentation. One drawback is having to nest all of those functions, or store output to variables over and over, but pipe functions (|>
or %>%
) have largely eliminate that problem. Now the only thing to watch for is long strings of spaghetti code, but that’s an issue regardless.
DT is where I turn to in Shiny, or when I need to include a dataset as a table, but I try to avoid that.
It’s a pretty neat library, but I get overwhelmed with all of the options. Especially since many of the options are set by passing HTML/CSS and javascript code directly into the R code. Luckily there are some helpful guides from Rstudio and from the authors
kableExtra and flextable can also make really nice PDF tables as well. For small tables I lean towards flextable just because I think it looks “better” right away. For longer tables I lean towards kableExtra because of some of the row highlighting and spacing it does easily. Let’s see some examples
iris |>
group_by(Species) |>
summarise(across(.fns=list(mean=mean, sd=sd))) |>
kableExtra::kbl(
booktabs = T,
col.names = c("Species", rep(c("Mean", "SD"), 4)),
digits=1
) |>
kableExtra::add_header_above(
c(" ",
"Sepal Length"=2,
"Sepal Width"=2,
"Petal Length"=2,
"Petal Width"=2),
align = "c"
) |>
kableExtra::kable_classic_2()
I ran the above code in a PDF format quarto document with mainfont: Cambria
. I really don’t care for the default LaTeX font. Using the various styling functions in kableExtra speeds up the layout process. I tend not to use them for HTML tables though, just kable_styling()
.
For longer tables in a PDF document I make use of the latex_options
flag in kable_styling()
:
The default of adding a little extra vertical padding every 5 lines, along with the striped
option helps readability.
For this flextable example, I used the exact same code as the previous HTML example, but ran it in a PDF format quarto document. The library really lives up to it’s name!
Longer flextables are where I run into issues though:
evens <- function(x) subset(x, x %% 2 == 0)
fives <- function(x) subset(x, x %% 5 == 0)
irisSubset <- iris |>
head(20)
irisSubset |>
flextable::flextable() |>
flextable::set_header_labels(
Sepal.Length="Sepal Length",
Sepal.Width="Sepal Width",
Petal.Length="Petal Length",
Petal.Width="Petal Width") |>
flextable::align(j=5, align="center", part="all") |>
flextable::bg(i=evens(c(1:length(irisSubset[,1]))), bg="#eeeeee") |>
flextable::padding(i=fives(c(1:length(irisSubset[,1]))), padding.bottom = 20) |>
flextable::autofit()
The color and padding have to be manually defined which adds to code length and complexity. That’s not always a bad thing, but unfortunately not all of this code works in both HTML and PDF. flextable::padding()
is the main issue. Theoretically you could add blank rows or something but that just further clutters up the code when you could just use kableExtra. In general, I find that simpler is better when it come to LaTeX and flextable.
Most of the table libraries are focused on HTML, so outputting to Word usually involves using webshot
to save a png and insert it into the document, or tossing out most of the formatting with a markdown table.
If you use the prefer-html: true
option in the YAML header, kableExtra can output simple markdown tables into a document:
But notice that they lose most of the formatting options that were applied. Also notice that while there is a caption, Word didn’t recognize it as the table caption so there is no numbering or document linking, and that’s even with using Quarto’s tbl-cap
chunk option!
Flextable was designed to generate real MS Office tables though:
These tables look pretty good (if nothing else they’re drawn exactly as I told flextable). The padding
option also works in Word documents. As before, the flextable output all uses the same code for HTML, PDF, and Docx report formats.
In general, when I am making tables in an HTML document I’ll go with kableExtra because it very quickly gets me on the right track. However, there are many, many other options. When I am working with PDF documents I’ll use flextable for the simple stuff and kableExtra when I need better control over the LaTeX code being generated. When I am making Word documents, there’s really no better choice than flextable.
I often create HTML and PDF/Docx reports for the same data. HTML is great for opening on a screen and interacting with the report, while PDF/Word can be more easily emailed, printed, etc. To facilitate my above mentioned table choices, I use the following code early in the document:
When the report is being compiled, the variable is_html
will be TRUE
for HTML documents and FALSE
for all others. Then I can define functions like:
irisSumm <- iris |>
group_by(Species) |>
summarise(across(.fns=list(mean=mean, sd=sd)))
if(is_html){
irisSumm |>
kableExtra::kbl(
col.names = c("Species", rep(c("Mean", "SD"), 4)),
digits=1
) |>
kableExtra::add_header_above(
c(" ",
"Sepal Length"=2,
"Sepal Width"=2,
"Petal Length"=2,
"Petal Width"=2),
align = "c"
) |>
kableExtra::kable_styling(
bootstrap_options = c("hover", "responsive")
)
} else {
irisSumm |>
flextable::flextable() |>
flextable::set_header_labels(
Sepal.Length_mean = "Mean",
Sepal.Length_sd = "SD",
Sepal.Width_mean = "Mean",
Sepal.Width_sd = "SD",
Petal.Length_mean = "Mean",
Petal.Length_sd = "SD",
Petal.Width_mean = "Mean",
Petal.Width_sd = "SD"
) |>
flextable::add_header_row(
values=c("", "Sepal Length", "Sepal Width", "Petal Length", "Petal Width"),
colwidths = c(1, 2, 2, 2, 2)
) |>
flextable::colformat_double(j=2:9, digits=1) |>
flextable::align(i=1, j=2:9, align="center", part="header")
}
evens <- function(x) subset(x, x %% 2 == 0)
fives <- function(x) subset(x, x %% 5 == 0)
irisSubset <- iris |>
head(20)
if(is_html){
irisSubset |>
kableExtra::kbl(
booktabs = T,
col.names = c("Sepal Length", "Sepal Width",
"Petal Length", "Petal Width",
"Species"),
digits=1
) |>
kableExtra::kable_styling(bootstrap_options = c("hover", "responsive", "condensed"))
} else{
irisSubset |>
flextable::flextable() |>
flextable::set_header_labels(
Sepal.Length="Sepal Length",
Sepal.Width="Sepal Width",
Petal.Length="Petal Length",
Petal.Width="Petal Width") |>
flextable::align(j=5, align="center", part="all") |>
flextable::bg(i=evens(c(1:length(irisSubset[,1]))), bg="#eeeeee") |>
flextable::padding(i=fives(c(1:length(irisSubset[,1]))), padding.bottom = 20) |>
flextable::autofit()
}