Using crosstalk with DT in R Markdown

This is a template for using the crosstalk with several grouped datatable widgets to create filterable reports in R Markdown (i.e. in a non-Shiny setting).

This post demonstrates use of the crosstalk extension to coordinate filtering across multiple DT datatable widgets. The process is embedded in a standard R Markdown document, and is an example of server(Shiny)-less dashboarding/reporting.

Objective

I often need to create reports or dashboards that present information across multiple tables from a database. Frequently, this deliverable is best displayed as a filterable view of the tables themselves in a web browser. This can happen when a primary key in one table (e.g. subject ID) appears in multiple rows of another table (e.g. event based lines of therapy), and I don’t want to rectangle the data into a single flat file because it would complicate or mask necessary information.

A seemingly obvious solution here is Shiny, and that’s certainly true when you have large data / are querying in real time against a database. However, if your data are small to mid-sized (MB, not GB), you can achieve the above goal in a single html page from R Markdown, without any Shiny server overhead. The following outlines a workflow I’ve been using to make this possible; something I wanted to document is how to use crosstalk when you’re calling multiple DT datatable widgets in the same filter_select() call.

Simulate data

Let’s start by simulating some data that looks similar to what we would get from a clinical trial or other healthcare-related database. There will be three tables: demographics, treatments, and follow-up. The code to do this is a bit long, and ancillary to the main message of this post, so it’s hidden below (but feel free to toggle and browse it if you wish!).

Toggle simulation code
library(tidyverse)
library(lubridate)

### Function to simulate follow up tables
sim_data_fup <- function(
  n = 100,
  dt_sim_start = "2000-01-01",
  dt_sim_end = "2019-12-31",
  seed = 12345
) {
  set.seed(seed)
  
  dt_sim_start <- as_date(dt_sim_start)
  dt_sim_end <- as_date(dt_sim_end)
  
  # generate patient IDs
  ids <- sample(111111:999999, n, replace = FALSE)

  # simulate the data table
  data_fup <- tibble(
    id = ids,
    ind_event = sample(0:1, n, replace = TRUE, prob = c(.8, .2)),
    dt_first_contact = sample(dt_sim_start:dt_sim_end, n) %>% as_date(),
  ) %>%
    mutate(
      dt_last_contact =
        dt_first_contact + sample(1:3000, n, replace = TRUE),
    ) %>%
    mutate(
      dt_last_contact =
        if_else(dt_last_contact > dt_sim_end, dt_sim_end, dt_last_contact)
    )
  
  data_fup
}

### Function to simulate demographics tables
### Accepts a vector of IDs (usually identified from the follow up table)
sim_data_demo <- function(
  ids, 
  age_range = 18:100,
  race_opts = c("Black", "White")
) {
  n = length(ids)
  
  data_demo <- tibble(
    id = ids,
    age = sample(age_range, n, replace = TRUE),
    sex = rep("F", n),
    race = sample(race_opts, n, replace = TRUE, prob = c(.2, .8)),
    ind_her2 = sample(0:1, n, replace = TRUE, prob = c(.2, .8))
  )  
  
  data_demo
}

### Function to simulate treatment tables
### Accepts a follow-up table for IDs and first/last contact dates
sim_data_trt <- function(
  data_fup,
  n_trt_opts = 4, #the maximum number of treatments any patient can have
  treatment_choices #a character vector of therapy options
) {
  n <- data_fup %>% n_distinct("id")
  n_trt_opts <- 1:n_trt_opts
  
  # initialize patient_trt table; we'll bind rows at the end of each loop iteration
  patient_trt <- tibble()
  
  for(i in 1:n) {
    n_trt <- sample(n_trt_opts, size = 1)
    
    ith_patient <- data_fup %>% slice(i) 
    
    ith_patient_trt <- ith_patient %>%
      select(id) %>%
      slice(rep(1, times = n_trt)) %>% # elongate the table with a row for each treatment
      mutate(
        trt = sample(treatment_choices, size = n_trt, replace = TRUE),
        dt_trt_start = sample(
          seq(ith_patient$dt_first_contact, ith_patient$dt_last_contact, by="days") %>%
            sort(),
          size = n_trt, replace = FALSE
        )
      ) %>%
      arrange(dt_trt_start) %>%
      mutate(
        dt_trt_end = lead(dt_trt_start) - 1,
        dt_trt_end = case_when(
          row_number() == n() ~ 
            sample(seq(last(dt_trt_start), ith_patient$dt_last_contact, by="days"), 1),
          TRUE ~ dt_trt_end
        )
      ) 
    
    patient_trt <- patient_trt %>% bind_rows(ith_patient_trt)
  } 
  
  patient_trt
}

# simulate follow-up data
data_fup <- sim_data_fup(n = 500)
# simulate demographic data
data_demo <- sim_data_demo(ids = data_fup$id)
# simulate treatment data
data_trt <- 
  sim_data_trt(
    data_fup, n_trt_opts = 5, 
    treatment_choices = c(
      "TRASTUZUMAB",
      "PACLITAXEL/CAPECITABINE",
      "PERTUZUMAB",
      "PACLITAXEL",
      "HERCEPTIN",
      "HERCEPTIN/PACLITAXEL"
    )
  )

Ultimately, we end up with 3 tables like the following, which we would like the end user to filter interactively by subject ID.

data_fup
## # A tibble: 500 x 4
##        id ind_event dt_first_contact dt_last_contact
##     <int>     <int> <date>           <date>         
##  1 842910         1 2005-12-04       2006-12-01     
##  2 734307         0 2013-08-17       2019-12-31     
##  3 831334         0 2019-08-01       2019-12-31     
##  4 579848         0 2018-09-01       2019-02-13     
##  5 438865         0 2015-04-18       2017-12-13     
##  6 469101         1 2011-04-29       2014-09-17     
##  7 989456         0 2004-10-11       2009-05-22     
##  8 173478         0 2008-07-07       2010-11-21     
##  9 591277         0 2005-05-16       2005-09-18     
## 10 485156         1 2002-01-13       2006-04-02     
## # … with 490 more rows
data_demo
## # A tibble: 500 x 5
##        id   age sex   race  ind_her2
##     <int> <int> <chr> <chr>    <int>
##  1 842910    62 F     White        1
##  2 734307    30 F     White        1
##  3 831334    99 F     White        1
##  4 579848    48 F     White        1
##  5 438865    36 F     White        1
##  6 469101    55 F     Black        1
##  7 989456    53 F     White        1
##  8 173478    49 F     White        1
##  9 591277    67 F     White        0
## 10 485156    65 F     White        1
## # … with 490 more rows
data_trt
## # A tibble: 1,487 x 4
##        id trt                     dt_trt_start dt_trt_end
##     <int> <chr>                   <date>       <date>    
##  1 842910 HERCEPTIN/PACLITAXEL    2005-12-13   2006-06-23
##  2 842910 PACLITAXEL/CAPECITABINE 2006-06-24   2006-07-09
##  3 842910 PACLITAXEL/CAPECITABINE 2006-07-10   2006-10-23
##  4 842910 PERTUZUMAB              2006-10-24   2006-11-27
##  5 734307 PACLITAXEL              2013-10-02   2015-11-08
##  6 831334 PERTUZUMAB              2019-12-08   2019-12-15
##  7 579848 PERTUZUMAB              2018-11-24   2018-12-28
##  8 438865 PACLITAXEL/CAPECITABINE 2015-08-01   2016-03-17
##  9 438865 PERTUZUMAB              2016-03-18   2016-04-15
## 10 438865 PACLITAXEL              2016-04-16   2016-08-12
## # … with 1,477 more rows

Filter the tables with crosstalk

Since DT is a crosstalk compatible widget, it really only takes a few lines to get the desired outcome. The general strategy is to send your data frame into a new SharedData object with a line like SharedData$new(my_data_frame). Often, a single SharedData object is sufficient for crosstalk to coordinate multiple widgets, such as on the crosstalk examples page. Sometimes though, you might want crosstalk to operate across several data frames with the same key, meaning you need multiple calls to SharedData$new(). The trick here is to use the key and group arguments in SharedData$new(). In this case, key = ~id and we assign the name group = "shared_obj". Next, we pass any of the SharedData objects to filter_select() which recognizes that the object belongs to a group. Something that was a “gotcha” for me: the group argument in filter_select() is not the same as group in SharedData, rather, it’s the values that populate the select box (in this case, id).

All of this comes together as follows:

library(DT)
library(crosstalk)

# set up a group of SharedData objects
shared_demo <- SharedData$new(data_demo, key = ~id, group = "shared_obj")
shared_trt <- SharedData$new(data_trt, key = ~id, group = "shared_obj")
shared_fup <- SharedData$new(data_fup, key = ~id, group = "shared_obj")

# send the group to the filtering selection box
filter_select(
  id = "id-selector", label = "ID", 
  sharedData = shared_demo, group = ~id
)

Demographics

datatable(shared_demo)

Treatment

datatable(shared_trt)

Follow-up

datatable(shared_fup)

Summary

Note that this approach can work particularly well with rmdformats or other R Markdown theming. For example, here is the above workflow embedded in rmdformats::robobook which can be found in this repo with corresponding interactive webpage. Enjoy!


Published by on in R and tagged R markdown and tips using 3265 words.