End-to-End Submissions in R with the Pharmaverse

Pharmaverse Examples

Resource with example datasets

Introduction

Slides

Instructors

  • Daniel Sjoberg (MSK -> Roche/Genentech)
  • Becca Krouse (GSK)
  • Ben Straub (GSK)
  • Ram Ganapathy (Syneos -> Roche/Genentech)

What is this workshop

  • Get data into CDISC standards (ADaM and SDTM domains)

  • Pharmaverse

  • Align pharma industry on a standard process

  • Data collection to submission

  • R packages to support Clinical Reporting in R

  • Create SDTM from raw data (CDASH and non-CDASH formats)

  • Create ADaM datasets from SDTM

  • Multiple ways to create tables

End-to-End

  1. pharmaverse raw
  2. pharmaverse SDTM
  3. pharmaverse ADaM

Exercise 1

Code
# Let's warm-up!

library(dplyr)
library(pharmaverseadam)

# Using dplyr:
#  - From the ADSL dataset:
#   - Subset to the safety population (SAFFL == "Y")
#   - calculate the number of unique subjects in each treatment group (TRT01A)  

# View(pharmaverseadam::adsl)

knitr::kable(
  pharmaverseadam::adsl |> 
    dplyr::filter(SAFFL == "Y") |> 
    dplyr::group_by(TRT01A) |>
    dplyr::summarise(n = dplyr::n_distinct(SUBJID))
)
TRT01A n
Placebo 86
Xanomeline High Dose 72
Xanomeline Low Dose 96

SDTM Mapping

Slides

SDTM

Study Data Tabulation Model

  • Mapping raw data to standards
  • raw (EDC) to SDTM is difficult
  • SDTM across companies is standard
  • SDTM -> ADaM is easy

sdtm.oak package

  • Accommodates varying raw data structures from different EDC systems and vendors

Algorithms

  • variables with similar mapping algorithms are grouped together
  • 16,000 vars can be grouped into 22 groups
  • algorithms are backbone of oak

assign_no_ct() -> no controlled terminology

assign_ct() -> 1:1 mapping with controlled terminology

assign_datetime() -> ISO8601 format

hardcode_ct() -> text on EDC (eg units)

Compared to dplyr

  • do not have to write case_when statements

Topic Variables

  1. Identifier (ID of record)
  2. Qualifier (what is the variable)
  3. Timing (when was variable collected)

EDC Domains

EDC Domain Code
Demographics DM
Medical History MH
Adverse Events AE
Concomitant Medications CM
Laboratory Results LB
Vital Signs VS
Physical Examination PE
Study Drug Administration DA
Subject Disposition DS
Efficacy Assessments EF
Safety Assessments SA
Questionnaires QS
Imaging Assessments IMG
Randomization RAND
Protocol Deviations PD

Example Raw -> SDTM mapping

Code

  • vs Domain example
  • dm domain example
Code
library(sdtm.oak)
library(pharmaverseraw)
library(dplyr)

# AE aCRF - https://github.com/pharmaverse/pharmaverseraw/blob/main/vignettes/articles/aCRFs/AdverseEvent_aCRF.pdf

# Read in Raw dataset ----
ae_raw <- pharmaverseraw::ae_raw

# Generate oak_id_vars ----
ae_raw <- ae_raw %>%
  generate_oak_id_vars(
    pat_var = "PATNUM",
    raw_src = "ae_raw"
  )

# Read in Controlled Terminology
study_ct <-  data.frame(
  codelist_code = c("C66742", "C66742"),
  term_code = c("C49487", "C49488"),
  term_value = c("N", "Y"),
  collected_value = c("No", "Yes"),
  term_preferred_term = c("No", "Yes"),
  term_synonyms = c("No", "Yes"),
  stringsAsFactors = FALSE
)

# Exercise 1 ------------------------------------------------
# Map AETERM from raw_var=IT.AETERM, tgt_var=AETERM
ae <-
  # Derive topic variable
  # Map AETERM using assign_no_ct
  assign_no_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AETERM",
    tgt_var = "AETERM",
    id_vars = oak_id_vars()
  )

# Exercise 2 ------------------------------------------------
# Map AESER from raw_var=IT.AESER, tgt_var=AESER. Codelist code for AESDTH is C66742
ae <- ae %>%
  # Map AESER using ??
  assign_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AESER",
    tgt_var = "AESER",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  )

# Exercise 3 ------------------------------------------------
# Map AESDTH from raw_var=IT.AESDTH, tgt_var=AESDTH. Annotation text is 
# If "Yes" then AESDTH = "Y" else Not Submitted. Codelist code for AESDTH is C66742

ae <- ae %>%
  # Map AESDTH using condition_add & assign_ct, raw_var=IT.AESDTH, tgt_var=AESDTH
  assign_ct(
    raw_dat = condition_add(ae_raw, IT.AESDTH == "Yes"),
    raw_var = "IT.AESDTH",
    tgt_var = "AESDTH",
    ct_spec = study_ct,
    ct_clst = "C66742",
    id_vars = oak_id_vars()
  )

ADaM

ADSL - Subject Level Data

  • Each subject has 1 row/record

ADVS - Vital Signs dataset

  • ADVS is a basic data structure
  • focus is on records, not variables
  1. Apply info from Specs
  2. Derive vars and records
  3. Prepare dataset for submissions

Exercise 3

Code
# Exercise 1
# Update date and time imputation arguments so that any dates or times
# that are imputed are the last month/day of the year and 23:59:59

library(tibble)
library(lubridate)
library(admiral)

posit_mh <- tribble(
  ~USUBJID, ~MHSTDTC,
  1,        "2019-07-18T15:25:40",
  1,        "2019-07-18T15:25",
  1,        "2019-07-18",
  2,        "2024-02",
  2,        "2019",
  2,        "2019---07",
  3,        ""
)

paste0("Problem 1")

[1] “Problem 1”

Code
knitr::kable(
  derive_vars_dtm(
    dataset = posit_mh,
    new_vars_prefix = "AST",
    dtc = MHSTDTC,
    highest_imputation = "M",
    date_imputation = "last",
    time_imputation = "last"
  )
)
USUBJID MHSTDTC ASTDTM ASTDTF ASTTMF
1 2019-07-18T15:25:40 2019-07-18 15:25:40 NA NA
1 2019-07-18T15:25 2019-07-18 15:25:59 NA S
1 2019-07-18 2019-07-18 23:59:59 NA H
2 2024-02 2024-02-29 23:59:59 D H
2 2019 2019-12-31 23:59:59 M H
2 2019—07 2019-12-31 23:59:59 M H
3 NA NA NA
Code
# Exercise 2
# Update set_values_to argument for the formula
# MAP Formula: MAP = (SYSBP + 2*DIABP) / 3

ADVS <- tribble(
  ~USUBJID,      ~PARAMCD, ~PARAM,                            ~AVALU,  ~AVAL, ~VISIT,
  "01-701-1015", "DIABP",  "Diastolic Blood Pressure (mmHg)", "mmHg",    51, "BASELINE",
  "01-701-1015", "SYSBP",  "Systolic Blood Pressure (mmHg)",  "mmHg",   121, "BASELINE",
  "01-701-1028", "DIABP",  "Diastolic Blood Pressure (mmHg)", "mmHg",    79, "BASELINE",
  "01-701-1028", "SYSBP",  "Systolic Blood Pressure (mmHg)",  "mmHg",   130, "BASELINE",
) 

paste0("Problem 2")

[1] “Problem 2”

Code
knitr::kable(
  derive_param_computed(
    ADVS,
    by_vars = exprs(USUBJID, VISIT),
    parameters = c("SYSBP", "DIABP"),
    set_values_to = exprs(
      AVAL = (AVAL.SYSBP + 2 * AVAL.DIABP) / 3,
      PARAMCD = "MAP",
      PARAM = "Mean Arterial Pressure (mmHg)",
      AVALU = "mmHg",
    )
  )
)
USUBJID PARAMCD PARAM AVALU AVAL VISIT
01-701-1015 DIABP Diastolic Blood Pressure (mmHg) mmHg 51.00000 BASELINE
01-701-1015 SYSBP Systolic Blood Pressure (mmHg) mmHg 121.00000 BASELINE
01-701-1028 DIABP Diastolic Blood Pressure (mmHg) mmHg 79.00000 BASELINE
01-701-1028 SYSBP Systolic Blood Pressure (mmHg) mmHg 130.00000 BASELINE
01-701-1015 MAP Mean Arterial Pressure (mmHg) mmHg 74.33333 BASELINE
01-701-1028 MAP Mean Arterial Pressure (mmHg) mmHg 96.00000 BASELINE

ARDs - Analysis Results Datasets

  • tabulate and summarise Cat and Cont vars
  • cards does summary stats
  • cardx does statistical analysis

Exercise 4

Code
# ARD Exercise: Adverse Events summaries using {cards}


# Setup: run this first! --------------------------------------------------

# Load necessary packages
library(cards)

# Import & subset data
adsl <- pharmaverseadam::adsl |> 
  dplyr::filter(SAFFL=="Y")

adae <- pharmaverseadam::adae |> 
  dplyr::filter(SAFFL=="Y") |> 
  dplyr::filter(AESOC %in% unique(AESOC)[1:3]) |> 
  dplyr::group_by(AESOC) |> 
  dplyr::filter(AEDECOD %in% unique(AEDECOD)[1:3]) |> 
  dplyr::ungroup()

# Exercise ----------------------------------------------------------------

# A. Calculate the number and percentage of *unique* subjects with at least one AE:
#  - By each SOC (AESOC)
#  - By each Preferred term (AEDECOD) within SOC (AESOC)
# By every combination of treatment group (ARM) 

ard_stack_hierarchical(
  data = adae,
  variables = c(AESOC,AEDECOD),
  by = ARM, 
  id = USUBJID,
  denominator = adsl
) 

group1 group1_level group2 group2_level variable variable_level stat_name 1 ARM Placebo n 2 ARM Placebo N 3 ARM Placebo p 4 ARM Xanomeli… n 5 ARM Xanomeli… N 6 ARM Xanomeli… p 7 ARM Xanomeli… n 8 ARM Xanomeli… N 9 ARM Xanomeli… p 10 ARM Placebo AESOC GASTROIN… n stat_label stat 1 n 86 2 N 254 3 % 0.339 4 n 84 5 N 254 6 % 0.331 7 n 84 8 N 254 9 % 0.331 10 n 12

Code
# B. [*BONUS*] Modify the code from part A to include overall number/percentage of
# subjects with at least one AE, regardless of SOC and PT

ard_stack_hierarchical(
  data = adae,
  variables = c(AESOC, AEDECOD),
  by = ARM, 
  id = USUBJID,
  denominator = adsl,
  over_variables = TRUE
) 

group1 group1_level group2 group2_level variable 1 ARM 2 ARM 3 ARM 4 ARM 5 ARM 6 ARM 7 ARM 8 ARM 9 ARM 10 ARM Placebo ..ard_hierarchical_overall.. variable_level stat_name stat_label stat 1 Placebo n n 86 2 Placebo N N 254 3 Placebo p % 0.339 4 Xanomeli… n n 84 5 Xanomeli… N N 254 6 Xanomeli… p % 0.331 7 Xanomeli… n n 84 8 Xanomeli… N N 254 9 Xanomeli… p % 0.331 10 TRUE n n 31

tfrmt - Nicely formatting ARDs

Code
# Table Exercise: AE summary table using {tfrmt}

# For this exercise, we will use the AE ARD from the last section to
# create a {tfrmt} table


# Setup: run this first! --------------------------------------------------

## Load necessary packages
library(cards)
library(dplyr)
library(tidyr)
library(tfrmt)

## Import & subset data
adsl <- pharmaverseadam::adsl |> 
  dplyr::filter(SAFFL=="Y")

adae <- pharmaverseadam::adae |> 
  dplyr::filter(SAFFL=="Y") |> 
  dplyr::filter(AESOC %in% unique(AESOC)[1:3]) |> 
  dplyr::group_by(AESOC) |> 
  dplyr::filter(AEDECOD %in% unique(AEDECOD)[1:3]) |> 
  dplyr::ungroup()

## Create AE Summary using cards
ard_ae <- ard_stack_hierarchical(
  data = adae,
  variables = c(AESOC, AEDECOD),
  by = ARM, 
  id = USUBJID,
  denominator = adsl,
  over_variables = TRUE,
  statistic = ~ c("n", "p")
) 


# Exercise ----------------------------------------------------------------

# A. Convert `cards` object into a tidy data frame ready for {tfrmt}. 
#    Nothing to do besides run each step & explore the output!

ard_ae_tidy <- ard_ae |> 
  shuffle_card(fill_hierarchical_overall = "ANY EVENT") |> 
  prep_big_n(vars = "ARM") |> 
  prep_hierarchical_fill(vars = c("AESOC","AEDECOD"),
                         fill_from_left = TRUE) |> 
  dplyr::select(-c(context, stat_label, stat_variable)) 


# B. Create a basic tfrmt, filling in the missing variable names

ae_tfrmt <- tfrmt(
  group = AESOC,
  label = AEDECOD,
  param = , # fill
  value = , # fill
  column = , # fill
  body_plan = body_plan(
    frmt_structure(group_val = ".default", label_val = ".default", 
                   frmt_combine(
                     "{n} ({p}%)",
                     n = frmt("xx"),
                     p = frmt("xx", transform = ~ . *100)
                   )
    )
  ),
  big_n = big_n_structure(param_val = "bigN") 
) 

print_to_gt(ae_tfrmt,
            ard_ae_tidy)


# C. Switch the order of the columns so Placebo is last

ae_tfrmt <- ae_tfrmt |> 
  tfrmt(
    col_plan = col_plan(
      "Placebo",
      starts_with("Xanomeline")
    )
  )  

print_to_gt(ae_tfrmt, ard_ae_tidy)


# D. Add a title and source note for the table

ae_tfrmt <- ae_tfrmt |> 
  tfrmt(
    title = "", # fill
    footnote_plan = footnote_plan(
      footnote_structure("") # fill with footnote text
    ) 
  )

print_to_gt(ae_tfrmt, ard_ae_tidy)

gtsummary - more tables

  • How to adopt gtsummary at your company
  • Large user base, catch edge cases

teal - helps build shinys

https://insightsengineering.github.io/teal/latest-tag/

How to contribute

  1. use the package
  2. write a blog or create a template
  3. submit issues on git
  4. join as a contributor