End-to-End Submissions in R with the Pharmaverse
Pharmaverse Examples
Introduction
Instructors
- Daniel Sjoberg (MSK -> Roche/Genentech)
- Becca Krouse (GSK)
- Ben Straub (GSK)
- Ram Ganapathy (Syneos -> Roche/Genentech)
What is this workshop
Get data into CDISC standards (ADaM and SDTM domains)
Align pharma industry on a standard process
Data collection to submission
R packages to support Clinical Reporting in R
Create SDTM from raw data (CDASH and non-CDASH formats)
Create ADaM datasets from SDTM
Multiple ways to create tables
End-to-End
- pharmaverse raw
- pharmaverse SDTM
- pharmaverse ADaM
Exercise 1
Code
# Let's warm-up!
library(dplyr)
library(pharmaverseadam)
# Using dplyr:
# - From the ADSL dataset:
# - Subset to the safety population (SAFFL == "Y")
# - calculate the number of unique subjects in each treatment group (TRT01A)
# View(pharmaverseadam::adsl)
::kable(
knitr::adsl |>
pharmaverseadam::filter(SAFFL == "Y") |>
dplyr::group_by(TRT01A) |>
dplyr::summarise(n = dplyr::n_distinct(SUBJID))
dplyr )
TRT01A | n |
---|---|
Placebo | 86 |
Xanomeline High Dose | 72 |
Xanomeline Low Dose | 96 |
SDTM Mapping
SDTM
Study Data Tabulation Model
- Mapping raw data to standards
- raw (EDC) to SDTM is difficult
- SDTM across companies is standard
- SDTM -> ADaM is easy
sdtm.oak package
- Accommodates varying raw data structures from different EDC systems and vendors
Algorithms
- variables with similar mapping algorithms are grouped together
- 16,000 vars can be grouped into 22 groups
- algorithms are backbone of oak
assign_no_ct()
-> no controlled terminology
assign_ct()
-> 1:1 mapping with controlled terminology
assign_datetime()
-> ISO8601 format
hardcode_ct()
-> text on EDC (eg units)
Compared to dplyr
- do not have to write
case_when
statements
Topic Variables
- Identifier (ID of record)
- Qualifier (what is the variable)
- Timing (when was variable collected)
EDC Domains
EDC Domain | Code |
---|---|
Demographics | DM |
Medical History | MH |
Adverse Events | AE |
Concomitant Medications | CM |
Laboratory Results | LB |
Vital Signs | VS |
Physical Examination | PE |
Study Drug Administration | DA |
Subject Disposition | DS |
Efficacy Assessments | EF |
Safety Assessments | SA |
Questionnaires | QS |
Imaging Assessments | IMG |
Randomization | RAND |
Protocol Deviations | PD |
Example Raw -> SDTM mapping
- vs Domain example
- dm domain example
Code
library(sdtm.oak)
library(pharmaverseraw)
library(dplyr)
# AE aCRF - https://github.com/pharmaverse/pharmaverseraw/blob/main/vignettes/articles/aCRFs/AdverseEvent_aCRF.pdf
# Read in Raw dataset ----
<- pharmaverseraw::ae_raw
ae_raw
# Generate oak_id_vars ----
<- ae_raw %>%
ae_raw generate_oak_id_vars(
pat_var = "PATNUM",
raw_src = "ae_raw"
)
# Read in Controlled Terminology
<- data.frame(
study_ct codelist_code = c("C66742", "C66742"),
term_code = c("C49487", "C49488"),
term_value = c("N", "Y"),
collected_value = c("No", "Yes"),
term_preferred_term = c("No", "Yes"),
term_synonyms = c("No", "Yes"),
stringsAsFactors = FALSE
)
# Exercise 1 ------------------------------------------------
# Map AETERM from raw_var=IT.AETERM, tgt_var=AETERM
<-
ae # Derive topic variable
# Map AETERM using assign_no_ct
assign_no_ct(
raw_dat = ae_raw,
raw_var = "IT.AETERM",
tgt_var = "AETERM",
id_vars = oak_id_vars()
)
# Exercise 2 ------------------------------------------------
# Map AESER from raw_var=IT.AESER, tgt_var=AESER. Codelist code for AESDTH is C66742
<- ae %>%
ae # Map AESER using ??
assign_ct(
raw_dat = ae_raw,
raw_var = "IT.AESER",
tgt_var = "AESER",
ct_spec = study_ct,
ct_clst = "C66742",
id_vars = oak_id_vars()
)
# Exercise 3 ------------------------------------------------
# Map AESDTH from raw_var=IT.AESDTH, tgt_var=AESDTH. Annotation text is
# If "Yes" then AESDTH = "Y" else Not Submitted. Codelist code for AESDTH is C66742
<- ae %>%
ae # Map AESDTH using condition_add & assign_ct, raw_var=IT.AESDTH, tgt_var=AESDTH
assign_ct(
raw_dat = condition_add(ae_raw, IT.AESDTH == "Yes"),
raw_var = "IT.AESDTH",
tgt_var = "AESDTH",
ct_spec = study_ct,
ct_clst = "C66742",
id_vars = oak_id_vars()
)
ADaM
ADSL - Subject Level Data
- Each subject has 1 row/record
ADVS - Vital Signs dataset
- ADVS is a basic data structure
- focus is on records, not variables
- Apply info from Specs
- Derive vars and records
- Prepare dataset for submissions
Exercise 3
Code
# Exercise 1
# Update date and time imputation arguments so that any dates or times
# that are imputed are the last month/day of the year and 23:59:59
library(tibble)
library(lubridate)
library(admiral)
<- tribble(
posit_mh ~USUBJID, ~MHSTDTC,
1, "2019-07-18T15:25:40",
1, "2019-07-18T15:25",
1, "2019-07-18",
2, "2024-02",
2, "2019",
2, "2019---07",
3, ""
)
paste0("Problem 1")
[1] “Problem 1”
Code
::kable(
knitrderive_vars_dtm(
dataset = posit_mh,
new_vars_prefix = "AST",
dtc = MHSTDTC,
highest_imputation = "M",
date_imputation = "last",
time_imputation = "last"
) )
USUBJID | MHSTDTC | ASTDTM | ASTDTF | ASTTMF |
---|---|---|---|---|
1 | 2019-07-18T15:25:40 | 2019-07-18 15:25:40 | NA | NA |
1 | 2019-07-18T15:25 | 2019-07-18 15:25:59 | NA | S |
1 | 2019-07-18 | 2019-07-18 23:59:59 | NA | H |
2 | 2024-02 | 2024-02-29 23:59:59 | D | H |
2 | 2019 | 2019-12-31 23:59:59 | M | H |
2 | 2019—07 | 2019-12-31 23:59:59 | M | H |
3 | NA | NA | NA |
Code
# Exercise 2
# Update set_values_to argument for the formula
# MAP Formula: MAP = (SYSBP + 2*DIABP) / 3
<- tribble(
ADVS ~USUBJID, ~PARAMCD, ~PARAM, ~AVALU, ~AVAL, ~VISIT,
"01-701-1015", "DIABP", "Diastolic Blood Pressure (mmHg)", "mmHg", 51, "BASELINE",
"01-701-1015", "SYSBP", "Systolic Blood Pressure (mmHg)", "mmHg", 121, "BASELINE",
"01-701-1028", "DIABP", "Diastolic Blood Pressure (mmHg)", "mmHg", 79, "BASELINE",
"01-701-1028", "SYSBP", "Systolic Blood Pressure (mmHg)", "mmHg", 130, "BASELINE",
)
paste0("Problem 2")
[1] “Problem 2”
Code
::kable(
knitrderive_param_computed(
ADVS,by_vars = exprs(USUBJID, VISIT),
parameters = c("SYSBP", "DIABP"),
set_values_to = exprs(
AVAL = (AVAL.SYSBP + 2 * AVAL.DIABP) / 3,
PARAMCD = "MAP",
PARAM = "Mean Arterial Pressure (mmHg)",
AVALU = "mmHg",
)
) )
USUBJID | PARAMCD | PARAM | AVALU | AVAL | VISIT |
---|---|---|---|---|---|
01-701-1015 | DIABP | Diastolic Blood Pressure (mmHg) | mmHg | 51.00000 | BASELINE |
01-701-1015 | SYSBP | Systolic Blood Pressure (mmHg) | mmHg | 121.00000 | BASELINE |
01-701-1028 | DIABP | Diastolic Blood Pressure (mmHg) | mmHg | 79.00000 | BASELINE |
01-701-1028 | SYSBP | Systolic Blood Pressure (mmHg) | mmHg | 130.00000 | BASELINE |
01-701-1015 | MAP | Mean Arterial Pressure (mmHg) | mmHg | 74.33333 | BASELINE |
01-701-1028 | MAP | Mean Arterial Pressure (mmHg) | mmHg | 96.00000 | BASELINE |
ARDs - Analysis Results Datasets
- tabulate and summarise Cat and Cont vars
cards
does summary statscardx
does statistical analysis
Exercise 4
Code
# ARD Exercise: Adverse Events summaries using {cards}
# Setup: run this first! --------------------------------------------------
# Load necessary packages
library(cards)
# Import & subset data
<- pharmaverseadam::adsl |>
adsl ::filter(SAFFL=="Y")
dplyr
<- pharmaverseadam::adae |>
adae ::filter(SAFFL=="Y") |>
dplyr::filter(AESOC %in% unique(AESOC)[1:3]) |>
dplyr::group_by(AESOC) |>
dplyr::filter(AEDECOD %in% unique(AEDECOD)[1:3]) |>
dplyr::ungroup()
dplyr
# Exercise ----------------------------------------------------------------
# A. Calculate the number and percentage of *unique* subjects with at least one AE:
# - By each SOC (AESOC)
# - By each Preferred term (AEDECOD) within SOC (AESOC)
# By every combination of treatment group (ARM)
ard_stack_hierarchical(
data = adae,
variables = c(AESOC,AEDECOD),
by = ARM,
id = USUBJID,
denominator = adsl
)
group1 group1_level group2 group2_level variable variable_level stat_name 1
Code
# B. [*BONUS*] Modify the code from part A to include overall number/percentage of
# subjects with at least one AE, regardless of SOC and PT
ard_stack_hierarchical(
data = adae,
variables = c(AESOC, AEDECOD),
by = ARM,
id = USUBJID,
denominator = adsl,
over_variables = TRUE
)
group1 group1_level group2 group2_level variable 1
tfrmt - Nicely formatting ARDs
Code
# Table Exercise: AE summary table using {tfrmt}
# For this exercise, we will use the AE ARD from the last section to
# create a {tfrmt} table
# Setup: run this first! --------------------------------------------------
## Load necessary packages
library(cards)
library(dplyr)
library(tidyr)
library(tfrmt)
## Import & subset data
<- pharmaverseadam::adsl |>
adsl ::filter(SAFFL=="Y")
dplyr
<- pharmaverseadam::adae |>
adae ::filter(SAFFL=="Y") |>
dplyr::filter(AESOC %in% unique(AESOC)[1:3]) |>
dplyr::group_by(AESOC) |>
dplyr::filter(AEDECOD %in% unique(AEDECOD)[1:3]) |>
dplyr::ungroup()
dplyr
## Create AE Summary using cards
<- ard_stack_hierarchical(
ard_ae data = adae,
variables = c(AESOC, AEDECOD),
by = ARM,
id = USUBJID,
denominator = adsl,
over_variables = TRUE,
statistic = ~ c("n", "p")
)
# Exercise ----------------------------------------------------------------
# A. Convert `cards` object into a tidy data frame ready for {tfrmt}.
# Nothing to do besides run each step & explore the output!
<- ard_ae |>
ard_ae_tidy shuffle_card(fill_hierarchical_overall = "ANY EVENT") |>
prep_big_n(vars = "ARM") |>
prep_hierarchical_fill(vars = c("AESOC","AEDECOD"),
fill_from_left = TRUE) |>
::select(-c(context, stat_label, stat_variable))
dplyr
# B. Create a basic tfrmt, filling in the missing variable names
<- tfrmt(
ae_tfrmt group = AESOC,
label = AEDECOD,
param = , # fill
value = , # fill
column = , # fill
body_plan = body_plan(
frmt_structure(group_val = ".default", label_val = ".default",
frmt_combine(
"{n} ({p}%)",
n = frmt("xx"),
p = frmt("xx", transform = ~ . *100)
)
)
),big_n = big_n_structure(param_val = "bigN")
)
print_to_gt(ae_tfrmt,
ard_ae_tidy)
# C. Switch the order of the columns so Placebo is last
<- ae_tfrmt |>
ae_tfrmt tfrmt(
col_plan = col_plan(
"Placebo",
starts_with("Xanomeline")
)
)
print_to_gt(ae_tfrmt, ard_ae_tidy)
# D. Add a title and source note for the table
<- ae_tfrmt |>
ae_tfrmt tfrmt(
title = "", # fill
footnote_plan = footnote_plan(
footnote_structure("") # fill with footnote text
)
)
print_to_gt(ae_tfrmt, ard_ae_tidy)
gtsummary - more tables
- How to adopt
gtsummary
at your company - Large user base, catch edge cases
teal - helps build shinys
How to contribute
- use the package
- write a blog or create a template
- submit issues on git
- join as a contributor