Combining TST Targets with the MSD
combining-with-msd.Rmd
Introduction
This vignette provides an overview for utilizing your
tame_dp()
output by combining them with previous targets
from the MER Structured Dataset (MSD).
Setup
If you haven’t installed R or tameDP
, you should first
refer to the Setup vignette or if you want to
know more about the options for tame_dp()
, see Extracting Targets.
In addition to loading tameDP
, we’re also going to need
to install/load a couple others. We’ll be working with some of the other
packages we installed as dependencies for tameDP
, but we
will need to use two other packages: here
and another
custom OHA package called gophr
. Let’s install bothS
now.
install.packages("here")
install.packages('gophr', repos = c('https://usaid-oha-si.r-universe.dev', 'https://cloud.r-project.org'))
Alright, with gophr
installed to work with the MSD,
let’s get going.
The first step is reading in the from the Target Setting Tool (TST) file. We are going to work with the targets out the target tabs, assuming the PSNUxIM is not yet populated. Remember, Windows filepaths have their slash in the wrong direction. You will need to replace all backslashes (“/") in your filepath with forward slashes (”/“).
#data pack file path
tst_filepath <- here("tools/DataPack_Jupiter_20600101.xlsx")
#process the data pack to extract the targets
df_tst <- tame_dp(tst_filepath)
#glimpse the output
glimpse(df_tst)
We now have all the targets extracted from the TST and stored in a
tidy dataframe called df_tst
. We want to combine this with
the MSD. You will need to download your country’s full MSD from PEPFAR Panorama or Genie
Extract from DATIM. For our analysis, let’s
look across PSNUs so we will want to download the PSNUxIM dataset. Once
you have it downloaded, you will want to use
gophr::read_psd()
to open the MSD.
#msd file path
msd_filepath <- here("../../../Downloads/MER_Structured_Datasets_PSNU_IM_FY45-49_Jupiter.txt")
#read in MSD
df_msd <- read_psd(msd_filepath)
The PSNUxIM file has a few more columns that we need and the TST output had a processed time column that we won’t need.
#extra column names in the MSD that don't exist in the TST
setdiff(names(df_msd), names(df_tst))
#exta columns in the TST that don't exist in the MSD
setdiff(names(df_tst), names(df_msd))
It also has more indicators and disaggregates that aren’t in the Data
Pack. To simplify matters, we can subset, or limit, the dataset to what
we have to work with. Luckily we have a table stored in this packaged
called mer_disagg_mapping
that contains all the indicators
and their disaggregates that we can use to filter the MSD by.
mer_disagg_mapping
Combining the names from our Data Pack data frame
(names(df_tst)
) to limit the columns and using an inner
join to keep only the target disaggregates, we can then append or bind
this with onto the MSD data frame. The TST will also have historic
fiscal year data in addition to to the planning year that will be
duplicated in the MSD so we’ll need to remove that as well.
#remove extra column from the tidy TST
df_tst <- select(df_tst, -source_processed)
#remove extra columns from MSD
df_msd_lim <- select(df_msd, all_of(names(df_tst)))
#subset MSD data to only indicators/disaggs that exist in the TST (inner_join is acting like filter)
df_msd_lim <- df_msd_lim %>%
inner_join(mer_disagg_mapping, by = c("indicator", "standardizeddisaggregate", "numeratordenom"))
#remove duplicative data
df_tst <- df_tst %>%
filter(fiscal_year > max(df_tst$fiscal_year))
#bind TST data frame onto MSD
df_msd_tst <- df_msd_lim %>%
bind_rows(df_tst) %>%
glimpse()
We now have a joined data frame that we can use to compare indicator trends over time. For instance, we can compare how this year’s TX_NEW targets compare with previous few years’ targets across ages.
df_tx_trend <- df_msd_tst %>%
filter(indicator == "TX_NEW",
standardizeddisaggregate == "Age/Sex/HIVStatus") %>%
count(fiscal_year, indicator, ageasentered, wt = targets, name = "targets")
Finally, we can reshape the data to get it into an easy table to view the trends.
df_tx_trend %>%
filter(fiscal_year >= 2048) %>%
pivot_wider(names_from = fiscal_year,
values_from = targets)