Digging a Pit of Success

Digging a Pit of Success

Data Science Infrastructure in the Office of HIV/AIDS

Tim Essam
Karishma Srikanth

4/19/23

Office of HIV/AIDS

Implement the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR)

PEPFAR Logo

About Us

The Strategic Information Branch (GH/OHA/SIEI/SI) provides technical expertise to support and promote data-informed decision making, through program monitoring, reporting and analysis in order to target resources appropriately to achieve HIV epidemic control.

  • Support HQ and missions by building capacity for program monitoring and surveillance to improve HIV/AIDS programs and to provide accountability, oversight and management of programs and partners.

  • Focus on strengthening PEPFAR and USAID program data for use in inter-agency and intra-agency analyses.

  • TL/DR: SI Branch is the data science hub of OHA

Guiding Principles

Organize around shared responsibility and accountability

  • Analyses are well-documented, reproducible, and open to the office

  • Code pushed to github repo following SI best practices

  • Leverage SI infrastructure (R packages / OHA colors / Tableau Prep Flows)

  • Continuous improvement / open feedback loops (GitHub issues, after action reviews, etc.)

Why this works

  • Predictable data structure, storage, and refresh schedule

  • Critical mass of analysts who code (or are learning)

  • Space for skill development and continuous learning

  • Support from leadership

Key Resources

DATIM

DATIM (DHIS2) captures all PEPFAR’s monitoring, evaluation and reporting (MER) indicators.

Data intended for import into DATIM must satisfy strict requirements with respect to the format of the data as well the relationship of the data to the current metadata within of the system.

MER structured data sets are available through the platform.

Panorama

Analytic platform built on top of DATIM that hosts dossiers, data tables, and custom applications built in MicroStrategy.

OHA Style Guide

Style guide serves as a tool to define and enhance brand cohesion. Where possible, we preset defaults (font, color, titles, captions) to save colleagues time and cognitive load.

Tools

Core Software

R + RStudio

R + Rstudio is our primary analytic tool. Most of the SI infrastructure is based on tidyverse principles and workflows.

GitHub + Git

Use git locally for version control and Github to store packages and analytic code online. This allows for remote collaboration and serves as a default knowledge management platform.

No data are stored on GitHub – only code.

Tableau

OHA maintains 30+ Tableau Dashboards. Most are linked to our quarterly data (MER) from DATIM. Tableau is a a powerful tool but can quickly create technical debt depending on product ownership.

Excel

Cmmonly used tool among implementing partners and mission staff. Many core PEPFAR products are built in Excel.

Can be challenging to create reproducible workflows that can scale.

Adobe Illustrator

Vector graphics editor and design program used for enhancing visualizations and communications products.

And the rest

Digging a Pit of Success

Reproducible Workflows

  • Create a repo on Github

  • Clone repo to local machine via Rstudio Project

  • Run SI setup functions

  • Start munging and push code to repo online when finished

Common Folders

# glamr::setup_gitignore() # ignore certain file extensions
# glamr::setup_readme() # readme with a standard disclaimer
# glamr::folder_setup() # standardized set of folders

# All above functions are wrapped in another function
glamr::si_setup()

[1] "The following directories will be created:"
Data
Images
Scripts
AI
Dataout
Data_public
GIS
Documents
Graphics
markdown
✔ Setting active project to
'C:/Users/tessam/Documents/Github/demo_repo'
✔ Writing 'README.md'
• Modify 'README.md'

After si_setup()

Code Reproducibility through Common Paths

  • To improve reproducibility of our code and encourage collaboration, we created a function that accesses our central data folder paths stored locally in our .Rprofile.

  • This way, when we are collaborating on code as a team, we don’t need to change any paths manually to adjust for different folder paths from machine to machine.

    set_paths(folderpath_msd = "~/Documents/Data",
      folderpath_datim =  "~/Documents/DATIM",
      folderpath_downloads =  "~/Downloads")

    df <- glamr::si_path() %>%
      glamr::return_latest("OU_IM_FY22") %>% 
      gophr::read_psd()

Discoverable Content I

With around 20 analysts on our team, covering over 50 countries, tracking analyses and visualizations can be difficult. To track visualizations we use unique referenced ids that are embedded in graphics using the glue package.

  # Generate a reference id for a visualization
    (ref_id <- Sys.time() |> digest::sha1() |> substr(start = 1, stop = 8))
[1] "25de83f5"

Discoverable Content II

We use this unique id in the caption of all visuals, which allows us to search github for content.

  # Required libraries
  library(palmerpenguins) 
  library(tidyverse)
  library(glue)

# Create a mock up plot
  p <- penguins %>% 
    summarise(ave_bill_length = mean(bill_length_mm, na.rm = T), .by = "species") %>% 
    mutate(species_order = fct_reorder(species, ave_bill_length)) %>% 
    ggplot(aes(y = ave_bill_length, x = species_order, fill = glitr::si_palettes$old_rose[1:3])) +
    geom_col(width = 0.5) +
    glitr::si_style_ygrid() +
    scale_fill_identity() +
    labs(caption = glue::glue("SI graph | {ref_id}"),
         title = "Chinstrap penguins have the longest bill length, on average", 
         x = NULL, y = "Bill length (in mm)")

Discoverable Content III

# Print the plot
  p

coRps

Our mission is to create an inclusive learning/sharing collaborative within USAID’s Office of HIV/AIDS (OHA),  where analysts can gain from others’ analytic experiences, primarily in and around R. The coRps is focused on improving R skills and building a culture or R use for the benefit of OHA.

Tableau Learning Collaborative (TLC)

The Tableau Learning Collaborative is a space where staff across OHA can join to learn to use Tableau with PEPFAR data from OHA Tableau users through didactic training sessions and from each other from practical project examples.

The goal of the TLC isto provide continuous learning environment and closer collaboration of OHA analyst to improve quality and standardization of OHA products

Credits

Images from Upslash