Implement the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR)
About Us
The Strategic Information Branch(GH/OHA/SIEI/SI) provides technical expertise to support and promote data-informed decision making, through program monitoring, reporting and analysis in order to target resources appropriately to achieve HIV epidemic control.
Support HQ and missions by building capacity for program monitoring and surveillance to improve HIV/AIDS programs and to provide accountability, oversight and management of programs and partners.
Focus on strengthening PEPFAR and USAID program data for use in inter-agency and intra-agency analyses.
TL/DR: SI Branch is the data science hub of OHA
Guiding Principles
Organize around shared responsibility and accountability
Analyses are well-documented, reproducible, and open to the office
Code pushed to github repo following SI best practices
Continuous improvement / open feedback loops (GitHub issues, after action reviews, etc.)
Why this works
Predictable data structure, storage, and refresh schedule
Critical mass of analysts who code (or are learning)
Space for skill development and continuous learning
Support from leadership
Key Resources
DATIM
DATIM (DHIS2) captures all PEPFAR’s monitoring, evaluation and reporting (MER) indicators.
Data intended for import into DATIM must satisfy strict requirements with respect to the format of the data as well the relationship of the data to the current metadata within of the system.
MER structured data sets are available through the platform.
Panorama
Analytic platform built on top of DATIM that hosts dossiers, data tables, and custom applications built in MicroStrategy.
OHA Style Guide
Style guide serves as a tool to define and enhance brand cohesion. Where possible, we preset defaults (font, color, titles, captions) to save colleagues time and cognitive load.
Tools
Core Software
R + RStudio
R + Rstudio is our primary analytic tool. Most of the SI infrastructure is based on tidyverse principles and workflows.
GitHub + Git
Use git locally for version control and Github to store packages and analytic code online. This allows for remote collaboration and serves as a default knowledge management platform.
No data are stored on GitHub – only code.
Tableau
OHA maintains 30+ Tableau Dashboards. Most are linked to our quarterly data (MER) from DATIM. Tableau is a a powerful tool but can quickly create technical debt depending on product ownership.
Excel
Cmmonly used tool among implementing partners and mission staff. Many core PEPFAR products are built in Excel.
Can be challenging to create reproducible workflows that can scale.
Adobe Illustrator
Vector graphics editor and design program used for enhancing visualizations and communications products.
And the rest
Digging a Pit of Success
Reproducible Workflows
Create a repo on Github
Clone repo to local machine via Rstudio Project
Run SI setup functions
Start munging and push code to repo online when finished
Common Folders
# glamr::setup_gitignore() # ignore certain file extensions# glamr::setup_readme() # readme with a standard disclaimer# glamr::folder_setup() # standardized set of folders# All above functions are wrapped in another functionglamr::si_setup()[1] "The following directories will be created:"DataImagesScriptsAIDataoutData_publicGISDocumentsGraphicsmarkdown✔ Setting active project to'C:/Users/tessam/Documents/Github/demo_repo'✔ Writing 'README.md'• Modify 'README.md'
After si_setup()
Code Reproducibility through Common Paths
To improve reproducibility of our code and encourage collaboration, we created a function that accesses our central data folder paths stored locally in our .Rprofile.
This way, when we are collaborating on code as a team, we don’t need to change any paths manually to adjust for different folder paths from machine to machine.
With around 20 analysts on our team, covering over 50 countries, tracking analyses and visualizations can be difficult. To track visualizations we use unique referenced ids that are embedded in graphics using the glue package.
# Generate a reference id for a visualization (ref_id <-Sys.time() |> digest::sha1() |>substr(start =1, stop =8))
[1] "25de83f5"
Discoverable Content II
We use this unique id in the caption of all visuals, which allows us to search github for content.
# Required librarieslibrary(palmerpenguins) library(tidyverse)library(glue)# Create a mock up plot p <- penguins %>%summarise(ave_bill_length =mean(bill_length_mm, na.rm = T), .by ="species") %>%mutate(species_order =fct_reorder(species, ave_bill_length)) %>%ggplot(aes(y = ave_bill_length, x = species_order, fill = glitr::si_palettes$old_rose[1:3])) +geom_col(width =0.5) + glitr::si_style_ygrid() +scale_fill_identity() +labs(caption = glue::glue("SI graph | {ref_id}"),title ="Chinstrap penguins have the longest bill length, on average", x =NULL, y ="Bill length (in mm)")
Discoverable Content III
# Print the plot p
coRps
Our mission is to create an inclusive learning/sharing collaborative within USAID’s Office of HIV/AIDS (OHA), where analysts can gain from others’ analytic experiences, primarily in and around R. The coRps is focused on improving R skills and building a culture or R use for the benefit of OHA.
Tableau Learning Collaborative (TLC)
The Tableau Learning Collaborative is a space where staff across OHA can join to learn to use Tableau with PEPFAR data from OHA Tableau users through didactic training sessions and from each other from practical project examples.
The goal of the TLC isto provide continuous learning environment and closer collaboration of OHA analyst to improve quality and standardization of OHA products