9  Storing and accessing PEPFAR data

9.1 Storing PEPFAR Data

When working on any given project, the general advice is typically to store your data in that specific project ’s repository. This advice is a good best practice, but does not translate well in the PEPFAR space for two main reasons: file size and sensitivity. Instead, we recommend storing PEPFAR structured datasets in a centralized location on your machine outside of the project, e.g. C:\Users\spower\Documents\Data.

Our work primarily revolves around using PEPFAR structured datasets, which are large, cumbersome SQLview output files, e.g. OUxIM, PSNU, PSNUxIM, NAT_SUBNAT, and FSD. Given this fact, it makes more sense for us to store these dataset in a central location on our machines rather than duplicating these files in each project. These PEPFAR structured datasets are easily accessible for anyone working in PEPFAR, either from PEPFAR Panorama or through the DATIM Genie.

The second reason for storing these datasets outside of the project is to avoid any risk of posting these non-public data to a public space when using version control. While these structured dataset are aggregated and not individual patient level data, they are not published by PEPFAR to the public and may contain sensitivities when it comes to key populations or when using data at the facility level.

For small (and non-PEPFAR structured) datasets, we recommend storing these data within the project either in the Data and Data_public folder. Another alternative is storing the data on Google Drive in a shared folder and pulling the data down utilizing the Google API via the googledrive or googlesheets4 packages.

9.2 Accessing PEPFAR Data

PEPFAR maintains a number of different structured datasets across MER, EA, Budget, and SIMS. Data are entered into DATIM, the system of record, by partners and goes through an approval process that takes about six weeks after the quarter ends. In-process data can be accessed through DATIM and the DATIM Genie, which will export data in the typical structured manner. Otherwise, datasets are made available on PEPFAR Panorama for download. PEPFAR data can also be pulled directly from DATIM utilizing an API (see the DHIS2 API documentation or our grabr package

As mentioned in the last section, we recommend storing these datasets in a central location on your computer. This creates a problem from a collaboration standpoint, as we can’t just point to the “Data” folders in our project using a relative path; we have to provide a file path that only works on one user’s machine and not another. To solve this dilemma, we use a function in the glamr package called si_paths() which access the paths we have stored locally to where our PEPFAR structured datasets reside. This way, when you pick up a coworker’s code, you don’t have to change any of the file paths, it just works.

Those local paths need to be set once and stored in your .Rprofile. To do so, you will run glamr::set_paths() to store all the relevant paths (you can ignore any that aren’t relevant to you). The below example would be the code I would run in the console indicating where my folders are for the MSD, DATIM files and Downloads.

library(glamr)
set_paths(folderpath_msd = "~/Documents/Data",
  folderpath_datim =  "~/Documents/DATIM",
  folderpath_downloads =  "~/Downloads")

9.3 Additional Resources