Data Extraction from Panorama
Baboyma Kagniniwa
2021-09-23
Source:vignettes/data-extraction-from-panorama.Rmd
data-extraction-from-panorama.Rmd
Introduction
This vignette provides guidance on how to identify and extract data sets output files stored in an S/FTP site.
Datasets
PEPFAR/Panorama releases, on the quarterly basis, global programs’ Monitoring, Evaluation and Reporting (MER), Financial, SIMS, and Narratives.
Create an active session
Panorama is a protected site and all user will need to authenticate
in order to order to explore the dashboards. Same is true for data
extraction. To create an active and valid session for all the http
requests, we will use pano_session()
load_secrets()
user <- pano_user()
pass <- pano_pwd()
Extract content from download page
In order to extract the list of data items on the download page, we
will need an active session and the html content of the page. This can
be achieved with pano_items()
. Under the wood, a valid
session is created, html elements extracted
(pano_content()
) and parsed out
(pano_elements()
) as data frame.
pano_items()
combines pano_content()
and
pano_elements()
into 1 function for a quick access to data
items list on a specific page.
url <- "https://pepfar-panorama.org/forms/downloads/"
mer_items <- pano_items(page_url = dir_mer_path,
username = user,
password = pass)
mer_items
Download specific items from Panorama
Most data items under the download page of Panorama are listed as
zipped files. To download them to a local directory, we will need to use
the pano_download()
function. This function is a wrapper
for httr::GET()
function write option set to a local
directory.
dest_path <- "../../../Temp/"
url <- mer_items %>%
filter(type == "file zip_file",
str_detect(item, ".*_PSNU_IM_FY19-21_.*.zip$")) %>%
pull(path) %>%
first()
url
pano_download(item_url = url, session = sess, dest = dest_path)
Download mutiple items from Panorama
pano_extract()
is good for batch processing.
Eg: download all MER data sets from Panorama. This function combine all the above steps into one.
items <- pano_extract(item = "mer",
version = "clean",
fiscal_year = 2023,
quarter = 4,
username = user,
password = pass,
unpack = TRUE)
items
url_items <- items %>%
filter(type == "file zip_file") %>%
pull(path) %>%
first() %>% # remove this to downlaod all zipped files
walk(~pano_download(item_url = .x,
session = sess,
dest = dest_path))
Download specific MSD / OU Specific items from Panorama
pano_extract_msd()
is designed to facilitate the
download of MSD files for specific operating units and at a specific org
hierarchy level. Eg: download Zambia’s Site x IM data sets from
Panorama.
pano_extract_msd(operatingunit = "Zambia",
version = "clean",
fiscal_year = 2021,
quarter = 3,
level = "site",
dest_path = NULL)
Download latest MSD / OU Specific items from Panorama
pano_extract_msds()
is designed to facilitate the
download and management of latest MSD files for global and/or specific
operating units. The function will also move existing files to an
Archive
folder before downloading current files. Users are
also able to include / exclude global datasets with
add_global = TRUE
.
pano_extract_msds(operatingunit = "Zambia",
archive = TRUE,
dest_path = si_path(),
username = pano_user(),
password = pano_pwd())