This vignette provides best practices for project workflow at USAID/Office of HIV/AIDS, making it easier to get you going and to align exceptions across the team.
The first thing we want to do when starting up new work is determining if it fits into an existing bucket or this is completely new. All of our R work exists on GitHub to make our work transparent, improve collaboration, and makee it easy access. You find our repositories for different projects and package under our GitHub organization, USAID-OHA-SI. If the work exists, you can clone the repo (make a local copy) and start working from there. If however its new work, you’ll want to start up a new project and repo. We work with RStudio Projects and git/GitHub
RStudio projects make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents.
To create a RStudio project, you can navigate there from
File > New Project
or establish through
usethis
library(usethis)
create_project("~/Documents/project-transform")
This will get you set you up in RStudio with a self contained
project. In order to collaborate we also use git and GitHub. For more on
integrating git with RStudio, check out Jenny Bryan’s Happy Git and GitHub for the useR.
You an also use a UI tool like GitHub Desktop or GitKraken. If you have
git installed and integreated with R, you can again use
usethis
to set everything up. And usethis
even
has some help
info on getting started with git if this your first time.
use_git() #this prompts a restart of your session
use_github(organisation = "USAID-OHA-SI")
The last thing we’ll use usethis
for is to set up a
license for the project. We use the MIT license which calls for
attribution for downstream uses of your work and specifies that the
software is provided as is.
usethis::use_mit_license("Dr. Raj Shah") #add your name
Okay, so we go through the basics of setting up a project, now let’s
turn to some workflow specifics that glamr
is going to help
us with.
The primary function we’re going to run to get us going is
si_setup
. If we look under the hood at
si_setup
, we can see it actually contains three
sub-function.
si_setup
#> function ()
#> {
#> folder_setup()
#> setup_gitignore()
#> setup_readme()
#> }
#> <bytecode: 0x555c12ad3ae8>
#> <environment: namespace:glamr>
The first one, folder_setup()
initiates the main folders
we use for our work:
If you clone from GitHub not all the folders may exist there, so we
recommend running folder_setup()
on its own to establish
the same organization on your local machine. The reason we have created
this into a function is to ensure uniformity across our projects and
know what to expect when we pick up someone else’s work (or our own work
from the distant past).
The next function that is run is setup_gitignore()
,
which creates a .gitignore
file in your project. The
primary purpose of this file is to ensure that our data and outputs are
not being published to the web. By default after running
si_setup()
most data formats/folders will be kept from
being published. Below is what the .gitignore
file will
look like.
#R basics
.Rproj.user
.Rhistory
.RData
.Ruserdata
#no data
*.csv
*.gz
*.txt
*.rds
*.xlsx
*.xls
*.zip
*.png
*.pptx
*.tfl
*.twb
*.twbx
*.tbs
*.tbm
*.hyper
*.sql
*.parquet
*.svg
*.dfx
*.json
*.dta
*.shp
*.dbf
#nothing from these folders
AI/*
GIS/*
Images/*
Graphics/*
Data/*
Dataout/*
And the last component from si_setup()
is to add a
standard USAID disclaimer to your project’s README.md
file
(and creates this file if its missing).
From a project standpoint, now you’re good to go.
For the most part, our SI work revolves around using the MER Structured Datasets, which are large, cumbersome files. A best practice when working on projects is to store the data you use within that project so its all self-contained. However, since most of our work revolve around a couple of massive datasets (OUxIM, PSNU, PSNUxIM, NAT_SUBNAT, FSD), it makes more sense for us to store these dataset in a central location on our machines rather than in each project.
The problem now is that where I store my MSD files is going to be
different than the path to yours. To solve this dilemma, we use a
function called si_paths()
which access the paths we have
stored locally to where our MSDs or Downloads are for instance. This
way, when you pick up a coworkers code, you don’t have to change any of
the file paths, it just works.
Those local paths will be set once and stored in your
.Rprofile
. To do so, you will run set_paths
to
store all the relevant paths (you can ignore any that aren’t relevant to
you).
set_paths(folderpath_msd = "~/Documents/Data",
folderpath_datim = "~/Documents/DATIM",
folderpath_downloads = "~/Downloads")
Running this will open your .Rprofile
and you’ll be
prompted to paste the pre-copied code into the .Rprofile
save and restart your session.
With that stored, you can use si_path()
to return the
path to your MSD folder (as the default) and then use another
glamr
function, return_latest()
, which looks
in the provide folder path for the lastest version of a file that
matches the pattern you provided. In this case, we can pass in that we
want the last OUxIM MSD and it will return the file path which we can
pass into readr::read_rds()
df <- si_path() %>%
return_latest("OU_IM_FY19") %>%
readr::read_rds()