How to cut out the SQL middle-person in analytics

One of my current goals in life is to help data analysts cut out the boring, mind-numbing aspects of their work so that they can focus on more interesting, useful and cool stuff.

One common situation I see in analytics teams is where analysts are playing an unnecessary and highly repetitive middle-person role — where business users want certain specific data repeatedly, and it is up to the analyst to manually perform these repetitive SQL queries to extract the data and send it to the client. The same or very similar queries, again and again. Not fun. And actually totally unnecessary with today’s technology.

I love the R ecosystem. One of the reasons I love it is because it allows you to easily do some magical stuff in terms of linking data sources with web-based interaction. In this post I’ll show you how to set up a simple web app using R Shiny which will allow your non-tech savvy clients to the select raw data they want from a SQL database and fetch and download it in an Excel spreadsheet. This is a safe way to take out the middle-person because you can control who can access the Web app and what data it is allowed to retrieve from the database.

I’m going to use a simple scenario here to illustrate the concepts, but this idea can be expanded to be able to deliver a substantial amount of downloadable pre-processed data to your clients.

What you’ll need

  1. R with the shiny, dplyr, openxlsx,rmarkdown and odbc packages installed.
  2. A SQL database from which you would like to retrieve certain data. In this article I will assume it is an Oracle SQL database.
  3. The RStudio Connect document/app hosting platform, or at a minimum access to a Shiny Server.

Our simple example scenario

Let’s assume that you are a data analyst for a chain of vet clinics. Every day you get several requests for data on appointments and appointment cancellations. Two questions are asked: how many appointments did we have for each animal type between these particular years? What % of them were cancelled? You are getting tired and bored of answering these questions again and again using the same SQL queries to your appointments database.

Your appointments database table is called APPTS_DB and it has a number of columns, in particular DATE (in the format DD-MMM-YYYY, eg 01-MAR-2017), ANIMAL_TYPE (eg Cat, Dog), CANCEL_FLAG (binary 1 if the appointment was cancelled and 0 otherwise).

I assume you know about the structure of Shiny apps. In this article I will be writing the Shiny app as an interactive R markdown .Rmd document. For more info on this see here.

Step 1: Turn your SQL query into a function in R

Turning commonly used SQL queries into R functions is a really good idea even if you don’t do anything else in this article. It allows you to perform common queries with a single command in your R session.

For building this into your app, this step can be done in a separate R script that you can load into your Shiny app, or it can be done as a code chunk in the app itself. Let’s assume the latter.

First we set up a connection to the database. I suggest putting your credentials in the system environment if you are going to store code on Github or any other place where others might see it.

``` {r db_connect}
db_conn <- odbc::dbConnect(odbc::odbc(), 
dsn = Sys.getenv("DSN"), # <- database
uid = Sys.getenv("UID"), # <- user id
pwd = Sys.getenv("PASS")) # <- password
```

First we write the SQL query with placeholders and then we use gsub() to swap the placeholders with function arguments and then run the query. We want to set it up to extract the total number of appointments by animal type between a time period commencing with a specified starting year and ending year, and to calculate the cancellation rate. Ideally I would recommend that you do this query in tidy form using dbplyr but I am going to do it in raw SQL here as most readers will be familiar with that.

``` {r query_fn}
# create query function
appt_query <- function (start_yr, end_yr) {
# write query with placeholders
  qry <- "SELECT ANIMAL_TYPE, COUNT(*) NUM_APPTS, 
SUM(CANCEL_FLAG)/COUNT(*) PERC_CANCELLED
FROM APPTS_DB
WHERE EXTRACT(YEAR FROM DATE) BETWEEN start_year AND end_year
GROUP BY ANIMAL_TYPE
ORDER BY ANIMAL_TYPE"
# replace placeholders with function arguments
  qry <- gsub("start_year", start_yr, qry)

qry <- gsub("end_year", end_yr, qry)
# execute query
  odbc::dbGetQuery(db_conn, qry)
}

```

This is quite powerful because it allows the fetching of the information using a simple command like appt_query(2015, 2017). You can also use a simple paste() function to paste together the SQL query but I find using gsub() easier to manage and to detect bugs and errors.

Step 2: Set up Shiny app inputs and download button

Now we set up some simple inputs for the end user, and a download button that they will press to execute the query and download the data they require.

In this simple example, the only inputs required from the user is a start year and end year. We will start our year options back in 2010 and we will end with the current year

``` {r user_input}
# get current year
curr_year <- format(Sys.Date(), "%Y")
# set up panel to select start and end years
mainPanel(
HTML("<br>"),
selectInput("start_yr", "Select start year:",
choices = c(2010:curr_year)),
selectInput("end_yr", "Select end year:",
choices = c(2010:curr_year)),
HTML("<br>"),
uiOutput("downloadExcel")
)
# generate download button confirming the input selection
output$downloadExcel <- renderUI(
downloadButton("downloadData", paste("Download Data for", input$start_yr, "to", input$end_yr, "in Excel"))
)
```

This creates a user input panel with a download button confirming the years selected. This download button will link to a server process called downloadData where we will execute the query and write to an Excel file for download to the user.

Step 3: Writing background code to execute query and download data

Now we write the server process to execute the query and download the data. This has been made simpler by the fact that we have a neat function set up to do the query (see Step 1). We will also set up a neat progress bar so that users can see how the process is progressing. This is not necessary for a simple query which will be done instantly, but if you are building a multiple tab spreadsheet using a number of different queries, it is essential so that users are reassured that something is happening.

``` {r query_and_download}
# use downloadHandler
output$downloadData <- downloadHandler(
# give download file a name 
  filename = function() {
paste0("animal_appt_data_", input$start_yr, "_", input$end_yr, ".xlsx")
},
# download and populate excel file
  content = function(file) {
# create progress bar for user with first step

shiny::withProgress(
min = 0,
max = 1,
value = 0,
{
shiny::incProgress(
amount = 1/2,
message = "Retrieving data from database..."
)

# get data using function
  data <- appt_query(input$start_yr, input$end_yr)
# set up an excel file in a tab called "Appointment Data"
        shiny::incProgress(
amount = 1/2,
message = "Writing to Excel..."
)
  wb <- openxlsx::createWorkbook()
openxlsx::addWorksheet(wb, "Appointment Data")

# design a nice header style so the results look professional
  hs1 <- openxlsx::createStyle(fontColour = "#ffffff", 
fgFill = "#4F80BD",
halign = "left",
valign = "center",
textDecoration = "bold",
border = "TopBottomLeftRight",
wrapText = TRUE)
# write the results into the "Appointment Data" tab with a nice border
  openxlsx::writeData(wb, "Appointment Data", x = data,
startRow = 1, startCol = 1,
borders = "surrounding",
headerStyle = hs1)
# save Excel file and send to download
  openxlsx::saveWorkbook(wb, file, overwrite = TRUE)
  } # <- end progress bar 
) # <- end withProgress
})  # <- close content wrapper and downloadHandler function
```

Step 4: Provide access to clients

Using this method, the clients only have access to data that you have designed for them. If you use the RStudio Connect sharing platform, you can publish this as an app and provide access to only specific individuals or groups, ensuring that only authorised clients have access. If you use Shiny Server Pro, seek advice in how to control user access from user documentation.

Conclusion and Extension

If you are a fluent Shiny user you can likely immediately grasp what is going on here. If not, you may need more training in Shiny before you take this on. This methodology can be extended to be extremely powerful. Dashboards can be designed using ggplot2 or plotly instead of — or in addition to — Excel downloads. Packages like async , future and promises can be used to scale for multiple concurrent users using asychronous programming (for more on that see here). In the most advanced case, an entire flexible business intelligence platform can be built and hosted on Shiny using the initial concepts presented here.

Leave a Reply

%d bloggers like this: