Instructions and Overview

In this lab, you will refer to the plots you created in each of your previous labs to summarize insights and critiques of the dataset you’ve been analyzing. This text will be added to an “About” page on your data dashboard.

Getting Started

Load the relevant libraries

rr library(tidyverse) library(lubridate) library(shiny) library(shinydashboard) library(shinyWidgets)

Import and clean example datasets

hospitals <- read.csv("https://opendata.arcgis.com/datasets/6ac5e325468c4cb9b905f1728d6fbf0f_0.csv", stringsAsFactors = FALSE)

world_health_econ <- read.csv("https://raw.githubusercontent.com/lindsaypoirier/STS-115/master/datasets/world_health_econ.csv", stringsAsFactors = FALSE)

cases <- read.csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv", stringsAsFactors = FALSE)

#Do not worry about this line of code for now. Since the cases data gets appended every day with a new column representing that day's case counts, if we want the total cases per country, we need to add up all of the previous day's counts into a new column. The column below does this for us. 
cases <- 
  cases %>% 
  mutate(Total.Cases = 
           cases %>% 
           select(starts_with("X")) %>% 
           rowSums()
         ) %>%
  select(Province.State, Country.Region, Total.Cases)

hospitals$ZIP <- as.character(hospitals$ZIP)

hospitals$ZIP <- str_pad(hospitals$ZIP, 5, pad = "0") 

is.na(hospitals) <- hospitals == "NOT AVAILABLE"
is.na(hospitals) <- hospitals == -999
is.na(cases) <- cases == ""

hospitals$SOURCEDATE <- ymd_hms(hospitals$SOURCEDATE)
hospitals$VAL_DATE <- ymd_hms(hospitals$VAL_DATE)

Import and clean your dataset.

rr #Copy and paste relevant code from Lab 4 to import your data here.

#Copy and paste relevant code from Lab 4 to clean your data here. This includes any row binding, character removals, converions in variable type, date formatting, or NA conversions.

Summarize Insights

In about 500 words, describe how your understanding of your topic has advanced as a result of your research. Reference specific statistics, plots, or tables in this section to communicate what key insights you were able to draw from the data. Some things to consider as you are writing this section:

  1. Be sure to contextualize your findings in the appropriate temporal and geographic context. If your dataset is about the average age of homeowners, and it only spans from 2012-2015 and covers the state of CA:
  • Imprecise: “There are more general acute care hospitals than any other hospital.”
  • Better: “Based on data aggregated from government sources by Oak Ridge National Laboratory since 2012, there are more general acute care hospitals in the United States than any other type of hospital.”
  1. Make a clear distinction between what the data empirically shows you and how you interpret the results. Again, imagine that your dataset is about the average age of homeowners:
  • Imprecise: “This data indicates that many hospitals do not have enough beds to take on new patients.”
  • Better: “According to this data, the median number of beds at general acute hospitals in the US is 139. Comparing this to the case rates in the US since January, this suggests that many hospitals will not have enough beds available to take on the influx in patients.”
  1. Consider your data issues:
  • Imprecise: “From January to present the growth of the Covid-19 cases has increased most dramatically in the US.”
  • Better: “From January to present the growth of the Covid-19 cases has increased most dramatically in the US. However, since the case data is only as accurate as the portion of the population being test and countries are testing at different capacities, this measure may be imprecise.”
Fill your response here. 

Characterize Knowledge Gaps

In what ways might the data in your dataset be inaccurate? Describe in specific detail how these inaccuracies might affect your data analysis. Reference a specific analysis you’ve completed (either through transformation or plotting), and describe how this issue might impact how you interpret your data analysis.

Fill your response here. 

In what ways might your dataset be incomplete or non-representative of the extent of the issues? Describe in specific detail how the data’s incompleteness might affect your data analysis. Reference a specific analysis you’ve completed (either through transformation or plotting), and describe how this issue might impact how you interpret your data analysis.

Fill your response here. 

In what ways have you had to make assumptions in order to glean insights from your dataset? Describe in specific detail how these assumptions might affect your data analysis. Reference a specific analysis you’ve completed (either through transformation or plotting), and describe how this issue might impact how you interpret your data analysis.

Fill your response here. 

What don’t you know about your data domain that has made it difficult to interpret the data? Describe in specific detail how these cultural gaps might affect your data analysis. Reference a specific analysis you’ve completed (either through transformation or plotting), and describe how this issue might impact how you interpret your data analysis.

Fill your response here. 

BONUS (+1): Using the functions we have learned in R, create one plot that graphically represents one of the issues that you outlined above. It might be a plot that displays the sampling gaps. It might be a plot that showcases where data quality issues are present in your data. It might be a set of plots that show how different results are produced when different assumptions are made. Write a caption for this plot, explaining how it illustrates potential issues with your data analysis.

rr #Fill code here.

Continue your shiny app.

Now we will aggregate all of the text that you produced above, along with text that you produced in lab 3 into an “About the Data” page on your Shiny App. Follow the instructions below to fill your text into a new page created on the front end of the app.

We won’t be touching the input variables this week. First copy and paste your input variables from lab 7.

#======================
#COPY AND PASTE THE INPUT VARIABLES SECTION FROM LAB 7 BELOW
#======================
geo_input_choices <- 
  #REPLACE hospitals BELOW WITH YOUR OWN DATAFRAME
  hospitals %>% 
  #REPLACE STATE BELOW WITH YOUR OWN GEOGRAPHIC VARIABLE
  select(STATE) %>% 
  distinct() %>% 
  #REPLACE STATE BELOW WITH YOUR OWN GEOGRAPHIC VARIABLE
  arrange(STATE)

#COMMENT LINES BELOW IF YOU DO NOT HAVE A TEMPORAL VARIABLE IN YOUR DATAFRAME
date_input_start <- 
  #REPLACE hospitals BELOW WITH YOUR OWN DATAFRAME
  hospitals %>% 
  #REPLACE SOURCEDATE BELOW WITH YOUR OWN TEMPORAL VARIABLE
  summarize(date = min(SOURCEDATE))

date_input_end <- 
  #REPLACE hospitals BELOW WITH YOUR OWN DATAFRAME
  hospitals %>% 
  #REPLACE SOURCEDATE BELOW WITH YOUR OWN TEMPORAL VARIABLE
  summarize(date = max(SOURCEDATE))
#COMMENT LINES ABOVE IF YOU DO NOT HAVE A TEMPORAL VARIABLE IN YOUR DATAFRAME

#======================
#COPY AND PASTE THE INPUT VARIABLES SECTION FROM LAB 7 ABOVE
#======================

rr geo_input_choices <- hospitals %>% select(STATE) %>% distinct() %>% arrange(STATE) date_input_start <- hospitals %>% summarize(date = min(SOURCEDATE)) date_input_end <- hospitals %>% summarize(date = max(SOURCEDATE))

ui <- dashboardPage(

#REPLACE ‘TITLE HERE’ BELOW WITH YOUR OWN TITLE dashboardHeader(title = HERE), #REPLACE ‘TITLE HERE’ ABOVE WITH YOUR OWN TITLE

dashboardSidebar( sidebarMenu( menuItem(, tabName = , icon = icon()), menuItem(the Data, tabName = , icon = icon(-sign)) ), selectInput(inputId = _val, label = an geography:, choices = geo_input_choices, selected = geo_input_choices[1]), dateRangeInput(inputId = \date_val, label = a date range:, start = date_input_start\(date, end = date_input_end\)date) ),

dashboardBody( tabItems( tabItem(tabName = , infoBoxOutput(1, width = 4), infoBoxOutput(2, width = 4), infoBoxOutput(3, width = 4),

    box(plotOutput(\plot1\)),
    box(plotOutput(\plot2\)),
    box(plotOutput(\plot3\)),
    box(plotOutput(\plot4\))
  ),
  tabItem(tabName = \about\,
          tags$h1(\Data Source\),
          tags$p(\Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium

```

We won’t be touching the server function this week. You can copy and paste yours from Lab 7 to replace my code below.

rr server <- function(input, output) {

output$value1 <- renderInfoBox({ quant_insight1 <- 0 #Replace ‘0’ above with the code for one of the values you calculated above. Replace ‘FILL DESCRIPTION HERE’ with a brief description of this number.
infoBox(quant_insight1,‘FILL DESCRIPTION HERE’, icon = icon(, lib=‘glyphicon’), color = ) })

output$value2 <- renderInfoBox({ quant_insight2 <- 0 #Replace ‘0’ above with the code for one of the values you calculated above. Replace ‘FILL DESCRIPTION HERE’ with a brief description of this number.
infoBox(quant_insight2,‘FILL DESCRIPTION HERE’, icon = icon(, lib=‘glyphicon’), color = ) })

output$value3 <- renderInfoBox({ quant_insight3 <- 0 #Replace ‘0’ above with the code for one of the values you calculated above. Replace ‘FILL DESCRIPTION HERE’ with a brief description of this number.
infoBox(quant_insight3,‘FILL DESCRIPTION HERE’, icon = icon(, lib=‘glyphicon’), color = ) })

output\(plot1 <- renderPlot({ hospitals %>% filter( STATE == input\)geo_val & SOURCEDATE > input\(date_val[1] & SOURCEDATE < input\)date_val[2] ) %>% ggplot(aes(x = TYPE)) + geom_bar() #Replace plot above with your own plot.

})

output\(plot2 <- renderPlot({ hospitals %>% filter( STATE == input\)geo_val & SOURCEDATE > input\(date_val[1] & SOURCEDATE < input\)date_val[2] ) %>% ggplot(aes(x = TYPE)) + geom_bar() #Replace plot above with your own plot. })

output\(plot3 <- renderPlot({ hospitals %>% filter( STATE == input\)geo_val & SOURCEDATE > input\(date_val[1] & SOURCEDATE < input\)date_val[2] ) %>% ggplot(aes(x = TYPE)) + geom_bar() #Replace plot above with your own plot.

})

output\(plot4 <- renderPlot({ hospitals %>% filter( STATE == input\)geo_val & SOURCEDATE > input\(date_val[1] & SOURCEDATE < input\)date_val[2] ) %>% ggplot(aes(x = TYPE)) + geom_bar() #Replace plot above with your own plot. })

}

rr shinyApp(ui, server)


Listening on http://127.0.0.1:3920
NA
