Skip to contents

The covidBR package provides easy access to official COVID-19 data sets of Brazil. All functions’ syntax is based on a simple logic that allows users to download updated data sets for all years. This vignette provides an overview of covidBR.

Installation

You can install the development version of covidBR from GitHub with:

# Development version
# install.packages("devtools")
devtools::install_github("arianacabral/covidBR")

Load packages

Now, let’s load the covidBR and the libraries we’ll use in this vignette.

General usage

The covidBR allows you to obtain the URL on the Covid-19 Portal available for downloading the files.

Daily, the Ministry of Health (Ministério da Saúde) updates and publicly disseminates the database, and, consequently, the download URL is updated.

# Get updated URL to download the public file
download_url <- covidBR::cbr_download_url()

# Print URL
download_url
#> [1] "https://mobileapps.saude.gov.br/esus-vepi/files/unAFkcaNDeXajurGB7LChj8SgQYS2ptm/1b386b80cf23162edda1a942b9832380_HIST_PAINEL_COVIDBR_07nov2022.rar"

Now let’s download the public file available at the obtained URL.

# Download public file
resp <- covidBR::cbr_download(url_file = download_url,
                              dir = here::here("inst/extdata"), #saving to "extradata" folder
                              keep_filename = TRUE)

# Get filename
filename <- fs::path_file(resp$url)

# Print filename
filename

A zip file has been downloaded from the COVID-19 Portal. We use the cbr_unzip() function to unzip it.

# Set absolute directory
abs_path <- stringr::str_c(
  here::here("inst/extdata"),
  filename,
  sep = "/"
)

# Unzip
covidBR::cbr_unzip(zip_file = abs_path,
                   dir = here::here("inst/extdata"), # the directory to extract archive contents
                   keep_zip_file = FALSE) # Delete the compressed file

Simply use the cbr_data() function to combine all files and generate a single dataset.

You can already combine the files by specifying a time range of interest. For example, use the argument date = Sys.Date() to retrieve today’s data.

# Bind multiple files
covidBR.db <- covidBR::cbr_data(dir = here::here("inst/extdata"))  

Available variables

You can view what variables are available using the str() function.

# Display the dataset's structure
str(covidBR.db)
#> Classes 'data.table' and 'data.frame':   5372632 obs. of  17 variables:
#>  $ regiao                 : chr  "Brasil" "Brasil" "Brasil" "Brasil" ...
#>  $ estado                 : chr  "" "" "" "" ...
#>  $ municipio              : chr  "" "" "" "" ...
#>  $ coduf                  : int  76 76 76 76 76 76 76 76 76 76 ...
#>  $ codmun                 : int  NA NA NA NA NA NA NA NA NA NA ...
#>  $ cod_regiao_saude       : int  NA NA NA NA NA NA NA NA NA NA ...
#>  $ nome_regiao_saude      : chr  "" "" "" "" ...
#>  $ data                   : IDate, format: "2020-02-25" "2020-02-26" ...
#>  $ semana_epi             : int  9 9 9 9 9 10 10 10 10 10 ...
#>  $ populacao_tcu2019      : int  210147125 210147125 210147125 210147125 210147125 210147125 210147125 210147125 210147125 210147125 ...
#>  $ casos_acumulado        : num  0 1 1 1 2 2 2 2 3 7 ...
#>  $ casos_novos            : int  0 1 0 0 1 0 0 0 1 4 ...
#>  $ obitos_acumulado       : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ obitos_novos           : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ recuperadosnovos       : int  0 1 1 0 1 1 0 0 1 4 ...
#>  $ em_acompanhamento_novos: int  0 0 0 1 1 1 2 2 2 3 ...
#>  $ interior_metropolitana : int  NA NA NA NA NA NA NA NA NA NA ...
#>  - attr(*, ".internal.selfref")=<externalptr>

Data exploration

You may be interested in specific information, and we will now demonstrate how to work with some case scenarios.

Data exploration by date

For this purpose, we can use the package lubridate. It is an R package that makes it easier to work with dates and times.

See help(package = "lubridate").

How can I get this year’s data?

# Get current date
today <- Sys.Date()

# Print current date
today
#> [1] "2022-11-09"

# Get this year
current_year <- lubridate::year(today)

# Print the current year 
current_year
#> [1] 2022

# Set first day of this year
day_1 <- lubridate::ymd(paste0(current_year,"01","01",sep="-"))

# Print the first day of this year 
day_1
#> [1] "2022-01-01"

# Get this year's data
covidBR.db <- covidBR::cbr_data(dir = here::here("./inst/extdata"),
                                date = c(day_1, 
                                         today)) 
#> Data from 2022-01-01 to 2022-11-09

# Tabular view
covidBR.db %>% 
  dplyr::select(regiao, 
                data,
                casos_acumulado,
                casos_novos, 
                obitos_novos) %>% 
  head(n = 10L) %>% 
  knitr::kable() 
regiao data casos_acumulado casos_novos obitos_novos
Brasil 2022-01-01 22291507 3986 49
Brasil 2022-01-02 22293228 1721 28
Brasil 2022-01-03 22305078 11850 76
Brasil 2022-01-04 22323837 18759 175
Brasil 2022-01-05 22351104 27267 129
Brasil 2022-01-06 22386930 35826 128
Brasil 2022-01-07 22450222 63292 181
Brasil 2022-01-08 22499525 49303 115
Brasil 2022-01-09 22523907 24382 44
Brasil 2022-01-10 22558695 34788 110

How can I get the last six months’ worth of data?

# Get current date
today <- Sys.Date()

# Print current date
today
#> [1] "2022-11-09"

# Get date from 6 months ago
date_6_months_ago <- today - months(6)

# Print date from 6 months ago
date_6_months_ago
#> [1] "2022-05-09"

# Get data from the last 6 months
covidBR.db <- covidBR::cbr_data(dir = here::here("./inst/extdata"),
                                date = c(date_6_months_ago,today)) 
#> Data from 2022-05-09 to 2022-11-09

# Tabular view
covidBR.db %>% 
  dplyr::select(regiao, 
                data,
                casos_acumulado,
                casos_novos, 
                obitos_novos) %>% 
  head(n = 10L) %>% 
  knitr::kable()
regiao data casos_acumulado casos_novos obitos_novos
Brasil 2022-05-09 30574245 9709 53
Brasil 2022-05-10 30594388 20143 198
Brasil 2022-05-11 30617786 23398 126
Brasil 2022-05-12 30639130 21344 125
Brasil 2022-05-13 30664739 25609 139
Brasil 2022-05-14 30682094 17355 92
Brasil 2022-05-15 30688390 6296 46
Brasil 2022-05-16 30701900 13510 69
Brasil 2022-05-17 30728286 26386 229
Brasil 2022-05-18 30741811 13525 103

How can I get the last 30 days’ worth of data?

# Get current date
today <- Sys.Date()

# Print current date
today
#> [1] "2022-11-09"

# Get date from 30 days ago
date_30_days_ago <- today - lubridate::days(30)

# Print date from 30 days ago
date_30_days_ago
#> [1] "2022-10-10"

# Get data from the last 30 days
covidBR.db <- covidBR::cbr_data(dir = here::here("./inst/extdata"),
                                date = c(date_30_days_ago, today)) 
#> Data from 2022-10-10 to 2022-11-09

# Tabular view
covidBR.db %>% 
  dplyr::select(regiao, 
                data,
                casos_acumulado,
                casos_novos, 
                obitos_novos) %>% 
  head(n = 10L) %>% 
  knitr::kable()
regiao data casos_acumulado casos_novos obitos_novos
Brasil 2022-10-10 34724189 4682 26
Brasil 2022-10-11 34731539 7350 86
Brasil 2022-10-12 34736653 5114 63
Brasil 2022-10-13 34739865 3212 43
Brasil 2022-10-14 34746462 6597 75
Brasil 2022-10-15 34748678 2216 9
Brasil 2022-10-16 34749058 380 2
Brasil 2022-10-17 34754590 5532 88
Brasil 2022-10-18 34762873 8283 83
Brasil 2022-10-19 34771320 8447 97

Data visualization

# Obtain data for Brazil
tidycovidBR <- covidBR::cbr_data(
  dir = here::here("inst/extdata"))  %>% 
  dplyr::filter(regiao == "Brasil") 

# Select the interesting columns 
tidycovidBR <- tidycovidBR %>% 
  dplyr::select(c(regiao, data,
                  casos_acumulado:obitos_novos))

# Tabular view 
tidycovidBR %>% 
  head(n = 15L) %>% 
  knitr::kable()
regiao data casos_acumulado casos_novos obitos_acumulado obitos_novos
Brasil 2020-02-25 0 0 0 0
Brasil 2020-02-26 1 1 0 0
Brasil 2020-02-27 1 0 0 0
Brasil 2020-02-28 1 0 0 0
Brasil 2020-02-29 2 1 0 0
Brasil 2020-03-01 2 0 0 0
Brasil 2020-03-02 2 0 0 0
Brasil 2020-03-03 2 0 0 0
Brasil 2020-03-04 3 1 0 0
Brasil 2020-03-05 7 4 0 0
Brasil 2020-03-06 13 6 0 0
Brasil 2020-03-07 19 6 0 0
Brasil 2020-03-08 25 6 0 0
Brasil 2020-03-09 25 0 0 0
Brasil 2020-03-10 34 9 0 0

# Summary
summary(tidycovidBR)
#>     regiao               data            casos_acumulado     casos_novos    
#>  Length:987         Min.   :2020-02-25   Min.   :       0   Min.   :  -573  
#>  Class :character   1st Qu.:2020-10-28   1st Qu.: 5481323   1st Qu.: 11046  
#>  Mode  :character   Median :2021-07-02   Median :18687469   Median : 26749  
#>                     Mean   :2021-07-02   Mean   :16979975   Mean   : 35315  
#>                     3rd Qu.:2022-03-05   3rd Qu.:29041032   3rd Qu.: 49784  
#>                     Max.   :2022-11-07   Max.   :34855492   Max.   :298408  
#>  obitos_acumulado  obitos_novos   
#>  Min.   :     0   Min.   :   0.0  
#>  1st Qu.:158713   1st Qu.: 150.0  
#>  Median :521952   Median : 451.0  
#>  Mean   :408614   Mean   : 697.5  
#>  3rd Qu.:652035   3rd Qu.:1023.0  
#>  Max.   :688395   Max.   :4249.0

# Check for duplicate dates
is_unique <- length(unique(tidycovidBR$data)) == length(tidycovidBR$data)

# Plot data
if(is_unique){
  
  tidycovidBR %>% 
  dplyr::ungroup() %>% 
  tidyr::pivot_longer(
    cols = -c(regiao,data),
    names_to  = "variavel",
    values_to = "valor") %>% 
  dplyr::mutate_at("variavel", 
                   stringr::str_replace_all,
                   pattern = "_",
                   replacement = " ")  %>% 
  ggplot(aes(x = data, y = valor)) + 
  geom_line() + 
  facet_wrap(~ variavel, scales = "free_y", ncol = 1) + 
  labs(x = "Data da notificação", 
       y = "") + 
  theme_minimal() 
  
}