| Title: | Fast Access to Brazilian Public Health Data from 'DATASUS' |
|---|---|
| Description: | Provides fast, in-memory reading of 'DATASUS' 'DBC' files using native 'C' code, along with a catalog of public health data sources, 'FTP' file discovery, caching downloads, and a high-level datasus_fetch() function that lists, downloads, and reads files in a single call. Bundles the 'blast' decompressor from 'zlib' contrib/blast to decode 'PKWare DCL' compressed 'DBC' files and parses 'DBF' records directly for efficient import into tibbles. See the 'DATASUS' file transfer site <https://datasus.saude.gov.br> and Adler (2003) <https://github.com/madler/zlib/tree/master/contrib/blast> for details on the underlying data and compression format. |
| Authors: | Andre Leite [aut, cre], Marcos Wasilew [aut], Hugo Vasconcelos [aut], Carlos Amorin [aut], Diogo Bezerra [aut], Mark Adler [ctb, cph] (Author of bundled blast.c and blast.h from zlib contrib/blast) |
| Maintainer: | Andre Leite <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-04 07:46:28 UTC |
| Source: | https://github.com/strategicprojects/datasusr |
Builds candidate FTP directories for a given source and file type. For SIH/SIA, the function selects historical vs. current trees by year/month. For SIM and SINASC, preliminary trees are included when requested.
datasus_build_path( source, file_type, year = NULL, month = NULL, include_prelim = TRUE )datasus_build_path( source, file_type, year = NULL, month = NULL, include_prelim = TRUE )
source |
Source code (e.g. |
file_type |
File type code (e.g. |
year |
Integer vector of years. |
month |
Integer vector of months. |
include_prelim |
Logical. Include preliminary trees when available
(default |
A tibble with columns source, file_type, period, and path.
datasus_build_path(source = "SIHSUS", file_type = "RD", year = 2024, month = 1)datasus_build_path(source = "SIHSUS", file_type = "RD", year = 2024, month = 1)
Removes files from the cache directory. By default all cached files are
removed; pass a character vector to files to remove specific paths.
datasus_cache_clear(cache_dir = NULL, files = NULL, verbose = TRUE)datasus_cache_clear(cache_dir = NULL, files = NULL, verbose = TRUE)
cache_dir |
Optional cache directory. |
files |
Optional character vector of file paths to remove. When |
verbose |
Logical. Emit progress messages (default |
A tibble with columns path and removed.
Returns the active cache directory path. Checks, in order: the cache_dir
argument, the DATASUSR_CACHE_DIR environment variable, the
datasusr.cache_dir option, and finally a session-scoped subdirectory of
tempdir(). To opt in to a persistent cache across sessions, set the
DATASUSR_CACHE_DIR environment variable, the datasusr.cache_dir option,
or pass cache_dir explicitly (for example,
tools::R_user_dir("datasusr", "cache")).
datasus_cache_dir(cache_dir = NULL)datasus_cache_dir(cache_dir = NULL)
cache_dir |
Optional cache directory supplied by the caller. |
A single path string.
datasus_cache_dir()datasus_cache_dir()
Returns a one-row tibble summarising the current cache: directory path, file count, total size, and modification time range.
datasus_cache_info(cache_dir = NULL, verbose = TRUE)datasus_cache_info(cache_dir = NULL, verbose = TRUE)
cache_dir |
Optional cache directory. |
verbose |
Logical. Print a summary to the console (default |
A tibble with one row.
datasus_cache_info()datasus_cache_info()
Returns a tibble describing every file currently stored in the cache directory.
datasus_cache_list(cache_dir = NULL)datasus_cache_list(cache_dir = NULL)
cache_dir |
Optional cache directory. |
A tibble with columns path, file_name, size_bytes,
modified_time, and accessed_time.
datasus_cache_list()datasus_cache_list()
Selectively removes cached files based on age and/or total size. Older and least-recently-accessed files are removed first.
datasus_cache_prune( cache_dir = NULL, max_size_bytes = NULL, older_than_days = NULL, verbose = TRUE )datasus_cache_prune( cache_dir = NULL, max_size_bytes = NULL, older_than_days = NULL, verbose = TRUE )
cache_dir |
Optional cache directory. |
max_size_bytes |
Maximum total cache size in bytes. When exceeded, the least-recently-accessed files are removed until the cache fits. |
older_than_days |
Age threshold in days. Files with a modification time older than this are removed. |
verbose |
Logical. Emit progress messages (default |
A tibble with columns path, removed, and reason.
Returns the FTP URLs where documentation files (layouts, data dictionaries)
for each source system can be found. Use datasus_ftp_ls() to list the
actual files at each URL.
datasus_docs_url(source = NULL)datasus_docs_url(source = NULL)
source |
Optional source code to filter (e.g. |
A tibble with columns source and docs_url.
datasus_docs_url() datasus_docs_url("CNES") tryCatch({ # List documentation files for CNES (network required). docs <- datasus_docs_url("CNES") datasus_ftp_ls(docs$docs_url[[1]], verbose = FALSE) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))datasus_docs_url() datasus_docs_url("CNES") tryCatch({ # List documentation files for CNES (network required). docs <- datasus_docs_url("CNES") datasus_ftp_ls(docs$docs_url[[1]], verbose = FALSE) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
Downloads one or many DATASUS files. When use_cache = TRUE, files that
already exist in the cache directory are reused instead of re-downloaded.
datasus_download( files = NULL, ..., dest_dir = NULL, overwrite = FALSE, timeout = 240, use_cache = TRUE, cache_dir = NULL, refresh = FALSE, verbose = TRUE )datasus_download( files = NULL, ..., dest_dir = NULL, overwrite = FALSE, timeout = 240, use_cache = TRUE, cache_dir = NULL, refresh = FALSE, verbose = TRUE )
files |
A tibble returned by |
... |
Filters passed to |
dest_dir |
Optional destination directory. When |
overwrite |
Logical. Overwrite existing files (default |
timeout |
Timeout in seconds for each download (default 240). |
use_cache |
Logical. Store and reuse downloads in the cache directory
(default |
cache_dir |
Optional cache directory. |
refresh |
Logical. Force re-download even when a cached file exists
(default |
verbose |
Logical. Emit progress messages (default |
A tibble with a local_file column containing the paths to the
downloaded files, plus a downloaded flag.
tryCatch({ files <- datasus_list_files( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = "AC", verbose = FALSE ) downloads <- datasus_download( files, cache_dir = tempdir(), verbose = FALSE ) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))tryCatch({ files <- datasus_list_files( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = "AC", verbose = FALSE ) downloads <- datasus_download( files, cache_dir = tempdir(), verbose = FALSE ) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
A convenience wrapper that lists, downloads, and reads DATASUS files in a single call. Particularly useful for interactive / exploratory work.
datasus_fetch( source, file_type, year = NULL, month = NULL, uf = NULL, ..., bind = TRUE, include_prelim = TRUE, timeout = 240, use_cache = TRUE, cache_dir = NULL, verbose = TRUE )datasus_fetch( source, file_type, year = NULL, month = NULL, uf = NULL, ..., bind = TRUE, include_prelim = TRUE, timeout = 240, use_cache = TRUE, cache_dir = NULL, verbose = TRUE )
source |
Character vector of source codes. |
file_type |
Character vector of file type codes. |
year |
Integer vector of years. |
month |
Integer vector of months (required for monthly sources). |
uf |
Character vector of UF codes (required for UF-scoped sources). |
... |
Additional arguments forwarded to |
bind |
Logical. When |
include_prelim |
Logical. Include preliminary data trees (default |
timeout |
Timeout in seconds for FTP and download operations (default 240). |
use_cache |
Logical. Reuse cached downloads (default |
cache_dir |
Optional cache directory. |
verbose |
Logical. Emit progress messages (default |
A tibble (when bind = TRUE) or a named list of tibbles.
tryCatch({ # Fetch a small SIHSUS slice into tempdir() (network required). df <- datasus_fetch( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = "AC", select = c("uf_zi", "ano_cmpt", "munic_res", "val_tot"), cache_dir = tempdir(), verbose = FALSE ) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))tryCatch({ # Fetch a small SIHSUS slice into tempdir() (network required). df <- datasus_fetch( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = "AC", select = c("uf_zi", "ano_cmpt", "munic_res", "val_tot"), cache_dir = tempdir(), verbose = FALSE ) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
Returns the internal catalog of file types. Optionally filtered by source and/or file type codes.
datasus_file_types(source = NULL, file_type = NULL)datasus_file_types(source = NULL, file_type = NULL)
source |
Optional character vector of source codes (e.g. |
file_type |
Optional character vector of file type codes (e.g. |
A tibble.
datasus_file_types() datasus_file_types(source = "SIHSUS") datasus_file_types(source = "CNES", file_type = "ST")datasus_file_types() datasus_file_types(source = "SIHSUS") datasus_file_types(source = "CNES", file_type = "ST")
Fetches a raw directory listing from a DATASUS FTP path.
datasus_ftp_ls(path, timeout = 120, ftp_use_epsv = FALSE, verbose = TRUE)datasus_ftp_ls(path, timeout = 120, ftp_use_epsv = FALSE, verbose = TRUE)
path |
FTP path or full URL. Relative paths are prefixed with the DATASUS public FTP root. |
timeout |
Timeout in seconds (default 120). |
ftp_use_epsv |
Logical. Passed to curl (default |
verbose |
Logical. Emit progress messages (default |
A tibble with columns ftp_url and entry.
tryCatch( datasus_ftp_ls("SIHSUS/200801_/Dados/", verbose = FALSE), error = function(e) message("FTP unavailable: ", conditionMessage(e)) )tryCatch( datasus_ftp_ls("SIHSUS/200801_/Dados/", verbose = FALSE), error = function(e) message("FTP unavailable: ", conditionMessage(e)) )
Downloads and reads tables from the DATASUS territorial data section. The DATASUS FTP publishes a ZIP archive per year containing reference tables for municipalities, health regions, and other geographic divisions used by SUS, in CSV, DBF, and TXT formats.
datasus_get_territory( table = "tb_municip", year = NULL, format = "csv", cache_dir = NULL, verbose = TRUE )datasus_get_territory( table = "tb_municip", year = NULL, format = "csv", cache_dir = NULL, verbose = TRUE )
table |
Name of the table to read. Common values:
|
year |
Year of the territorial table. Defaults to the current year.
Use |
format |
File format to extract from the ZIP: |
cache_dir |
Optional cache directory. |
verbose |
Logical. Emit progress messages (default |
A tibble with column names in snake_case.
tryCatch({ # Download territorial tables into tempdir() (network required). municipios <- datasus_get_territory( "tb_municip", cache_dir = tempdir(), verbose = FALSE ) ufs <- datasus_get_territory( "tb_uf", cache_dir = tempdir(), verbose = FALSE ) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))tryCatch({ # Download territorial tables into tempdir() (network required). municipios <- datasus_get_territory( "tb_municip", cache_dir = tempdir(), verbose = FALSE ) ufs <- datasus_get_territory( "tb_uf", cache_dir = tempdir(), verbose = FALSE ) }, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
Builds candidate file names from the internal catalog and, optionally, validates them against the DATASUS FTP.
datasus_list_files( source, file_type, year = NULL, month = NULL, uf = NULL, include_prelim = TRUE, check_exists = TRUE, timeout = 120, ftp_use_epsv = FALSE, verbose = TRUE )datasus_list_files( source, file_type, year = NULL, month = NULL, uf = NULL, include_prelim = TRUE, check_exists = TRUE, timeout = 120, ftp_use_epsv = FALSE, verbose = TRUE )
source |
Character vector of source codes. |
file_type |
Character vector of file type codes. |
year |
Integer vector of years. |
month |
Integer vector of months (required for monthly sources). |
uf |
Character vector of UF codes (required for UF-scoped sources). |
include_prelim |
Logical. Include preliminary data trees (default |
check_exists |
Logical. Query the FTP and keep only files that exist
(default |
timeout |
Timeout in seconds for FTP requests (default 120). |
ftp_use_epsv |
Logical. Passed to curl (default |
verbose |
Logical. Emit progress messages (default |
A tibble with one row per file, including its FTP URL and metadata.
tryCatch( datasus_list_files( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = c("PE", "PB"), verbose = FALSE ), error = function(e) message("FTP unavailable: ", conditionMessage(e)) )tryCatch( datasus_list_files( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = c("PE", "PB"), verbose = FALSE ), error = function(e) message("FTP unavailable: ", conditionMessage(e)) )
Returns the available DATASUS data modalities (data, documentation, programs, etc.).
datasus_modalities()datasus_modalities()
A tibble.
datasus_modalities()datasus_modalities()
Returns the internal catalog of DATASUS sources available in datasusr.
datasus_sources()datasus_sources()
A tibble with one row per source, including its code, description, default scope, and flags for monthly and UF support.
datasus_sources()datasus_sources()
Returns a character vector of the 27 Brazilian state abbreviations accepted by DATASUS file naming conventions.
datasus_ufs()datasus_ufs()
A character vector of length 27.
datasus_ufs()datasus_ufs()
Converts raw byte counts into human-readable strings (e.g. "1.23 MB").
format_bytes(x)format_bytes(x)
x |
Numeric vector of byte sizes. |
A character vector with formatted sizes.
format_bytes(c(1024, 1048576, NA))format_bytes(c(1024, 1048576, NA))
Reads a DATASUS .dbc file directly into R using native C code for
PKWare DCL decompression and DBF parsing. The function always returns a
tibble and is designed to work well with the tidyverse.
read_datasus_dbc( file, select = NULL, n_max = Inf, trim_ws = TRUE, encoding = "latin1", guess_types = TRUE, col_types = NULL, parse_dates = FALSE, clean_names = TRUE, verbose = TRUE )read_datasus_dbc( file, select = NULL, n_max = Inf, trim_ws = TRUE, encoding = "latin1", guess_types = TRUE, col_types = NULL, parse_dates = FALSE, clean_names = TRUE, verbose = TRUE )
file |
Path to a |
select |
Optional character vector of column names to keep. When
|
n_max |
Maximum number of rows to read. Defaults to |
trim_ws |
Logical. Trim leading/trailing whitespace from character
fields (default |
encoding |
Encoding of the DBF character fields. Typically |
guess_types |
Logical. Inspect numeric fields to distinguish
integer-like columns from double columns (default |
col_types |
Optional named character vector of explicit column types.
Supported values: |
parse_dates |
Logical. If |
clean_names |
Logical. If |
verbose |
Logical. Emit progress messages (default |
A tibble.
# The example downloads a small DATASUS file into tempdir() and reads it. # Skipped automatically if the FTP is unreachable. tmp <- file.path(tempdir(), "RDAC2401.dbc") url <- paste0( "ftp://ftp.datasus.gov.br/dissemin/publicos/SIHSUS/200801_/Dados/", "RDAC2401.dbc" ) ok <- tryCatch( { utils::download.file(url, tmp, mode = "wb", quiet = TRUE) TRUE }, error = function(e) FALSE, warning = function(w) FALSE ) if (ok) { # Basic read (column names in snake_case by default) x <- read_datasus_dbc(tmp, verbose = FALSE) # Select works with either case x <- read_datasus_dbc( tmp, select = c("uf_zi", "ano_cmpt", "val_tot"), verbose = FALSE ) # Keep original uppercase names x <- read_datasus_dbc(tmp, clean_names = FALSE, verbose = FALSE) unlink(tmp) }# The example downloads a small DATASUS file into tempdir() and reads it. # Skipped automatically if the FTP is unreachable. tmp <- file.path(tempdir(), "RDAC2401.dbc") url <- paste0( "ftp://ftp.datasus.gov.br/dissemin/publicos/SIHSUS/200801_/Dados/", "RDAC2401.dbc" ) ok <- tryCatch( { utils::download.file(url, tmp, mode = "wb", quiet = TRUE) TRUE }, error = function(e) FALSE, warning = function(w) FALSE ) if (ok) { # Basic read (column names in snake_case by default) x <- read_datasus_dbc(tmp, verbose = FALSE) # Select works with either case x <- read_datasus_dbc( tmp, select = c("uf_zi", "ano_cmpt", "val_tot"), verbose = FALSE ) # Keep original uppercase names x <- read_datasus_dbc(tmp, clean_names = FALSE, verbose = FALSE) unlink(tmp) }