Brazil’s Continuous National Household Sample Survey (PNADC) publishes labor market indicators as rolling (moving) quarters — 3-month moving averages where each published “quarter” shares 2 months with its neighbors. This smoothing hides short-term dynamics: turning points are delayed, seasonal patterns are distorted, and international comparison becomes difficult.
The PNADCperiods package includes a SIDRA mensalization module that recovers exact monthly estimates from rolling quarter data. This vignette explains how to use it.
Each published “quarter” is actually a 3-month moving average:
When unemployment jumps sharply in a single month, the rolling quarter spreads that spike across multiple overlapping periods. The mensalization algorithm inverts this averaging process to recover the true monthly values.
library(PNADCperiods)
# Step 1: Fetch rolling quarter data from SIDRA API
rolling_quarters <- fetch_sidra_rolling_quarters()
# Step 2: Convert to monthly estimates
monthly <- mensalize_sidra_series(rolling_quarters)
# Step 3: Use your monthly data!
head(monthly[, .(anomesexato, m_popocup, m_taxadesocup)])That’s it! You now have monthly estimates starting from January 2012.
fetch_sidra_rolling_quarters()
downloaded 86+ economic indicators from IBGE’s SIDRA API
mensalize_sidra_series() applied
the mensalization formula using pre-computed starting points (bundled
with the package)
The result is a data.table with one row per month
and m_* columns for each mensalized series
The mensalized output contains:
anomesexato: Month identifier (YYYYMM format, e.g.,
201903 = March 2019)m_* columns: Mensalized (monthly) estimates for each
seriesipca100dez1993,
inpc100dez1993 (passed through for deflation)Key series include:
| Column | Description | Unit |
|---|---|---|
m_populacao |
Total population | Thousands |
m_pop14mais |
Population 14+ years | Thousands |
m_popocup |
Employed population | Thousands |
m_popdesocup |
Unemployed population | Thousands |
m_taxadesocup |
Unemployment rate | Percent |
m_taxapartic |
Labor force participation rate | Percent |
m_massahabnominaltodos |
Total nominal wage bill | Millions R$ |
Rate series (like m_taxadesocup) are
derived from mensalized level series when
compute_derived = TRUE (the default). They are computed as
ratios of the mensalized levels, not directly mensalized from the
rolling quarter rates.
Use get_sidra_series_metadata() to explore all 86+
available series:
meta <- get_sidra_series_metadata()
# View series organized by theme
meta[, .N, by = .(theme, theme_category)]
# Filter to specific theme categories
meta[theme_category == "employment_type", .(series_name, description)]The metadata uses a hierarchical taxonomy: theme (top
level, e.g., “labor_market”), theme_category (e.g.,
“employment_type”), and optionally subcategory (e.g.,
“levels”, “rates”).
The mensalization process follows a three-step pipeline:
fetch_sidra_rolling_quarters() downloads data from five
SIDRA tables:
| Table | Content |
|---|---|
| 4093 | Population and labor force |
| 6390 | Income (nominal and real) |
| 6392 | Real income by occupation |
| 6399 | Employment by sector |
| 6906 | Underutilization indicators |
Key columns: anomesfinaltrimmovel (end month of rolling
quarter, YYYYMM), mesnotrim (month position 1/2/3), plus
one column per series.
monthly <- mensalize_sidra_series(rq, verbose = TRUE)
# Compare dimensions
cat("Rolling quarters:", nrow(rq), "rows\n")
cat("Monthly data:", nrow(monthly), "rows\n")The row count is approximately the same (one per month), but the meaning changes from “rolling quarter ending in month X” to “exact estimate for month X”.
# --- VIGNETTE CODE: plot-unemployment ---
library(ggplot2)
monthly[, date := as.Date(paste0(substr(anomesexato, 1, 4), "-",
substr(anomesexato, 5, 6), "-01"))]
ggplot(monthly, aes(x = date, y = m_taxadesocup)) +
geom_line(color = "#1976D2", linewidth = 0.8) +
labs(title = "Monthly Unemployment Rate",
x = NULL, y = "Unemployment Rate (%)")Instead of fetching all 86+ series, filter by theme or theme category:
# Only employment type series
employment <- fetch_sidra_rolling_quarters(theme_category = "employment_type")
# Only wage mass series
wages <- fetch_sidra_rolling_quarters(theme_category = "wage_mass")
# Only labor market theme (includes participation, unemployment, employment types, etc.)
labor <- fetch_sidra_rolling_quarters(theme = "labor_market")For maximum efficiency, request only the series you need:
Some series are rates computed from other series. To fetch only “base” series:
This section can be skipped by users who just need results.
Rolling quarters are 3-month moving averages. If we denote the true monthly value for month \(t\) as \(y_t\), then the rolling quarter value \(x_t\) is:
\[x_t = \frac{y_{t-2} + y_{t-1} + y_t}{3}\]
The mensalization algorithm inverts this relationship to recover \(y_t\) from the sequence of \(x_t\) values.
Step 1: Compute first differences
\[d3_t = x_t - x_{t-1}\]
Step 2: Identify month position (mesnotrim)
Each month has a position within its quarter: - Position 1: Jan, Apr, Jul, Oct - Position 2: Feb, May, Aug, Nov - Position 3: Mar, Jun, Sep, Dec
Step 3: Cumulative sum by position
For each position separately, compute the cumulative sum of first differences, starting from a calibrated “starting point” \(y_0\):
\[y_t = y_0 + \sum_{s \in \text{same position}, s \leq t} d3_s\]
The starting point \(y_0\) is crucial. It determines the level of all subsequent monthly estimates. The package includes pre-computed starting points for 53 series, calibrated during the stable 2013-2019 period.
Starting points are computed by:
The package caches SIDRA API responses in memory during your R session:
# First call: fetches from API (~10 seconds)
rq1 <- fetch_sidra_rolling_quarters()
# Second call with use_cache = TRUE: uses cached data (instant)
rq2 <- fetch_sidra_rolling_quarters(use_cache = TRUE)
# Clear all cached data (force fresh fetch on next call)
clear_sidra_cache()The cache persists until you call clear_sidra_cache() or
restart R.
| Error | Cause | Solution |
|---|---|---|
| “Series not found” | Misspelled series name | Check get_sidra_series_metadata() |
| “API timeout” | SIDRA server slow | Retry; use use_cache = TRUE |
| “No starting points” | Custom series | See Custom Starting Points below |
COVID-19 disruptions (2020): IBGE suspended in-person interviews during the pandemic. Some indicators show unusual patterns in 2020-Q2.
CNPJ series availability: Series based on CNPJ registration (empregadorcomcnpj, contapropriacomcnpj, etc.) are only available from October 2015, when V4019 was introduced.
For users with calibrated PNADC microdata.
Use the bundled starting points (default) unless:
# Load your stacked PNADC microdata (with pnadc_apply_periods weights)
stacked <- readRDS("my_calibrated_pnadc.rds")
# Compute starting points
custom_y0 <- compute_starting_points_from_microdata(
data = stacked,
calibration_start = 201301L,
calibration_end = 201912L,
verbose = TRUE
)
# Use custom starting points
monthly <- mensalize_sidra_series(rq, starting_points = custom_y0)# Step 1: Build crosswalk and calibrate
crosswalk <- pnadc_identify_periods(stacked)
calibrated <- pnadc_apply_periods(
stacked, crosswalk,
weight_var = "V1028",
anchor = "quarter",
calibration_unit = "month"
)
# Step 2: Compute z_ aggregates (monthly totals from microdata)
z_agg <- compute_z_aggregates(calibrated)
# Step 3: Fetch rolling quarters for comparison
rq <- fetch_sidra_rolling_quarters()
# Step 4: Compute starting points
y0 <- compute_series_starting_points(
monthly_estimates = z_agg,
rolling_quarters = rq,
calibration_start = 201301L,
calibration_end = 201912L
)
# Step 5: Use custom starting points
result <- mensalize_sidra_series(rq, starting_points = y0)CNPJ-based series automatically use a later calibration period
(2016-2019) when use_series_specific_periods = TRUE (the
default in compute_series_starting_points()).
bundled <- pnadc_series_starting_points
# Merge and compare
comp <- merge(custom_y0, bundled,
by = c("series_name", "mesnotrim"),
suffixes = c("_custom", "_bundled"))
comp[, rel_diff := abs(y0_custom - y0_bundled) / abs(y0_bundled) * 100]
comp[rel_diff > 1] # Flag series with >1% differenceHow quickly did unemployment rise when COVID-19 hit Brazil? Rolling quarter data obscures these dynamics. Monthly estimates reveal the exact timing.
# --- VIGNETTE CODE: covid-analysis ---
# Fetch all series and mensalize
rq <- fetch_sidra_rolling_quarters()
monthly <- mensalize_sidra_series(rq)
# Filter to COVID period
covid_period <- monthly[anomesexato >= 201901 & anomesexato <= 202212]
# Create date column
covid_period[, date := as.Date(paste0(
substr(anomesexato, 1, 4), "-",
substr(anomesexato, 5, 6), "-01"
))]
# Find peak
peak_month <- covid_period[which.max(m_taxadesocup)]
cat("Peak unemployment:", peak_month$m_taxadesocup, "% in",
format(peak_month$date, "%B %Y"), "\n")Key findings from monthly estimates:
Exact peak timing: Monthly data pinpoints the peak month, while rolling quarters show only a gradual rise
Speed of impact: The monthly series reveals a sharp spike that rolling quarters smooth over 3+ months
Recovery dynamics: Monthly estimates show pauses and reversals in recovery that are hidden in quarterly averages
| Pattern | Meaning | Example |
|---|---|---|
m_ |
Mensalized monthly estimate | m_popocup |
pop* |
Population count | populacao, pop14mais |
*comcart |
With formal contract | empregprivcomcart |
*semcart |
Without formal contract | empregprivsemcart |
*comcnpj |
With CNPJ registration | empregadorcomcnpj |
taxa* |
Rate (percent) | taxadesocup |
nivel* |
Level/ratio (percent) | nivelocup |
rend* |
Income (rendimento) | rendhabnominaltodos |
massa* |
Wage bill (massa salarial) | massahabnominaltodos |
*hab* |
Usually received (habitual) | rendhabnominaltodos |
*efet* |
Actually received (efetivo) | rendefetnominaltodos |
For the complete catalog, use
get_sidra_series_metadata():
meta <- get_sidra_series_metadata()
# Filter by theme category
meta[theme_category == "employment_type", .(series_name, description)]
# Filter by theme and pattern
meta[theme == "labor_market" & grepl("taxa|nivel", series_name),
.(series_name, description)]| Function | Purpose |
|---|---|
fetch_sidra_rolling_quarters() |
Download rolling quarter data from SIDRA API |
fetch_monthly_population() |
Get monthly population estimates |
mensalize_sidra_series() |
Convert rolling quarters to monthly estimates |
get_sidra_series_metadata() |
Explore available series and metadata |
clear_sidra_cache() |
Clear cached API data |
compute_z_aggregates() |
Compute monthly aggregates from calibrated microdata |
compute_series_starting_points() |
Compute \(y_0\) values from aggregates |
compute_starting_points_from_microdata() |
All-in-one \(y_0\) computation |
Bundled data:
pnadc_series_starting_points — pre-computed \(y_0\) for 53 series x 3 month positions
(calibration period: 2013-2019).
vignette("getting-started") — Setting up PNADC
microdata analysisvignette("how-it-works") — The period identification
algorithmvignette("applied-examples") — Applied research
examples