| Title: | Identify Reference Periods in Brazil's PNADC Survey Data |
| Version: | 0.1.1 |
| Description: | Identifies reference periods (months, fortnights, and weeks) in Brazil's quarterly PNADC (Pesquisa Nacional por Amostra de Domicilios Continua) survey data and computes calibrated weights for sub-quarterly analysis. The core algorithm uses IBGE (Instituto Brasileiro de Geografia e Estatistica) 'Parada Tecnica' (technical break) rules combined with respondent birthdates to determine which temporal period each survey observation refers to. Period identification follows a nested hierarchy enforced by construction: fortnights require months, weeks require fortnights. Achieves approximately 97% monthly determination rate with the full series (2012-2025). Strict fortnight and week rates are approximately 9% and 3% respectively, as they cannot leverage cross-quarter panel aggregation. Experimental strategies (probabilistic assignment and UPA (Primary Sampling Unit) aggregation) further improve these determination rates. The package provides adaptive hierarchical weight calibration (4/2/1 cell levels for month/fortnight/week) with period-specific smoothing to produce survey weights calibrated to SIDRA (Sistema IBGE de Recuperacao Automatica) population totals. Also includes a SIDRA mensalization module that converts 86+ official rolling quarter series from the IBGE SIDRA API (Application Programming Interface) into exact monthly estimates, without requiring access to microdata. Hecksher (2020) https://repositorio.ipea.gov.br/handle/11058/9859. |
| License: | MIT + file LICENSE |
| Language: | en-US |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| Depends: | R (≥ 4.1.0) |
| Imports: | data.table (≥ 1.14.0), checkmate (≥ 2.0.0), sidrar (≥ 0.2.9), lubridate (≥ 1.9.4) |
| Suggests: | dplyr, fst, haven, testthat (≥ 3.0.0), knitr, rmarkdown, pkgdown, ggplot2, scales |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| URL: | https://antrologos.github.io/PNADCperiods/, https://github.com/antrologos/PNADCperiods |
| BugReports: | https://github.com/antrologos/PNADCperiods/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-04-15 18:54:35 UTC; antro |
| Author: | Rogerio Barbosa |
| Maintainer: | Rogerio Barbosa <rogerio.barbosa@iesp.uerj.br> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-21 18:42:40 UTC |
PNADCperiods: Identify Reference Periods in Brazil's PNADC Survey Data
Description
The PNADCperiods package provides tools to identify the exact reference period (month, fortnight, or week) of Brazil's quarterly official household survey, PNADC (Pesquisa Nacional por Amostra de Domicilios Continua - IBGE), allowing for analyzing the survey data at sub-quarterly temporal granularity.
Details
The package offers four main capabilities:
-
Reference period identification: Determines which temporal period (month, fortnight, or week) within each quarter each survey observation refers to, using IBGE's "Parada Técnica" rules and respondent birthdates. The identification is nested by construction: fortnights require months, and weeks require fortnights.
-
Period-specific weight calibration: Adjusts survey weights for sub-quarterly estimates using adaptive hierarchical rake weighting (4/2/1 cell levels for month/fortnight/week respectively)
-
Experimental strategies: Probabilistic assignment and UPA aggregation to boost fortnight/week determination rates for sensitivity analysis
-
SIDRA series mensalization: Converts IBGE's rolling quarterly (trimestre móvel) aggregate series to exact monthly estimates
Determination rates (strict, full series 2012-2025):
Monthly: ~97\
Fortnight: ~9\
Week: ~3\
With smaller datasets, rates may differ (e.g., 8 quarters: ~94\ Experimental strategies (probabilistic + UPA aggregation) further improve these determination rates.
The package is highly optimized for large datasets (~450,000 rows/sec). Uses pre-computed lookup tables for 20x faster date creation.
Note: Strict fortnight and week determination rates are inherently low because they cannot leverage cross-quarter aggregation like months can. Only birthday constraints within a single quarter are available to narrow the interview window.
The main functions are:
-
pnadc_identify_periods: Builds a crosswalk containing month, fortnight, and week reference periods with IBGE calendar-based dates -
pnadc_apply_periods: Applies the crosswalk to any PNADC dataset and optionally calibrates weights -
pnadc_experimental_periods: Applies experimental strategies (probabilistic, UPA aggregation) for improved fortnight/week determination -
get_sidra_series_metadata: Lists 86+ available PNADC rolling quarter series from SIDRA -
fetch_sidra_rolling_quarters: Downloads rolling quarterly data from IBGE SIDRA API -
mensalize_sidra_series: Converts rolling quarters to exact monthly estimates
Author(s)
Rogerio Barbosa rogerio.barbosa@iesp.uerj.br (R package, dashboard, and website; Ceres-IESP/UERJ)
Marcos Hecksher mdhecksher@gmail.com (mensalization methodology; Ipea)
References
HECKSHER, Marcos. "Valor Impreciso por Mes Exato: Microdados e Indicadores Mensais Baseados na Pnad Continua". IPEA - Nota Tecnica Disoc, n. 62. Brasilia, DF: IPEA, 2020. https://portalantigo.ipea.gov.br/portal/index.php?option=com_content&view=article&id=35453
HECKSHER, M. "Cinco meses de perdas de empregos e simulacao de um incentivo a contratacoes". IPEA - Nota Tecnica Disoc, n. 87. Brasilia, DF: IPEA, 2020.
HECKSHER, Marcos. "Mercado de trabalho: A queda da segunda quinzena de marco, aprofundada em abril". IPEA - Carta de Conjuntura, v. 47, p. 1-6, 2020.
IBGE. Manual Basico da Entrevista PNADC (methodology on "Parada Tecnica").
See Also
Useful links:
Report bugs at https://github.com/antrologos/PNADCperiods/issues
Get Month Position in Quarter (mesnotrim)
Description
Given a month (1-12), returns its position in the rolling quarter (1, 2, or 3).
January, April, July, October = position 1
February, May, August, November = position 2
March, June, September, December = position 3
Usage
.get_mesnotrim(month)
Arguments
month |
Integer. Month number (1-12). |
Value
Integer. Position in quarter (1, 2, or 3).
Clear All SIDRA Caches
Description
Clears all cached SIDRA data (both rolling quarter series and population). Use this if you need to force a fresh download, for example after IBGE updates their data.
Usage
clear_sidra_cache()
Value
Invisibly returns TRUE if any cache was cleared, FALSE if all empty.
Examples
clear_sidra_cache()
Compute Starting Points from Microdata
Description
For advanced users who want to compute custom starting points using their own calibrated microdata estimates.
Usage
compute_series_starting_points(
monthly_estimates,
rolling_quarters,
calibration_start = NULL,
calibration_end = NULL,
scale_factor = 1000,
use_series_specific_periods = TRUE,
verbose = TRUE
)
Arguments
monthly_estimates |
data.table with columns:
|
rolling_quarters |
data.table from |
calibration_start |
Integer. Start of calibration period (YYYYMM). Default NULL uses .PNADC_DATES$DEFAULT_CALIB_START (201301). Note: CNPJ series automatically use CNPJ_CALIB_START (201601) regardless. |
calibration_end |
Integer. End of calibration period (YYYYMM). Default NULL uses .PNADC_DATES$DEFAULT_CALIB_END (201912). |
scale_factor |
Numeric. Scale factor for z_ values (usually 1000). Default 1000. |
use_series_specific_periods |
Logical. If TRUE (default), use series-specific calibration periods for CNPJ series (201601-201912) and cumsum starting dates (201510). Set to FALSE to use uniform calibration for all series. |
verbose |
Logical. Print progress? Default TRUE. |
Details
The starting points (y0) are computed by:
Calculating cumulative variations from SIDRA rolling quarters
Computing backprojection: e0 = z / scale_factor - cum
Averaging e0 by mesnotrim over the calibration period
Value
data.table with columns:
- series_name
Character. Series name
- mesnotrim
Integer. Month position (1, 2, or 3)
- y0
Numeric. Starting point value
Series-Specific Handling
When use_series_specific_periods = TRUE, the following series receive
special handling for series-specific data availability:
- CNPJ series
empregadorcomcnpj, empregadorsemcnpj, contapropriacomcnpj, contapropriasemcnpj use calibration period 201601-201912 and cumsum starts from 201510 (when V4019 became available)
Examples
## Not run:
rq <- fetch_sidra_rolling_quarters()
z_agg <- compute_z_aggregates(calibrated_data)
y0 <- compute_series_starting_points(z_agg, rq)
monthly <- mensalize_sidra_series(rq, starting_points = y0)
## End(Not run)
Compute Starting Points from Raw PNADC Microdata
Description
Complete workflow to compute y0 starting points from raw PNADC microdata. This is a convenience wrapper that combines period identification, weight calibration, z_ aggregation, and starting point computation.
Usage
compute_starting_points_from_microdata(
data,
calibration_start = NULL,
calibration_end = NULL,
verbose = TRUE
)
Arguments
data |
Stacked PNADC microdata (multiple quarters). Must contain variables
for period identification (see |
calibration_start |
Integer. Start of calibration period (YYYYMM). Default NULL uses .PNADC_DATES$DEFAULT_CALIB_START (201301). |
calibration_end |
Integer. End of calibration period (YYYYMM). Default NULL uses .PNADC_DATES$DEFAULT_CALIB_END (201912). |
verbose |
Print progress messages. |
Details
This function performs the complete workflow:
Build crosswalk via
pnadc_identify_periods()Calibrate weights via
pnadc_apply_periods()Compute z_ aggregates via
compute_z_aggregates()Fetch SIDRA rolling quarters
Compute starting points via
compute_series_starting_points()
Value
data.table with columns:
- series_name
Character. Series name
- mesnotrim
Integer. Month position (1, 2, or 3)
- y0
Numeric. Starting point value
Weight Calibration
All months are scaled uniformly to SIDRA monthly population totals.
See Also
pnadc_apply_periods for the weight calibration step
compute_z_aggregates for the z_ aggregation step
compute_series_starting_points for the y0 computation
pnadc_identify_periods for period identification
Examples
## Not run:
stacked <- fst::read_fst("pnadc_stacked.fst", as.data.table = TRUE)
y0 <- compute_starting_points_from_microdata(stacked)
bundled <- pnadc_series_starting_points
comparison <- merge(y0, bundled, by = c("series_name", "mesnotrim"))
## End(Not run)
Compute z_ Aggregates from Monthly Microdata
Description
Computes monthly z_ aggregates from PNADC microdata using calibrated monthly weights, with options for different population scaling approaches.
Usage
compute_z_aggregates(calibrated_data, verbose = TRUE)
Arguments
calibrated_data |
PNADC microdata output from |
verbose |
Print progress messages. |
Details
This function creates z_ indicator variables and aggregates them using the
calibrated weight_monthly from pnadc_apply_periods().
The pnadc_apply_periods() function implements the calibration
methodology as follows:
All months are scaled uniformly to SIDRA monthly population totals.
This function simply aggregates the indicators using the already-calibrated weights.
Value
data.table with columns:
- anomesexato
Integer YYYYMM month
- z_*
Numeric weighted aggregates for each series (one column per series)
See Also
pnadc_apply_periods for the calibration step
compute_series_starting_points for the y0 computation
Examples
## Not run:
crosswalk <- pnadc_identify_periods(stacked_data)
calibrated <- pnadc_apply_periods(stacked_data, crosswalk,
weight_var = "V1028",
anchor = "quarter",
calibration_unit = "month")
z_agg <- compute_z_aggregates(calibrated)
rq <- fetch_sidra_rolling_quarters()
y0 <- compute_series_starting_points(z_agg, rq)
## End(Not run)
Fetch Monthly Population from SIDRA
Description
Functions to download and transform IBGE's population estimates from SIDRA API for use in monthly weight calibration.
Fetch SIDRA Rolling Quarter Series
Description
Functions to download PNADC labor market indicators from IBGE's SIDRA API. These series are published as rolling quarterly averages (trimestre movel).
Fetch Monthly Population from SIDRA
Description
Downloads population estimates from IBGE SIDRA API (table 6022) and transforms from moving-quarter to exact monthly values.
Usage
fetch_monthly_population(
start_yyyymm = NULL,
end_yyyymm = NULL,
verbose = TRUE,
use_cache = FALSE,
cache_max_age_hours = 24
)
Arguments
start_yyyymm |
Integer. First month to include (YYYYMM format). If NULL, returns all available months. |
end_yyyymm |
Integer. Last month to include (YYYYMM format). If NULL, returns all available months. |
verbose |
Logical. Print progress messages? Default TRUE. |
use_cache |
Logical. If TRUE, uses cached data if available and not expired. Default FALSE (always fetch fresh data for consistency). Set to TRUE for faster repeated calls during development. |
cache_max_age_hours |
Numeric. Maximum cache age in hours before automatic expiration when use_cache=TRUE. Default 24 hours. |
Details
SIDRA table 6022 provides moving-quarter population estimates. Each value represents the 3-month average centered on the middle month. For example, the value for code 201203 (quarter ending March 2012) represents the population for February 2012.
This function:
Fetches raw moving-quarter data from SIDRA
Transforms to exact monthly values by aligning with middle months
Extrapolates boundary months (first and last) using quadratic regression
The extrapolation uses quadratic regression on population differences to estimate the first month (Jan 2012) and the most recent month.
Value
A data.table with columns:
-
ref_month_yyyymm: Integer in YYYYMM format -
m_populacao: Monthly population in thousands
Dependencies
This function requires the sidrar package for API access.
Install with: install.packages("sidrar")
See Also
pnadc_apply_periods which uses this function when
calibrate = TRUE
Examples
pop <- fetch_monthly_population()
pop <- fetch_monthly_population(201301, 201912)
Fetch Rolling Quarter Series from SIDRA
Description
Downloads PNADC labor market indicators from IBGE's SIDRA API. These series are published as rolling quarterly averages (trimestre movel), with 12 observations per year.
Usage
fetch_sidra_rolling_quarters(
series = "all",
theme = NULL,
theme_category = NULL,
subcategory = NULL,
exclude_derived = FALSE,
use_cache = FALSE,
verbose = TRUE,
retry_failed = TRUE,
max_retries = 3
)
Arguments
series |
Character vector of series names to fetch, or "all" (default)
for all available series. Use |
theme |
Character vector of themes to filter by. Valid options: "labor_market", "earnings", "demographics", "social_protection", "prices". Use NULL for no filter. |
theme_category |
Character vector of theme categories to filter by. Use NULL for no filter. |
subcategory |
Character vector of subcategories to filter by. Use NULL for no filter. |
exclude_derived |
Logical. If TRUE, exclude series marked as derived (is_derived = TRUE in metadata). Default FALSE for backward compatibility. Derived series (rates) are computed from other series during mensalization, so excluding them saves API calls when fetching for mensalization. |
use_cache |
Logical. Use cached data if available? Default FALSE.
When TRUE, shows the date when data was cached (may be outdated).
Use |
verbose |
Logical. Print progress messages? Default TRUE. |
retry_failed |
Logical. Retry failed series downloads? Default TRUE. |
max_retries |
Integer. Maximum retry attempts per series. Default 3. |
Details
Rolling quarters are labeled by their ending month:
201201 = Nov 2011 - Jan 2012 (mesnotrim = 1)
201202 = Dec 2011 - Feb 2012 (mesnotrim = 2)
201203 = Jan - Mar 2012 (mesnotrim = 3)
201204 = Feb - Apr 2012 (mesnotrim = 1)
etc.
The mesnotrim column indicates the month's position within its rolling
quarter, which is essential for the mensalization algorithm.
Value
A data.table with columns:
- anomesfinaltrimmovel
Integer. YYYYMM of rolling quarter end month
- mesnotrim
Integer. Month position in quarter (1, 2, or 3)
- <series_name>
Numeric. One column per requested series
Rate Limiting
SIDRA API may have rate limits. The function includes automatic retry logic with exponential backoff for failed requests.
See Also
get_sidra_series_metadata for available series names and metadata
mensalize_sidra_series to convert to exact months
Examples
rq <- fetch_sidra_rolling_quarters(
series = c("taxadesocup", "popocup", "popdesocup")
)
head(rq)
rq_labor <- fetch_sidra_rolling_quarters(theme = "labor_market")
Get SIDRA Series Metadata
Description
Returns a data.table with metadata for all PNADC rolling quarter series available from IBGE's SIDRA API.
Usage
get_sidra_series_metadata(
series = "all",
theme = NULL,
theme_category = NULL,
subcategory = NULL,
lang = "pt"
)
Arguments
series |
Character vector of series names to retrieve, or "all" (default) for all series. |
theme |
Character vector of themes to filter by. Valid themes: "labor_market", "earnings", "demographics", "social_protection", "prices". Use NULL (default) for no filtering. |
theme_category |
Character vector of theme categories to filter by. Use NULL (default) for no filtering. |
subcategory |
Character vector of subcategories to filter by. Use NULL (default) for no filtering. |
lang |
Character. Language for descriptions: "pt" (Portuguese, default)
or "en" (English). When "en", the |
Value
A data.table with columns:
- series_name
Character. Internal name used in the package
- api_path
Character. SIDRA API path for get_sidra()
- table_id
Integer. SIDRA table number
- variable_id
Integer. SIDRA variable code
- theme
Character. Top-level theme
- theme_category
Character. Middle-level category within theme
- subcategory
Character. Optional subcategory for filtering
- description_pt
Character. Portuguese description
- description_en
Character. English description
- description
Character. Description in the requested language
- unit
Character. Unit of measurement
- unit_label_pt
Character. Unit label in Portuguese
- unit_label_en
Character. Unit label in English
- is_derived
Logical. TRUE if computed from other series
- requires_deflation
Logical. TRUE if needs IPCA deflation
Examples
meta <- get_sidra_series_metadata()
head(meta)
labor <- get_sidra_series_metadata(theme = "labor_market")
unemp <- get_sidra_series_metadata(theme = "labor_market",
theme_category = "unemployment")
meta_en <- get_sidra_series_metadata(series = c("taxadesocup", "popocup"),
lang = "en")
Mensalize SIDRA Rolling Quarter Series
Description
Functions to convert IBGE's rolling quarterly (trimestre movel) series into exact monthly estimates.
Convert Rolling Quarters to Exact Monthly Series
Description
Transforms SIDRA rolling quarterly averages into exact monthly values using the mathematical relationship between consecutive rolling quarters.
Usage
mensalize_sidra_series(
rolling_quarters,
starting_points = NULL,
series = "all",
compute_derived = TRUE,
verbose = TRUE
)
Arguments
rolling_quarters |
data.table from |
starting_points |
Optional data.table with precomputed starting points
(y0 values). If NULL (default), uses bundled |
series |
Character vector of series names to mensalize, or "all" (default) for all series in the input data (except price indices). |
compute_derived |
Logical. Compute derived series (rates, aggregates)? Default TRUE. |
verbose |
Logical. Print progress messages? Default TRUE. |
Details
The algorithm exploits the mathematical property of rolling quarterly averages:
RQ_t - RQ_{t-1} = (Month_t - Month_{t-3}) / 3
This means exact 3-month variations can be extracted from consecutive rolling quarters. By accumulating these variations separately for each month-position (1, 2, or 3), we build cumulative variation series. The only unknown is the starting level for Jan, Feb, and Mar 2012.
Starting points are estimated by:
Computing monthly estimates from calibrated microdata (z_ variables)
Calculating cumulative variations from SIDRA (cum_ variables)
Backprojecting: e0 = z_ - cum_ over calibration period (2013-2019)
Averaging e0 by month position to get y0_ for each position
Final adjustment ensures the average of 3 consecutive mensalized values equals the original rolling quarter value.
Value
A data.table with columns:
- anomesexato
Integer. YYYYMM exact month
- m_*
Numeric. Mensalized value for each series (one column per series)
Starting Points Format
If providing custom starting points, the data.table must have columns:
-
series_name: Character. Series name matching rolling_quarters columns -
mesnotrim: Integer (1, 2, or 3). Month position in quarter -
y0: Numeric. Starting point value
Mathematical Foundation
The mensalization algorithm proceeds in steps:
Calculate d3 = 3 * (RQ_t - RQ_t-1)
Separate d3 by month position: d3m1, d3m2, d3m3
Cumulate separately: cum1, cum2, cum3
Apply starting points: y = y0 + cum
Final adjustment for rolling quarter consistency
See Also
fetch_sidra_rolling_quarters to obtain input data
compute_series_starting_points for custom calibration
Examples
rq <- fetch_sidra_rolling_quarters(
series = c("taxadesocup", "popocup", "popdesocup")
)
monthly <- mensalize_sidra_series(rq)
head(monthly)
Apply Reference Period Crosswalk to PNADC Data
Description
This function takes a crosswalk from pnadc_identify_periods and
applies it to any PNADC dataset (quarterly or annual). It can optionally
calibrate the survey weights to match external population totals at the
chosen temporal granularity (month, fortnight, or week).
Usage
pnadc_apply_periods(
data,
crosswalk,
weight_var,
anchor,
calibrate = TRUE,
calibration_unit = c("month", "fortnight", "week"),
calibration_min_cell_size = 1,
target_totals = NULL,
smooth = FALSE,
keep_all = TRUE,
verbose = TRUE
)
Arguments
data |
A data.frame or data.table with PNADC microdata. Must contain
join keys |
crosswalk |
A data.table crosswalk from |
weight_var |
Character. Name of the survey weight column. Must be specified:
|
anchor |
Character. How to anchor the weight redistribution. Must be specified:
|
calibrate |
Logical. If TRUE (default), calibrate weights to external population totals. If FALSE, only merge the crosswalk without calibration. |
calibration_unit |
Character. Temporal unit for weight calibration.
One of |
calibration_min_cell_size |
Integer. Minimum sample size required in a cell for it to be used in hierarchical raking. Cells smaller than this threshold are collapsed to coarser levels. Default: 1 (use all cells). |
target_totals |
Optional data.table with population targets. If NULL (default), fetches monthly population from SIDRA and derives targets for fortnight/week. Each time period (month, fortnight, or week) is calibrated to the FULL Brazilian population from SIDRA. If providing custom targets, the population column ( |
smooth |
Logical. If TRUE, smooth calibrated weights to remove quarterly artifacts. Smoothing is adapted per time period: monthly (3-period window), fortnight (7-period window), weekly (no smoothing). Default: FALSE. |
keep_all |
Logical. If TRUE (default), keep all observations including those with undetermined reference periods. If FALSE, drop undetermined rows. |
verbose |
Logical. If TRUE (default), print progress messages. |
Details
Merges a reference period crosswalk with PNADC microdata and optionally calibrates survey weights for sub-quarterly analysis.
Weight Calibration
When calibrate = TRUE, the function performs hierarchical rake weighting:
Groups observations by nested demographic/geographic cells
Iteratively adjusts weights so sub-period totals match anchor-period totals
Calibrates final weights against external population totals (FULL Brazilian population)
Optionally smooths weights to remove quarterly artifacts
Population Targets
All time periods (months, fortnights, and weeks) are calibrated to the FULL Brazilian population from SIDRA. This means:
Monthly weights sum to the Brazilian population for that month
Fortnight weights sum to the Brazilian population for the containing month
Weekly weights sum to the Brazilian population for the containing month
Hierarchical Raking Levels
The number of hierarchical cell levels is automatically adjusted based on the calibration unit to avoid sparse cell issues:
-
"month": 4 levels (age, region, state, post-stratum) - full hierarchy -
"fortnight": 2 levels (age, region) - simplified for lower sample size -
"week": 1 level (age groups only) - minimal hierarchy for sparse data
Anchor Period
The anchor parameter determines how weights are redistributed:
-
"quarter": Quarterly totals are preserved and redistributed to months/fortnights/weeks -
"year": Yearly totals are preserved and redistributed to months/fortnights/weeks
Use anchor = "quarter" with quarterly V1028 weights, and
anchor = "year" with annual V1032 weights.
Value
A data.table with the input data plus crosswalk columns:
- ref_month_in_quarter, ref_month_in_year
Month position (1-3 in quarter, 1-12 in year)
- ref_fortnight_in_month, ref_fortnight_in_quarter
Fortnight position (1-2 in month, 1-6 in quarter)
- ref_week_in_month, ref_week_in_quarter
Week position (1-4 in month, 1-12 in quarter)
- ref_month_yyyymm, ref_fortnight_yyyyff, ref_week_yyyyww
Integer period codes
- determined_month, determined_fortnight, determined_week
Logical determination flags
- weight_monthly, weight_fortnight, or weight_weekly
Calibrated weights (if calibrate=TRUE)
See Also
pnadc_identify_periods to build the crosswalk
Examples
## Not run:
crosswalk <- pnadc_identify_periods(pnadc_stacked)
result <- pnadc_apply_periods(
pnadc_2023,
crosswalk,
weight_var = "V1028",
anchor = "quarter"
)
result <- pnadc_apply_periods(
pnadc_annual,
crosswalk,
weight_var = "V1032",
anchor = "year"
)
result <- pnadc_apply_periods(
pnadc_2023,
crosswalk,
weight_var = "V1028",
anchor = "quarter",
calibration_unit = "week"
)
result <- pnadc_apply_periods(
pnadc_2023,
crosswalk,
weight_var = "V1028",
anchor = "quarter",
calibrate = FALSE
)
## End(Not run)
Experimental Period Identification Strategies
Description
Three experimental strategies are available, all properly nested by period:
-
probabilistic: For narrow ranges (2 possible periods), classifies based on where most of the date interval falls. Assigns only when confidence exceeds threshold.
-
upa_aggregation: Extends strictly identified periods to other observations in the same UPA-V1014 within the quarter, if a sufficient proportion already have strict identification.
-
both: Sequentially applies probabilistic strategy first, then UPA aggregation on top. Guarantees identification rate >= max of individual strategies.
Usage
pnadc_experimental_periods(
crosswalk,
strategy = c("probabilistic", "upa_aggregation", "both"),
confidence_threshold = 0.9,
upa_proportion_threshold = 0.5,
verbose = TRUE
)
Arguments
crosswalk |
A crosswalk data.table from |
strategy |
Character specifying which strategy to apply. Options: "probabilistic", "upa_aggregation", "both" |
confidence_threshold |
Numeric (0-1). Minimum confidence required to assign a probabilistic period. Used by probabilistic and combined strategies. Default 0.9. |
upa_proportion_threshold |
Numeric (0-1). Minimum proportion of UPA observations (within quarter) that must have strict identification with consensus for extending to unidentified observations. Default 0.5. |
verbose |
Logical. If TRUE, print progress information. |
Details
Provides experimental strategies for improving period identification rates beyond the standard deterministic algorithm. All strategies respect the nested identification hierarchy: weeks require fortnights, fortnights require months.
Nesting Enforcement
All strategies enforce proper nesting:
Fortnights can only be assigned if month is identified (strictly OR experimentally)
Weeks can only be assigned if fortnight is identified (strictly OR experimentally)
Probabilistic Strategy
For each period type (processed in order: months, then fortnights, then weeks):
Check that the required parent period is identified
If bounds are narrowed to exactly 2 sequential periods, calculate which period contains most of the date interval
Calculate confidence based on the proportion of interval in the likely period (0-1)
Only assign if confidence >=
confidence_threshold
For months: aggregates at UPA-V1014 level across all quarters (like strict algorithm) For fortnights and weeks: works at household level within quarter
UPA Aggregation Strategy
Extends strictly identified periods based on consensus within geographic groups:
-
Months: Uses UPA level within quarter
-
Fortnights/Weeks: Uses UPA level within quarter (all households in same UPA are interviewed in same fortnight/week within a quarter)
Calculate proportion of observations with strictly identified period
If proportion >=
upa_proportion_thresholdAND consensus exists, extendApply in nested order: months first, then fortnights, then weeks
Combined Strategy ("both")
Sequentially applies both strategies to maximize identification:
First, apply the probabilistic strategy (captures observations with narrow date ranges and high confidence)
Then, apply UPA aggregation (extends based on strict consensus within UPA/UPA-V1014 groups)
This guarantees that "both" identifies at least as many observations as either individual strategy alone. The strategies operate independently (UPA aggregation considers only strict identifications), so the result is the union of both strategies.
Integration with Weight Calibration
The output can be passed directly to pnadc_apply_periods() for weight calibration.
The derived columns combine strict and experimental assignments, with strict taking priority. Use the
probabilistic_assignment flag to filter if you only want strict determinations.
Value
A modified crosswalk with additional columns. Output is directly compatible
with pnadc_apply_periods():
-
ref_month_in_quarter,ref_month_in_year,ref_month_yyyymm: Month position (combined strict + experimental, strict takes priority) -
ref_fortnight_in_month,ref_fortnight_in_quarter,ref_fortnight_yyyyff: Fortnight position (combined strict + experimental) -
ref_week_in_month,ref_week_in_quarter,ref_week_yyyyww: Week position (combined strict + experimental) -
determined_month,determined_fortnight,determined_week: TRUE if period is assigned (strictly or experimentally) -
determined_probable_month,determined_probable_fortnight,determined_probable_week: TRUE if period was assigned by probabilistic strategy -
probabilistic_assignment: TRUE if any period was assigned experimentally (vs strictly deterministic) -
week_1_start,week_1_end, ...,week_4_start,week_4_end: IBGE week boundaries for the assigned month
Note
These strategies produce "experimental" assignments, not strict determinations.
The standard pnadc_identify_periods() function should be used for
rigorous analysis. Experimental outputs are useful for:
Sensitivity analysis
Robustness checks
Research into identification algorithm improvements
See Also
pnadc_identify_periods to build the crosswalk that this function modifies.
pnadc_apply_periods to apply period crosswalk and calibrate weights.
Examples
## Not run:
crosswalk <- pnadc_identify_periods(pnadc_data)
crosswalk_exp <- pnadc_experimental_periods(
crosswalk,
strategy = "probabilistic",
confidence_threshold = 0.9
)
crosswalk_exp[, .(
strict = sum(!is.na(ref_month_in_quarter) & !probabilistic_assignment),
experimental = sum(probabilistic_assignment, na.rm = TRUE),
total = sum(determined_month)
)]
result <- pnadc_apply_periods(pnadc_data, crosswalk_exp,
weight_var = "V1028", anchor = "quarter")
strict_only <- crosswalk_exp[
probabilistic_assignment == FALSE | is.na(probabilistic_assignment)
]
## End(Not run)
Identify Reference Periods in PNADC Data
Description
PNADC is a quarterly survey, but each interview actually refers to a specific temporal period within the quarter. This function identifies which month, fortnight (quinzena), and week each observation belongs to, enabling sub-quarterly time series analysis.
The algorithm uses a nested identification approach:
-
Phase 1: Identify MONTHS for all observations using:
IBGE's reference week timing rules (first reference week – ending in a Saturday – with sufficient days)
Respondent birthdates to constrain possible interview dates
UPA-panel level aggregation across ALL quarters (panel design)
Dynamic exception detection (identifies quarters needing relaxed rules)
-
Phase 2: Identify FORTNIGHTS for month-determined observations:
Search space constrained to 2 fortnights within determined month
Household-level aggregation within each quarter
-
Phase 3: Identify WEEKS for fortnight-determined observations:
Search space constrained to ~2 weeks within determined fortnight
Household-level aggregation within each quarter
Usage
pnadc_identify_periods(data, verbose = TRUE, store_date_bounds = FALSE)
Arguments
data |
A data.frame or data.table with PNADC microdata. Required columns:
Optional but recommended:
|
verbose |
Logical. If TRUE (default), display progress information. |
store_date_bounds |
Logical. If TRUE, stores date bounds and exception
flags in the crosswalk for optimization when calling
|
Details
Builds a crosswalk containing reference periods (month, fortnight, and week) for PNADC survey data based on IBGE's interview timing rules.
Temporal Granularity
The crosswalk contains three levels of temporal granularity:
-
Month: 3 per quarter, ~97\
-
Fortnight (quinzena): 6 per quarter, ~9\
-
Week: 12 per quarter, ~3\
Cross-Quarter Aggregation (Important!)
For optimal month determination rates, input data should be stacked across multiple quarters (ideally 4+ years). The algorithm leverages PNADC's rotating panel design where the same UPA-V1014 is interviewed in the same relative position across quarterly visits.
Fortnight Definition
Fortnights are numbered 1-6 per quarter (2 per month), based on the IBGE reference week calendar (not calendar days). Each IBGE "month" consists of exactly 4 reference weeks (28 days), starting on a Sunday:
Fortnight 1 in month: IBGE weeks 1-2 (days 1-14 of the IBGE month)
Fortnight 2 in month: IBGE weeks 3-4 (days 15-28 of the IBGE month)
Value
A data.table crosswalk with columns:
- Ano, Trimestre, UPA, V1008, V1014
Join keys (year, quarter, UPA, household, panel)
- ref_month_in_quarter
Integer. Month position in quarter (1, 2, 3) or NA
- ref_month_in_year
Integer. Month position in year (1-12) or NA
- ref_fortnight_in_month
Integer. Fortnight position in month (1 or 2) or NA
- ref_fortnight_in_quarter
Integer. Fortnight position in quarter (1-6) or NA
- ref_week_in_month
Integer. Week position in month (1-4) or NA
- ref_week_in_quarter
Integer. Week position in quarter (1-12) or NA
- date_min
Date. Lower bound of the interview reference date for the individual. Only returned if store_date_bounds = TRUE
- date_max
Date. Upper bound of the interview reference date for the individual. Only returned if store_date_bounds = TRUE
- week_1_start
Date. Sunday of the IBGE first reference week of the month. Only returned if store_date_bounds = TRUE
- week_1_end
Date. Saturday of the IBGE first reference week of the month. Only returned if store_date_bounds = TRUE
- week_2_start
Date. Sunday of the IBGE second reference week of the month. Only returned if store_date_bounds = TRUE
- week_2_end
Date. Saturday of the IBGE second reference week of the month. Only returned if store_date_bounds = TRUE
- week_3_start
Date. Sunday of the IBGE third reference week of the month. Only returned if store_date_bounds = TRUE
- week_3_end
Date. Saturday of the IBGE third reference week of the month. Only returned if store_date_bounds = TRUE
- week_4_start
Date. Sunday of the IBGE fourth reference week of the month. Only returned if store_date_bounds = TRUE
- week_4_end
Date. Saturday of the IBGE fourth reference week of the month. Only returned if store_date_bounds = TRUE
- month_max_upa
Integer. Maximum month position across UPA-V1014 group (for debugging). Only returned if store_date_bounds = TRUE
- month_min_upa
Integer. Minimum month position across UPA-V1014 group (for debugging). Only returned if store_date_bounds = TRUE
- fortnight_max_hh
Integer. Maximum fortnight position within household (for debugging). Only returned if store_date_bounds = TRUE
- fortnight_min_hh
Integer. Minimum fortnight position within household (for debugging). Only returned if store_date_bounds = TRUE
- week_min_hh
Integer. Minimum week position within household (for debugging). Only returned if store_date_bounds = TRUE
- week_max_hh
Integer. Maximum week position within household (for debugging). Only returned if store_date_bounds = TRUE
- ref_month_yyyymm
Integer. Identified reference month in the format YYYYMM, where MM follows the IBGE calendar. 1 <= MM <= 12
- ref_fortnight_yyyyff
Integer. Identified reference fortnight in the format YYYYFF, where FF follows the IBGE calendar. 1 <= FF <= 24
- ref_week_yyyyww
Integer. Identified reference Week in the format YYYYWW, where WW follows the IBGE calendar. 1 <= WW <= 48
- determined_month
Logical. Flags if the month was determined.
- determined_fortnight
Logical. Flags if the fortnight was determined.
- determined_week
Logical. Flags if the week was determined.
Note
Nested Identification Hierarchy
The algorithm enforces strict nesting by construction:
Fortnights can ONLY be identified for observations with determined months
Weeks can ONLY be identified for observations with determined fortnights
This guarantees: determined_week => determined_fortnight => determined_month
Aggregation Levels
The crosswalk aggregates at different levels:
-
Months: UPA-V1014 level across ALL quarters (PNADC panel design ensures same month position)
-
Fortnights: Household level within quarter only
-
Weeks: Household level within quarter only
See Also
pnadc_apply_periods to apply the crosswalk and
calibrate weights
Examples
## Not run:
crosswalk <- pnadc_identify_periods(pnadc_stacked)
crosswalk[, .(
month_rate = mean(determined_month),
fortnight_rate = mean(determined_fortnight),
week_rate = mean(determined_week)
)]
crosswalk[determined_fortnight, all(determined_month)]
crosswalk[determined_week, all(determined_fortnight)]
result <- pnadc_apply_periods(pnadc_2023, crosswalk,
weight_var = "V1028",
anchor = "quarter")
## End(Not run)
Starting Points for SIDRA Series Mensalization
Description
Pre-computed starting point values (y0) for mensalizing IBGE's rolling quarterly series into exact monthly estimates.
Usage
data(pnadc_series_starting_points)
Format
A data.table with 159 rows and 3 columns:
- series_name
Character. Name of the SIDRA series (53 series)
- mesnotrim
Integer. Month position in quarter (1, 2, or 3)
- y0
Numeric. Starting point value for this series and position
Details
These starting points were computed from PNADC microdata using the full
R package pipeline, ensuring consistency with
compute_starting_points_from_microdata:
Weight calibration via
pnadc_apply_periods: all months scaled to SIDRA monthly population totalsz_ aggregates computed via
compute_z_aggregatesusing calibratedweight_monthlyStarting points computed via
compute_series_starting_pointswith CNPJ-aware calibration periods
The calibration period (2013-2019) was chosen because:
It includes stable pre-pandemic data
IBGE methodology was consistent during this period
Sufficient observations for reliable estimates
CNPJ series (empregadorcomcnpj, empregadorsemcnpj, contapropriacomcnpj, contapropriasemcnpj) use calibration period 2016-2019 with cumulative sum starting from October 2015 due to V4019 variable availability.
Methodology Consistency
The bundled starting points are generated using the same pipeline as
compute_starting_points_from_microdata, ensuring that users
who compute custom starting points will get consistent results.
When to Use Custom Starting Points
The bundled starting points are suitable for most users. Consider computing
custom starting points with compute_starting_points_from_microdata if:
IBGE makes major methodological changes to the PNADC
You need series not included in the bundled data
You want to use a different calibration period
You are working with updated or different microdata
Source
Computed using data-raw/regenerate_starting_points_from_microdata.R
See Also
mensalize_sidra_series which uses this data by default
compute_series_starting_points for custom calibration
Examples
data(pnadc_series_starting_points)
head(pnadc_series_starting_points)
unique(pnadc_series_starting_points$series_name)
SIDRA Series Metadata for PNADC Mensalization
Description
This file contains metadata definitions for all 86 SIDRA series used in the mensalization process. It maps series names to their SIDRA API endpoints, table IDs, variable codes, and hierarchical categorization.
Date Utility Functions
Description
Internal helper functions for date calculations.
This file includes:
Day of week calculation (dow)
ISO 8601 week utilities (Monday-Sunday weeks)
IBGE first Saturday calculation for reference weeks
Month position calculation for quarter mapping
Input Validation Utilities
Description
Internal helper functions for validating PNADC input data.
Validate PNADC Input Data
Description
Checks that input data has required columns for the specified processing.
Usage
validate_pnadc(data, check_weights = FALSE, stop_on_error = TRUE)
Arguments
data |
A data.frame or data.table with PNADC microdata |
check_weights |
Logical. If TRUE, also check for weight-related variables. |
stop_on_error |
Logical. If TRUE, stops with an error. If FALSE, returns a validation report list. |
Details
The function performs the following validations:
Checks for required columns for reference period identification:
Ano,Trimestre,UPA,V1008,V1014,V2008,V20081,V20082,V2009Validates year range (2012-2100 for PNADC coverage)
Validates quarter values (must be 1-4)
Validates birth day values (must be 1-31 or 99 for unknown)
Validates birth month values (must be 1-12 or 99 for unknown)
Warns about unusual ages (outside 0-130 range)
If
check_weights = TRUE, also validates weight-related columns:V1028,UF,posest,posest_sxi
Value
If stop_on_error = TRUE, returns invisibly if valid or stops with error.
If stop_on_error = FALSE, returns a list with:
-
valid: Logical indicating if data passed all validations -
issues: Named list of validation issues found (empty if none) -
n_rows: Number of rows in input data -
n_cols: Number of columns in input data -
join_keys_available: Character vector of available join key columns
See Also
pnadc_identify_periods which calls this function
internally to validate input data.
Examples
# Minimal valid data (all 9 required columns)
sample_data <- data.frame(
Ano = 2023L, Trimestre = 1L, UPA = 110000001L,
V1008 = 1L, V1014 = 1L,
V2008 = 15L, V20081 = 3L, V20082 = 1990L, V2009 = 33L
)
validate_pnadc(sample_data)
# Data with missing columns returns issues (non-stop mode)
incomplete_data <- data.frame(Ano = 2023L, Trimestre = 1L)
result <- validate_pnadc(incomplete_data, stop_on_error = FALSE)
result$valid # FALSE
result$issues # lists missing columns