---
title: "Utility-analysis taxonomy for personnel selection"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Utility-analysis taxonomy for personnel selection}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```


## Purpose

The package is organised around a diagnostic question: **what kind of selection problem are you analysing?** Many disagreements in the utility-analysis literature arise because a method designed for one type of problem is applied to another. Taylor and Russell (1939) framed utility as a classificatory success-ratio problem. Naylor and Shine (1965) moved the focus to expected criterion gain in standard-deviation units. Brogden (1946, 1949) and Cronbach and Gleser (1965) developed the decision-theoretic monetary formulation. Boudreau (1983, 1991) added economic realism through discounting, taxes, and employee flows. Holling (1998), Sturman (2000, 2001), Thomas, Owen, and Gunst (1977), and Ock and Oswald (2018) each contributed substantive corrections or extensions. The taxonomy used here is intended to make the model-problem match explicit before any number is reported.

```{r}
library(personnelSelectionUtility)
```

## The two-by-two taxonomy

The taxonomy crosses two dimensions:

1. **Criterion scale**: is the criterion treated as continuous/monetary, or dichotomised into success/failure?
2. **Selection structure**: is selection compensatory, based on a composite score, or conjunctive/multiple-hurdle, based on passing multiple cutoffs or stages?

```{r}
model_taxonomy()
```

This yields four practical cells.

| Criterion scale | Compensatory selection | Multiple-hurdle selection |
|---|---|---|
| Dichotomised / classificatory | Taylor-Russell on one predictor or a composite | Thomas-Owen-Gunst multivariate Taylor-Russell |
| Continuous / monetary | Naylor-Shine, Brogden-Cronbach-Gleser, Boudreau, Sturman incremental validity | Stage-wise utility or simulation-based comparisons |

The four cells are not interchangeable. A selection system can have the same predictors but a different appropriate utility model depending on the decision rule. If cognitive ability, conscientiousness, biodata, and interview ratings are added into one composite and applicants are selected top-down, the system is compensatory. If applicants must first pass a cheap screening composite and only then a more expensive interview stage, the system is staged multiple-hurdle. The distinction is not cosmetic. Ock and Oswald (2018) emphasise the cost-reliability trade-off: a compensatory composite typically uses more information and yields higher expected criterion performance, whereas a multiple-hurdle system can be cheaper because expensive stages are administered only to applicants who survive earlier screens.

## Why the choice of cell matters

A simple thought experiment makes the point. Suppose two predictors $X_1$ and $X_2$ correlate $.30$ with each other and have validities of $.40$ and $.30$ against a job-performance criterion. If selection is compensatory and applicants are ranked on the equally weighted sum, the implied composite validity follows the canonical correction for predictor intercorrelation. If selection is conjunctive on the same two predictors with marginal cutoffs at the top $50\%$ each, the operative quantity is the *joint* selection ratio, which is materially smaller than $.50$ and has a different positive predictive value. Reporting one number when the other is operationally relevant misrepresents the system. The package therefore encourages the analyst to specify the cell first and choose the function second.

## Argument naming and conventions

The package uses readable R argument names while preserving close correspondence with the notation used in the literature.

```{r}
head(argument_glossary(), 12)
```

The most frequent names are:

`base_rate`: population proportion successful before selection; classical Taylor-Russell notation often uses $BR$ or $\phi$.

`selection_ratio`: proportion selected by a single cutoff or composite; classical notation uses $SR$.

`selection_ratios`: vector of marginal selection ratios in multiple-predictor or multiple-stage models.

`joint_selection_ratio`: overall conjunctive selection ratio after all cutoffs.

`validity`: predictor-criterion validity coefficient, usually $r_{xy}$.

`validities`: vector of predictor-criterion correlations.

`sdy`: standard deviation of job performance in monetary or criterion units, usually $SD_y$.

`baseline_validity`: validity of the operating system that the focal procedure is being compared against (Sturman, 2000, 2001).

`n_applicants`, `n_selected`: sample sizes of assessed and selected groups.

`tenure`: time horizon, usually in years.

## Recommended workflow

### Step 1: Specify the selection decision

Start by writing down the rule used in practice. Is selection based on a single test, a composite, several simultaneous cutoffs, or a staged process? Do not choose a utility formula before specifying the rule. A common error is to apply Brogden-Cronbach-Gleser to a problem that is operationally a multiple-hurdle decision, which masks both the joint selection ratio and the cost differential between stages.

### Step 2: Specify the criterion scale

If the criterion is success/failure, use Taylor-Russell-style functions: `tr_classic()`, `tr_multivariate()`, or `tr_multivariate_equal_cutoff()`. If the criterion is continuous or monetary, use `naylor_shine()`, `bcg_utility()`, or `boudreau_utility()`. The choice should be driven by how the organisation actually evaluates job performance, not by computational convenience.

### Step 3: Specify the baseline

Following Sturman (2000, 2001), the realistic comparator is rarely random selection. Almost every organisation already operates with some procedure: reference checks, unstructured interviews, biodata. Treating random selection as the implicit baseline inflates the estimated utility of a new procedure by approximately $60\%$ on average (Sturman, 2000, 2001). Use `baseline_validity` in `bcg_utility()` and `boudreau_utility()` whenever the operating procedure is identifiable.

### Step 4: Estimate or triangulate $SD_y$

Holling (1998) shows that $SD_y$ is the central vulnerability of monetary utility analysis. The package implements the four families documented by Holling: cost accounting (`sdy_cost_accounting()`), global percentile judgements (`sdy_percentile()`), proportional rules (`sdy_proportional()`, `sdy_rbn()`), and individualised job-analysis methods (`sdy_crepid()`, `sdy_superior_equivalents()`). Triangulating two or three of these, rather than relying on a single estimate, is the practice supported by the empirical comparisons reported in Bobko, Karren, and Parkington (1983), Becker and Huselid (1992), and Hakstian, Wooley, Woolsey, and Kryger (1991).

### Step 5: Report uncertainty and sensitivity

Ock and Oswald (2018) demonstrate via Monte Carlo simulation that utility estimates exhibit sample-to-sample variability of the same order of magnitude as the mean effect. A point estimate without an interval is therefore not a complete report. The package includes `utility_monte_carlo()` for full uncertainty propagation, `sensitivity_grid()` for exploring how the estimate varies under perturbations of the inputs, and `break_even_validity()` for computing the validity required to break even at given costs. Cronshaw, Alexander, Wiesner, and Barrick (1987) introduced sensitivity and break-even analysis to selection utility precisely because point estimates of $\Delta U$ tend to be reported with implausible precision.

## Minimal examples by model family

### Dichotomous success criterion

For a single predictor or a composite that has already collapsed several predictors into one score, use `tr_classic()`.

```{r}
tr_classic(base_rate = .50, selection_ratio = .20, validity = .35)
```

The output includes the full $2 \times 2$ table (true positives, false positives, true negatives, false negatives), the positive predictive value, sensitivity, and specificity. For multiple simultaneous cutoffs, use `tr_multivariate()` with the predictor-criterion correlation matrix.

```{r}
R <- matrix(c(
  1.00, .30, .40,
  .30, 1.00, .35,
  .40, .35, 1.00
), nrow = 3, byrow = TRUE)
tr_multivariate(selection_ratios = c(.50, .50), base_rate = .50, R = R)
```

### Continuous criterion

For expected standardised criterion gain without a monetary unit, use `naylor_shine()`.

```{r}
naylor_shine(validity = .35, selection_ratio = .20)
```

For monetary utility, `bcg_utility()` provides a transparent baseline model.

```{r}
bcg_utility(
  validity = .35,
  selection_ratio = .20,
  sdy = 50000,
  n_selected = 100,
  tenure = 3,
  cost = 75000
)
```

If the analysis spans several periods or requires discounting, taxes, contribution margins, or employee flows, use `boudreau_utility()`.

```{r}
boudreau_utility(
  validity = .35,
  baseline_validity = .20,
  selection_ratio = .20,
  sdy = 50000,
  n_by_period = c(100, 90, 80),
  contribution_margin = .30,
  tax_rate = .25,
  discount_rate = .08,
  cost_by_period = c(75000, 10000, 10000)
)
```

### Effect-size conversions

In contemporary selection research, particularly when classification or algorithmic models are involved, predictive performance is sometimes reported as the area under the ROC curve (AUC) rather than as a validity coefficient. The package separates three conceptually distinct conversions, following Hanley and McNeil (1982), Rice and Harris (2005), and Salgado (2018). First, `auc_to_rank_biserial()` returns the dominance summary $2 \cdot AUC - 1$, which follows from interpreting AUC as the probability of a favourable ordering of one positive and one negative case (Hanley & McNeil, 1982; Kerby, 2014). Second, `auc_to_d_equal_variance()` converts AUC to Cohen's $d$ under the equal-variance binormal model (Rice & Harris, 2005; Salgado, 2018). Third, `auc_to_point_biserial()` converts that $d$ to a point-biserial correlation for a specified base rate, making the base-rate dependence explicit.

```{r}
auc_to_rank_biserial(.75)
auc_to_d_equal_variance(.75)
auc_to_point_biserial(.75, base_rate = c(.50, .30, .20, .10))
```

These conversions should be used as bridges between reported classification performance and utility-analysis inputs, not as assumption-free substitutes for validation studies. If the selection criterion is binary and the available evidence is AUC, the reporting analyst should state which conversion was used and whether a base rate was assumed.

## Reporting checklist

A complete utility analysis should report: (i) the selection rule, (ii) the criterion scale, (iii) the base rate when a classificatory model is used, (iv) the selection ratio, (v) all validity coefficients and their sources, (vi) how $SD_y$ was estimated and whether it was triangulated, (vii) whether validity and $SD_y$ were corrected for unreliability or range restriction and which corrections were applied, (viii) the baseline comparator (random or operating procedure), (ix) costs disaggregated by period and stage when relevant, (x) the time horizon and discount rate, (xi) uncertainty intervals through Monte Carlo or bootstrap, and (xii) sensitivity and break-even analyses for the most uncertain inputs. Items (viii) and (xi) are the two most frequently omitted in the empirical literature, and their omission is the principal source of the practitioner scepticism documented by Latham and Whyte (1994), Whyte and Latham (1997), and König, Bösch, Reshef, and Winkler (2013).

## How to proceed in applied work

1. Specify the selection rule before opening the package: single test, composite, simultaneous cutoffs, or staged process.
2. Locate your problem in the $2 \times 2$ taxonomy and use `model_taxonomy()` as a checklist.
3. Use `argument_glossary()` to map your existing notation to the package argument names before writing code.
4. Specify the operating baseline; if it cannot be identified, report this limitation explicitly and treat random-selection results as upper bounds.
5. Triangulate $SD_y$ across at least two methods; report the range, not a single value.
6. Propagate uncertainty using `utility_monte_carlo()` or `sensitivity_grid()`; do not report a deterministic point estimate.
7. Use `break_even_validity()` to identify the validity floor at which the new procedure breaks even, and compare it to the lower bound of the validity confidence interval.

## References

Becker, B. E., & Huselid, M. A. (1992). Direct estimates of $SD_y$ and the implications for utility analysis. *Journal of Applied Psychology*, *77*, 227--233.

Bobko, P., Karren, R., & Parkington, J. J. (1983). Estimation of standard deviations in utility analyses: An empirical test. *Journal of Applied Psychology*, *68*, 170--176.

Boudreau, J. W. (1983). Economic considerations in estimating the utility of human resource productivity improvement programs. *Personnel Psychology*, *36*, 551--576.

Boudreau, J. W. (1991). Utility analysis for decisions in human resource management. In M. D. Dunnette & L. M. Hough (Eds.), *Handbook of industrial and organizational psychology* (Vol. 2, pp. 621--745). Consulting Psychologists Press.

Brogden, H. E. (1946). On the interpretation of the correlation coefficient as a measure of predictive efficiency. *Journal of Educational Psychology*, *37*, 65--76.

Brogden, H. E. (1949). When testing pays off. *Personnel Psychology*, *2*, 171--183.

Cronbach, L. J., & Gleser, G. C. (1965). *Psychological tests and personnel decisions* (2nd ed.). University of Illinois Press.

Cronshaw, S. F., Alexander, R. A., Wiesner, W. H., & Barrick, M. R. (1987). Incorporating risk into selection utility: Two models for sensitivity analysis and risk simulation. *Organizational Behavior and Human Decision Processes*, *40*, 270--286.

Hakstian, A. R., Wooley, R. M., Woolsey, L. K., & Kryger, B. R. (1991). Management selection by multiple-domain assessment: II. Utility to the organisation. *Educational and Psychological Measurement*, *51*, 899--911.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*, *143*(1), 29--36.

Holling, H. (1998). Utility analysis of personnel selection: An overview and empirical study based on objective performance measures. *Methods of Psychological Research Online*, *3*(1), 5--24.

Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. *Comprehensive Psychology*, *3*, 11.IT.3.1.

König, C. J., Bösch, F., Reshef, A., & Winkler, S. (2013). Human resource managers' attitudes toward utility analysis. *Journal of Personnel Psychology*, *12*, 152--156.

Latham, G. P., & Whyte, G. (1994). The futility of utility analysis. *Personnel Psychology*, *47*, 31--46.

Naylor, J. C., & Shine, L. C. (1965). A table for determining the increase in mean criterion score obtained by using a selection device. *Journal of Industrial Psychology*, *3*, 33--42.

Ock, J., & Oswald, F. L. (2018). The utility of personnel selection decisions: Comparing compensatory and multiple-hurdle selection models. *Journal of Personnel Psychology*, *17*(4), 172--182.

Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen's $d$, and $r$. *Law and Human Behavior*, *29*(5), 615--620.

Salgado, J. F. (2018). Transforming the area under the normal curve (AUC) into Cohen's $d$, Pearson's $r_{pb}$, odds-ratio, and natural log odds-ratio: Two conversion tables. *The European Journal of Psychology Applied to Legal Context*, *10*(1), 35--47.

Sturman, M. C. (2000). Implications of utility analysis adjustments for estimates of human resource intervention value. *Journal of Management*, *26*, 281--299.

Sturman, M. C. (2001). Utility analysis for multiple selection devices and multiple outcomes. *Journal of Human Resource Costing and Accounting*, *6*(2), 9--28.

Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical effectiveness of tests in selection. *Journal of Applied Psychology*, *23*, 565--578.

Thomas, J. G., Owen, D. B., & Gunst, R. F. (1977). Improving the use of educational tests as selection tools. *Journal of Educational Statistics*, *2*(1), 55--77.

Whyte, G., & Latham, G. P. (1997). The futility of utility analysis revisited: When even an expert fails. *Personnel Psychology*, *50*, 601--610.