Machine Learning

Bernardo Lares

2026-04-23

Introduction

The lares package provides a streamlined interface to h2o’s AutoML for automated machine learning. This vignette demonstrates how to build, evaluate, and interpret models with minimal code.

Setup

Install and load required packages:

library(lares)
library(dplyr)

h2o must be installed separately:

# Install h2o (run once)
# install.packages("h2o")
library(h2o)

# Initialize h2o quietly for vignette
Sys.unsetenv("http_proxy")
Sys.unsetenv("https_proxy")
h2o.init(nthreads = -1, max_mem_size = "2G", ip = "127.0.0.1")
#> 
#> H2O is not running yet, starting it now...
#> 
#> Note:  In case of errors look at the following log files:
#>     /var/folders/_9/97xqjz8j4cx_q5m_3t646mdm0000gn/T//Rtmpl5cGQ0/file4ca320b67f28/h2o_bernardo_started_from_r.out
#>     /var/folders/_9/97xqjz8j4cx_q5m_3t646mdm0000gn/T//Rtmpl5cGQ0/file4ca32d562bdf/h2o_bernardo_started_from_r.err
#> 
#> 
#> Starting H2O JVM and connecting: ... Connection successful!
#> 
#> R is connected to the H2O cluster: 
#>     H2O cluster uptime:         3 seconds 462 milliseconds 
#>     H2O cluster timezone:       Europe/Madrid 
#>     H2O data parsing timezone:  UTC 
#>     H2O cluster version:        3.44.0.3 
#>     H2O cluster version age:    2 years, 4 months and 2 days 
#>     H2O cluster name:           H2O_started_from_R_bernardo_rna358 
#>     H2O cluster total nodes:    1 
#>     H2O cluster total memory:   1.76 GB 
#>     H2O cluster total cores:    12 
#>     H2O cluster allowed cores:  12 
#>     H2O cluster healthy:        TRUE 
#>     H2O Connection ip:          127.0.0.1 
#>     H2O Connection port:        54321 
#>     H2O Connection proxy:       NA 
#>     H2O Internal Security:      FALSE 
#>     R Version:                  R version 4.5.3 (2026-03-11)
h2o.no_progress() # Disable progress bars

Pipeline

h2o_automl workflow

In short, these are the steps that happen on h2o_automl’s backend:

  1. Input Processing: The function receives a dataframe df and the dependent variable y to predict. Set seed for reproducibility.

  2. Model Type Detection: Automatically decides between classification (categorical) or regression (continuous) based on y’s class and unique values (controlled by thresh parameter).

  3. Data Splitting: Splits data into test and train datasets. Control the proportion with split parameter. Replicate this with msplit().

  4. Preprocessing:

    • Center and scale numerical values
    • Remove outliers with no_outliers
    • Impute missing values with MICE (impute = TRUE)
    • Balance training data for classification (balance = TRUE)
    • Replicate with model_preprocess()
  5. Model Training: Runs h2o::h2o.automl() to train multiple models and generate a leaderboard sorted by performance. Customize with:

    • max_models or max_time
    • nfolds for k-fold cross-validation
    • exclude_algos and include_algos
  6. Model Selection: Selects the best model based on performance metric (change with stopping_metric). Use h2o_selectmodel() to choose an alternative.

  7. Performance Evaluation: Calculates metrics and plots using test predictions (unseen data). Replicate with model_metrics().

  8. Results: Returns a list with inputs, leaderboard, best model, metrics, and plots. Export with export_results().

Quick Start: Binary Classification

Let’s build a model to predict Titanic survival:

data(dft)

# Train an AutoML model
# Binary classification
model <- h2o_automl(
  df = dft,
  y = "Survived",
  target = "TRUE",
  ignore = c("Ticket", "Cabin", "PassengerId"),
  max_models = 10,
  max_time = 120,
  impute = FALSE
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_4_AutoML_1_20260423_85339 0.8655135 0.4261943 0.8323496
#> 2 GBM_3_AutoML_1_20260423_85339 0.8628015 0.4334215 0.8260966
#> 3 GBM_2_AutoML_1_20260423_85339 0.8589818 0.4318601 0.8276204
#>   mean_per_class_error      rmse       mse
#> 1            0.1807105 0.3635268 0.1321517
#> 2            0.1725745 0.3662049 0.1341061
#> 3            0.1843010 0.3652161 0.1333828
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/10): GBM_4_AutoML_1_20260423_85339
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.86366
#>    ACC = 0.18657
#>    PRC = 0.19355
#>    TPR = 0.34615
#>    TNR = 0.085366
#> 
#> Most important variables:
#>    Sex (40.8%)
#>    Fare (20.8%)
#>    Age (16.2%)
#>    Pclass (14.6%)
#>    SibSp (3.2%)

# View results
print(model)
#> Model (1/10): GBM_4_AutoML_1_20260423_85339
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.86366
#>    ACC = 0.18657
#>    PRC = 0.19355
#>    TPR = 0.34615
#>    TNR = 0.085366
#> 
#> Most important variables:
#>    Sex (40.8%)
#>    Fare (20.8%)
#>    Age (16.2%)
#>    Pclass (14.6%)
#>    SibSp (3.2%)

That’s it! h2o_automl() handles:

Understanding the Output

The model object contains:

names(model)
#>  [1] "model"           "y"               "scores_test"     "metrics"        
#>  [5] "parameters"      "importance"      "datasets"        "scoring_history"
#>  [9] "categoricals"    "type"            "split"           "threshold"      
#> [13] "model_name"      "algorithm"       "leaderboard"     "project"        
#> [17] "ignored"         "seed"            "h2o"             "plots"

Key components: - model: Best h2o model - metrics: Performance metrics - importance: Variable importance - datasets: Train/test data used - parameters: Configuration used

Model Performance

Metrics

View detailed metrics:

# All metrics
model$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#>        Pred
#> Real    FALSE TRUE
#>   FALSE    14  150
#>   TRUE     68   36
#> 
#> $gain_lift
#> # A tibble: 10 × 10
#>    percentile value random target total  gain optimal   lift response score
#>    <fct>      <chr>  <dbl>  <int> <int> <dbl>   <dbl>  <dbl>    <dbl> <dbl>
#>  1 1          TRUE    10.8     29    29  27.9    27.9 158.     27.9   90.0 
#>  2 2          TRUE    20.1     22    25  49.0    51.9 143.     21.2   78.1 
#>  3 3          TRUE    30.2     16    27  64.4    77.9 113.     15.4   51.6 
#>  4 4          TRUE    39.9     13    26  76.9   100    92.7    12.5   29.2 
#>  5 5          TRUE    50        6    27  82.7   100    65.4     5.77  20.7 
#>  6 6          TRUE    60.1      5    27  87.5   100    45.7     4.81  14.8 
#>  7 7          TRUE    69.8      6    26  93.3   100    33.7     5.77  12.3 
#>  8 8          TRUE    79.9      1    27  94.2   100    18.0     0.962  9.31
#>  9 9          TRUE    89.9      3    27  97.1   100     8.00    2.88   6.12
#> 10 10         TRUE   100        3    27 100     100     0       2.88   1.54
#> 
#> $metrics
#>       AUC     ACC     PRC     TPR      TNR
#> 1 0.86366 0.18657 0.19355 0.34615 0.085366
#> 
#> $cv_metrics
#> # A tibble: 20 × 8
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    <chr>     <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 accuracy  0.855 0.0331      0.856     0.904       0.864      0.815      0.839
#>  2 auc       0.861 0.0424      0.876     0.924       0.860      0.815      0.831
#>  3 err       0.145 0.0331      0.144     0.096       0.136      0.185      0.161
#>  4 err_cou… 18     4.06       18        12          17         23         20    
#>  5 f0point5  0.835 0.0356      0.855     0.881       0.826      0.830      0.785
#>  6 f1        0.797 0.0342      0.812     0.838       0.809      0.777      0.75 
#>  7 f2        0.763 0.0369      0.774     0.799       0.793      0.730      0.718
#>  8 lift_to…  2.66  0.377       2.40      3.12        2.72       2.18       2.88 
#>  9 logloss   0.432 0.0758      0.413     0.324       0.425      0.530      0.468
#> 10 max_per…  0.259 0.0400      0.25      0.225       0.217      0.298      0.302
#> 11 mcc       0.690 0.0592      0.703     0.775       0.705      0.632      0.636
#> 12 mean_pe…  0.834 0.0278      0.841     0.870       0.847      0.806      0.806
#> 13 mean_pe…  0.166 0.0278      0.159     0.130       0.153      0.194      0.194
#> 14 mse       0.132 0.0245      0.126     0.0977      0.129      0.164      0.145
#> 15 pr_auc    0.832 0.0509      0.868     0.895       0.815      0.818      0.764
#> 16 precisi…  0.863 0.0399      0.886     0.912       0.837      0.870      0.811
#> 17 r2        0.436 0.0871      0.481     0.551       0.446      0.342      0.358
#> 18 recall    0.741 0.0400      0.75      0.775       0.783      0.702      0.698
#> 19 rmse      0.362 0.0342      0.355     0.313       0.359      0.404      0.381
#> 20 specifi…  0.926 0.0231      0.932     0.965       0.911      0.910      0.914
#> 
#> $max_metrics
#>                         metric  threshold       value idx
#> 1                       max f1 0.49534115   0.7775281 157
#> 2                       max f2 0.28977227   0.7987220 226
#> 3                 max f0point5 0.50884749   0.8172147 150
#> 4                 max accuracy 0.50884749   0.8410915 150
#> 5                max precision 0.98538253   1.0000000   0
#> 6                   max recall 0.04255203   1.0000000 386
#> 7              max specificity 0.98538253   1.0000000   0
#> 8             max absolute_mcc 0.50884749   0.6587759 150
#> 9   max min_per_class_accuracy 0.36328089   0.8025210 200
#> 10 max mean_per_class_accuracy 0.46113191   0.8194041 168
#> 11                     max tns 0.98538253 385.0000000   0
#> 12                     max fns 0.98538253 236.0000000   0
#> 13                     max fps 0.01285907 385.0000000 399
#> 14                     max tps 0.04255203 238.0000000 386
#> 15                     max tnr 0.98538253   1.0000000   0
#> 16                     max fnr 0.98538253   0.9915966   0
#> 17                     max fpr 0.01285907   1.0000000 399
#> 18                     max tpr 0.04255203   1.0000000 386

# Specific metrics
model$metrics$AUC
#> NULL
model$metrics$Accuracy
#> NULL
model$metrics$Logloss
#> NULL

Confusion Matrix

# Confusion matrix plot
mplot_conf(
  tag = model$scores_test$tag,
  score = model$scores_test$score,
  subtitle = sprintf("AUC: %.3f", model$metrics$metrics$AUC)
)

ROC Curve

# ROC curve
mplot_roc(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

Gain and Lift Charts

# Gain and Lift charts for binary classification
mplot_gain(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

Variable Importance

See which features matter most:

# Variable importance dataframe
head(model$importance, 15)
#>   variable relative_importance scaled_importance importance
#> 1      Sex           202.18417        1.00000000 0.40811816
#> 2     Fare           102.86121        0.50875008 0.20763014
#> 3      Age            80.14220        0.39638218 0.16177077
#> 4   Pclass            72.13468        0.35677709 0.14560721
#> 5    SibSp            15.87309        0.07850806 0.03204057
#> 6    Parch            12.75075        0.06306504 0.02573799
#> 7 Embarked             9.45986        0.04678833 0.01909517

# Plot top 15 important variables
top15 <- head(model$importance, 15)
mplot_importance(
  var = top15$variable,
  imp = top15$importance
)

Model Interpretation with SHAP

SHAP values explain individual predictions:

# Calculate SHAP values (computationally expensive)
shap <- h2o_shap(model)

# Plot SHAP summary
plot(shap)

Advanced: Customizing AutoML

Preprocessing Options

model <- h2o_automl(
  df = dft,
  y = "Survived",
  # Ignore specific columns
  ignore = c("Ticket", "Cabin", "PassengerId"),
  # Use only specific algorithms (exclude_algos also available)
  include_algos = c("GBM", "DRF"), # Gradient Boosting & Random Forest
  # Data split
  split = 0.7,
  # Handle imbalanced data
  balance = TRUE,
  # Remove outliers (Z-score > 3)
  no_outliers = TRUE,
  # Impute missing values (requires mice package if TRUE)
  impute = FALSE,
  # Keep only unique training rows
  unique_train = TRUE,
  # Reproducible results
  seed = 123
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_2_AutoML_2_20260423_85401 0.8596583 0.4248255 0.8431084
#> 2 DRF_1_AutoML_2_20260423_85401 0.8564385 0.4488829 0.8421588
#> 3 GBM_1_AutoML_2_20260423_85401 0.8328975 0.4839880 0.8085889
#>   mean_per_class_error      rmse       mse
#> 1            0.1960698 0.3625182 0.1314194
#> 2            0.1961569 0.3699173 0.1368388
#> 3            0.2342388 0.3942838 0.1554597
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/3): GBM_2_AutoML_2_20260423_85401
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 123
#> 
#> Test metrics:
#>    AUC = 0.87879
#>    ACC = 0.86567
#>    PRC = 0.88506
#>    TPR = 0.74757
#>    TNR = 0.93939
#> 
#> Most important variables:
#>    Sex (37.5%)
#>    Fare (22.7%)
#>    Age (17.3%)
#>    Pclass (12.2%)
#>    Embarked (4.1%)

Multi-Class Classification

Predict passenger class (3 categories):

model_multiclass <- h2o_automl(
  df = dft,
  y = "Pclass",
  ignore = c("Cabin", "PassengerId"),
  max_models = 10,
  max_time = 60
)
#> # A tibble: 3 × 5
#>   tag       n     p order  pcum
#>   <fct> <int> <dbl> <int> <dbl>
#> 1 n_3     491  55.1     1  55.1
#> 2 n_1     216  24.2     2  79.4
#> 3 n_2     184  20.6     3 100
#> train_size  test_size 
#>        623        268
#>                            model_id mean_per_class_error   logloss      rmse
#> 1 XGBoost_3_AutoML_3_20260423_85406            0.0975638 0.1843648 0.2331297
#> 2 XGBoost_2_AutoML_3_20260423_85406            0.1134454 0.2204648 0.2579203
#> 3 XGBoost_1_AutoML_3_20260423_85406            0.1191367 0.2584227 0.2761215
#>          mse
#> 1 0.05434945
#> 2 0.06652287
#> 3 0.07624310
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/10): XGBoost_3_AutoML_3_20260423_85406
#> Dependent Variable: Pclass
#> Type: Classification (3 classes)
#> Algorithm: XGBOOST
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.98236
#>    ACC = 0.9291
#> 
#> Most important variables:
#>    Fare (66%)
#>    Age (14.8%)
#>    SibSp (7.9%)
#>    Parch (4.4%)
#>    Survived.FALSE (2.9%)

# Multi-class metrics
model_multiclass$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#> # A tibble: 3 × 4
#>   `Real x Pred`   n_3   n_1   n_2
#>   <fct>         <int> <int> <int>
#> 1 n_3             136     3     3
#> 2 n_1               1    60     2
#> 3 n_2               4     6    53
#> 
#> $metrics
#>       AUC    ACC
#> 1 0.98236 0.9291
#> 
#> $metrics_tags
#> # A tibble: 3 × 9
#>   tag       n     p   AUC order   ACC   PRC   TPR   TNR
#>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 n_3     142  53.0 0.985     1 0.959 0.965 0.958 0.960
#> 2 n_1      63  23.5 0.983     2 0.955 0.870 0.952 0.956
#> 3 n_2      63  23.5 0.979     3 0.944 0.914 0.841 0.976
#> 
#> $cv_metrics
#> # A tibble: 12 × 8
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    <chr>     <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 accur…   0.926  0.0272     0.936      0.944      0.944       0.879     0.927 
#>  2 auc    NaN      0        NaN        NaN        NaN         NaN       NaN     
#>  3 err      0.0739 0.0272     0.064      0.056      0.056       0.121     0.0726
#>  4 err_c…   9.2    3.35       8          7          7          15         9     
#>  5 loglo…   0.185  0.0881     0.162      0.186      0.113       0.334     0.128 
#>  6 max_p…   0.190  0.0785     0.167      0.125      0.136       0.32      0.2   
#>  7 mean_…   0.903  0.0429     0.922      0.926      0.936       0.830     0.900 
#>  8 mean_…   0.0972 0.0429     0.0777     0.0739     0.0640      0.170     0.100 
#>  9 mse      0.0544 0.0267     0.0436     0.0511     0.0357      0.101     0.0405
#> 10 pr_auc NaN      0        NaN        NaN        NaN         NaN       NaN     
#> 11 r2       0.923  0.0378     0.933      0.926      0.950       0.857     0.947 
#> 12 rmse     0.229  0.0517     0.209      0.226      0.189       0.318     0.201 
#> 
#> $hit_ratio
#>   k hit_ratio
#> 1 1 0.9261637
#> 2 2 0.9903692
#> 3 3 1.0000000

# Confusion matrix for multi-class
mplot_conf(
  tag = model_multiclass$scores_test$tag,
  score = model_multiclass$scores_test$score
)

Regression Example

Predict fare prices:

model_regression <- h2o_automl(
  df = dft,
  y = "Fare",
  ignore = c("Cabin", "PassengerId"),
  max_models = 10,
  exclude_algos = NULL
)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    7.91   14.45   32.20   31.00  512.33
#> train_size  test_size 
#>        609        262
#>                                                 model_id     rmse      mse
#> 1 StackedEnsemble_BestOfFamily_1_AutoML_4_20260423_85416 10.38136 107.7726
#> 2    StackedEnsemble_AllModels_1_AutoML_4_20260423_85416 10.55894 111.4913
#> 3                          GBM_3_AutoML_4_20260423_85416 12.44341 154.8385
#>        mae     rmsle mean_residual_deviance
#> 1 5.533338 0.4535433               107.7726
#> 2 5.719281 0.4555461               111.4913
#> 3 5.769395 0.4650435               154.8385
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/12): StackedEnsemble_BestOfFamily_1_AutoML_4_20260423_85416
#> Dependent Variable: Fare
#> Type: Regression
#> Algorithm: STACKEDENSEMBLE
#> Split: 70% training data (of 871 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    rmse = 6.6239
#>    mae = 4.143
#>    mape = 0.012637
#>    mse = 43.876
#>    rsq = 0.9391
#>    rsqa = 0.9389

# Regression metrics
model_regression$metrics
#> $dictionary
#> [1] "RMSE: Root Mean Squared Error"       
#> [2] "MAE: Mean Average Error"             
#> [3] "MAPE: Mean Absolute Percentage Error"
#> [4] "MSE: Mean Squared Error"             
#> [5] "RSQ: R Squared"                      
#> [6] "RSQA: Adjusted R Squared"            
#> 
#> $metrics
#>       rmse      mae       mape      mse    rsq   rsqa
#> 1 6.623908 4.143029 0.01263738 43.87615 0.9391 0.9389
#> 
#> $cv_metrics
#> # A tibble: 8 × 8
#>   metric     mean      sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>   <chr>     <dbl>   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#> 1 mae     5.57e+0 6.50e-1      5.34       4.79       5.52       6.58       5.63 
#> 2 mean_r… 1.08e+2 4.86e+1     74.7       78.3      110.       192.        86.4  
#> 3 mse     1.08e+2 4.86e+1     74.7       78.3      110.       192.        86.4  
#> 4 null_d… 1.19e+5 2.29e+4  89084.    111869.    147015.    109745.    135751.   
#> 5 r2      8.85e-1 5.72e-2      0.891      0.908      0.914      0.785      0.926
#> 6 residu… 1.31e+4 5.85e+3   9415.     10332.     12509.     23363.      9934.   
#> 7 rmse    1.02e+1 2.14e+0      8.64       8.85      10.5       13.8        9.29 
#> 8 rmsle   4.46e-1 1.21e-1      0.456      0.270      0.424      0.472      0.608

Using Pre-Split Data

If you have predefined train/test splits:

# Create splits
splits <- msplit(dft, size = 0.8, seed = 123)
#> train_size  test_size 
#>        712        179
splits$train$split <- "train"
splits$test$split <- "test"

# Combine
df_split <- rbind(splits$train, splits$test)

# Train using split column
model <- h2o_automl(
  df = df_split,
  y = "Survived",
  train_test = "split",
  max_models = 5
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> 
#>  test train 
#>   179   712
#>                            model_id       auc   logloss     aucpr
#> 1     DRF_1_AutoML_5_20260423_85425 0.8680875 0.7855203 0.8270861
#> 2     GLM_1_AutoML_5_20260423_85425 0.8654726 0.4253319 0.8491966
#> 3 XGBoost_2_AutoML_5_20260423_85425 0.8537248 0.4484009 0.8055514
#>   mean_per_class_error      rmse       mse
#> 1            0.1775527 0.3813365 0.1454175
#> 2            0.1923547 0.3652137 0.1333811
#> 3            0.2039812 0.3752972 0.1408480
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/5): DRF_1_AutoML_5_20260423_85425
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: DRF
#> Split: 80% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.85792
#>    ACC = 0.78212
#>    PRC = 0.84783
#>    TPR = 0.5493
#>    TNR = 0.93519
#> 
#> Most important variables:
#>    Ticket (65.7%)
#>    Sex (14.9%)
#>    Cabin (8.7%)
#>    Pclass (3.4%)
#>    Fare (2.7%)

Making Predictions

On New Data

# New data (same structure as training)
new_data <- dft[1:10, ]

# Predict
predictions <- h2o_predict_model(new_data, model$model)
head(predictions)
#>   predict     FALSE.        TRUE.
#> 1   FALSE 0.99979242 0.0002075763
#> 2    TRUE 0.02148936 0.9785106383
#> 3    TRUE 0.12765957 0.8723404255
#> 4    TRUE 0.09574468 0.9042553191
#> 5   FALSE 0.99979242 0.0002075763
#> 6   FALSE 0.97851583 0.0214841721

Binary Model Predictions

# Get probabilities
predictions <- h2o_predict_model(new_data, model$model)
head(predictions)
#>   predict     FALSE.        TRUE.
#> 1   FALSE 0.99979242 0.0002075763
#> 2    TRUE 0.02148936 0.9785106383
#> 3    TRUE 0.12765957 0.8723404255
#> 4    TRUE 0.09574468 0.9042553191
#> 5   FALSE 0.99979242 0.0002075763
#> 6   FALSE 0.97851583 0.0214841721

Model Comparison

Full Visualization Suite

# Complete model evaluation plots
mplot_full(
  tag = model$scores_test$tag,
  score = model$scores_test$score,
  subtitle = model$model@algorithm
)

Metrics Comparison

# Model performance over trees
mplot_metrics(model)

Saving and Loading Models

Export Results

# Save model and plots
export_results(model, subdir = "models", thresh = 0.5)

This creates: - Model file (.rds) - MOJO file (for production) - Performance plots - Metrics summary

Load Saved Model

# Load model
loaded_model <- readRDS("models/Titanic_Model/Titanic_Model.rds")

# Make predictions with MOJO (production-ready)
predictions <- h2o_predict_MOJO(
  model_path = "models/Titanic_Model",
  df = dft[1:10, ]
)

Best Practices

1. Start Simple

# Quick prototype
model <- h2o_automl(dft, "Survived", max_models = 3, max_time = 30)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                            model_id       auc   logloss     aucpr
#> 1     GLM_1_AutoML_6_20260423_85436 0.8566401 0.4331878 0.8468753
#> 2 XGBoost_1_AutoML_6_20260423_85436 0.8400780 0.4574884 0.8099752
#> 3     GBM_1_AutoML_6_20260423_85436 0.8159377 0.6451460 0.7378534
#>   mean_per_class_error      rmse       mse
#> 1            0.1914171 0.3680368 0.1354511
#> 2            0.2138740 0.3789387 0.1435946
#> 3            0.2218336 0.4732407 0.2239567
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/3): GLM_1_AutoML_6_20260423_85436
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GLM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.87979
#>    ACC = 0.79851
#>    PRC = 0.90164
#>    TPR = 0.53398
#>    TNR = 0.96364
#> 
#> Most important variables:
#>    Ticket.1601 (0.9%)
#>    Ticket.2661 (0.9%)
#>    Ticket.C.A. 37671 (0.8%)
#>    Cabin.C22 C26 (0.8%)
#>    Sex.female (0.7%)

2. Iterate and Refine

# Refine based on results
model <- h2o_automl(
  dft, "Survived",
  max_models = 20,
  no_outliers = TRUE,
  balance = TRUE,
  ignore = c("PassengerId", "Name", "Ticket", "Cabin"),
  model_name = "Titanic_Model"
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_3_AutoML_7_20260423_85441 0.8575063 0.4316748 0.8410436
#> 2 GBM_2_AutoML_7_20260423_85441 0.8571250 0.4270731 0.8442881
#> 3 GBM_4_AutoML_7_20260423_85441 0.8561498 0.4266892 0.8469913
#>   mean_per_class_error      rmse       mse
#> 1            0.1887694 0.3656777 0.1337202
#> 2            0.1944735 0.3641046 0.1325721
#> 3            0.1909050 0.3631896 0.1319067
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

3. Validate Thoroughly

# Check multiple metrics
model$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#>        Pred
#> Real    FALSE TRUE
#>   FALSE   156    9
#>   TRUE     28   75
#> 
#> $gain_lift
#> # A tibble: 10 × 10
#>    percentile value random target total  gain optimal  lift response score
#>    <fct>      <fct>  <dbl>  <int> <int> <dbl>   <dbl> <dbl>    <dbl> <dbl>
#>  1 1          FALSE   10.1     25    27  15.2    16.4  50.4   15.2   93.5 
#>  2 2          FALSE   20.1     24    27  29.7    32.7  47.4   14.5   92.2 
#>  3 3          FALSE   30.2     26    27  45.5    49.1  50.4   15.8   89.0 
#>  4 4          FALSE   39.9     24    26  60      64.8  50.3   14.5   86.1 
#>  5 5          FALSE   50       23    27  73.9    81.2  47.9   13.9   81.1 
#>  6 6          FALSE   60.1     17    27  84.2    97.6  40.2   10.3   71.1 
#>  7 7          FALSE   69.8     18    26  95.2   100    36.4   10.9   45.7 
#>  8 8          FALSE   79.9      3    27  97.0   100    21.4    1.82  22.6 
#>  9 9          FALSE   89.9      4    27  99.4   100    10.5    2.42   9.55
#> 10 10         FALSE  100        1    27 100     100     0      0.606  1.49
#> 
#> $metrics
#>       AUC     ACC     PRC     TPR     TNR
#> 1 0.89147 0.86194 0.89286 0.72816 0.94545
#> 
#> $cv_metrics
#> # A tibble: 20 × 8
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    <chr>     <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 accuracy  0.836 0.0462      0.84      0.896       0.8        0.782      0.863
#>  2 auc       0.847 0.0721      0.847     0.927       0.854      0.731      0.876
#>  3 err       0.164 0.0462      0.16      0.104       0.2        0.218      0.137
#>  4 err_cou… 20.4   5.73       20        13          25         27         17    
#>  5 f0point5  0.783 0.0914      0.788     0.913       0.743      0.663      0.808
#>  6 f1        0.781 0.0793      0.778     0.876       0.779      0.658      0.813
#>  7 f2        0.780 0.0759      0.768     0.842       0.818      0.653      0.819
#>  8 lift_to…  2.64  0.337       2.72      2.23        2.40       3.1        2.76 
#>  9 logloss   0.432 0.0828      0.452     0.326       0.460      0.543      0.379
#> 10 max_per…  0.236 0.0702      0.239     0.179       0.233      0.35       0.178
#> 11 mcc       0.651 0.110       0.653     0.792       0.605      0.499      0.705
#> 12 mean_pe…  0.824 0.0531      0.823     0.889       0.807      0.748      0.854
#> 13 mean_pe…  0.176 0.0531      0.177     0.111       0.193      0.252      0.146
#> 14 mse       0.134 0.0294      0.139     0.0966      0.145      0.173      0.115
#> 15 pr_auc    0.822 0.0987      0.821     0.937       0.814      0.669      0.870
#> 16 precisi…  0.785 0.103       0.795     0.939       0.721      0.667      0.804
#> 17 r2        0.425 0.149       0.403     0.610       0.402      0.207      0.503
#> 18 recall    0.780 0.0793      0.761     0.821       0.846      0.65       0.822
#> 19 rmse      0.364 0.0405      0.372     0.311       0.381      0.416      0.339
#> 20 specifi…  0.868 0.0693      0.886     0.957       0.767      0.845      0.886
#> 
#> $max_metrics
#>                         metric  threshold       value idx
#> 1                       max f1 0.41714863   0.7672956 181
#> 2                       max f2 0.28931884   0.7898957 224
#> 3                 max f0point5 0.66135190   0.8173619 120
#> 4                 max accuracy 0.61537557   0.8250401 130
#> 5                max precision 0.99256302   1.0000000   0
#> 6                   max recall 0.03108045   1.0000000 391
#> 7              max specificity 0.99256302   1.0000000   0
#> 8             max absolute_mcc 0.61537557   0.6277286 130
#> 9   max min_per_class_accuracy 0.35070798   0.7907950 205
#> 10 max mean_per_class_accuracy 0.41714863   0.8112306 181
#> 11                     max tns 0.99256302 384.0000000   0
#> 12                     max fns 0.99256302 238.0000000   0
#> 13                     max fps 0.01019947 384.0000000 399
#> 14                     max tps 0.03108045 239.0000000 391
#> 15                     max tnr 0.99256302   1.0000000   0
#> 16                     max fnr 0.99256302   0.9958159   0
#> 17                     max fpr 0.01019947   1.0000000 399
#> 18                     max tpr 0.03108045   1.0000000 391

# Visual inspection
mplot_full(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)


# Variable importance
mplot_importance(
  var = model$importance$variable,
  imp = model$importance$importance
)

Score Distribution

# Density plot
mplot_density(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

4. Document Your Process

# Save everything
export_results(model, subdir = "my_project", thresh = 0.5)

Troubleshooting

h2o Initialization Issues

# Manually initialize h2o with more memory
h2o::h2o.init(max_mem_size = "8G", nthreads = -1)

Clean h2o Environment

# Remove all models
h2o::h2o.removeAll()

# Shutdown h2o
h2o::h2o.shutdown(prompt = FALSE)

Check h2o Flow UI

# Open h2o's web interface
# Navigate to: http://localhost:54321/flow/index.html

Further Reading

Package & ML Resources

Next Steps