Density ratio estimation is described as follows: for given two data samples \(x\) and \(y\) from unknown distributions \(p(x)\) and \(q(y)\) respectively, estimate
\[ w(x) = \frac{p(x)}{q(x)} \]
where \(x\) and \(y\) are \(d\)-dimensional real numbers.
The estimated density ratio function \(w(x)\) can be used in many applications such as anomaly detection [1] and covariate shift adaptation [2]. Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3].
The package densratio provides a function densratio()
. The function outputs an object that has a function to estimate density ratio.
For example,
set.seed(3)
x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)
library(densratio)
result <- densratio(x, y)
The function densratio()
estimates the density ratio of \(p(x)\) to \(q(y)\), \[
w(x) = \frac{p(x)}{q(y)} = \frac{\rm{Norm}(1, 1/8)}{\rm{Norm}(1, 1/2)}
\] and provides a function to compute estimated density ratio. The result
object has a function compute_density_ratio()
that can compute the estimated density ratio \(\hat{w}(x) \simeq p(x)/q(y)\) for any \(d\)-dimensional input \(x\) (now \(d=1\)).
new_x <- seq(0, 2, by = 0.05)
w_hat <- result$compute_density_ratio(new_x)
plot(new_x, w_hat, pch=19)
In this case, the true density ratio \(w(x) = p(x)/q(y) = \rm{Norm}(1, 1/8) / \rm{Norm}(1, 1/2)\) can be computed precisely. So we can compare \(w(x)\) with the estimated density ratio \(\hat{w}(x)\).
true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)
plot(true_density_ratio, xlim=c(-1, 3), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")
plot(result$compute_density_ratio, xlim=c(-1, 3), lwd=2, col="green", add=TRUE)
legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)
You can install the densratio package from CRAN.
You can also install the package from GitHub.
install.packages("devtools") # If you have not installed "devtools" package
devtools::install_github("hoxo-m/densratio")
The source code for densratio package is available on GitHub at
The package densratio provides a function densratio()
. The function outputs an object that has a function to estimate density ratio.
For data samples x
and y
,
library(densratio)
x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)
result <- densratio(x, y)
Here, result$compute_density_ratio()
is the function to compute estimated density ratio.
new_x <- seq(0, 2, by = 0.05)
w_hat <- result$compute_density_ratio(new_x)
plot(new_x, w_hat, pch=19)
densratio()
has method
argument that you can pass "uLSIF"
or "KLIEP"
.
uLSIF (unconstrained Least-Squares Importance Fitting) is the default method. This algorithm estimates density ratio by minimizing the squared loss. You can find more information in Hido et al. (2011) [1].
KLIEP (Kullback-Leibler Importance Estimation Procedure) is the anothor method. This algorithm estimates density ratio by minimizing Kullback-Leibler divergence. You can find more information in Sugiyama et al. (2007) [2].
The both methods assume that density ratio are represented by linear model
\[ w(x) = \alpha_1 K(x, c_1) + \alpha_2 K(x, c_2) + ... + \alpha_b K(x, c_b) \]
where
\[ K(x, c) = \exp\left(-\frac{\|x - c\|^2}{2 \sigma ^ 2}\right) \]
is the Gaussian RBF.
densratio()
performs two main jobs:
\(\sigma\) and \(\alpha_i\) are saved into result objects of densratio()
, and used to compute estimated density ratio in compute_density_ratio()
.
You can print()
result objects of densratio()
to see information. Moreover, you can change some conditions to specify arguments of densratio()
.
##
## Call:
## densratio(x = x, y = y, method = "uLSIF")
##
## Kernel Information:
## Kernel type: Gaussian RBF
## Number of kernels: 100
## Bandwidth(sigma): 0.1
## Centers: num [1:100, 1] 0.907 1.093 1.18 1.136 1.046 ...
##
## Kernel Weights(alpha):
## num [1:100] 0.067455 0.040045 0.000459 0.016849 0.067084 ...
##
## Regularization Parameter(lambda): 1
##
## The Function to Estimate Density Ratio:
## compute_density_ratio()
kernel_num
argument. In default, kernel_num = 100
.sigma = "auto"
, the algorithms automatically select the optimal value by cross validation. If you set sigma
a single number, it will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.x
underlying a numerator distribution \(p(x)\). You can find the whole values in result$kernel_info$centers
.result$alpha
.compute_density_ratio()
.So far, the input data samples x
and y
were one dimensional. densratio()
allows to input multidimensional data samples as matrix
.
For example,
library(densratio)
library(mvtnorm)
set.seed(71)
x <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))
y <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))
result <- densratio(x, y)
result
##
## Call:
## densratio(x = x, y = y, method = "uLSIF")
##
## Kernel Information:
## Kernel type: Gaussian RBF
## Number of kernels: 100
## Bandwidth(sigma): 0.316
## Centers: num [1:100, 1:2] 1.113 0.517 1.418 1.294 1.28 ...
##
## Kernel Weights(alpha):
## num [1:100] 0.0465 0.0177 0.0702 0.0795 0.0798 ...
##
## Regularization Parameter(lambda): 0.3162278
##
## The Function to Estimate Density Ratio:
## compute_density_ratio()
Also in this case, we can compare the true density ratio with the estimated density ratio.
true_density_ratio <- function(x) {
dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /
dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))
}
N <- 20
range <- seq(0, 2, length.out = N)
input <- expand.grid(range, range)
w_true <- matrix(true_density_ratio(input), nrow = N)
w_hat <- matrix(result$compute_density_ratio(input), nrow = N)
par(mfrow = c(1, 2))
contour(range, range, w_true, main = "True Density Ratio")
contour(range, range, w_hat, main = "Estimated Density Ratio")
The dimensions of x
and y
must be same.
[1] Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems 2011.
[2] Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P. & Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS 2007.
[3] Sugiyama, M., Suzuki, T. & Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge University Press 2012.