Processing math: 100%

codec

Dependence and Conditional Dependence

For random variable Y and random vectors Z and X, T(Y,ZX)[0,1], conditional dependence coefficient, gives a measure of dependence of Y on Z given X. T(Y,ZX) is zero if and only if Y is independent of Z given X and is 1 if and only if Y is a function of Z given X. This measure is well-defined if Y is not almost surely a function of X. For more details on the definition of T and its properties, and its estimator see the paper A Simple Measure Of Conditional Dependence.

Given a sample of n i.i.d observations of triple (X,Y,Z), we can estimate T(Y,ZX) efficiently in a non-parametric fashion. Function codec estimates this value. The default value for X is NULL and if is not provided by the user, codec gives the estimate of the dependence measure of Y on Z, T(Y,Z).

In the following examples, we illustrate the behavior of this estimator is different settings.

library(FOCI)

In this example we have generated a 10000×3 matrix x, with i.i.d elements from unif[0,1]. The observed value of y is the sum of the elements of each row of x mod 1. Although y is a function of x, it can be seen that it is independent of each of the single columns of x or each pair of its columns. On the other hand conditional on the last column, y is a function of the first two columns but it is still independent of any of the first two columns separately.

n = 10000
p = 3
x = matrix(runif(n * p), ncol = p)
y = (x[, 1] + x[, 2] + x[, 3]) %% 1
# y is independent of each of column of x 
codec(y, x[, 1])
#> [1] -0.00225687
codec(y, x[, 2])
#> [1] 0.01326747
codec(y, x[, 3])
#> [1] 0.00130965

# y is independent of the first two columns of x, x[, c(1, 2)]
codec(y, x[, c(1, 2)])
#> [1] -0.00191775

# y is a function of x
codec(y, x)
#> [1] 0.8649652

# conditional on the last column of x, y is a function of the first two columns
codec(y, x[, c(1, 2)], x[, 3])
#> [1] 0.8647881
# conditional on x[, 3], y is independent of x[, 1]
codec(y, x[, 1], x[, 3])
#> [1] 0.02025655

In the following example we have generated a 10000×2 matrix x, with i.i.d normal standard elements. Each row of this matrix represent a point in the 2-dimensional plane. We call the square distance of this point from the origin y and its angle with the horizontal axis, z. It can be seen that y and z are independent of each other, but conditional on any of the coordinates of the given point y can be fully determind using z.

n = 1000
p = 2
x = matrix(rnorm(n * p), ncol = p)
y = x[, 1]^2 + x[, 2]^2
z = atan(x[, 1] / x[, 2])
# y is independent of z
codec(y, z)
#> [1] -0.01983902
# conditional on x[, 1], y is a function of z
codec(y, z, x[, 1])
#> [1] 0.8077018