Friday, February 6, 2015

R9. Correlation in R

x and y correspond to 50 thousand samples of 2 throws of dice.


The correlation is near 0, as the two samples are independent, and will go to 0 as the number of samples is increased.


The calculated correlation and the result from cor(x,y) are compared.

# ex9.R
num_samples <- 50000
x <- sample(6, num_samples, replace = T)
y <- sample(6, num_samples, replace = T)
df <- data.frame(x,y)
df["a"] <- df["x"]-mean(df[["x"]])
df["b"] <- df["y"]-mean(df[["y"]])
df["ab"] <- df["a"]*df["b"]
df["sqa"] <- df["a"]*df["a"]
df["sqb"] <- df["b"]*df["b"]
den <- sqrt(sum(df[["sqa"]])*sum(df[["sqb"]]))
correlation <- sum(df[["ab"]])/den
print("correlation =")
print(correlation)
print("cor(x,y) =")
print(cor(x,y))
err <- abs(correlation-cor(x,y))
print("error =")
print(err)
# [1] "correlation ="
# [1] 0.0008143925
# [1] "cor(x,y) ="
# [1] 0.0008143925
# [1] "error ="
# [1] 1.12757e-17

No comments:

Post a Comment