rEDM的用法(2)——Determining causal variables by convergent cross mapping (CCM)

Load package and time series

We demonstrate the efficacy of CCM to correctly identify causation using the Moran effect model and the two-species competition model, following Sugihara et al. (2012). In the Moran effect model, we simulate adult-recruitment dynamics as commonly used in fisheries. In this model, two populations do not have any biological interaction, while both are driven by the same environmental factor.Despite that no interaction exists, the shared environmental driver leads to strong correlation between the two populations (Fig. 1b).

In the two-species competition model, we find no lasting correlation between the two species, as the sign of correlation flips through time (Fig. 1d), which is an example of mirage correlation, a hallmark of nonlinear systems.  Both models are simulated for 10000 time steps, but only the last 1000 data are kept for further analyses in order to exclude the transient dynamics.

# Loading R packages
library(rEDM)
library(Kendall)

# Loading the time series for the Moran effect and mirage correlation models 
dam <- read.csv('ESM3_Data_moran.csv',header=T) # Moran effect
dac <- read.csv('ESM4_Data_competition.csv',header=T) # Mirage correlation

# Data normalization
dac.n <- scale(dac[,-1], center = TRUE, scale = TRUE)
dam.n <- scale(dam[,-1], center = TRUE, scale = TRUE)

In the rEDM package, the function rEDM::ccm() is used for CCM analyses. Here, we illustrate how to implement the CCM causality test that examines whether N2 causes N1 in Moran effect model. To test this, we design a cross-mapping from N1 to N2 by the augument lib_column="N1" and target_column="N2". In CCM analysis, we firstly need to determine the best embedding dimension for the cross-mapping. At this step, we perform the cross-mapping with a fixed library size (lib_sizes = 1000). Then, we use the time lags of N1 to predict the lagged one time step values of N2 by setting the augment tp=-1 and determine the optimal E based on the hindcast skill to avoid over-fitting (Deyle et al. 2016). Similarly, we can repeat the same process to determine the embedding dimension in the cross-mapping from N2 to N1. Next, we carried out the CCM causality test with varying library size.

libs <- c(seq(20,80,20),seq(100,1000,100))

Moran effect

In CCM analysis, we use the time lags of N1 to predict the current value of N2 (tp=0). To precisely estimate the predictive skill (ρ), we generate the 200 random samples with replacement (replace=T) for each library length L (num_samples=200). As such, we obtain the sampling distribution of predictive skill ρ(L). A random seed is setup to make the results repeatable (RNGseed=2301). Finally, we offer a simple statistical test for the convergence of CCM by Mann-Kendall Tau trend test when the null time series is not easily accessible. This is a nonparametric test for the existence of monotonic increasing trend, using the function Kendall::MannKendall(). Practically, we can evaluate the significance of causations by examining whether all the quantiles of predictive skill demonstrate a significant increasing trend with increasing library size (τ statistics significantly > 0). Similarly, we can repeat all these procedures of CCM analysis to test the causality for the two species competition model with mirage correlation.

## CCM analysis of the Moran effect model, N1 and N2
# Design a sequence of library size
libs <- c(seq(20,80,20),seq(100,1000,100))

# Moran effect model: N1 cross-mapping N2 (i.e. testing N2 as a cause of N1)
# Determine the embedding dimension
E.test.n1=NULL
for(E.t in 2:8){
  cmxy.t <- ccm(dam.n, E = E.t, lib_column = "N1", target_column = "N2",
lib_sizes = 1000, num_samples = 1, tp=-1,random_libs = F)
  E.test.n1=rbind(E.test.n1,cmxy.t)}
(E_n1 <- E.test.n1$E[which.max(E.test.n1$rho)[1]]) # the optimal E

# CCM analysis with varying library size (L)
n1_xmap_n2 <- ccm(dam.n, E=E_n1,lib_column="N1", target_column="N2",
                lib_sizes=libs, num_samples=200, replace=T, RNGseed=2301)

# Calculate the median, maximum, and 1st & 3rd quantile of rho for each L
n12q=as.matrix(aggregate(n1_xmap_n2[,c('rho')],by = list(as.factor(n1_xmap_n2$lib_size)), quantile)[,'x'])
apply(n12q[,2:5],2,MannKendall)

###########################################################
# Moran effect model: N2 cross-mapping N1 (i.e. testing N1 as a cause of N2)
# Determine the embedding dimension
E.test.n2=NULL
for(E.t in 2:8){
  cmxy.t <- ccm(dam.n, E = E.t, lib_column = "N2", target_column = "N1",
               lib_sizes = 1000, num_samples = 1, tp=-1, random_libs = F)
  E.test.n2=rbind(E.test.n2,cmxy.t)}
(E_n2 <- E.test.n2$E[which.max(E.test.n2$rho)[1]])

# CCM analysis
n2_xmap_n1 <- ccm(dam.n, E=E_n2,lib_column="N2", target_column="N1",
                lib_sizes=libs, num_samples=200, replace=T, RNGseed=2301)

# Calculate the (25%,50%,75%,100%) quantile for predictive skills
n21q=as.matrix(aggregate(n2_xmap_n1[,c('rho')],by = list(as.factor(n2_xmap_n1$lib_size)), quantile)[,'x'])
apply(n21q[,2:5],2,MannKendall)

# Plot forecast skill vs library size
# Plot N1 cross-mapping N2
plot(n12q[,3]~libs,type="l",col="red",ylim=c(0,1),lwd=2,
     main="Convergent cross mapping CCM",xlab="Library size",ylab=expression(rho)) # median predictive skill vs library size (or we can use mean predictive skill)
lines(n12q[,2]~libs,col="red",lwd=1,lty=2) # 1st quantile 
lines(n12q[,4]~libs,col="red",lwd=1,lty=2) # 3rd quantile

# Plot N2 cross-mapping N1
lines(n21q[,3]~libs,col="blue",lwd=1,lty=1) # median 
lines(n21q[,2]~libs,col="blue",lwd=1,lty=2) # 1st quantile 
lines(n21q[,4]~libs,col="blue",lwd=1,lty=2) # 3rd quantile
legend(600,1,c("N1 xmap N2","N2 xmap N1"),lty=c(1,1),col=c("red","blue"))
abline(h=cor(dam[,'N1'],dam[,'N2']),lty=3)

(It will take a little time...)  

 

mirage correlations

Following the same procedure, we apply CCM to test the mutual causation between the two competitors exhibiting mirage correlations.

#########################################################################
#########################################################################
## CCM analysis of the two species competition model with mirage correlation
# Design a sequence of library size
libs <- c(seq(20,80,20),seq(100,1000,100))

# Mirage correlation model: M1 cross-mapping M2 (i.e. testing M2 as a cause of M1)
# Determine the embedding dimension
E.test.x=NULL
for(E.t in 2:8){
  cmxy.t <- ccm(dac.n, E = E.t, lib_column = "M1", target_column = "M2",
lib_sizes = 1000, num_samples = 1, tp=-1,random_libs = F)
  E.test.x=rbind(E.test.x,cmxy.t)}
(E_x <- E.test.x$E[which.max(E.test.x$rho)[1]])

# CCM analysis: varying library size
x_xmap_y <- ccm(dac.n, E=E_x,lib_column="M1", target_column="M2",
                lib_sizes=libs, num_samples=200, replace=T, RNGseed=2301)

# Calculate the median, maximum, and 1st & 3rd quantiles of rho
xyq=as.matrix(aggregate(x_xmap_y[,c('rho')],by = list(as.factor(x_xmap_y$lib_size)), quantile)[,'x'])
apply(xyq[,2:5],2,MannKendall)

###########################################################
# Mirage correlation model: M2 cross-mapping M1 (i.e. testing M1 as a cause of M2)
# Determine the embedding dimension
E.test.y=NULL
for(E.t in 2:8){
  cmxy.t <- ccm(dac.n, E = E.t, lib_column = "M2", target_column = "M1",
                lib_sizes = 1000, num_samples = 1,tp=-1,random_libs = F)
  E.test.y=rbind(E.test.y,cmxy.t)}
(E_y <- E.test.y$E[which.max(E.test.y$rho)[1]])

# CCM analysis
y_xmap_x <- ccm(dac.n, E=E_y,lib_column="M2", target_column="M1",
                lib_sizes=libs, num_samples=200, replace=T, RNGseed=2301)

# Calculate the (25%,50%,75%,100%) quantile for predictive skills
yxq=as.matrix(aggregate(y_xmap_x[,c('rho')],by = list(as.factor(y_xmap_x$lib_size)), quantile)[,'x'])
apply(yxq[,2:5],2,MannKendall)

# Plot forecast skill vs library size
# Plot X cross-mapping Y
plot(xyq[,3]~libs,type="l",col="red",ylim=c(0,1),lwd=2,
     main="Convergent cross mapping CCM",xlab="Library size",ylab=expression(rho)) # median predictive skill vs library size (or we can use mean predictive skill)
lines(xyq[,2]~libs,col="red",lwd=1,lty=2) # 1st quantile 
lines(xyq[,4]~libs,col="red",lwd=1,lty=2) # 3rd quantile

# Plot Y cross-mapping X
lines(yxq[,3]~libs,col="blue",lwd=1,lty=1) # median 
lines(yxq[,2]~libs,col="blue",lwd=1,lty=2) # 1st quantile 
lines(yxq[,4]~libs,col="blue",lwd=1,lty=2) # 3rd quantile
legend(600,0.4,c("M1 xmap M2","M2 xmap M1"),lty=c(1,1),col=c("red","blue"))
abline(h=cor(dac[,'M1'],dac[,'M2']),lty=3)

References

Deyle ER, Maher MC, Hernandez RD, Basu S, Sugihara G (2016) Global environmental drivers of influenza. Proc Natl Acad Sci USA 113: 13081-13086. DOI: 10.1073/pnas.1607747113(PDF)

Sugihara G, May R, Ye H, Hsieh CH, Deyle E, Fogarty M, Munch S (2012) Detecting causality in complex ecosystems. Science 338: 496-500. DOI: 10.1126/science.1227079(PDF)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章