Introduction

When validating a statistical methodology it is often useful to test the method using simulated data. Here we test whether a method can correctly identify no pattern when the data is random, and similarly detect the correct pattern when there is a known structure. This is another way to test whether given the amount and temporal resolution of the data is sufficient to correctly identify the desired patterns.

Libraries

#if netTS is not yet installed you can use the following code to install it from github:
#devtools::install_github("tbonne/netTS")

library(ggplot2)
library(netTS)
library(igraph)
library(reshape2)
library(lubridate)
library(mgcv)
library(dplyr)

Measurement function

Mean Betweenness

#1. use mean betweenness of the network
mean_betweenness <- function (net) {
  md <- mean(betweenness(net))
    return(md)
}

Testing for structure when there is none

Simulate data: netts provides a function to simulate event data (sim.event.data). It is possible to feed this simulation function a pre-specified network or to have no underlying network when simulating data. When no network is provided the probability of events occurring between particular individuals is based on a randomly generated network. Below we simulate random interactions, with no inter-individual dependence. Here we simulate 100 sampling periods (scans), distributing 2 scans per day.

#simulate some data (too long to run)
df.rand <-sim.events.data(nodes=30, sampling.periods = 100, sampling.periods.per.day = 2)

#view the changes in individual probabilities of performing an interaction
head(df.rand$prob)

#view the changes in observed individual interaction events
head(df.rand$beha)

Get network measures over time and test with permutations.

#1. Extract network strength over time
str.30 <- graphTS(df.rand$beha,windowsize = days(30), windowshift = days(1), measureFun = mean_betweenness, nperm=1000)
## [1] "20 networks extracted"
## [1] "perm"
#2. Plot the results
graphTS.plot(str.30, plotCI = T)

Run a statistical model

#convert dates to time from start
str.30$time <- as.numeric(as.duration(str.30$windowstart-(min(str.30$windowstart))))
str.30$time <- scale(str.30$time)

#fit a generalized additive model to test for a temporal trend
fit.gam.30 <- gam(measure ~ s(time), data = str.30)

#plot estimated temporal trend in strength (should be flat!)
plot(fit.gam.30)

We can see from the outputs that, given the sample size and rate, we would have to be very careful about drawing inferences about small changes in time. I.e., random processes are sufficient to produce the observed changes in the time series.

Testing for structure when there is some

Simulate data: here we simulate event data, this time using an underlying structure.

#1. create an underlying network
net1 = sample_k_regular(30, k=3, directed = FALSE, multiple = FALSE) 
E(net1)$weight <- runif(ecount(net1))
true.betweeneess <- mean(betweenness(net1))
plot(net1, edge.width=E(net1)$weight)

#2. simulate event data using the 'true' network: net1
df.obs <-sim.events.data(nodes=30, sampling.periods = 100,sampling.periods.per.day = 2, true.net = net1 )

Get network measures over time and test with permutations.

#1. Extract mean network betweenness over time
bet.30.true <- graphTS(df.obs$beha,windowsize = days(30), windowshift = days(1), measureFun = mean_betweenness, nperm=1000)
## [1] "20 networks extracted"
## [1] "perm"
#2. Plot the results
graphTS.plot(bet.30.true, plotCI = T)

Run a statistical model

#convert dates to time from start
bet.30.true$time <- as.numeric(as.duration(bet.30.true$windowstart-(min(bet.30.true$windowstart))))
bet.30.true$time <- scale(bet.30.true$time)

#fit a generalized additive model to test for a temporal trend
fit.gam.30.true <- gam(measure ~ s(time), data = bet.30.true)

#true mean betweeness estimated?
summary(fit.gam.30.true)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## measure ~ s(time)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 42.48944    0.06578   645.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value   
## s(time) 2.537  3.159 5.737 0.00619 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.485   Deviance explained = 55.4%
## GCV = 0.10514  Scale est. = 0.086548  n = 20
#plot estimated temporal trend in strength (should be flat!)
plot(fit.gam.30.true)

Similar to the case of no-structure above, we find a small effect of time here suggesting, again, that given the sample size and rate, we would have to be very careful about drawing inferences about small changes in time. The main idea here is that these simulated dataset can aid in the interpretation of statistical models.

More detailed simulations

It is also possible to initialize the simulation for the generation of more nuanced event data, where individuals behaviour is dependent on previous behaviour, behaviour of others, or covariates. Depending on the question of interest, the ability to simulate data with these dependencies and then test statistical models on these data can be very valuable for model construction and making inference from model results.

The simulation function allows the user to input individual mean behaviour (A), inter-individual dependence in mean behaviour (B), effects of covariates (C), inter-individual dependence in deviations from the mean, and the scale of the error (e). Below we show an example of generating individual behaviour where the deviations around individual means are inter-related.

set.seed(1111)
nodes = 5

#1.Set a known underlying network
net1 = sample_k_regular(nodes, k=2, directed = FALSE, multiple = FALSE) 
E(net1)$weight <- runif(ecount(net1))
true.betweeneess <- mean(betweenness(net1))
plot(net1, edge.width=E(net1)$weight)

#2.Set individuals to start with a mean probability of performing the behaviour between 0.4 and 0.6
individuial.behavioural.mean <- runif(nodes,0.4,0.6)

#3.Set the inter-individual dependence in deviations from their means: all individuals are dependent on their previous deviations from the mean, and individual 1 is dependent on 2.
individaul.dependence <- diag(nodes)*0.79
individaul.dependence[1,2]<-0.20 #here individual 1 and 2 are interdependent
individaul.dependence[2,1]<-0.20

#run the simulation
df.obs <-sim.events.data(nodes=nodes, sampling.periods = 1000,sampling.periods.per.day = 30, true.net = net1, A=individuial.behavioural.mean, D=individaul.dependence)

plot individual probabilities

#look at our two individuals who are dependent
plot(df.obs$probs[,1],type="l",col="red",ylim=c(0.0,1.0))
lines(df.obs$probs[,2],type="l",col="blue")
lines(df.obs$probs[,3],type="l",col="grey80")
lines(df.obs$probs[,4],type="l",col="grey80")
lines(df.obs$probs[,5],type="l",col="grey80")

#look at a cross-correlation plot between the two inter-dependent individuals
ccf(df.obs$probs[,1],df.obs$probs[,2])

Plot the observed interactions

#Here we extract node out-strength over time
out_strength <- function(net){
  strength(net, mode="out")
}

df.sim<-nodeTS(df.obs$beha, windowsize = days(10),windowshift = days(1), measureFun = out_strength, directed = T)
## [1] "24 networks extracted"
#plot the results
nodeTS.plot(df.sim)

#ccf
ccf(df.sim[,1],df.sim[,2])

We can see from the above plot that, with the size and resolution of the simulated dataset, we can extract the dependence between individual 1 and 2.