**********************************************************************
*
*  Reproducibility Information for
*
*  Spatio-Phylogenetic Multi-Species Distribution Models
*  by A. Kaldhusdal, R. Brandl, J. Mueller, L. Moest and T. Hothorn
*
**********************************************************************

Content

### bird data
data/birdsData.Rdata

### analysis of bird data
analysis_birdData/
mBirdsESM1_2.R  mBirdsESM3_2.R

### PSGLMM R code
R/
makeData.R  makeEnvir.R  psglmm.R  psglmmSim.R

### simulation study
simulationStudy/
simModel11bin.R   simModel12.R      simModel21pois.R  simModel23bin.R
simModel11pois.R  simModel13bin.R   simModel21.R      simModel23pois.R
simModel11.R      simModel13pois.R  simModel22bin.R   simModel23.R
simModel12bin.R   simModel13.R      simModel22pois.R
simModel12pois.R  simModel21bin.R   simModel22.R

Description

A PSGLMM may be fitted by calling the main function psglmm(). This function 
utilises the functions lmer() (for Gaussian responses) or glmer() (for binary or
Poisson responses) from the lme4 package, but with a twist. Instead of having 
the covariance structure in the data estimated by the function, psglmm() 
requires the user to provide an assumption for this in advance, which is then 
incorporated into the estimation procedure. The covariance structure is passed 
to the function as a list containing one element per random effect component in
the model. 

A formula is passed to the function in the known mixed model (lme4) form, e.g.
  y ~ spp - 1 + (spp - 1 | plot).
With spp being a factor expressing to which species an observation belongs and
plot expressing to which plot the observation was made, the above model assumes
a fixed effect for each species and a random effect in the plots for each 
species. At this stage psglmm() is merely an extension of (g)lmer() for which 
the covariance structure may be assumed to be known.

If several possible covariance structures are plausible, psglmm() also allows us
to test which of these is best fitted to the data. In doing this we pass one
random effect component for each covariance structure to be tested, e.g.:
  y ~ spp - 1 + (spp - 1 | plot) + (spp - 1 | plot).
With A_ind being a unity matrix with a size equal to the number of species,
A_phylo being a phylogenetic correlation matrix for the species and B being
a correlation matrix expressing the spatial dependency of the plots, we would
pass the following list of covariance matrices (VCV) to the function (naming
the list elements eases the interpretation of the model output):
  list("independence"= kronecker(B, A_ind), "phylogeny"= kronecker(B, A_phylo)).
To perform the model selection, the argument msel must be utilised. The model
selection procedure performs a predictive cross-validation and offers three
different proper scoring rules for this task: LS or DSS for Gaussian responses, 
BS or LS for binary responses and DSS for Poisson responses.


psglmm() first calls (g)lFormula() (lme4) to set up the needed model objects. 
These are then modified using several helper functions before they are passed to 
the lme4 fitting algorithms. These helper functions are briefly described in the 
following:
- As lmer() is only allowed to fit less random effects than there are 
  observations and the above mentioned formulas assume one random effect for 
  each observation, we apply the code in lines 301-350 (see psglmm.R) to disable 
  this limitation. Without executing this code a Gaussian model of the above 
  form cannot be fitted.
- adaptFormula() is used to ensure that the order of the random effect 
  components in the formula passed on to the lme4 fitting algorithms is indeed
  the same as in the formula passed to psglmm() by the user.
- randomTerms() is used to set up the random effect terms using the provided
  covariance structures.

- makeVARALL() simply sets up a data.frame for the model results.
- modelSelection() is the function which is called if an automated model 
  selection is wanted (argument msel set to TRUE in psglmm()). The function in 
  turn calls the helper functions pred.xval() in which the predictive cross-
  validation is performed. It also allows for an LR test to be performed to test 
  if the variance estimates for the covariance components are > 0 (because of 
  the issues concerning the testing of variance parameters on the boundary of 
  their parameter space, the calculated p-value should not be overly 
  interpreted).
- LR.test() performs the mentioned LR test.
- pred.xval() is the function in which the predictive cross-validation is
  actually performed. The function is based on the algorithm outlined by Braun 
  et. al (2013). The iterated weighted least squares algorithm is hard coded to 
  abort if no convergence is achieved within 10000 iterations. If this happens, 
  or if any other numerical issue arises, the model selection is aborted, but 
  the model fitting is continued. A corresponding warning is displayed in the 
  model output. Depending on which proper scoring rule is chosen by the user
  (argument msel passed to psglmm()), one of the following helper functions is
  called:
- BS(), DSS(), LS() incorporate the three proper scoring rules "Brier Score",
  "Dawid-Sebastiani Score" and "Logarithmic Score".
The following helper functions are used to simplify the calculation of some 
tasks, which were performed often and thus save some space and make the code a
little easier to read:
- nuniq() returns the number of unique elements in a vector.
- equal() checks if all elements of a vector are identical.
- IFELSE() works like ifelse() but allows for the objects to be returned in 
  either case to be of different classes.
- makePolys() actually didn't find any use in the final version of the paper, 
  but is practical for setting up the polygons of the plots when displaying 
  the estimated values from the model.


The results can be reproduced using the following package versions:

lme4 version 1.0-0 (from R-forge), 1.1-6 from CRAN
Matrix version 1.1-4
fields version 6.8
mvtnorm version 0.9-9995
picante version 1.6-2
ggplot2 version 1.0.0