2017 Apr 28

\((\textrm{I})\) Introduction

  • The data set has 40 rows and 9 columns.
  • SITE, Elevation, Profile.Area, Height, Half.height, Latitude, Longitude, No..Species and Total.density
    • Profile.Area: density of forest at a certain height
    • Height: the top of the foliage area
    • Half.height: the median density of the foliage area
  • y variables: No..Species and Total.density.
  • Total.density (\(y_1\))
  • No..Species (\(y_2\))
  • x variables: Elevation, Profile.Area, Height, Half.height, Latitude and Longitude.

\((\textrm{II})\) Scatter Plot Matrix of the data set

  • All data points on the map

\((\textrm{III})\) Model for Total.density (\(y_1\))

  • We decided the first covariate to use in our model by eyeballing the scatterplot matrix and looking for one which seemed to have a linear relationship with our response. In the end we decided on Profile.Area

  • For our next variable, we again looked at the scatter plot matrix for a linear looking relationship. Both Height and Half.height seemed correlated with Total.density.
## Analysis of Variance Table
## 
## Response: Total.density
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## Profile.Area  1 102.174 102.174  9.0203 0.005837 **
## Height        1  22.512  22.512  1.9874 0.170462   
## Half.height   1  35.693  35.693  3.1511 0.087585 . 
## Residuals    26 294.503  11.327                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • Height is not significant when Profile.Area and Half.height are in the model.
  • We have decided to only include Profile.Area and Half.height.
  • So now our model is Total.density ~ Profile.Area + Half.Height
## Analysis of Variance Table
## 
## Response: Total.density
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## Profile.Area  1 102.174 102.174  9.2684 0.005152 **
## Half.height   1  55.064  55.064  4.9949 0.033892 * 
## Residuals    27 297.644  11.024                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • We assume that location plays a role in determining the density of bird population in an area so we decided to test how much of an influence Longitude and Latitude have when added to our current model.

  • Longitude

  • Longitude
## Analysis of Variance Table
## 
## Response: Total.density
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## Profile.Area  1 102.174 102.174  9.6396 0.004557 **
## Half.height   1  55.064  55.064  5.1950 0.031101 * 
## Longitude     1  22.060  22.060  2.0813 0.161054   
## Residuals    26 275.584  10.599                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • Latitude

  • Latitude
## Analysis of Variance Table
## 
## Response: Total.density
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## Profile.Area  1 102.174 102.174  9.8256 0.004235 **
## Half.height   1  55.064  55.064  5.2952 0.029654 * 
## Latitude      1  27.276  27.276  2.6230 0.117392   
## Residuals    26 270.368  10.399                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • For this model, we have decided to not include Latitude and Longitude in their current forms.
  • In our final report we would like to explore the possible effects of categorical predictors derived from Latitude and Longitude. E.g. grouping site by their distance to the coast, transforming the variables, etc.

  • For the last covariate, Elevation, we thought it made sense that it could affect Total.density. However just by looking at the graph, we aren't too optimistic about the impact it could have on our current model. But we check it anyways.

## Analysis of Variance Table
## 
## Response: Total.density
##              Df  Sum Sq Mean Sq F value  Pr(>F)  
## Profile.Area  1  72.779  72.779  5.5999 0.02874 *
## Half.height   1  76.322  76.322  5.8726 0.02553 *
## Elevation     1   6.826   6.826  0.5253 0.47744  
## Residuals    19 246.930  12.996                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • As expected, looking at both the added variable plots and the p-value of the model, it seems that elevation is not that useful in determining Total.density

\((\textrm{IV})\) Model for No..Species (\(y_2\))

  • Same as our previous response variable, we tried to find a starting point by looking at the scatterplot matrix for a covariate with a linear looking relationship with our response. We again settled on Profile.Area.

  • Seems like a good start but we would still like a more accurate model because much of the variance still lies unexplained.

  • For our next variable, we again looked at the scatter plot matrix for a linear looking relationship. Same as with our previous response Total.Density, No..Species seemed linearly correlated with Height and Half.height.
## Analysis of Variance Table
## 
## Response: No..Species
##              Df  Sum Sq Mean Sq F value  Pr(>F)  
## Profile.Area  1  179.67 179.674  4.5200 0.04316 *
## Height        1   16.28  16.275  0.4094 0.52786  
## Half.height   1   13.23  13.227  0.3328 0.56900  
## Residuals    26 1033.52  39.751                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • However, this time, it seems that neither does much to create a better model when Profile.Area is already included.

  • Just to verify that adding one of them is also not a good idea, we check the added variable plots:

  • Again we think that intuitively location would play a factor for determining the number of species of birds in a given location so we check to see if Latitude and Longitude are good inclusions in our current model.

  • We can see that one point on the far left of the added variable plot leverages the least squares fit on this model by a lot. This point, La Giganta is actually in Mexico and is very far away from the other places sampled.
  • This is one of those cases where we wish we had more data around the same location as this outlier. Since that's not possible (requires too much effort), we tentatively say that we do not know if Longitude would be a good addition to our model. One thing to keep in mind is that, if we were to remove LG, the added variable plot would have a much smaller slope and thus probably wouldn't be a great addition to the model.

  • Adding latitude has much of the same issues as adding longitude. All we can really say is that we wish we had more data to work with. The range of values for Latitude and Longitude is simply not large enough for us to make meaningful conclusions off of them. Transformations or categorizing may help.

  • Finally we check if elevation in its natural form is useful to add to our model.

  • Elevation clearly does not contribute much in its current form when we already have Profile.Area in our model.

Conclusion

  • Our tentative models for our two responses are:
  • Total.density ~ Profile.Area + Half.height
  • No..Species ~ Profile.Area
  • While other covariates seem like they could be useful, from the tests we did, it does not seem like they are effective when used in conjunction with the covariates we have chosen to represent our model.

Things to Consider:

  • We said that it seemed like location plays a big part in determining the number of birds in a given area. While we still think that is true, the reason we thought so was because location affects climate.

  • The range of Elevation is very large and so we may want to transform it in some way to better fit our model. (For the last added variable plot, even though the slope was 0, it still looks like there might be a patern of some kind.)

Future Work

Cross validation

Test some 'natural' categories which we can form with our existing covariates

Transformations