Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data
Volume 1, Issue 1 (2023), pp. 84–94
Pub. online: 11 January 2023
Type: Methodology Article
Open Access
Area: Statistical Methodology
Accepted
4 January 2023
4 January 2023
Published
11 January 2023
11 January 2023
Abstract
Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.
References
Algorithms for threat detection (atd). URL. https://www.nsf.gov/pubs/2020/nsf20531/nsf20531.htm.
Banerjee, A., Dunson, D. B. and Tokdar, S. T. Efficient gaussian process regression for large datasets. Biometrika 100(1) 75–89 (2013). https://doi.org/10.1093/biomet/ass068. MR3034325
Beaumont, M. A. Approximate bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics 379–406 (2010). https://doi.org/10.1146/annurev-statistics-030718-105212. MR3939526
Chen, T. and Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘16, New York, NY, USA 785–794 (2016). ACM. http://doi.acm.org/10.1145/2939672.2939785. ISBN 978-1-4503-4232-2.
Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association 111(514) 800–812 (2016). https://doi.org/10.1080/01621459.2015.1044091. MR3538706
Friedman, J., Hastie, T. and Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 28(2) 337–407 (2000). https://doi.org/10.1214/aos/1016218223. MR1790002
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of Statistics 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451. MR1873328
Little, R. J. and Rubin, D. B. Statistical analysis with missing data 793. John Wiley & Sons, (2019). https://doi.org/10.1002/9781119013563. MR1925014
Mihaita, A.-S., Li, H. and Rizoiu, M.-A. Traffic congestion anomaly detection and prediction using deep learning (2020). arXiv preprint. arXiv:2006.13215.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Scikit-learn, E. D. Machine learning in Python. Journal of Machine Learning Research 12. 2825–2830 (2011). MR2854348
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. et al.Scikit-learn: Machine learning in python. the Journal of machine Learning research 12. 2825–2830 (2011). MR2854348
Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N. and Aigrain, S. Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 20110550 (1984). 2013. https://doi.org/10.1098/rsta.2011.0550. MR3005668.
Schulz, E., Speekenbrink, M. and Krause, A. A tutorial on gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology 85. 1–16 (2018). https://doi.org/10.1016/j.jmp.2018.03.001. MR3852577.