The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 1, Issue 1 (2023)
  4. Detection of Anomalies in Traffic Flows ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • More
    Article info Full article Related articles

Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data
Volume 1, Issue 1 (2023), pp. 84–94
Qing He   Charles W. Harrison   Hsin-Hsiung Huang  

Authors

 
Placeholder
https://doi.org/10.51387/23-NEJSDS20
Pub. online: 11 January 2023      Type: Methodology Article      Open accessOpen Access
Area: Statistical Methodology

Accepted
4 January 2023
Published
11 January 2023

Abstract

Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.

References

[1] 
Algorithms for threat detection (atd). URL. https://www.nsf.gov/pubs/2020/nsf20531/nsf20531.htm.
[2] 
Bai, S., He, Z., Lei, Y., Wu, W., Zhu, C., Sun, M. and Yan, J. Traffic anomaly detection via perspective map based on spatial-temporal information matrix. In CVPR Workshops 117–124 (2019).
[3] 
Banerjee, A., Dunson, D. B. and Tokdar, S. T. Efficient gaussian process regression for large datasets. Biometrika 100(1) 75–89 (2013). https://doi.org/10.1093/biomet/ass068. MR3034325
[4] 
Beaumont, M. A. Approximate bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics 379–406 (2010). https://doi.org/10.1146/annurev-statistics-030718-105212. MR3939526
[5] 
Bhaskaran, K. and Smeeth, L. What is the difference between missing completely at random and missing at random? International Journal of Epidemiology 43(4) 1336–1339 (2014).
[6] 
Calafate, C. T., Soler, D., Cano, J.-C. and Manzoni, P. Traffic management as a service: The traffic flow pattern classification problem. Mathematical Problems in Engineering (2015).
[7] 
Chen, T. and Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘16, New York, NY, USA 785–794 (2016). ACM. http://doi.acm.org/10.1145/2939672.2939785. ISBN 978-1-4503-4232-2.
[8] 
Csilléry, K., Blum, M. G., Gaggiotti, O. E. and François, O. Approximate bayesian computation (abc) in practice. Trends in Ecology & Evolution 25(7) 410–418 (2010).
[9] 
Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association 111(514) 800–812 (2016). https://doi.org/10.1080/01621459.2015.1044091. MR3538706
[10] 
Dutta, R., Schoengens, M., Onnela, J.-P. and Abcpy, A. M. A user-friendly, extensible, and parallel library for approximate Bayesian computation. In Proceedings of the platform for advanced scientific computing conference 1–9 (2017).
[11] 
Friedman, J., Hastie, T. and Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 28(2) 337–407 (2000). https://doi.org/10.1214/aos/1016218223. MR1790002
[12] 
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of Statistics 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451. MR1873328
[13] 
Kut, A. and Birant, D. Spatio-temporal outlier detection in large databases. Journal of Computing and Information Technology 14(4) 291–297 (2006).
[14] 
Little, R. J. and Rubin, D. B. Statistical analysis with missing data 793. John Wiley & Sons, (2019). https://doi.org/10.1002/9781119013563. MR1925014
[15] 
Mihaita, A.-S., Li, H. and Rizoiu, M.-A. Traffic congestion anomaly detection and prediction using deep learning (2020). arXiv preprint. arXiv:2006.13215.
[16] 
Münz, G., Li, S. and Carle, G. Traffic anomaly detection using k-means clustering. In GI/ITG Workshop MMBnet 13–14 (2007).
[17] 
Neal, R. M. Priors for infinite networks. In Bayesian Learning for Neural Networks 29–53. Springer, (1996).
[18] 
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Scikit-learn, E. D. Machine learning in Python. Journal of Machine Learning Research 12. 2825–2830 (2011). MR2854348
[19] 
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. et al.Scikit-learn: Machine learning in python. the Journal of machine Learning research 12. 2825–2830 (2011). MR2854348
[20] 
Quinonero-Candela, J., Rasmussen, C. E. and Williams, C. K. Approximation methods for gaussian process regression. In Large-scale kernel machines 203–223. MIT Press, (2007).
[21] 
Rasmussen, C. E. Gaussian processes in machine learning. In Summer school on machine learning 63–71. Springer, (2003).
[22] 
Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N. and Aigrain, S. Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 20110550 (1984). 2013. https://doi.org/10.1098/rsta.2011.0550. MR3005668.
[23] 
Schulz, E., Speekenbrink, M. and Krause, A. A tutorial on gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology 85. 1–16 (2018). https://doi.org/10.1016/j.jmp.2018.03.001. MR3852577.
[24] 
Snoek, J., Larochelle, H. and Adams, R. P. Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25 (2012).
[25] 
Sofuoglu, S. E. and Gloss, S. A. Tensor-based anomaly detection in spatiotemporal urban traffic data. In Signal Processing 108370 (2021).
[26] 
Williams, C. K. Computing with infinite networks. Advances in neural information processing Systems 295–301 (1997).
[27] 
Wilson, A. G., Hu, Z., Salakhutdinov, R. and Xing, E. P. Deep kernel learning. In Artificial intelligence and statistics 370–378. PMLR, (2016).
[28] 
Wilson, A. G., Hu, Z., Salakhutdinov, R. R. and Xing, E. P. Stochastic variational deep kernel learning. Advances in Neural Information Processing Systems 29. 2586–2594 (2016).
[29] 
Zhang, M., Li, T., Shi, H., Li, Y. and Hui, P. A decomposition approach for urban anomaly detection across spatiotemporal data. In IJCAI International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence (2019).
[30] 
Zhang, Z., He, Q., Tong, H., Gou, J. and Li, X. Spatial-temporal traffic flow pattern identification and anomaly detection with dictionary-based compression theory in a large-scale urban network. Transportation Research Part C: Emerging Technologies 71. 284–302 (2016).

Full article Related articles PDF XML
Full article Related articles PDF XML

Copyright
© 2023 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Anomaly detection Gaussian process Spatiotemporal High dimensional Missing completely at random

Funding
This research was supported in part by the National Science Foundation grant, DMS-1924792.

Metrics
since December 2021
821

Article info
views

245

Full article
views

269

PDF
downloads

94

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy