Open access peer-reviewed chapter - ONLINE FIRST

Machine Learning and Seismic Hazard: A Combination of Probabilistic Approaches for Probabilistic Seismic Hazard Analysis

Written By

Roberto Ortega

Submitted: 16 July 2024 Reviewed: 27 July 2024 Published: 22 August 2024

DOI: 10.5772/intechopen.1006533

Exploring the Unseen Hazards of Our World IntechOpen
Exploring the Unseen Hazards of Our World Edited by Mohammad Mokhtari

From the Edited Volume

Exploring the Unseen Hazards of Our World [Working Title]

Dr. Mohammad Mokhtari

Chapter metrics overview

19 Chapter Downloads

View Full Metrics

Abstract

Probabilistic seismic hazard analysis (PSHA) integrates seismology with invitation of civil engineering. Allin Cornell’s 1968 work, developed with Dr. Emilio Rosenblueth and Dr. Luis Esteban Maraboto, revolutionized earthquake engineering by making seismology practical for construction. Cornell’s deterministic equations, once valued for their elegance and simplicity, can now be enhanced with modern tools. Today, probabilistic seismic hazard analysis (PSHA) is evolving by integrating both deterministic and nondeterministic models, leveraging machine learning (ML) techniques such as Random Forests, Support Vector Machines, Neural Networks, Reinforcement Learning, and Bayesian Inferences. This chapter explores the future of PSHA through these advanced methods. While ML offers powerful solutions, it is crucial to recognize that it is not a one-size-fits-all answer. The optimal approach involves using a hybrid ensemble of systems, each designed to address specific challenges in detail.

Keywords

  • probabilistic seismic hazard analysis (PSHA)
  • ground motion prediction equations (GMPEs)
  • deterministic models
  • Bayesian inference
  • machine learning

1. Introduction

The study of probabilistic seismic hazards is one of the most prolific in engineering. It connects seismology with civil engineering needs. Allin Cornell’s work in 1966 and published in 1968 [1] changed earthquake engineering, making seismology practical for construction. Cornell’s paper, partly written during a stay at the National Autonomous University of Mexico with invitation of Dr. Emilio Rosenblueth [2] and supported by Dr. Luis Esteban Maraboto [3], is highly cited in earthquake engineering. It is important to review its origins and impact, especially the role of geometric probability [3]. Cornell’s paper emphasizes the probability of distance distribution based on area geometry. However, seismologists later focused on ground motion prediction equations (GMPEs) [4]. These equations, which are studied and cited extensively [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], map logarithmic scale problems to linear values for construction, complicating solutions to probabilistic seismic hazard problems [5].

Dr. Cornell’s idea was to present an analytical problem with clear formulas. His deterministic equations outperformed Monte Carlo algorithms based on Markov chains. At the time, there were two main approaches: seeking elegant analytical solutions or using computational models for complex equations. Over time, solving nondeterministic probability problems with stochastic processes, which do not require strict definitions in the equations, has been developed. This includes Bayesian inference, using a priori and a posteriori probability distribution linked by a likelihood distribution.

This approach is praised and discussed for its ability to learn from previous data and adjust parameters by trial and error to solve complex problems. It has led to the development of artificial intelligence and machine learning, which generate equations fitting inferences to established patterns.

Figure 1, represents a simple figure with three probability approaches that are currently used in PSHA; as expected, the best option is that of a hybrid approach that takes advantage of the different advances in methodologies.

Figure 1.

Three probability approaches currently used in probabilistic seismic hazard analysis (PSHA): the actual maps are based on stochastic and analytical approximations. Meanwhile, the machine learning approach is evolving toward hybrid and more robust methods.

Advertisement

2. Machine learning

Currently, artificial intelligence is booming, and all lines of thought are trying to solve problems using this new paradigm [15]. Probabilistic seismic hazard is no exception, and new attempts in this area combine deterministic and nondeterministic models. In this chapter, I will present the combination of both approaches to envision the future of probabilistic seismic hazards through the new resources offered by machine learning.

In the difficult art of solving technical and scientific problems, we are left with the idea of trying new techniques, and many of them are complicated. Sometimes, it is several years to several generations of PhDs. Neural networks and other ML techniques are powerful tools, but they are not suitable for all problems. Their greatest success is in image recognition, speech recognition, and generative text and audio tasks. The real question is, how much of a substitute can PSHA be for the huge breakthrough it has? The reality is that in solving most of the PSHA problems, very ingenious solutions have been chosen and many of them are a breakthrough. But there are others that have not yet been explored, and AI has come to the rescue. One of them is the tree decision problem.

The decision tree is a classic problem in computational science, which has evolved into Random Forests [16]. Decision trees play a crucial role in allowing the modeling of uncertainties and the combination of multiple sources of information. However, conventional decision trees can be limited due to their tendency to overfit and their sensitivity to variations in the input data. This is where Random Forest, an advanced extension of decision trees, provides a significant improvement.

Random Forest is particularly useful in PSHA for integrating and analyzing complex data, such as ground motion prediction equations (GMPE). In a GMPE context, where multiple models are considered to predict ground motion intensity as a function of variables such as earthquake magnitude, distance to the epicenter, and ground characteristics, Random Forest allows combining these multiple GMPEs in an efficient manner. By creating a set of decision trees, each trained with different subsets of data and features, Random Forest can capture the variability and uncertainty inherent in individual GMPEs, providing more robust and accurate ground motion predictions. Figure 2 presents the decision tree and the random forest in the context of PSHA. Note that the Random Forest approach optimizes the decision tree. The decision tree (Figure 2a) is a simple probabilistic combination in ML and is also called as ensemble. A Random Forest (Figure 2b) is the optimal consequence using ML techniques.

Figure 2.

(a) Schematic representation of a decision tree in PSHA and (b) the natural extension of Random Forest. In general, Random Forest allows combining these multiple GMPEs in an efficient manner.

In terms of seismic risk assessment, a crucial aspect is the estimation of Value at Risk (VaR) values, which represents the expected loss in each period due to seismic events. Random Forest can improve these estimates by combining multiple scenarios and seismic data sources. Each decision tree in the forest can represent different seismic hazard scenarios and their impact on b-values. By averaging the results of these multiple trees, Random Forest provides a more reliable and less biased assessment of seismic risk, incorporating uncertainty more effectively.

In addition, the ability of Random Forest to assess the importance of each feature is especially valuable in PSHA. For example, they can identify which factors are most influential in predicting ground motion and estimating b-values. This allows engineers and scientists to focus on the most critical parameters and improve the accuracy of their seismic models.

Advertisement

3. Catalogs and statistical seismology

The elaboration of catalogs represents a significant technological advancement in seismology within the field of machine learning. Less than a decade ago, catalogs contained information on a few tens of thousands of events within a regional network. Today, millions of data points with magnitudes of less than one is detected. Although classic PSHA studies typically consider minimum magnitudes of around 4.5, the extensive information from these catalogs profoundly enhances our understanding of seismic activity. This growth in data and techniques is propelling the development of new big data paradigms [17, 18, 19, 20, 21, 22, 23, 24, 25, 26].

The most important parameter in statistical seismology is what we call the b-value [27], which represents the relationship between events of smaller magnitude and events of larger magnitude. In general, this b-value is used for a probabilistic analysis in the distribution of magnitudes in such a way that it compensates for the number of events of a destructive magnitude in a large period. This value is very important because by defining long periods of time, we can estimate the moment at which we will have destructive magnitudes.

However, in the final stage of PSHA, this distribution is combined with another discrete distribution, which assumes that the events in time have a Poisson distribution, when we think that the occurrence of an earthquake is simply limited to an average value of events in 1 year, in which is once eliminated the relation of its aftershocks we assume that all events are independent. That is to say that the events have no memory in the sense that a seismic event does not depend on any other event in a long period. This assumption is incorrect, and we are facing a huge challenge to find an adequate probabilistic model where to integrate medium- and long-term seasonal changes with non-homogeneous Poisson distributions integrating large-scale knowledge of the physics of rupture coma of geology and the transfer of stresses that occur when triggering large magnitude seismic events. A typical Gutenberg-Richter relationship (Figure 3) is a linear trend with a slope equal to the b-value. It starts from the cut magnitude (Mc). Due to the limitation of data, Mc varies depending on the technological limitations. In the analog period (about 1950–1970), Mc was about 3. Nowadays, with the use of machine learning algorithms, it is expected to be less than 1. At the same time, this amount of data is situated where a Poisson process goes beyond the number of the annual cumulative events and can reach new thresholds, such as monthly or even daily measurements with statistical significance. This opens new perspectives in homogeneous, no-homogeneous Poisson distributions [28, 29, 30].

Figure 3.

Three different examples of Gutenberg-Richter distributions, depending on technological limitations. In the analog period, Mc was about 3. Nowadays, with the use of machine learning algorithms, it is expected to be less than 1 with millions of data.

Analyzing these events temporally is perfect for unsupervised machine learning problems, which analyze clustering and will allow us to search for spatial and temporal relationships. At the same time, it is possible to create new complex and realistic models with new conditions. For example, the non-scientific but valid observation that there are regions that seasonally experience earthquakes [31, 32, 33, 34, 35], such as September in the case of central Mexico, or the interactions between earthquakes [36].

Advertisement

4. Site effects

Site effects are not always part of the PSHA study per se but are part of the site-specific analysis study. When the PSHA maps have been elaborated, each construction then studies the specific conditions of the site, for example, the type of soil and sediments that amplify the seismic waves, and seismic design spectra are constructed that include the site amplification. However, in places like Mexico City, this phenomenon is very complex and must be addressed from the beginning as the most important phenomenon.

In Machine Learning, there are many ways to tackle the learning process. Data-driven and physics-based are two different ways to predict earthquake site response [37]. Data-driven seems to be more adequate because it has more variability and better suits the site effects phenomena [38]. However, there is still a long way to go to understand the details of this complex problem.

Figure 4 shows some of the problems of using GMPE and side effects. A big regression between observed peak ground acceleration (PGA) is performed. The final curve is a parametric function that predicts the PGA given distance. Also, it is important to estimate uncertainties and correct them for a specific site. Usually, a specific site is given in shear wave velocity, which in turn represents rigidity and is a good comparison to correlate seismic wave amplification. The specific site is commonly 720 or 750 m/s.

Figure 4.

Graphical sketch of a ground motion prediction. The equation is a parametric representation that simplifies the prediction. However, the real challenge is that outliers in the upper region need to be studied comprehensively so that site amplification provides a real prediction of a GMP.

The primary challenge lies in effectively integrating the Ground Motion Prediction Equation (GMPE) with a design spectrum in such a way that it virtually eliminates false negatives. This means that the design should ensure that no buildings sustain damage during seismic events. Achieving this goal requires a robust approach that accounts for various factors influencing ground motion and building response.

The GMPEs are mathematical models used to predict the expected ground motion intensity (such as peak ground acceleration or spectral acceleration) at a given site based on parameters such as earthquake magnitude, distance from the fault, and local site conditions. GMPEs are usually isolated and should be integrated into the design spectrum. The design spectrum is a crucial tool in seismic design, representing the maximum expected response (acceleration, velocity, or displacement) of a building to ground motion across a range of frequencies. It is derived from GMPEs and other seismic data to ensure that buildings can safely absorb and dissipate seismic energy. The future should involve ML, which integrates GMPE and design spectrum in the same framework. However, this is not the only problem, as every site is different, and so should be every construction.

Figure 5 shows a simple representation that should perform integrating site effects and GMPE to estimate the design spectrum using data-driven and physics-based hybrid ensembles.

Figure 5.

Machine learning should be performed integrating site effects and GMPE to estimate the design spectrum using data-driven and phyisics-based hybrid ensembles.

A challenge today is to distinguish the types of problems that machine learning helps us with without being tempted to use it for everything. The advantage of this paradigm is the enormous amount of data that can be processed in a short time and finding ways to solve more specific problems.

Advertisement

5. Conclusions

The integration of machine learning into probabilistic seismic hazard analysis (PSHA) marks a significant advancement in the field. By combining deterministic and nondeterministic models, machine learning provides robust methods for analyzing and predicting seismic activity. One prominent application is the use of Random Forests, an advanced form of decision tree, which enhances the modeling of uncertainties and integrates multiple sources of information. This approach is particularly effective for ground motion prediction equations (GMPEs), enabling the combination of various models to yield more accurate predictions of ground motion intensity. Additionally, Random Forests help identify the most influential factors in predicting seismic activity, thus improving the overall accuracy of seismic models.

Despite these advancements, several challenges remain, particularly in integrating GMPEs with design spectra to ensure that buildings can withstand seismic events without damage. This requires a comprehensive understanding of site-specific conditions and the ability to process vast amounts of data efficiently. Machine learning’s capability to handle big data and uncover spatial and temporal relationships offers new perspectives, especially in creating realistic models that account for complex phenomena like seasonal seismic activity. However, distinguishing the appropriate applications of machine learning and addressing the unique characteristics of each site and construction remain crucial. The future of PSHA lies in leveraging machine learning to develop integrated frameworks that combine GMPEs and design spectra, ultimately enhancing the resilience of structures against earthquakes.

Advertisement

Acknowledgments

I acknowledge all my colleagues in AI and seismic engineering, especially Javier Morales, Reynaldo Rubio, Dana Carciumaru, and Israel Santillan, who have worked in both fields of science for many years. The financial support for this research was provided through the grants CF-2023-G-958 and 319664 from CONAHCYT.

Advertisement

Conflict of interest

The author declares no conflict of interest.

References

  1. 1. Cornell CA. Engineering seismic risk analysis. Bulletin of the Seismological Society of America. 1968;58:1583-1606
  2. 2. Esteva L. The legacy of Emilio Rosenblueth. Engineering Structures. 1994;16:459
  3. 3. McGuire RK. Probabilistic seismic hazard analysis: Early history. Earthquake Engineering and Structural Dynamics. 2008;37:329-338
  4. 4. Stewart JP, Douglas J, Javanbarg M, et al. Selection of ground motion prediction equations for the global earthquake model. Earthquake Spectra. 2015;31:19-45
  5. 5. Douglas J, Edwards B. Recent and future developments in earthquake ground motion estimation. Earth-Science Reviews. 2016;160:203-219
  6. 6. Dang H, Wang Z, Zhao D, et al. Ground motion prediction model for shallow crustal earthquakes in Japan based on XGBoost with Bayesian optimization. Soil Dynamics and Earthquake Engineering. 2024;177:108391
  7. 7. Kale Ö, Engineering SA-15th WConfE. A method to determine the appropriate GMPEs for a selected seismic prone region. In: Proceedings of the Fifthteenth World Conference on Earthquake Engineering. Lisbon, Portugal. 2012. iitk.ac.in. Available from: http://www.iitk.ac.in/nicee/wcee/article/WCEE2012_2827.pdf [Accessed: July 7, 2024]
  8. 8. Arroyo D, Ordaz M et al. On the selection of ground-motion prediction equations for probabilistic seismic-hazard analysis. Bulletin of the Seismological Society of America. 2014;104:4. pubs.geoscienceworld.org. Available from: https://pubs.geoscienceworld.org/ssa/bssa/article-abstract/104/4/1860/325529 [Accessed: July 7, 2024]
  9. 9. Slejko D, Valensise G, Meletti C et al. The assessment of earthquake hazard in Italy: A review. Annals of Geophysics = Annali di geofisica. 2022;65. ricerca.ogs.it. Available from: https://ricerca.ogs.it/handle/20.500.14083/14962 [Accessed: July 7, 2024]
  10. 10. Bommer JJ, Douglas J, Scherbaum F et al. On the selection of ground-motion prediction equations for seismic hazard analysis. 2010;81:783. pubs.geoscienceworld.org
  11. 11. Fallah-Tafti M, Amini-Hosseini K et al. Ranking of GMPEs for seismic hazard analysis in Iran using LH, LLH and EDR approaches. Journal of Seismology and Earthquake Engineering. 2017;19:2. jsee.ir. Available from: http://www.jsee.ir/article_47726_330621d09ccfa18823d7f6ee8466e686.pdf [Accessed: July 7, 2024]
  12. 12. Atkinson GM, Adams J. Ground motion prediction equations for application to the 2015 Canadian national seismic hazard maps. Canadian Journal of Civil Engineering. 2013;40:10. cdnsciencepub.com. DOI: 10.1139/cjce-2012-0544 [Accessed: July 7, 2024]
  13. 13. Atkinson GM. Effects of seismicity models and new ground-motion prediction equations on seismic hazard assessment for four Canadian cities. Bulletin of the Seismological Society of America. 2011;101:1. pubs.geoscienceworld.org. Available from: https://pubs.geoscienceworld.org/ssa/bssa/article-abstract/101/1/176/349458 [Accessed: July 7, 2024]
  14. 14. Lam N. A review of stochastic earthquake ground motion prediction equations for stable regions. International Journal of Advances Sciences and Applied Mathematics (Springer). 2023;15:1. DOI: 10.1007/s12572-022-00325-0 [Accessed: July 15, 2024]
  15. 15. Zhou Z-H. Machine Learning. Singapore: Springer; 2021. Epub ahead of print 2021. DOI: 10.1007/978-981-15-1967-3
  16. 16. Biau G, Scornet E. A random forest guided tour. Test. 2016;25:197-227
  17. 17. Corbi F, Sandri L, Bedford J, et al. Machine learning can predict the timing and size of analog earthquakes. Geophysical Research Letters. 2019;46:1303-1311
  18. 18. Cheng Y, Ben-Zion Y, Brenguier F, et al. An automated method for developing a catalog of small earthquakes using data of a dense seismic array and nearby stations. Seismological Research Letters. 2020;91:2862-2871
  19. 19. Chen X, Shearer PM. Analysis of foreshock sequences in California and implications for earthquake triggering. Pure and Applied Geophysics. 2016;173:133-152
  20. 20. Chai C, Maceira M, Santos-Villalobos HJ, et al. Using a deep neural network and transfer learning to bridge scales for seismic phase picking. Geophysical Research Letters;47:e2020GL088651. Epub ahead of print 28 August 2020. DOI: 10.1029/2020GL088651
  21. 21. Ben-Zion Y, Vernon FL, Ozakin Y, et al. Basic data features and results from a spatially dense seismic array on the San Jacinto fault zone. Geophysical Journal International. 2015;202:370-380
  22. 22. Bedle H, Lou X, van der Lee S. Continental tectonics inferred from high-resolution imaging of the mantle beneath the United States, through the combination of USArray data types. Geochemistry, Geophysics, Geosystems;22:e2021GC009674. Epub ahead of print 1 October 2021. DOI: 10.1029/2021GC009674
  23. 23. Brodsky EE. The importance of studying small earthquakes. Science. 2019;364:736-737
  24. 24. Bergen KJ, Johnson PA, De Hoop MV, et al. Machine learning for data-driven discovery in solid earth geoscience. Science. 1979;363:eaau0323. Epub ahead of print 22 March 2019. DOI: 10.1126/SCIENCE.AAU0323
  25. 25. Aster RC, McNamara DE, Bromirski PD. Global trends in extremal microseism intensity. Geophysical Research Letters. 2017;37. Epub ahead of print 1 July 2010. DOI: 10.1029/2010GL043472
  26. 26. Campbell KW. Comprehensive comparison among the Campbell–Bozorgnia NGA-West2 GMPE and three GMPEs from Europe and the Middle East. Bulletin of the Seismological Society of America. 2016;106:2081-2103
  27. 27. El-Isa Z. Spatiotemporal variations in the b-value of earthquake magnitude–frequency distributions: Classification and causes. Tectonophysics (Elsevier). 2014;615. Available from: https://www.sciencedirect.com/science/article/pii/S0040195113007063 [Accessed: July 15, 2024]
  28. 28. Giorgio M. On multisite probabilistic seismic hazard analysis. Bulletin of the Seismological Society of America. 2016;106:3. pubs.geoscienceworld.org. DOI: 10.1785/0120150369
  29. 29. Iervolino I, Giorgio M, Polidoro B et al. Probabilistic seismic hazard analysis for seismic sequences. Vienna Congress on Recent Advances in Earthquake Engineering and Strutural Dynamics. 2013. wpage.unina.it. Available from: http://wpage.unina.it/iuniervo/papers/Iervolino_et_al_VEESD-066.pdf [Accessed: July 15, 2024]
  30. 30. Iervolino I, Giorgio M, et al. Sequence-based probabilistic seismic hazard analysis. Bulletin of the Seismological Society of America. 2014;104:2. pubs.geoscienceworld.org. Available from: https://pubs.geoscienceworld.org/ssa/bssa/article-abstract/104/2/1006/331697 [Accessed: July 15, 2024]
  31. 31. Smirnov VB, Potanina MG, Kartseva TI, et al. Seasonal variations in the b-value of the reservoir-triggered seismicity in the Koyna–Warna region, Western India. Izvestiya, Physics of the Solid Earth. 2022;58:364-378
  32. 32. Lordi A, Neves M, Science SC-F in E et al. Seasonal modulation of oceanic seismicity in the azores. Frontiers in Earth Science. 2022;10:995401. frontiersin.org. DOI: 10.3389/feart.2022.995401/full [Accessed: July 15, 2024]
  33. 33. Saar MO, Zurich E, Manga M et al. Seismicity induced by seasonal groundwater recharge at Mt. Hood, Oregon. Earth and Planetary Science Letters (Elsevier). 2010;214:3-4. DOI: 10.1016/S0012-821X(03)00418-7
  34. 34. Seismicity RW-TM of I. Seasonal seismicity of Northern California before the great 1906 earthquake. The Mechanism of Induced Seismicity (Springer). 2002:7-62. DOI: 10.1007/978-3-0348-8179-1_2 [Accessed: July 15, 2024]
  35. 35. Christiansen L, Hurwitz S, M.O. Saar, S.E. Ingebritsen and Hsieh PA., Seasonal seismicity at western United States volcanic centers. Earth and Planetary Science Letters. 2005;240:2. Available from: https://www.sciencedirect.com/science/article/pii/S0012821X0500587X [Accessed: July 15, 2024]
  36. 36. Sarlis NV, Skordas ES, Varotsos PA, et al. Investigation of the temporal correlations between earthquake magnitudes before the Mexico M8. 2 earthquake on 7 September 2017. Physica A: Statistical Mechanics and Its Applications. 2019;517:475-483
  37. 37. Zhu C, Cotton F, Kawase H et al. How well can we predict earthquake site response so far? Machine learning vs physics-based modeling. Earthquake Spectra. 2023;39:1. journals.sagepub.com. DOI: 10.1177/87552930221116399 [Accessed: July 15, 2024]
  38. 38. Pilz M, Cotton F, International SK-GJ et al. Data-driven and machine learning identification of seismic reference stations in Europe. Data-driven and Machine Learning Identification of Seismic Reference Stations in Europe. 2020;222:2. academic.oup.com. Available from: https://academic.oup.com/gji/article-abstract/222/2/861/5824633 [Accessed: July 15, 2024]

Written By

Roberto Ortega

Submitted: 16 July 2024 Reviewed: 27 July 2024 Published: 22 August 2024