Random forest prediction

11/17/2023 0 Comments

Random forest prediction

ML can not only be used to generate predictions, but also to identify the strongest predictors for a certain outcome. But when used together with methods to facilitate interpretability, there are opportunities for these fields to also incorporate ML techniques to analyze the wealth of data that arise from longitudinal cohort studies 8, 9, 11, 12. The limited use of ML in the field of epidemiology and public health research may be partly because ML models are often considered difficult to grasp. In particular, application of ML to longitudinal data is still in its infancy. While in other research fields the use of ML is already established, within epidemiology and public health research the use of these techniques is still limited, although an increasing number of examples exist where the use of ML has contributed to building prediction models for both diagnosis and prognosis in healthcare research 9, 10. In doing so, ML techniques often do not make assumptions on the exact functional form of the model and attempt to learn the model form directly from the data, such that it maximizes prediction accuracy 5. It can be used to automatically create models that are able to predict the outcome with high accuracy and to identify the most important predicting exposures. ML is able to analyze large amounts of data consisting of numerous exposures 8, 9. Machine learning (ML), which has been defined as “a family of mathematical modelling techniques that uses a variety of approaches to automatically learn from data, without explicit programming” 7, offers a solution to deal with limitations of traditional statistical techniques. Nonetheless, these assumptions are frequently ignored or violated, thereby potentially biasing study results 5, 6. However, these assumptions can often not be verified, and if they are violated, they may potentially lead to wrong conclusions. Second, in such regression models it is often assumed that the relation between each exposure and the outcome is linear in nature and that there are no (or a limited number of prespecified) interactions between exposures. First, the inclusion of many (repeated measurements of) exposures poses considerable challenges, as traditional regression models are generally not well-suited to deal with large numbers of covariates 4. In trying to predict health based on multiple exposures, we are faced with several challenges. Since a wealth of data from longitudinal cohort studies is currently available, with each study measuring more aspects of the exposome 3, there is a need for methods to adequately analyze these large amounts of data. Long-term cohort studies applying this approach can help in identifying predictors of health in older age, which is important for personalized prevention. specific/general external, and internal environment) that are repeatedly measured over the life-course 2. With increasing knowledge on risk factors, an ‘exposome approach’ is often advocated, taking into account a broad range of exposures from different domains (i.e. The development of health problems in older age is influenced by a multitude of risk factors to which people are exposed over the life course 1. The approach is context-independent and broadly applicable. Our approach demonstrates that ML can be interpreted more than widely believed, and can be applied to identify important longitudinal predictors of health over the life course in studies with repeated measures of exposure. Nine exposures from different exposome-related domains were largely responsible for the model’s performance, while 87 exposures seemed to contribute little to the performance. The RF model’s ability to discriminate poor from good self-perceived health was acceptable (Area-Under-the-Curve = 0.707). To facilitate interpretation, exposures were summarized by expressing them as the average exposure and average trend over time. The relation between predictors and outcome was visualized with partial dependence and accumulated local effects plots. Random Forest (RF) was used to identify the strongest predictors due to its favorable prediction performance in prior research. Our application involves studying the relation between exposome and self-perceived health based on the 30-year running Doetinchem Cohort Study. We propose an approach in which machine learning is used to identify longitudinal exposome-related predictors of health, and illustrate its potential through an application. Due to the wealth of exposome data from longitudinal cohort studies that is currently available, the need for methods to adequately analyze these data is growing.

0 Comments

YOUR CART

Random forest prediction

Leave a Reply.

Author

Archives

Categories