Decision Tree Ensemble Approach for Crop Yield Prediction
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V7I2P124Keywords:
Crop Yield Prediction, Corn, Soybean, Gradient Boosting, Random Forest, Machine Learning, Feature Importance, Precision Agriculture, Maryland Eastern ShoreAbstract
This study presents a comprehensive analysis of corn and soybean crop yield determinants on the Maryland Eastern Shore spanning ten growing seasons (2015–2024). A dataset of 2,520 field-season observations, 1,260 corn and 1,260 soybean, across nine counties, was subjected to rigorous descriptive statistical characterization followed by machine learning modeling using a two-tier architecture: a combined Gradient Boosting (GB) model for mixed-crop prediction and interpretation, and crop-specific Random Forest (RF) deployment models for single-crop inference. Corn yields averaged 150.2 bu/ac (SD = 15.1; CV = 10%) and soybean yields averaged 44.0 bu/ac (SD = 8.6; CV = 20%), with soybean exhibiting substantially greater relative variability. The GB combined model achieved near-perfect cross-validated performance (R² = 0.9801, MAE = 5.70 bu/ac, RMSE = 7.68 bu/ac), while crop-specific RF models yielded R² of 0.6835 (corn) and 0.7528 (soybean). Feature attribution analysis identified seeding rate, crop maturity rating, and nitrogen application rate as the dominant agronomic predictors, while June–August precipitation emerged as the primary weather driver, particularly for soybean, where summer moisture governs pod-fill. Partial dependence analysis revealed nonlinear agronomic response curves consistent with established agronomic principles. Together, these findings provide an actionable, data-driven framework for precision crop management in the Mid-Atlantic coastal plain.
References
[1] Lobell, D.B., Schlenker, W., & Costa-Roberts, J. (2011). Climate trends and global crop production since 1980. Science, 333(6042), 616–620.
[2] Schlenker, W., & Roberts, M.J. (2009). Nonlinear temperature effects indicate severe damages to U.S. crop yields under climate change. Proceedings of the National Academy of Sciences, 106(37), 15594–15598.
[3] Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
[4] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
[5] Shahhosseini, M., Hu, G., & Archontoulis, S.V. (2021). Forecasting corn yield with machine learning ensembles. Frontiers in Plant Science, 12, 638569.
[6] van Klompenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture, 177, 105709.
[7] Lundberg, S.M., & Lee, S.I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.
[8] Lobell, D.B., & Burke, M.B. (2010). On the use of statistical models to predict crop yield responses to climate change. Agricultural and Forest Meteorology, 150(11), 1443–1452.
[9] Meng, Q., Chen, X., Lobell, D.B., Cui, Z., Zhang, Y., Yang, H., & Zhang, F. (2016). Growing sensitivity of maize to water scarcity under climate change. Scientific Reports, 6, 19605.
[10] Wang, X., Dunson, D., & Leng, C. (2016). No penalty no tears: Least squares in high-dimensional linear models. Proceedings of the 33rd International Conference on Machine Learning, PMLR, 48:1814-1822. Available from https://proceedings.mlr.press/v48/wange16.html.
[11] Wang, F., Mukherjee, S., Richardson, S., & Hill, S. M. (2020). High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking. Statistics and Computing, 30, 697–719. https://doi.org/10.1007/s11222-019-09914-9
[12] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. arXiv preprint, tarXiv:1603.02754v3
[13] Ke, G., Meng, Q., Finley, T., Wang, T., Chen. W., Ma, W., Ye, Q., & Liu, T-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st Conference on Neural Information Processing Systems, 3149-3157.
[14] Khaki, S., & Wang, L. (2019). Crop yield prediction using deep neural networks. Frontiers in Plant Science, 22;10:621.
[15] Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why tree-based models still outperform deep learning on tabular data. Advances in Neural Information Processing Systems, 35, 507–520.
[16] Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 2674. https://doi.org/10.3390/s18082674
[17] Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67.
[18] Khaki, S., Pham, H., & Wang, L. (2021). Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Scientific Reports, 11, 11132.
[19] Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variables importance by studying an entire class of prediction models simultaneously. arXiv reprint, arXiv:1801.01489v5
[20] Cassman, K. G., Dobermann, A., & Walters, D. T. (2002). Agroecosystems, nitrogen-use efficiency, and nitrogen management. Ambio. 31(2), 132-40.
[21] Beegle, D.B. (2009). Nutrient management. In Agronomy Guide. Penn State Extension. Pennsylvania State University.
[22] Meals, D.W., Dressing, S.A., & Davenport, T.E. (2010). Lag time in water quality response to best management practices: A review. Journal of Environmental Quality, 39(1), 85–96.
[23] Nielsen, R. L., Camberato, J., & Lee, J. (2019). Yield response of corn to plant population in Indiana. Agronomy Department, Purdue University.
[24] De Bruin, J. L., & Pedersen, P. (2008). Effect of row spacing and seeding rate on soybean yield. Agronomy Journal, 100(3), 704-710.
[25] Mourtzinis, S., & Conley, S. P. (2017). Delineating soybean maturity groups across the United States. Agronomy Journal, 109(4), 1163-1784.
[26] Bastidas, A.M., Setiyono, T.D., Dobermann, A., Cassman, K.G., Elmore, R.W., Specht, J.E., & Graef, G.L. (2008). Soybean sowing date: The vegetative, reproductive, and agronomic impacts. Crop Science, 48(2), 727–740.
[27] Egli, D. B., & Cornelius, P. L. (2009). A regional analysis of the response of soybean yield to planting date. Agronomy Journal, 101, 330-335.
[28] Darby, H. M., & Lauer, J. G. (2002), Planting Date and Hybrid Influence on Corn Forage Yield and Quality. Agronomy Journal, 94, 281-289.
[29] Eck, H. V., Mathers, A. C., & Musick, J T. (1987). Plant water stress at various growth stages and growth and yield of soybeans. Field Crops Research, 17(1), 1-16.
[30] Boyer, J.S., Byrne, P., Cassman, K.G., Cooper, M., Delmer, D., Greene, T., et al. (2013). The U.S. drought of 2012 in perspective: A call to action. Global Food Security, 2(3), 139–143.
[31] Nielson, R. L., & Thomison, P. (2003). Delayed planting & hybrid maturity decisions. Purdue University Cooperative Extension Service, Corn, AY-312-W.
[32] Setiyono, T.D., Weiss, A., Specht, J., Bastidas, A.M., Cassman, K.G., & Dobermann, A. (2007). Understanding and modeling the effect of temperature and daylength on soybean phenology under high-yield conditions. Field Crops Research, 100(2–3), 257–271.
[33] Zinn, K.E., Tunc-Ozdemir, M., & Harper, J.F. (2010). Temperature stress and plant sexual reproduction: Uncovering the weakest links. Journal of Experimental Botany, 61(7), 1959–1968.
[34] Horton, R., et al. (2014). Chapter 16: Northeast. In Climate Change Impacts in the United States: The Third National Climate Assessment. U.S. Global Change Research Program.
[35] Soil Survey Staff. (2022). Keys to soil taxonomy, 13th Edition. USDA Natural Resources Conservation Service.
[36] Sadras, V. O., & Calderini, D. F. (2015). Crop physiology: Applications for genetic improvement and agronomy, 2nd Edition. Academic Press.
[37] Obalum, S. E., Chibuike, G. U., Peth, S., & Ouyang, Y., (2017). Soil organic matter as sole indicator of soil degradation. Environmental Monitoring and Assessment, 189:176.
[38] Weil, R.R., & Brady, N.C. (2016). The Nature and Properties of Soils (15th ed.). Pearson Education, Inc., New York, NY.
[39] Staver, K.W., & Brinsfield, R. B. (2001). Agriculture and Water Quality on the Maryland Eastern Shore: Where Do We Go from Here? BioScience, 51(10), 859-868.
[40] Turner, J. S., Friedrichs, C. T., Parrish, D. B., & Fall, K. A. (2026). Chesapeake Bay water clarity: Challenges and successes. Annual Review of Marine Science, 18(1), 89-119. https://doi.org/10.1146/annurevmarine-040224-120528
[41] Peng, B., Guan, K., Tang, J. et al. (2020). Towards a multiscale crop modelling framework for climate change adaptation assessment. Nature Plants, 6, 338–348. https://doi.org/10.1038/s41477-020-0625-3
[42] Milly, P.C.D., Betancourt, J., Falkenmark, M., Hirsch, R.M., Kundzewicz, Z.W., Lettenmaier, D.P., & Stouffer, R.J. (2008). Stationarity is dead: Whither water management? Science, 319(5863), 573–574.
[43] USDA-NASS. (2024). Crop Production Annual Summary. National Agricultural Statistics Service, U.S. Department of Agriculture. Washington, D.C.










