Sampling Strategies for Training Machine Learning Emulators of Gravity Wave Parameterizations
Date:
This talk introduces a sampling strategy designed to overcome set imbalance in high dimensional datasets in regression tasks. In a case study of training emulators of a gravity wave parameterization scheme on a long-tail distributed dataset, we find that this strategy improves the errors at the tail of the distribution except at the extreme end, while maintaining minimal loss of accuracy at the peak of the distribution.