An ex-colleague of ours suggested to split the range into [0, K) and [K, +inf) for some K (like…

Aug 27, 2023

An ex-colleague of ours suggested to split the range into [0, K) and [K, +inf) for some K (like 95-th percentile of the data), having a binary model to predict which range it is. And then using just MSE of raw value for the lower range, and modelling distribution for the higher range: e.g., predict both mean (with MSE) and variance of the log-value (assuming value has log-normal distribution). On inference, the latter model can be used for an unbiased estimate of the mean of value, by using both the predicted mean and variance in log-scale.

I'm not sure about splitting the range (and it's a separate idea), but modelling distribution instead of a point-estimate does make sense.

Written by Michael Roizner

No responses yet