Add support for mixture distributions

StatMixedML · Aug 25, 2023 · c97e62a · c97e62a
1 parent 7701f57
commit c97e62a
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 2 deletions.
diff --git a/docs/dgbm.md b/docs/dgbm.md
@@ -44,7 +44,9 @@ Within the original distributional regression framework, the functions $f_{k}(\c
 
 Mixture densities or mixture distributions offer an extension to the notion of traditional univariate distributions by allowing the observed data to be thought of as arising from multiple underlying processes. In its essence, a mixture distribution is a weighted combination of several component distributions, where each component contributes to the overall mixture distribution, with the weights indicating the importance of each component. For instance, if you imagine the observed data distribution having multiple modes, a mixture of Gaussians could be employed to capture each mode with a separate Gaussian distribution. 
 
-<img align="middle" width="400" src="https://github.com/StatMixedML/XGBoostLSS/blob/master/docs/mixture.png">
+<center>
+<img src="mixture.png" width=400/>
+</center>
 
 For each component of the mixture, there would be a set of parameters that depend on covariates, and additional mixing coefficients which are also modeled as a function of covariates. This is particularly useful when a single parametric distribution cannot adequately capture the underlying data generating process. A mixture distribution can be represented as follows:
 

diff --git a/docs/examples/GaussianMixture_Regression_CaliforniaHousing.ipynb b/docs/examples/GaussianMixture_Regression_CaliforniaHousing.ipynb
@@ -30,7 +30,7 @@
     "f\\bigl(y_{i} | \\boldsymbol{\\theta}_{i}(x_{i})\\bigr) = \\sum_{m=1}^{M} w_{i,m}(x_{i}) \\cdot f_{m}\\bigl(y_{i} | \\boldsymbol{\\theta}_{i,m}(x_{i})\\bigr)\n",
     "\\end{equation}\n",
     "\n",
-    "where $f(\\cdot)$ represents the mixture density for the $i$-th observation, $f_{m}(\\cdot)$ is the $m$-th component density, each with its own set of parameters $\\boldsymbol{\\theta}_{i,m}(\\cdot)$, and $w_{i,m}(\\cdot)$ represent the weights of the $m$-th component in the mixture, subject to $\\sum_{j=1}^{M} w_{i,m} = 1$. The components can either be a combination of different parametric univariate distributions, such as a combination of a Normal and a StudentT, or, as in our implementation, a combination of the same distribution-type with different parameterizations, e.g., Gaussian-Mixture or StudentT-Mixture. The choice of the component distributions depends on the characteristics of the data and the underlying assumptions. Due to their high flexibility, mixture densities can portray a diverse range of shapes, making them adaptable to a plethora of datasets."
+    "where $f(\\cdot)$ represents the mixture density for the $i$-th observation, $f_{m}(\\cdot)$ is the $m$-th component density, each with its own set of parameters $\\boldsymbol{\\theta}_{i,m}(\\cdot)$, and $w_{i,m}(\\cdot)$ represent the weights of the $m$-th component in the mixture, subject to $\\sum_{j=1}^{M} w_{i,m} = 1$. The components can either be a combination of different parametric univariate distributions, such as a combination of a Normal and a StudentT, or, as in our implementation, a combination of the same distribution-type with different parameterizations, e.g., Gaussian-Mixture or StudentT-Mixture. The choice of the component distributions depends on the characteristics of the data and the underlying assumptions. Due to their high flexibility, mixture densities can portray a diverse range of shapes, making them adaptable to a plethora of datasets. "
    ]
   },
   {