Skip to content

Commit a23503c

Browse files
authored
Fix variance gradient computation (#177)
* Initial fix * Tight test for variance derivatives and cleanup * Fix doc indentation * Remove dead code * Fix doc indentation * Allow dead code * Fix python var gradient test
1 parent 9dfa9c4 commit a23503c

File tree

9 files changed

+119
-64
lines changed

9 files changed

+119
-64
lines changed

doe/src/lhs.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ impl<F: Float, R: Rng + Clone> SamplingMethod<F> for Lhs<F, R> {
8989
impl<F: Float, R: Rng + Clone> Lhs<F, R> {
9090
/// Constructor with given design space and random generator.
9191
/// * `xlimits`: (nx, 2) matrix where nx is the dimension of the samples and the ith row
92-
/// is the definition interval of the ith component of x.
92+
/// is the definition interval of the ith component of x.
9393
/// * `rng`: random generator used for [LhsKind::Classic] and [LhsKind::Optimized] LHS
9494
pub fn new_with_rng(xlimits: &ArrayBase<impl Data<Elem = F>, Ix2>, rng: R) -> Self {
9595
if xlimits.ncols() != 2 {

doe/src/traits.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pub trait SamplingMethod<F: Float> {
2323
/// # Returns
2424
///
2525
/// * A (ns, nx) matrix of samples where nx is the dimension of the sample space
26-
/// each sample belongs to `[0., 1.]^nx` hypercube
26+
/// each sample belongs to `[0., 1.]^nx` hypercube
2727
fn normalized_sample(&self, ns: usize) -> Array2<F>;
2828

2929
/// Generates a (ns, nx)-shaped array of samples belonging to `[lower_bound_xi, upper_bound_xi]^nx`
@@ -35,8 +35,8 @@ pub trait SamplingMethod<F: Float> {
3535
/// # Returns
3636
///
3737
/// * A (ns, nx) matrix where nx is the dimension of the sample space.
38-
/// each sample belongs to `[lower_bound_xi, upper_bound_xi]^nx` where bounds
39-
/// are defined as returned values of `sampling_space` function.
38+
/// each sample belongs to `[lower_bound_xi, upper_bound_xi]^nx` where bounds
39+
/// are defined as returned values of `sampling_space` function.
4040
fn sample(&self, ns: usize) -> Array2<F> {
4141
let xlimits = self.sampling_space();
4242
let lower = xlimits.column(0);

ego/src/lib.rs

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,8 @@
9696
//! Some of the most useful options are:
9797
//!
9898
//! * Specification of the size of the initial DoE. The default is nx+1 where nx is the dimension of x.
99-
//! If your objective function is not expensive you can take `3*nx` to help the optimizer
100-
//! approximating your objective function.
99+
//! If your objective function is not expensive you can take `3*nx` to help the optimizer
100+
//! approximating your objective function.
101101
//!
102102
//! ```no_run
103103
//! # use egobox_ego::{EgorConfig};
@@ -108,26 +108,26 @@
108108
//! You can also provide your initial doe though the `egor.doe(your_doe)` method.
109109
//!
110110
//! * As the dimension increase the gaussian process surrogate building may take longer or even fail
111-
//! in this case you can specify a PLS dimension reduction \[[Bartoli2019](#Bartoli2019)\].
112-
//! Gaussian process will be built using the `ndim` (usually 3 or 4) main components in the PLS projected space.
111+
//! in this case you can specify a PLS dimension reduction \[[Bartoli2019](#Bartoli2019)\].
112+
//! Gaussian process will be built using the `ndim` (usually 3 or 4) main components in the PLS projected space.
113113
//!
114114
//! ```no_run
115115
//! # let egor_config = egobox_ego::EgorConfig::default();
116116
//! egor_config.kpls_dim(3);
117117
//! ```
118118
//!
119119
//! * Specifications of constraints (expected to be negative at the end of the optimization)
120-
//! In this example below we specify that 2 constraints will be computed with the objective values meaning
121-
//! the objective function is expected to return an array '\[nsamples, 1 obj value + 2 const values\]'.
120+
//! In this example below we specify that 2 constraints will be computed with the objective values meaning
121+
//! the objective function is expected to return an array '\[nsamples, 1 obj value + 2 const values\]'.
122122
//!
123123
//! ```no_run
124124
//! # let egor_config = egobox_ego::EgorConfig::default();
125125
//! egor_config.n_cstr(2);
126126
//! ```
127127
//!
128128
//! * If the default infill strategy (WB2, Watson and Barnes 2nd criterion),
129-
//! you can switch for either EI (Expected Improvement) or WB2S (scaled version of WB2).
130-
//! See \[[Priem2019](#Priem2019)\]
129+
//! you can switch for either EI (Expected Improvement) or WB2S (scaled version of WB2).
130+
//! See \[[Priem2019](#Priem2019)\]
131131
//!
132132
//! ```no_run
133133
//! # use egobox_ego::{EgorConfig, InfillStrategy};
@@ -136,9 +136,9 @@
136136
//! ```
137137
//!
138138
//! * The default gaussian process surrogate is parameterized with a constant trend and a squared exponential correlation kernel, also
139-
//! known as Kriging. The optimizer use such surrogates to approximate objective and constraint functions. The kind of surrogate
140-
//! can be changed using `regression_spec` and `correlation_spec()` methods to specify trend and kernels tested to get the best
141-
//! approximation (quality tested through cross validation).
139+
//! known as Kriging. The optimizer use such surrogates to approximate objective and constraint functions. The kind of surrogate
140+
//! can be changed using `regression_spec` and `correlation_spec()` methods to specify trend and kernels tested to get the best
141+
//! approximation (quality tested through cross validation).
142142
//!
143143
//! ```no_run
144144
//! # use egobox_ego::{EgorConfig, RegressionSpec, CorrelationSpec};

ego/src/utils/sort_axis.rs

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -166,25 +166,6 @@ where
166166
}
167167
}
168168

169-
#[cfg(feature = "std")]
170-
fn main() {
171-
let a = Array::linspace(0., 63., 64).into_shape((8, 8)).unwrap();
172-
let strings = a.map(|x| x.to_string());
173-
174-
let perm = a.sort_axis_by(Axis(1), |i, j| a[[i, 0]] > a[[j, 0]]);
175-
println!("{:?}", perm);
176-
let b = a.permute_axis(Axis(0), &perm);
177-
println!("{:?}", b);
178-
179-
println!("{:?}", strings);
180-
let c = strings.permute_axis(Axis(1), &perm);
181-
println!("{:?}", c);
182-
}
183-
184-
#[cfg(not(feature = "std"))]
185-
#[allow(dead_code)]
186-
fn main() {}
187-
188169
#[cfg(test)]
189170
mod tests {
190171
use super::*;

gp/src/algorithm.rs

Lines changed: 90 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -85,21 +85,21 @@ impl<F: Float> Clone for GpInnerParams<F> {
8585
/// * `regr(x)` a vector of polynomial basis functions
8686
/// * `sigma^2` is the process variance
8787
/// * `corr(x, x')` is a correlation function which depends on `distance(x, x')`
88-
/// and a set of unknown parameters `thetas` to be determined.
88+
/// and a set of unknown parameters `thetas` to be determined.
8989
///
9090
/// # Implementation
9191
///
9292
/// * Based on [ndarray](https://github.com/rust-ndarray/ndarray)
93-
/// and [linfa](https://github.com/rust-ml/linfa) and strive to follow [linfa guidelines](https://github.com/rust-ml/linfa/blob/master/CONTRIBUTE.md)
93+
/// and [linfa](https://github.com/rust-ml/linfa) and strive to follow [linfa guidelines](https://github.com/rust-ml/linfa/blob/master/CONTRIBUTE.md)
9494
/// * GP mean model can be constant, linear or quadratic
9595
/// * GP correlation model can be build the following kernels: squared exponential, absolute exponential, matern 3/2, matern 5/2
96-
/// cf. [SMT Kriging](https://smt.readthedocs.io/en/latest/_src_docs/surrogate_models/krg.html)
96+
/// cf. [SMT Kriging](https://smt.readthedocs.io/en/latest/_src_docs/surrogate_models/krg.html)
9797
/// * For high dimensional problems, the classic GP algorithm does not perform well as
98-
/// it depends on the inversion of a correlation (n, n) matrix which is an O(n3) operation.
99-
/// To work around this problem the library implements dimension reduction using
100-
/// Partial Least Squares method upon Kriging method also known as KPLS algorithm (see Reference)
98+
/// it depends on the inversion of a correlation (n, n) matrix which is an O(n3) operation.
99+
/// To work around this problem the library implements dimension reduction using
100+
/// Partial Least Squares method upon Kriging method also known as KPLS algorithm (see Reference)
101101
/// * GP models can be saved and loaded using [serde](https://serde.rs/).
102-
/// See `serializable` feature section below.
102+
/// See `serializable` feature section below.
103103
///
104104
/// # Features
105105
///
@@ -517,19 +517,18 @@ impl<F: Float, Mean: RegressionModel<F>, Corr: CorrelationModel<F>> GaussianProc
517517
let x = &(x.to_owned().insert_axis(Axis(0)));
518518
let xnorm = (x - &self.xt_norm.mean) / &self.xt_norm.std;
519519
let dx = pairwise_differences(&xnorm, &self.xt_norm.data);
520-
521520
let sigma2 = self.inner_params.sigma2;
522521
let r_chol = &self.inner_params.r_chol;
523522

524523
let r = self.params.corr.value(&dx, &self.theta, &self.w_star);
525524
let dr =
526525
self.params
527526
.corr
528-
.jacobian(&xnorm.row(0), &self.xt_norm.data, &self.theta, &self.w_star)
529-
/ &self.xt_norm.std.to_owned().insert_axis(Axis(0));
527+
.jacobian(&xnorm.row(0), &self.xt_norm.data, &self.theta, &self.w_star);
530528

531529
// rho1 = Rc^-1 . r(x, X)
532530
let rho1 = r_chol.solve_triangular(&r, UPLO::Lower).unwrap();
531+
533532
// inv_kr = Rc^t^-1 . Rc^-1 . r(x, X) = R^-1 . r(x, X)
534533
let inv_kr = r_chol.t().solve_triangular(&rho1, UPLO::Upper).unwrap();
535534

@@ -569,12 +568,11 @@ impl<F: Float, Mean: RegressionModel<F>, Corr: CorrelationModel<F>> GaussianProc
569568

570569
// p4 = (B^-1 . A)^t . dA/dx^t = A^t . B^-1 . dA/dx^t = p3
571570
let p4 = d_mat.t().dot(&d_a.t());
572-
573571
let two = F::cast(2.);
574-
let prime_t = (-p2 + p4).mapv(|v| two * v).t().to_owned();
572+
let prime = (p4 - p2).mapv(|v| two * v);
575573

576574
let x_std = &self.xt_norm.std;
577-
let dvar = (prime_t / x_std).mapv(|v| v * sigma2);
575+
let dvar = (prime / x_std).mapv(|v| v * sigma2);
578576
dvar.row(0).into_owned()
579577
}
580578

@@ -693,6 +691,7 @@ where
693691
}
694692

695693
/// Gausssian Process adaptator to implement `linfa::Predict` trait for variance prediction.
694+
#[allow(dead_code)]
696695
pub struct GpVariancePredictor<'a, F, Mean, Corr>(&'a GaussianProcess<F, Mean, Corr>)
697696
where
698697
F: Float,
@@ -1091,7 +1090,7 @@ mod tests {
10911090
use super::*;
10921091
use approx::{assert_abs_diff_eq, assert_abs_diff_ne};
10931092
use argmin_testfunctions::rosenbrock;
1094-
use egobox_doe::{Lhs, SamplingMethod};
1093+
use egobox_doe::{Lhs, LhsKind, SamplingMethod};
10951094
use linfa::prelude::Predict;
10961095
#[cfg(not(feature = "blas"))]
10971096
use linfa_linalg::norm::Norm;
@@ -1436,7 +1435,7 @@ mod tests {
14361435
paste! {
14371436

14381437
#[test]
1439-
fn [<test_gp_variance_derivatives_ $regr:snake _ $corr:snake>]() {
1438+
fn [<test_gp_variance_derivatives_ $regr:snake _ $corr:snake _ $func:snake>]() {
14401439
let mut rng = Xoshiro256Plus::seed_from_u64(42);
14411440
let xt = egobox_doe::Lhs::new(&array![[-$limit, $limit], [-$limit, $limit]]).with_rng(rng.clone()).sample($nt);
14421441
let yt = [<$func>](&xt);
@@ -1602,15 +1601,88 @@ mod tests {
16021601

16031602
fn assert_rel_or_abs_error(y_deriv: f64, fdiff: f64) {
16041603
println!("analytic deriv = {y_deriv}, fdiff = {fdiff}");
1605-
if fdiff.abs() < 2e-1 || y_deriv.abs() < 2e-1 {
1606-
let atol = 2e-1;
1604+
if fdiff.abs() < 6e-1 {
1605+
let atol = 6e-1;
16071606
println!("Check absolute error: abs({y_deriv}) should be < {atol}");
16081607
assert_abs_diff_eq!(y_deriv, 0.0, epsilon = atol); // check absolute when close to zero
16091608
} else {
1610-
let rtol = 3e-1;
1609+
let rtol = 6e-1;
16111610
let rel_error = (y_deriv - fdiff).abs() / fdiff.abs(); // check relative
16121611
println!("Check relative error: {rel_error} should be < {rtol}");
16131612
assert_abs_diff_eq!(rel_error, 0.0, epsilon = rtol);
16141613
}
16151614
}
1615+
1616+
fn sin_linear(x: &Array2<f64>) -> Array2<f64> {
1617+
// sin + linear trend
1618+
let x1 = x.column(0).to_owned().mapv(|v| v.sin());
1619+
let x2 = x.column(0).mapv(|v| 2. * v) + x.column(1).mapv(|v| 5. * v);
1620+
(x1 + x2)
1621+
.mapv(|v| v + 10.)
1622+
.into_shape((x.nrows(), 1))
1623+
.unwrap()
1624+
}
1625+
1626+
#[test]
1627+
fn test_bug_var_derivatives() {
1628+
let _xt = egobox_doe::Lhs::new(&array![[-5., 10.], [-5., 10.]])
1629+
.kind(LhsKind::Centered)
1630+
.sample(12);
1631+
let _yt = sin_linear(&_xt);
1632+
1633+
let xt = array![
1634+
[6.875, -4.375],
1635+
[-3.125, 1.875],
1636+
[1.875, -1.875],
1637+
[-4.375, 3.125],
1638+
[8.125, 9.375],
1639+
[4.375, 4.375],
1640+
[0.625, 0.625],
1641+
[9.375, 6.875],
1642+
[5.625, 8.125],
1643+
[-0.625, -3.125],
1644+
[3.125, 5.625],
1645+
[-1.875, -0.625]
1646+
];
1647+
let yt = array![
1648+
[2.43286801],
1649+
[13.10840811],
1650+
[5.32908578],
1651+
[17.81862219],
1652+
[74.08849877],
1653+
[39.68137781],
1654+
[14.96009727],
1655+
[63.17475741],
1656+
[61.26331775],
1657+
[-7.46009727],
1658+
[44.39159189],
1659+
[2.17091422],
1660+
];
1661+
1662+
let gp = GaussianProcess::<f64, ConstantMean, SquaredExponentialCorr>::params(
1663+
ConstantMean::default(),
1664+
SquaredExponentialCorr::default(),
1665+
)
1666+
.theta_tuning(ThetaTuning::Fixed(vec![0.0437386, 0.00697978]))
1667+
.fit(&Dataset::new(xt, yt))
1668+
.expect("GP fitting");
1669+
1670+
let e = 5e-6;
1671+
let xa = -1.3;
1672+
let xb = 2.5;
1673+
let x = array![
1674+
[xa, xb],
1675+
[xa + e, xb],
1676+
[xa - e, xb],
1677+
[xa, xb + e],
1678+
[xa, xb - e]
1679+
];
1680+
let y_pred = gp.predict_var(&x).unwrap();
1681+
let y_deriv = gp.predict_var_gradients(&array![[xa, xb]]);
1682+
let diff_g = (y_pred[[1, 0]] - y_pred[[2, 0]]) / (2. * e);
1683+
let diff_d = (y_pred[[3, 0]] - y_pred[[4, 0]]) / (2. * e);
1684+
1685+
assert_abs_diff_eq!(y_deriv[[0, 0]], diff_g, epsilon = 1e-5);
1686+
assert_abs_diff_eq!(y_deriv[[0, 1]], diff_d, epsilon = 1e-5);
1687+
}
16161688
}

gp/src/sparse_algorithm.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,7 @@ where
377377
}
378378

379379
/// Sparse Gausssian Process adaptator to implement `linfa::Predict` trait for variance prediction.
380+
#[allow(dead_code)]
380381
pub struct SparseGpVariancePredictor<'a, F, Corr>(&'a SparseGaussianProcess<F, Corr>)
381382
where
382383
F: Float,

moe/src/algorithm.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -917,6 +917,7 @@ impl<D: Data<Elem = f64>> PredictInplace<ArrayBase<D, Ix2>, Array2<f64>> for GpM
917917
}
918918

919919
/// Adaptator to implement `linfa::Predict` for variance prediction
920+
#[allow(dead_code)]
920921
pub struct MoeVariancePredictor<'a>(&'a GpMixture);
921922
impl<'a, D: Data<Elem = f64>> PredictInplace<ArrayBase<D, Ix2>, Array2<f64>>
922923
for MoeVariancePredictor<'a>

moe/src/lib.rs

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,23 +7,23 @@
77
//!
88
//! The recombination between the GP models can be either:
99
//! * `hard`: one GP model is being responsible to provide the predicted value
10-
//! at the given point. GP selection is done by taking the largest probability of the
11-
//! given point being part of the cluster corresponding to the expert GP.
12-
//! In hard mode, transition between models leads to discontinuity.
10+
//! at the given point. GP selection is done by taking the largest probability of the
11+
//! given point being part of the cluster corresponding to the expert GP.
12+
//! In hard mode, transition between models leads to discontinuity.
1313
//! * `smooth`: all GPs models are taken and their predicted values at a given point are
14-
//! weighted regarding their responsability (probability of the given point being part
15-
//! of the cluster corresponding to the expert GP). In this case the MoE model is continuous.
16-
//! The smoothness is automatically adjusted using a factor, the heaviside factor,
17-
//! which can also be set manually.
14+
//! weighted regarding their responsability (probability of the given point being part
15+
//! of the cluster corresponding to the expert GP). In this case the MoE model is continuous.
16+
//! The smoothness is automatically adjusted using a factor, the heaviside factor,
17+
//! which can also be set manually.
1818
//!
1919
//! # Implementation
2020
//!
2121
//! * Clusters are defined by clustering the training data with
22-
//! [linfa-clustering](https://docs.rs/linfa-clustering/latest/linfa_clustering/)
23-
//! gaussian mixture model.
22+
//! [linfa-clustering](https://docs.rs/linfa-clustering/latest/linfa_clustering/)
23+
//! gaussian mixture model.
2424
//! * This library is a port of the
25-
//! [SMT MoE method](https://smt.readthedocs.io/en/latest/_src_docs/applications/moe.html)
26-
//! using egobox GP models as experts.
25+
//! [SMT MoE method](https://smt.readthedocs.io/en/latest/_src_docs/applications/moe.html)
26+
//! using egobox GP models as experts.
2727
//! * It leverages on the egobox GP PLS reduction feature to handle high dimensional problems.
2828
//! * MoE trained model can be save to disk and reloaded. See
2929
//!

python/egobox/tests/test_gpmix.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def test_gpx_kriging(self):
5050
1.1204, gpx.predict_gradients(np.array([[1.1]])).item(), delta=1e-3
5151
)
5252
self.assertAlmostEqual(
53-
0.0092, gpx.predict_var_gradients(np.array([[1.1]])).item(), delta=1e-3
53+
0.0145, gpx.predict_var_gradients(np.array([[1.1]])).item(), delta=1e-3
5454
)
5555

5656
def test_gpx_save_load(self):

0 commit comments

Comments
 (0)