diPLSlib package

diPLSlib.functions module

diPLSlib.functions.convex_relaxation(xs, xt)[source]

Perform convex relaxation of the covariance difference matrix.

This relaxation involves computing the eigenvalue decomposition of the symmetric covariance difference matrix, inverting the signs of negative eigenvalues, and reconstructing the matrix. This corresponds to an upper bound on the covariance difference between source and target domains.

Parameters:
xsndarray of shape (n_source_samples, n_features)

Feature data from the source domain.

xtndarray of shape (n_target_samples, n_features)

Feature data from the target domain.

Returns:
Dndarray of shape (n_features, n_features)

Relaxed covariance difference matrix.

References

Ramin Nikzad-Langerodi et al., “Domain-Invariant Regression under Beer-Lambert’s Law”, Proc. ICMLA, 2019.

Examples

>>> import numpy as np
>>> from diPLSlib.functions import convex_relaxation
>>> xs = np.random.random((100, 10))
>>> xt = np.random.random((100, 10))
>>> D = convex_relaxation(xs, xt)
diPLSlib.functions.dipals(x, y, xs, xt, A, l, heuristic: bool = False, target_domain=0, laplacian: bool = False)[source]

Perform (Multiple) Domain-Invariant Partial Least Squares (di-PLS) regression.

This method fits a PLS regression model using labeled source domain data and potentially unlabeled target domain data across multiple domains, aiming to build a model that generalizes well across different domains.

Parameters:
xndarray of shape (n_samples, n_features)

Labeled source domain data.

yndarray of shape (n_samples, 1)

Response variable associated with the source domain.

xsndarray of shape (n_source_samples, n_features)

Source domain feature data.

xtndarray of shape (n_target_samples, n_features) or list of ndarray

Target domain feature data. Multiple domains can be provided as a list.

Aint

Number of latent variables to use in the model.

lfloat or tuple of length A

Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.

heuristicbool, default=False

If True, automatically determine the regularization parameter to equally balance fitting Y and minimizing domain discrepancy.

target_domainint, default=0

Specifies which target domain the model should apply to, where 0 indicates the source domain.

laplacianbool, default=False

If True, uses a Laplacian matrix to regularize distances between matched calibration transfer samples in latent variable space.

Returns:
bndarray of shape (n_features, 1)

Regression coefficient vector.

Tndarray of shape (n_samples, A)

Training data projections (scores).

Tsndarray of shape (n_source_samples, A)

Source domain projections (scores).

Ttndarray of shape (n_target_samples, A) or list of ndarray

Target domain projections (scores).

Wndarray of shape (n_features, A)

Weight matrix.

Pndarray of shape (n_features, A)

Loadings matrix corresponding to x.

Psndarray of shape (n_features, A)

Loadings matrix corresponding to xs.

Ptndarray of shape (n_features, A) or list of ndarray

Loadings matrix corresponding to xt.

Endarray

Residuals of training data.

Esndarray

Source domain residual matrix.

Etndarray or list of ndarray

Target domain residual matrix.

Eyndarray

Residuals of response variable in the source domain.

Cndarray of shape (A, 1)

Regression vector relating source projections to the response variable.

opt_lndarray of shape (A,)

Heuristically determined regularization parameter for each latent variable.

discrepancyndarray of shape (A,)

The variance discrepancy between source and target domain projections.

References

  1. Ramin Nikzad-Langerodi et al., “Domain-Invariant Partial Least Squares Regression”, Analytical Chemistry, 2018.

  2. Ramin Nikzad-Langerodi et al., “Domain-Invariant Regression under Beer-Lambert’s Law”, Proc. ICMLA, 2019.

  3. Ramin Nikzad-Langerodi et al., “Domain adaptation for regression under Beer–Lambert’s law”, Knowledge-Based Systems, 2020.

    1. Mikulasek et al., “Partial least squares regression with multiple domains”, Journal of Chemometrics, 2023.

Examples

>>> import numpy as np
>>> from diPLSlib.functions import dipals
>>> x = np.random.random((100, 10))
>>> y = np.random.random((100, 1))
>>> xs = np.random.random((50, 10))
>>> xt = np.random.random((50, 10))
>>> b, T, Ts, Tt, W, P, Ps, Pt, E, Es, Et, Ey, C, opt_l, discrepancy = dipals(x, y, xs, xt, 2, 0.1)
diPLSlib.functions.edpls(x: ndarray, y: ndarray, n_components: int, epsilon: float, delta: float = 0.05, rng=None)[source]

(epsilon, delta)-Differentially Private Partial Least Squares Regression.

A Gaussian mechanism according to Balle & Wang (2018) is used to privately release weights \(\mathbf{W}\), scores \(\mathbf{T}\), and \(X/Y\)-loadings \(\mathbf{P}\)/\(\mathbf{c}\) from the PLS1 algorithm. For each latent variable, i.i.d. noise from \(\mathcal{N}(0,\sigma^2)\) with variance satisfying

\[\Phi\left( \frac{\Delta}{2\sigma} - \frac{\epsilon\sigma}{\Delta} \right) - e^{\epsilon} \Phi\left( -\frac{\Delta}{2\sigma} - \frac{\epsilon\sigma}{\Delta} \right)\leq \delta,\]

with \(\Phi(t) = \mathrm{P}[\mathcal{N}(0,1)\leq t]\) (i.e., the CDF of the standard univariate Gaussian distribution), is added to the weights, scores, and loadings, whereas the sensitivity \(\Delta(\cdot)\) for the functions releasing the corresponding quantities is calculated as follows:

\[\Delta(w) = \sup_{(\mathbf{x}, y)} |y| \|\mathbf{x}\|_2\]
\[\Delta(t) \leq \sup_{\mathbf{x}} \|\mathbf{x}\|_2\]
\[\Delta(p) \leq \sup_{\mathbf{x}} \|\mathbf{x}\|_2\]
\[\Delta(c) \leq \sup_{y} |y|.\]

Note that in contrast to the Gaussian mechanism proposed in Dwork et al. (2006) and Dwork et al. (2014), the mechanism of Balle & Wang (2018) guarantees \((\epsilon, \delta)\)-differential privacy for any value of \(\epsilon > 0\) and not only for \(\epsilon \leq 1\).

Parameters:
xndarray of shape (n_samples, n_features)

Input data.

yndarray of shape (n_samples, n_targets)

Target values.

n_componentsint

Number of latent variables.

epsilonfloat

Privacy loss parameter.

deltafloat, default=0.05

Failure probability.

rngnumpy.random.Generator, optional

Random number generator.

Returns:
coef_ndarray of shape (n_features, n_targets)

Regression coefficients.

x_weights_ndarray of shape (n_features, n_components)

X weights.

x_loadings_ndarray of shape (n_features, n_components)

X loadings.

y_loadings_ndarray of shape (n_components, n_targets)

Y loadings.

x_scores_ndarray of shape (n_samples, n_components)

X scores.

x_residuals_ndarray of shape (n_samples, n_features)

X residuals.

y_residuals_ndarray of shape (n_samples, n_targets)

Y residuals.

References

    1. Nikzad-Langerodi, et al. (2024). (epsilon,delta)-Differentially private partial least squares regression (unpublished).

  • Balle, B., & Wang, Y. X. (2018, July). Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (pp. 394-403). PMLR.

Examples

>>> from diPLSlib.functions import edpls
>>> import numpy as np
>>> x = np.random.rand(100, 10)
>>> y = np.random.rand(100, 1)
>>> coef_, x_weights_, x_loadings_, y_loadings_, x_scores_, x_residuals_, y_residuals_ = edpls(x, y, 2, epsilon=0.1, delta=0.05)
diPLSlib.functions.kdapls(x: ndarray, y: ndarray, xs: ndarray, xt, A: int, l, kernel_params: dict = {'gamma': 10, 'type': 'rbf'})[source]

Perform Kernel Domain Adaptive Partial Least Squares (kda-PLS) regression.

This method fits a Kernel PLS regression model using labeled source domain data and potentially unlabeled target domain data. In contrast to di-PLS, kda-PLS aligns the source and target distributions in a RKHS in a non-parametric way, thus making no assumptions about the underlying data distributions.

Mathematically, for each latent variable (LV), kda‐PLS finds a weight vector \(\mathbf{w}\) (with \(\mathbf{w}^T\mathbf{w} = 1\)) that maximizes

\[\max_{\mathbf{w} : \mathbf{w}^T\mathbf{w} = 1} \Biggl( \mathbf{w}^T K(X_s, X_s)^T Y Y^T K(X_s, X_s) \mathbf{w} - \gamma \mathbf{w}^T K(X_{st}, X_s)^T H L H K(X_{st}, X_s) \mathbf{w} \Biggr),\]

where

  • \(K(X_s, X_s)\) is the kernel matrix computed from the source-domain data,

  • \(K(X_{st}, X_s)\) is the kernel matrix computed between the combined source/target data \(X_{st} = [X_s; X_t]\) and the source-domain data,

  • \(Y\) is the response variable,

  • \(H\) denotes the centering matrix,

  • \(L\) is the Laplacian matrix defined such that \(L_{ij}=1\) if the i-th and j-th sample in \(X_{st}\) belong to the same domain and 0 otherwise,

  • \(\gamma\) is the regularization parameter that balances maximizing the covariance between \(K(X_s, X_s)\) and \(Y\) with minimizing the domain discrepancy.

Parameters:
xndarray of shape (n_samples, n_features)

Labeled source domain data.

yndarray of shape (n_samples, 1)

Response variable associated with the source domain.

xsndarray of shape (n_source_samples, n_features)

Source domain feature data.

xtndarray of shape (n_target_samples, n_features) or list of ndarray

Target domain feature data. Multiple domains can be provided as a list.

Aint

Number of latent variables to use in the model.

lfloat or tuple of length A

Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.

kernel_paramsdict, default={“type”: “rbf”, “gamma”: 10}

Kernel parameters. The dictionary must contain the following keys: - “type”: str, default=”rbf”

Type of kernel to use. Supported types are “rbf”, “linear”, and “primal”.

  • “gamma”: float, default=10

    Kernel coefficient for the RBF kernel.

Returns:
bndarray of shape (n_features, 1)

Regression coefficient vector.

bstndarray of shape (n_features, 1)

Regression coefficient vector for the target domain.

Tndarray of shape (n_samples, A)

Training data projections (scores).

Tstndarray of shape (n_source_samples + n_target_samples, A)

Source and target domain projections (scores).

Wndarray of shape (m, A)

Weight matrix.

Pndarray of shape (m, A)

Loadings matrix for source domain.

Pstndarray of shape (m, A)

Loadings matrix for source and target domains.

Endarray

Residuals for source domain.

Estndarray

Residuals for source and target domains.

Eyndarray

Residuals of response variable.

Cndarray of shape (A, q)

Regression vector relating projections to the response variable.

centeringdict

Dictionary containing centering information.

References

  1. Huang, G., Chen, X., Li, L., Chen, X., Yuan, L., & Shi, W. (2020). Domain adaptive partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 201, 103986.

Examples

>>> import numpy as np
>>> from diPLSlib.functions import kdapls
>>> x = np.random.random((100, 10))
>>> y = np.random.random((100, 1))
>>> xs = np.random.random((50, 10))
>>> xt = np.random.random((50, 10))
>>> b, bst, T, Tst, W, P, Pst, E, Est, Ey, C, centering = kdapls(x, y, xs, xt, 2, 0.5)
diPLSlib.functions.transfer_laplacian(x: ndarray, y: ndarray) ndarray[source]

Construct a Laplacian matrix for calibration transfer problems.

Parameters:
xndarray of shape (n_samples, n_features)

Data samples from device 1.

yndarray of shape (n_samples, n_features)

Data samples from device 2.

Returns:
Lndarray of shape (2 * n_samples, 2 * n_samples)

The Laplacian matrix for the calibration transfer problem.

References

Nikzad‐Langerodi, R., & Sobieczky, F. (2021). Graph‐based calibration transfer. Journal of Chemometrics, 35(4), e3319.

Examples

>>> import numpy as np
>>> from diPLSlib.functions import transfer_laplacian
>>> x = np.array([[1, 2], [3, 4]])
>>> y = np.array([[2, 3], [4, 5]])
>>> L = transfer_laplacian(x, y)
>>> print(L)
[[ 1.  0. -1. -0.]
 [ 0.  1. -0. -1.]
 [-1. -0.  1.  0.]
 [-0. -1.  0.  1.]]

diPLSlib.models module

diPLSlib model classes

  • DIPLS base class

  • GCTPLS class

  • EDPLS class

  • KDAPLS class

class diPLSlib.models.DIPLS(A=2, l=0, centering=True, heuristic=False, target_domain=0, rescale='Target')[source]

Bases: RegressorMixin, BaseEstimator

Domain-Invariant Partial Least Squares (DIPLS) algorithm for domain adaptation.

This class implements the DIPLS algorithm, which is designed to align feature distributions across different domains while predicting the target variable y. It supports multiple source and target domains through domain-specific feature transformations.

Parameters:
Aint, default=2

Number of latent variables to use in the model.

lfloat or tuple of length A, default=0

Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.

centeringbool, default=True

If True, source and target domain data are mean-centered.

heuristicbool, default=False

If True, the regularization parameter is set to a heuristic value that balances fitting the output variable y and minimizing domain discrepancy.

target_domainint, default=0

If multiple target domains are passed, target_domain specifies for which of the target domains the model should apply. If target_domain=0, the model applies to the source domain, if target_domain=1, it applies to the first target domain, and so on.

rescalestr or ndarray, default=’Target’

Determines rescaling of the test data. If ‘Target’ or ‘Source’, the test data will be rescaled to the mean of xt or xs, respectively. If an ndarray is provided, the test data will be rescaled to the mean of the provided array.

Attributes:
n_int

Number of samples in X.

ns_int

Number of samples in xs.

nt_int

Number of samples in xt.

n_features_in_int

Number of features in X.

mu_ndarray of shape (n_features,)

Mean of columns in X.

mu_s_ndarray of shape (n_features,)

Mean of columns in xs.

mu_t_ndarray of shape (n_features,) or list of ndarray

Mean of columns in xt, averaged per target domain if multiple domains exist.

b_ndarray of shape (n_features, 1)

Regression coefficient vector.

b0_float

Intercept of the regression model.

T_ndarray of shape (n_samples, A)

Training data projections (scores).

Ts_ndarray of shape (n_source_samples, A)

Source domain projections (scores).

Tt_ndarray of shape (n_target_samples, A) or list of ndarray

Target domain projections (scores).

W_ndarray of shape (n_features, A)

Weight matrix.

P_ndarray of shape (n_features, A)

Loadings matrix corresponding to X.

Ps_ndarray of shape (n_features, A)

Loadings matrix corresponding to xs.

Pt_ndarray of shape (n_features, A) or list of ndarray

Loadings matrix corresponding to xt.

E_ndarray

Residuals of training data.

Es_ndarray

Source domain residual matrix.

Et_ndarray or list of ndarray

Target domain residual matrix.

Ey_ndarray

Residuals of response variable in the source domain.

C_ndarray of shape (A, 1)

Regression vector relating source projections to the response variable.

opt_l_ndarray of shape (A,)

Heuristically determined regularization parameter for each latent variable.

discrepancy_ndarray of shape (A,)

The variance discrepancy between source and target domain projections.

is_fitted_bool

Whether the model has been fitted to data.

Methods

fit(X, y[, xs, xt])

Fit the DIPLS model.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict y using the fitted DIPLS model.

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_fit_request(*[, xs, xt])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

References

  1. Ramin Nikzad-Langerodi et al., “Domain-Invariant Partial Least Squares Regression”, Analytical Chemistry, 2018.

  2. Ramin Nikzad-Langerodi et al., “Domain-Invariant Regression under Beer-Lambert’s Law”, Proc. ICMLA, 2019.

  3. Ramin Nikzad-Langerodi et al., “Domain adaptation for regression under Beer–Lambert’s law”, Knowledge-Based Systems, 2020.

    1. Mikulasek et al., “Partial least squares regression with multiple domains”, Journal of Chemometrics, 2023.

Examples

>>> import numpy as np
>>> from diPLSlib.models import DIPLS
>>> x = np.random.rand(100, 10)
>>> y = np.random.rand(100, 1)
>>> xs = np.random.rand(100, 10)
>>> xt = np.random.rand(50, 10)
>>> model = DIPLS(A=5, l=10)
>>> model.fit(x, y, xs, xt)
DIPLS(A=5, l=10)
>>> xtest = np.array([5, 7, 4, 3, 2, 1, 6, 8, 9, 10]).reshape(1, -1)
>>> yhat = model.predict(xtest)
fit(X, y, xs=None, xt=None, **kwargs)[source]

Fit the DIPLS model.

This method fits the domain-invariant partial least squares (di-PLS) model using the provided source and target domain data. It can handle both single and multiple target domains.

Parameters:
Xndarray of shape (n_samples, n_features)

Labeled input data from the source domain.

yndarray of shape (n_samples, 1)

Response variable corresponding to the input data x.

xsndarray of shape (n_samples_source, n_features)

Source domain X-data. If not provided, defaults to X.

xtUnion[ndarray of shape (n_samples_target, n_features), List[ndarray]]

Target domain X-data. Can be a single target domain or a list of arrays representing multiple target domains. If not provided, defaults to X.

**kwargsdict, optional

Additional keyword arguments to pass to the model (e.g., for model selection purposes).

Returns:
selfobject

Fitted model instance.

predict(X)[source]

Predict y using the fitted DIPLS model.

This method predicts the response variable for the provided test data using the fitted domain-invariant partial least squares (di-PLS) model.

Parameters:
Xndarray of shape (n_samples, n_features)

Test data matrix to perform the prediction on.

Returns:
yhatndarray of shape (n_samples_test,)

Predicted response values for the test data.

set_fit_request(*, xs: bool | None | str = '$UNCHANGED$', xt: bool | None | str = '$UNCHANGED$') DIPLS

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
xsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for xs parameter in fit.

xtstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for xt parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DIPLS

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class diPLSlib.models.EDPLS(A: int = 2, epsilon: float = 1.0, delta: float = 0.05, centering: bool = True, random_state=None)[source]

Bases: DIPLS

(epsilon, delta)-Differentially Private Partial Least Squares Regression.

This class implements the (epsilon, delta)-Differentially Private Partial Least Squares (PLS) regression method by Nikzad-Langerodi et al. (2024, unpublished).

Parameters:
Aint, default=2

Number of latent variables.

epsilonfloat, default=1.0

Privacy loss parameter.

deltafloat, default=0.05

Failure probability.

centeringbool, default=True

If True, the data will be centered before fitting the model.

random_stateint, RandomState instance or None, default=None

Controls the randomness of the noise added for differential privacy.

Attributes:
n_int

Number of samples in the training data.

n_features_in_int

Number of features in the training data.

x_mean_ndarray of shape (n_features,)

Estimated mean of each feature.

coef_ndarray of shape (n_features, 1)

Estimated regression coefficients.

y_mean_float

Estimated intercept.

x_scores_ndarray of shape (n_samples, A)

X scores.

x_loadings_ndarray of shape (n_features, A)

X loadings.

x_weights_ndarray of shape (n_features, A)

X weights.

y_loadings_ndarray of shape (A, 1)

Y loadings.

x_residuals_ndarray of shape (n_samples, n_features)

X residuals.

y_residuals_ndarray of shape (n_samples, 1)

Y residuals.

is_fitted_bool

True if the model has been fitted.

Methods

fit(X, y, **kwargs)

Fit the EDPLS model.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(x)

Predict y using the fitted EDPLS model.

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_fit_request(*[, xs, xt])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_predict_request(*[, x])

Request metadata passed to the predict method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

References

    1. Nikzad-Langerodi, et al. (2024). (epsilon,delta)-Differentially private partial least squares regression (unpublished).

  • Balle, B., & Wang, Y. X. (2018, July). Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (pp. 394-403). PMLR.

Examples

>>> from diPLSlib.models import EDPLS
>>> import numpy as np
>>> x = np.random.rand(100, 10)
>>> y = np.random.rand(100, 1)
>>> model = EDPLS(A=5, epsilon=0.1, delta=0.01)
>>> model.fit(x, y)
EDPLS(A=5, delta=0.01, epsilon=0.1)
>>> xtest = np.array([5, 7, 4, 3, 2, 1, 6, 8, 9, 10]).reshape(1, -1)
>>> yhat = model.predict(xtest)
fit(X: ndarray, y: ndarray, **kwargs)[source]

Fit the EDPLS model.

Parameters:
Xarray, shape (n_samples, n_features)

Training data.

yarray, shape (n_samples,)

Target values.

**kwargsdict, optional

Additional keyword arguments to pass to the model (e.g., for model selection purposes).

Returns:
selfobject

Fitted model instance.

predict(x: ndarray)[source]

Predict y using the fitted EDPLS model.

Parameters


x: numpy array of shape (n_samples_test, n_features)

Test data matrix to perform the prediction on.

Returns:
yhat: numpy array of shape (n_samples_test, )

Predicted response values for the test data.

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') EDPLS

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for x parameter in predict.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') EDPLS

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class diPLSlib.models.GCTPLS(A=2, l=0, centering=True, heuristic=False, rescale='Target')[source]

Bases: DIPLS

Graph-based Calibration Transfer Partial Least Squares (GCT-PLS).

This method minimizes the distance between source (xs) and target (xt) domain data pairs in the latent variable space while fitting the response.

Parameters:
Aint, default=2

Number of latent variables to use in the model.

lfloat or tuple of length A, default=0

Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.

centeringbool, default=True

If True, source and target domain data are mean-centered before fitting.

heuristicbool, default=False

If True, the regularization parameter is set to a heuristic value aimed at balancing model fitting quality for the response variable y while minimizing discrepancies between domain representations.

rescalestr or ndarray, default=’Target’

Determines rescaling of the test data. If ‘Target’ or ‘Source’, the test data will be rescaled to the mean of xt or xs, respectively. If an ndarray is provided, the test data will be rescaled to the mean of the provided array.

Attributes:
n_int

Number of samples in X.

ns_int

Number of samples in xs.

nt_int

Number of samples in xt.

n_features_in_int

Number of features in X.

mu_ndarray of shape (n_features,)

Mean of columns in X.

mu_s_ndarray of shape (n_features,)

Mean of columns in xs.

mu_t_ndarray of shape (n_features,)

Mean of columns in xt.

b_ndarray of shape (n_features, 1)

Regression coefficient vector.

b0_float

Intercept of the regression model.

T_ndarray of shape (n_samples, A)

Training data projections (scores).

Ts_ndarray of shape (n_source_samples, A)

Source domain projections (scores).

Tt_ndarray of shape (n_target_samples, A)

Target domain projections (scores).

W_ndarray of shape (n_features, A)

Weight matrix.

P_ndarray of shape (n_features, A)

Loadings matrix corresponding to X.

Ps_ndarray of shape (n_features, A)

Loadings matrix corresponding to xs.

Pt_ndarray of shape (n_features, A)

Loadings matrix corresponding to xt.

E_ndarray of shape (n_source_samples, n_features)

Residuals of source domain data.

Es_ndarray of shape (n_source_samples, n_features)

Source domain residual matrix.

Et_ndarray of shape (n_target_samples, n_features)

Target domain residual matrix.

Ey_ndarray of shape (n_source_samples, 1)

Residuals of response variable in the source domain.

C_ndarray of shape (A, 1)

Regression vector relating source projections to the response variable.

opt_l_ndarray of shape (A,)

Heuristically determined regularization parameter for each latent variable.

discrepancy_ndarray

The variance discrepancy between source and target domain projections.

is_fitted_bool

Whether the model has been fitted to data.

Methods

fit(X, y[, xs, xt])

Fit the GCT-PLS model to data.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict y using the fitted DIPLS model.

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_fit_request(*[, xs, xt])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

References

Nikzad‐Langerodi, R., & Sobieczky, F. (2021). Graph‐based calibration transfer. Journal of Chemometrics, 35(4), e3319.

Examples

>>> import numpy as np
>>> from diPLSlib.models import GCTPLS
>>> x = np.random.rand(100, 10)
>>> y = np.random.rand(100, 1)
>>> xs = np.random.rand(80, 10)
>>> xt = np.random.rand(80, 10)
>>> model = GCTPLS(A=3, l=(2, 5, 7))
>>> model.fit(x, y, xs, xt)
GCTPLS(A=3, l=(2, 5, 7))
>>> xtest = np.array([5, 7, 4, 3, 2, 1, 6, 8, 9, 10]).reshape(1, -1)
>>> yhat = model.predict(xtest)
fit(X, y, xs=None, xt=None, **kwargs)[source]

Fit the GCT-PLS model to data.

Parameters:
xndarray of shape (n_samples, n_features)

Labeled input data from the source domain.

yndarray of shape (n_samples, 1)

Response variable corresponding to the input data x.

xsndarray of shape (n_sample_pairs, n_features)

Source domain X-data. If not provided, defaults to X.

xtndarray of shape (n_sample_pairs, n_features)

Target domain X-data. If not provided, defaults to X.

**kwargsdict, optional

Additional keyword arguments to pass to the model (e.g., for model selection purposes).

Returns:
selfobject

Fitted model instance.

set_fit_request(*, xs: bool | None | str = '$UNCHANGED$', xt: bool | None | str = '$UNCHANGED$') GCTPLS

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
xsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for xs parameter in fit.

xtstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for xt parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GCTPLS

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class diPLSlib.models.KDAPLS(A=2, l=0, kernel_params=None, target_domain=0)[source]

Bases: RegressorMixin, BaseEstimator

Kernel Domain Adaptive Partial Least Squares (KDAPLS) algorithm for domain adaptation.

This class implements KDAPLS by calling the kdapls function from functions.py. KDAPLS projects both source and target data into a reproducing kernel Hilbert space (RKHS) and aligns domains in that space while fitting the regression model on labeled data.

Parameters:
Aint, default=2

Number of latent variables to use in the model.

lfloat or tuple, default=0

Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.

kernel_paramsdict, optional

Dictionary specifying the kernel type and parameters. Accepted keys: - “type” : str, default=”rbf”

Kernel type, can be “rbf”, “linear”, or “primal”.

  • “gamma”float, default=0.0001

    Kernel coefficient for RBF kernels.

target_domainint, default=0

Specifies which domain’s coefficient vector is used for predictions.

Attributes:
n_int

Number of samples in X.

n_features_in_int

Number of features in X.

ns_int

Number of samples in xs.

nt_int or list

Number of samples in xt. If multiple target domains are provided, this is a list of sample counts for each domain.

coef_ndarray of shape (n_features, 1)

Regression coefficient vector used for predictions.

X_ndarray of shape (n_, n_features_in_)

Training data used for fitting the model.

xs_ndarray of shape (ns_, n_features_in_)

(Unlabeled) source domain data used for fitting the model.

xt_ndarray of shape (nt_, n_features_in_)

(Unlabeled) target domain data used for fitting the model.

y_mean_float

Mean of the training response variable.

centering_dict

Dictionary of stored centering information for kernel operations.

is_fitted_bool

Whether the model has been fitted to data.

Methods

fit(X, y[, xs, xt])

Fit the KDAPLS Model.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict with KDAPLS model.

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_fit_request(*[, xs, xt])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

References

  1. Huang, G., Chen, X., Li, L., Chen, X., Yuan, L., & Shi, W. (2020). Domain adaptive partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 201, 103986.

  2. B. Schölkopf, A. Smola, and K. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299-1319, 1998.

Examples

>>> import numpy as np
>>> from diPLSlib.models import KDAPLS
>>> x = np.random.rand(100, 10)
>>> y = np.random.rand(100, 1)
>>> xs = np.random.rand(80, 10)
>>> xt = np.random.rand(50, 10)
>>> model = KDAPLS(A=2, l=0.5, kernel_params={"type": "rbf", "gamma": 0.001})
>>> model.fit(x, y, xs, xt)
KDAPLS(kernel_params={'gamma': 0.001, 'type': 'rbf'}, l=0.5)
>>> xtest = np.random.rand(5, 10)
>>> yhat = model.predict(xtest)
fit(X, y, xs=None, xt=None, **kwargs)[source]

Fit the KDAPLS Model.

Parameters:
Xnp.ndarray

Labeled source domain data (usually the same as xs).

ynp.ndarray

Corresponding labels for X.

xsnp.ndarray

Source domain data.

xtnp.ndarray

Target domain data.

**kwargsdict, optional

Additional keyword arguments to pass to the model (e.g., for model selection purposes).

Returns:
selfobject

Fitted estimator.

predict(X)[source]

Predict with KDAPLS model.

Parameters:
Xndarray of shape (n_samples, n_features)

Test data matrix to perform the prediction on.

Returns:
yhatndarray of shape (n_samples_test,)

Predicted response values for the test data.

set_fit_request(*, xs: bool | None | str = '$UNCHANGED$', xt: bool | None | str = '$UNCHANGED$') KDAPLS

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
xsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for xs parameter in fit.

xtstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for xt parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDAPLS

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

diPLSlib.utils subpackage

Module contents