diPLSlib package
diPLSlib.functions module
- diPLSlib.functions.convex_relaxation(xs, xt)[source]
Perform convex relaxation of the covariance difference matrix.
This relaxation involves computing the eigenvalue decomposition of the symmetric covariance difference matrix, inverting the signs of negative eigenvalues, and reconstructing the matrix. This corresponds to an upper bound on the covariance difference between source and target domains.
- Parameters:
- xsndarray of shape (n_source_samples, n_features)
Feature data from the source domain.
- xtndarray of shape (n_target_samples, n_features)
Feature data from the target domain.
- Returns:
- Dndarray of shape (n_features, n_features)
Relaxed covariance difference matrix.
References
Ramin Nikzad-Langerodi et al., “Domain-Invariant Regression under Beer-Lambert’s Law”, Proc. ICMLA, 2019.
Examples
>>> import numpy as np >>> from diPLSlib.functions import convex_relaxation >>> xs = np.random.random((100, 10)) >>> xt = np.random.random((100, 10)) >>> D = convex_relaxation(xs, xt)
- diPLSlib.functions.dipals(x, y, xs, xt, A, l, heuristic: bool = False, target_domain=0, laplacian: bool = False)[source]
Perform (Multiple) Domain-Invariant Partial Least Squares (di-PLS) regression.
This method fits a PLS regression model using labeled source domain data and potentially unlabeled target domain data across multiple domains, aiming to build a model that generalizes well across different domains.
- Parameters:
- xndarray of shape (n_samples, n_features)
Labeled source domain data.
- yndarray of shape (n_samples, 1)
Response variable associated with the source domain.
- xsndarray of shape (n_source_samples, n_features)
Source domain feature data.
- xtndarray of shape (n_target_samples, n_features) or list of ndarray
Target domain feature data. Multiple domains can be provided as a list.
- Aint
Number of latent variables to use in the model.
- lfloat or tuple of length A
Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.
- heuristicbool, default=False
If True, automatically determine the regularization parameter to equally balance fitting Y and minimizing domain discrepancy.
- target_domainint, default=0
Specifies which target domain the model should apply to, where 0 indicates the source domain.
- laplacianbool, default=False
If True, uses a Laplacian matrix to regularize distances between matched calibration transfer samples in latent variable space.
- Returns:
- bndarray of shape (n_features, 1)
Regression coefficient vector.
- Tndarray of shape (n_samples, A)
Training data projections (scores).
- Tsndarray of shape (n_source_samples, A)
Source domain projections (scores).
- Ttndarray of shape (n_target_samples, A) or list of ndarray
Target domain projections (scores).
- Wndarray of shape (n_features, A)
Weight matrix.
- Pndarray of shape (n_features, A)
Loadings matrix corresponding to x.
- Psndarray of shape (n_features, A)
Loadings matrix corresponding to xs.
- Ptndarray of shape (n_features, A) or list of ndarray
Loadings matrix corresponding to xt.
- Endarray
Residuals of training data.
- Esndarray
Source domain residual matrix.
- Etndarray or list of ndarray
Target domain residual matrix.
- Eyndarray
Residuals of response variable in the source domain.
- Cndarray of shape (A, 1)
Regression vector relating source projections to the response variable.
- opt_lndarray of shape (A,)
Heuristically determined regularization parameter for each latent variable.
- discrepancyndarray of shape (A,)
The variance discrepancy between source and target domain projections.
References
Ramin Nikzad-Langerodi et al., “Domain-Invariant Partial Least Squares Regression”, Analytical Chemistry, 2018.
Ramin Nikzad-Langerodi et al., “Domain-Invariant Regression under Beer-Lambert’s Law”, Proc. ICMLA, 2019.
Ramin Nikzad-Langerodi et al., “Domain adaptation for regression under Beer–Lambert’s law”, Knowledge-Based Systems, 2020.
Mikulasek et al., “Partial least squares regression with multiple domains”, Journal of Chemometrics, 2023.
Examples
>>> import numpy as np >>> from diPLSlib.functions import dipals >>> x = np.random.random((100, 10)) >>> y = np.random.random((100, 1)) >>> xs = np.random.random((50, 10)) >>> xt = np.random.random((50, 10)) >>> b, T, Ts, Tt, W, P, Ps, Pt, E, Es, Et, Ey, C, opt_l, discrepancy = dipals(x, y, xs, xt, 2, 0.1)
- diPLSlib.functions.edpls(x: ndarray, y: ndarray, n_components: int, epsilon: float, delta: float = 0.05, rng=None)[source]
(epsilon, delta)-Differentially Private Partial Least Squares Regression.
A Gaussian mechanism according to Balle & Wang (2018) is used to privately release weights \(\mathbf{W}\), scores \(\mathbf{T}\), and \(X/Y\)-loadings \(\mathbf{P}\)/\(\mathbf{c}\) from the PLS1 algorithm. For each latent variable, i.i.d. noise from \(\mathcal{N}(0,\sigma^2)\) with variance satisfying
\[\Phi\left( \frac{\Delta}{2\sigma} - \frac{\epsilon\sigma}{\Delta} \right) - e^{\epsilon} \Phi\left( -\frac{\Delta}{2\sigma} - \frac{\epsilon\sigma}{\Delta} \right)\leq \delta,\]with \(\Phi(t) = \mathrm{P}[\mathcal{N}(0,1)\leq t]\) (i.e., the CDF of the standard univariate Gaussian distribution), is added to the weights, scores, and loadings, whereas the sensitivity \(\Delta(\cdot)\) for the functions releasing the corresponding quantities is calculated as follows:
\[\Delta(w) = \sup_{(\mathbf{x}, y)} |y| \|\mathbf{x}\|_2\]\[\Delta(t) \leq \sup_{\mathbf{x}} \|\mathbf{x}\|_2\]\[\Delta(p) \leq \sup_{\mathbf{x}} \|\mathbf{x}\|_2\]\[\Delta(c) \leq \sup_{y} |y|.\]Note that in contrast to the Gaussian mechanism proposed in Dwork et al. (2006) and Dwork et al. (2014), the mechanism of Balle & Wang (2018) guarantees \((\epsilon, \delta)\)-differential privacy for any value of \(\epsilon > 0\) and not only for \(\epsilon \leq 1\).
- Parameters:
- xndarray of shape (n_samples, n_features)
Input data.
- yndarray of shape (n_samples, n_targets)
Target values.
- n_componentsint
Number of latent variables.
- epsilonfloat
Privacy loss parameter.
- deltafloat, default=0.05
Failure probability.
- rngnumpy.random.Generator, optional
Random number generator.
- Returns:
- coef_ndarray of shape (n_features, n_targets)
Regression coefficients.
- x_weights_ndarray of shape (n_features, n_components)
X weights.
- x_loadings_ndarray of shape (n_features, n_components)
X loadings.
- y_loadings_ndarray of shape (n_components, n_targets)
Y loadings.
- x_scores_ndarray of shape (n_samples, n_components)
X scores.
- x_residuals_ndarray of shape (n_samples, n_features)
X residuals.
- y_residuals_ndarray of shape (n_samples, n_targets)
Y residuals.
References
Nikzad-Langerodi, et al. (2024). (epsilon,delta)-Differentially private partial least squares regression (unpublished).
Balle, B., & Wang, Y. X. (2018, July). Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (pp. 394-403). PMLR.
Examples
>>> from diPLSlib.functions import edpls >>> import numpy as np >>> x = np.random.rand(100, 10) >>> y = np.random.rand(100, 1) >>> coef_, x_weights_, x_loadings_, y_loadings_, x_scores_, x_residuals_, y_residuals_ = edpls(x, y, 2, epsilon=0.1, delta=0.05)
- diPLSlib.functions.kdapls(x: ndarray, y: ndarray, xs: ndarray, xt, A: int, l, kernel_params: dict = {'gamma': 10, 'type': 'rbf'})[source]
Perform Kernel Domain Adaptive Partial Least Squares (kda-PLS) regression.
This method fits a Kernel PLS regression model using labeled source domain data and potentially unlabeled target domain data. In contrast to di-PLS, kda-PLS aligns the source and target distributions in a RKHS in a non-parametric way, thus making no assumptions about the underlying data distributions.
Mathematically, for each latent variable (LV), kda‐PLS finds a weight vector \(\mathbf{w}\) (with \(\mathbf{w}^T\mathbf{w} = 1\)) that maximizes
\[\max_{\mathbf{w} : \mathbf{w}^T\mathbf{w} = 1} \Biggl( \mathbf{w}^T K(X_s, X_s)^T Y Y^T K(X_s, X_s) \mathbf{w} - \gamma \mathbf{w}^T K(X_{st}, X_s)^T H L H K(X_{st}, X_s) \mathbf{w} \Biggr),\]where
\(K(X_s, X_s)\) is the kernel matrix computed from the source-domain data,
\(K(X_{st}, X_s)\) is the kernel matrix computed between the combined source/target data \(X_{st} = [X_s; X_t]\) and the source-domain data,
\(Y\) is the response variable,
\(H\) denotes the centering matrix,
\(L\) is the Laplacian matrix defined such that \(L_{ij}=1\) if the i-th and j-th sample in \(X_{st}\) belong to the same domain and 0 otherwise,
\(\gamma\) is the regularization parameter that balances maximizing the covariance between \(K(X_s, X_s)\) and \(Y\) with minimizing the domain discrepancy.
- Parameters:
- xndarray of shape (n_samples, n_features)
Labeled source domain data.
- yndarray of shape (n_samples, 1)
Response variable associated with the source domain.
- xsndarray of shape (n_source_samples, n_features)
Source domain feature data.
- xtndarray of shape (n_target_samples, n_features) or list of ndarray
Target domain feature data. Multiple domains can be provided as a list.
- Aint
Number of latent variables to use in the model.
- lfloat or tuple of length A
Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.
- kernel_paramsdict, default={“type”: “rbf”, “gamma”: 10}
Kernel parameters. The dictionary must contain the following keys: - “type”: str, default=”rbf”
Type of kernel to use. Supported types are “rbf”, “linear”, and “primal”.
- “gamma”: float, default=10
Kernel coefficient for the RBF kernel.
- Returns:
- bndarray of shape (n_features, 1)
Regression coefficient vector.
- bstndarray of shape (n_features, 1)
Regression coefficient vector for the target domain.
- Tndarray of shape (n_samples, A)
Training data projections (scores).
- Tstndarray of shape (n_source_samples + n_target_samples, A)
Source and target domain projections (scores).
- Wndarray of shape (m, A)
Weight matrix.
- Pndarray of shape (m, A)
Loadings matrix for source domain.
- Pstndarray of shape (m, A)
Loadings matrix for source and target domains.
- Endarray
Residuals for source domain.
- Estndarray
Residuals for source and target domains.
- Eyndarray
Residuals of response variable.
- Cndarray of shape (A, q)
Regression vector relating projections to the response variable.
- centeringdict
Dictionary containing centering information.
References
Huang, G., Chen, X., Li, L., Chen, X., Yuan, L., & Shi, W. (2020). Domain adaptive partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 201, 103986.
Examples
>>> import numpy as np >>> from diPLSlib.functions import kdapls >>> x = np.random.random((100, 10)) >>> y = np.random.random((100, 1)) >>> xs = np.random.random((50, 10)) >>> xt = np.random.random((50, 10)) >>> b, bst, T, Tst, W, P, Pst, E, Est, Ey, C, centering = kdapls(x, y, xs, xt, 2, 0.5)
- diPLSlib.functions.transfer_laplacian(x: ndarray, y: ndarray) ndarray[source]
Construct a Laplacian matrix for calibration transfer problems.
- Parameters:
- xndarray of shape (n_samples, n_features)
Data samples from device 1.
- yndarray of shape (n_samples, n_features)
Data samples from device 2.
- Returns:
- Lndarray of shape (2 * n_samples, 2 * n_samples)
The Laplacian matrix for the calibration transfer problem.
References
Nikzad‐Langerodi, R., & Sobieczky, F. (2021). Graph‐based calibration transfer. Journal of Chemometrics, 35(4), e3319.
Examples
>>> import numpy as np >>> from diPLSlib.functions import transfer_laplacian >>> x = np.array([[1, 2], [3, 4]]) >>> y = np.array([[2, 3], [4, 5]]) >>> L = transfer_laplacian(x, y) >>> print(L) [[ 1. 0. -1. -0.] [ 0. 1. -0. -1.] [-1. -0. 1. 0.] [-0. -1. 0. 1.]]
diPLSlib.models module
diPLSlib model classes
DIPLS base class
GCTPLS class
EDPLS class
KDAPLS class
- class diPLSlib.models.DIPLS(A=2, l=0, centering=True, heuristic=False, target_domain=0, rescale='Target')[source]
Bases:
RegressorMixin,BaseEstimatorDomain-Invariant Partial Least Squares (DIPLS) algorithm for domain adaptation.
This class implements the DIPLS algorithm, which is designed to align feature distributions across different domains while predicting the target variable y. It supports multiple source and target domains through domain-specific feature transformations.
- Parameters:
- Aint, default=2
Number of latent variables to use in the model.
- lfloat or tuple of length A, default=0
Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.
- centeringbool, default=True
If True, source and target domain data are mean-centered.
- heuristicbool, default=False
If True, the regularization parameter is set to a heuristic value that balances fitting the output variable y and minimizing domain discrepancy.
- target_domainint, default=0
If multiple target domains are passed, target_domain specifies for which of the target domains the model should apply. If target_domain=0, the model applies to the source domain, if target_domain=1, it applies to the first target domain, and so on.
- rescalestr or ndarray, default=’Target’
Determines rescaling of the test data. If ‘Target’ or ‘Source’, the test data will be rescaled to the mean of xt or xs, respectively. If an ndarray is provided, the test data will be rescaled to the mean of the provided array.
- Attributes:
- n_int
Number of samples in X.
- ns_int
Number of samples in xs.
- nt_int
Number of samples in xt.
- n_features_in_int
Number of features in X.
- mu_ndarray of shape (n_features,)
Mean of columns in X.
- mu_s_ndarray of shape (n_features,)
Mean of columns in xs.
- mu_t_ndarray of shape (n_features,) or list of ndarray
Mean of columns in xt, averaged per target domain if multiple domains exist.
- b_ndarray of shape (n_features, 1)
Regression coefficient vector.
- b0_float
Intercept of the regression model.
- T_ndarray of shape (n_samples, A)
Training data projections (scores).
- Ts_ndarray of shape (n_source_samples, A)
Source domain projections (scores).
- Tt_ndarray of shape (n_target_samples, A) or list of ndarray
Target domain projections (scores).
- W_ndarray of shape (n_features, A)
Weight matrix.
- P_ndarray of shape (n_features, A)
Loadings matrix corresponding to X.
- Ps_ndarray of shape (n_features, A)
Loadings matrix corresponding to xs.
- Pt_ndarray of shape (n_features, A) or list of ndarray
Loadings matrix corresponding to xt.
- E_ndarray
Residuals of training data.
- Es_ndarray
Source domain residual matrix.
- Et_ndarray or list of ndarray
Target domain residual matrix.
- Ey_ndarray
Residuals of response variable in the source domain.
- C_ndarray of shape (A, 1)
Regression vector relating source projections to the response variable.
- opt_l_ndarray of shape (A,)
Heuristically determined regularization parameter for each latent variable.
- discrepancy_ndarray of shape (A,)
The variance discrepancy between source and target domain projections.
- is_fitted_bool
Whether the model has been fitted to data.
Methods
fit(X, y[, xs, xt])Fit the DIPLS model.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Predict y using the fitted DIPLS model.
score(X, y[, sample_weight])Return the coefficient of determination of the prediction.
set_fit_request(*[, xs, xt])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.References
Ramin Nikzad-Langerodi et al., “Domain-Invariant Partial Least Squares Regression”, Analytical Chemistry, 2018.
Ramin Nikzad-Langerodi et al., “Domain-Invariant Regression under Beer-Lambert’s Law”, Proc. ICMLA, 2019.
Ramin Nikzad-Langerodi et al., “Domain adaptation for regression under Beer–Lambert’s law”, Knowledge-Based Systems, 2020.
Mikulasek et al., “Partial least squares regression with multiple domains”, Journal of Chemometrics, 2023.
Examples
>>> import numpy as np >>> from diPLSlib.models import DIPLS >>> x = np.random.rand(100, 10) >>> y = np.random.rand(100, 1) >>> xs = np.random.rand(100, 10) >>> xt = np.random.rand(50, 10) >>> model = DIPLS(A=5, l=10) >>> model.fit(x, y, xs, xt) DIPLS(A=5, l=10) >>> xtest = np.array([5, 7, 4, 3, 2, 1, 6, 8, 9, 10]).reshape(1, -1) >>> yhat = model.predict(xtest)
- fit(X, y, xs=None, xt=None, **kwargs)[source]
Fit the DIPLS model.
This method fits the domain-invariant partial least squares (di-PLS) model using the provided source and target domain data. It can handle both single and multiple target domains.
- Parameters:
- Xndarray of shape (n_samples, n_features)
Labeled input data from the source domain.
- yndarray of shape (n_samples, 1)
Response variable corresponding to the input data x.
- xsndarray of shape (n_samples_source, n_features)
Source domain X-data. If not provided, defaults to X.
- xtUnion[ndarray of shape (n_samples_target, n_features), List[ndarray]]
Target domain X-data. Can be a single target domain or a list of arrays representing multiple target domains. If not provided, defaults to X.
- **kwargsdict, optional
Additional keyword arguments to pass to the model (e.g., for model selection purposes).
- Returns:
- selfobject
Fitted model instance.
- predict(X)[source]
Predict y using the fitted DIPLS model.
This method predicts the response variable for the provided test data using the fitted domain-invariant partial least squares (di-PLS) model.
- Parameters:
- Xndarray of shape (n_samples, n_features)
Test data matrix to perform the prediction on.
- Returns:
- yhatndarray of shape (n_samples_test,)
Predicted response values for the test data.
- set_fit_request(*, xs: bool | None | str = '$UNCHANGED$', xt: bool | None | str = '$UNCHANGED$') DIPLS
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- xsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xsparameter infit.- xtstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xtparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DIPLS
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- class diPLSlib.models.EDPLS(A: int = 2, epsilon: float = 1.0, delta: float = 0.05, centering: bool = True, random_state=None)[source]
Bases:
DIPLS(epsilon, delta)-Differentially Private Partial Least Squares Regression.
This class implements the (epsilon, delta)-Differentially Private Partial Least Squares (PLS) regression method by Nikzad-Langerodi et al. (2024, unpublished).
- Parameters:
- Aint, default=2
Number of latent variables.
- epsilonfloat, default=1.0
Privacy loss parameter.
- deltafloat, default=0.05
Failure probability.
- centeringbool, default=True
If True, the data will be centered before fitting the model.
- random_stateint, RandomState instance or None, default=None
Controls the randomness of the noise added for differential privacy.
- Attributes:
- n_int
Number of samples in the training data.
- n_features_in_int
Number of features in the training data.
- x_mean_ndarray of shape (n_features,)
Estimated mean of each feature.
- coef_ndarray of shape (n_features, 1)
Estimated regression coefficients.
- y_mean_float
Estimated intercept.
- x_scores_ndarray of shape (n_samples, A)
X scores.
- x_loadings_ndarray of shape (n_features, A)
X loadings.
- x_weights_ndarray of shape (n_features, A)
X weights.
- y_loadings_ndarray of shape (A, 1)
Y loadings.
- x_residuals_ndarray of shape (n_samples, n_features)
X residuals.
- y_residuals_ndarray of shape (n_samples, 1)
Y residuals.
- is_fitted_bool
True if the model has been fitted.
Methods
fit(X, y, **kwargs)Fit the EDPLS model.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(x)Predict y using the fitted EDPLS model.
score(X, y[, sample_weight])Return the coefficient of determination of the prediction.
set_fit_request(*[, xs, xt])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_predict_request(*[, x])Request metadata passed to the
predictmethod.set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.References
Nikzad-Langerodi, et al. (2024). (epsilon,delta)-Differentially private partial least squares regression (unpublished).
Balle, B., & Wang, Y. X. (2018, July). Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (pp. 394-403). PMLR.
Examples
>>> from diPLSlib.models import EDPLS >>> import numpy as np >>> x = np.random.rand(100, 10) >>> y = np.random.rand(100, 1) >>> model = EDPLS(A=5, epsilon=0.1, delta=0.01) >>> model.fit(x, y) EDPLS(A=5, delta=0.01, epsilon=0.1) >>> xtest = np.array([5, 7, 4, 3, 2, 1, 6, 8, 9, 10]).reshape(1, -1) >>> yhat = model.predict(xtest)
- fit(X: ndarray, y: ndarray, **kwargs)[source]
Fit the EDPLS model.
- Parameters:
- Xarray, shape (n_samples, n_features)
Training data.
- yarray, shape (n_samples,)
Target values.
- **kwargsdict, optional
Additional keyword arguments to pass to the model (e.g., for model selection purposes).
- Returns:
- selfobject
Fitted model instance.
- predict(x: ndarray)[source]
Predict y using the fitted EDPLS model.
Parameters
- x: numpy array of shape (n_samples_test, n_features)
Test data matrix to perform the prediction on.
- Returns:
- yhat: numpy array of shape (n_samples_test, )
Predicted response values for the test data.
- set_predict_request(*, x: bool | None | str = '$UNCHANGED$') EDPLS
Request metadata passed to the
predictmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xparameter inpredict.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') EDPLS
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- class diPLSlib.models.GCTPLS(A=2, l=0, centering=True, heuristic=False, rescale='Target')[source]
Bases:
DIPLSGraph-based Calibration Transfer Partial Least Squares (GCT-PLS).
This method minimizes the distance between source (xs) and target (xt) domain data pairs in the latent variable space while fitting the response.
- Parameters:
- Aint, default=2
Number of latent variables to use in the model.
- lfloat or tuple of length A, default=0
Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.
- centeringbool, default=True
If True, source and target domain data are mean-centered before fitting.
- heuristicbool, default=False
If True, the regularization parameter is set to a heuristic value aimed at balancing model fitting quality for the response variable y while minimizing discrepancies between domain representations.
- rescalestr or ndarray, default=’Target’
Determines rescaling of the test data. If ‘Target’ or ‘Source’, the test data will be rescaled to the mean of xt or xs, respectively. If an ndarray is provided, the test data will be rescaled to the mean of the provided array.
- Attributes:
- n_int
Number of samples in X.
- ns_int
Number of samples in xs.
- nt_int
Number of samples in xt.
- n_features_in_int
Number of features in X.
- mu_ndarray of shape (n_features,)
Mean of columns in X.
- mu_s_ndarray of shape (n_features,)
Mean of columns in xs.
- mu_t_ndarray of shape (n_features,)
Mean of columns in xt.
- b_ndarray of shape (n_features, 1)
Regression coefficient vector.
- b0_float
Intercept of the regression model.
- T_ndarray of shape (n_samples, A)
Training data projections (scores).
- Ts_ndarray of shape (n_source_samples, A)
Source domain projections (scores).
- Tt_ndarray of shape (n_target_samples, A)
Target domain projections (scores).
- W_ndarray of shape (n_features, A)
Weight matrix.
- P_ndarray of shape (n_features, A)
Loadings matrix corresponding to X.
- Ps_ndarray of shape (n_features, A)
Loadings matrix corresponding to xs.
- Pt_ndarray of shape (n_features, A)
Loadings matrix corresponding to xt.
- E_ndarray of shape (n_source_samples, n_features)
Residuals of source domain data.
- Es_ndarray of shape (n_source_samples, n_features)
Source domain residual matrix.
- Et_ndarray of shape (n_target_samples, n_features)
Target domain residual matrix.
- Ey_ndarray of shape (n_source_samples, 1)
Residuals of response variable in the source domain.
- C_ndarray of shape (A, 1)
Regression vector relating source projections to the response variable.
- opt_l_ndarray of shape (A,)
Heuristically determined regularization parameter for each latent variable.
- discrepancy_ndarray
The variance discrepancy between source and target domain projections.
- is_fitted_bool
Whether the model has been fitted to data.
Methods
fit(X, y[, xs, xt])Fit the GCT-PLS model to data.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Predict y using the fitted DIPLS model.
score(X, y[, sample_weight])Return the coefficient of determination of the prediction.
set_fit_request(*[, xs, xt])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.References
Nikzad‐Langerodi, R., & Sobieczky, F. (2021). Graph‐based calibration transfer. Journal of Chemometrics, 35(4), e3319.
Examples
>>> import numpy as np >>> from diPLSlib.models import GCTPLS >>> x = np.random.rand(100, 10) >>> y = np.random.rand(100, 1) >>> xs = np.random.rand(80, 10) >>> xt = np.random.rand(80, 10) >>> model = GCTPLS(A=3, l=(2, 5, 7)) >>> model.fit(x, y, xs, xt) GCTPLS(A=3, l=(2, 5, 7)) >>> xtest = np.array([5, 7, 4, 3, 2, 1, 6, 8, 9, 10]).reshape(1, -1) >>> yhat = model.predict(xtest)
- fit(X, y, xs=None, xt=None, **kwargs)[source]
Fit the GCT-PLS model to data.
- Parameters:
- xndarray of shape (n_samples, n_features)
Labeled input data from the source domain.
- yndarray of shape (n_samples, 1)
Response variable corresponding to the input data x.
- xsndarray of shape (n_sample_pairs, n_features)
Source domain X-data. If not provided, defaults to X.
- xtndarray of shape (n_sample_pairs, n_features)
Target domain X-data. If not provided, defaults to X.
- **kwargsdict, optional
Additional keyword arguments to pass to the model (e.g., for model selection purposes).
- Returns:
- selfobject
Fitted model instance.
- set_fit_request(*, xs: bool | None | str = '$UNCHANGED$', xt: bool | None | str = '$UNCHANGED$') GCTPLS
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- xsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xsparameter infit.- xtstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xtparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GCTPLS
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- class diPLSlib.models.KDAPLS(A=2, l=0, kernel_params=None, target_domain=0)[source]
Bases:
RegressorMixin,BaseEstimatorKernel Domain Adaptive Partial Least Squares (KDAPLS) algorithm for domain adaptation.
This class implements KDAPLS by calling the kdapls function from functions.py. KDAPLS projects both source and target data into a reproducing kernel Hilbert space (RKHS) and aligns domains in that space while fitting the regression model on labeled data.
- Parameters:
- Aint, default=2
Number of latent variables to use in the model.
- lfloat or tuple, default=0
Regularization parameter. If a single value is provided, the same regularization is applied to all latent variables.
- kernel_paramsdict, optional
Dictionary specifying the kernel type and parameters. Accepted keys: - “type” : str, default=”rbf”
Kernel type, can be “rbf”, “linear”, or “primal”.
- “gamma”float, default=0.0001
Kernel coefficient for RBF kernels.
- target_domainint, default=0
Specifies which domain’s coefficient vector is used for predictions.
- Attributes:
- n_int
Number of samples in X.
- n_features_in_int
Number of features in X.
- ns_int
Number of samples in xs.
- nt_int or list
Number of samples in xt. If multiple target domains are provided, this is a list of sample counts for each domain.
- coef_ndarray of shape (n_features, 1)
Regression coefficient vector used for predictions.
- X_ndarray of shape (n_, n_features_in_)
Training data used for fitting the model.
- xs_ndarray of shape (ns_, n_features_in_)
(Unlabeled) source domain data used for fitting the model.
- xt_ndarray of shape (nt_, n_features_in_)
(Unlabeled) target domain data used for fitting the model.
- y_mean_float
Mean of the training response variable.
- centering_dict
Dictionary of stored centering information for kernel operations.
- is_fitted_bool
Whether the model has been fitted to data.
Methods
fit(X, y[, xs, xt])Fit the KDAPLS Model.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Predict with KDAPLS model.
score(X, y[, sample_weight])Return the coefficient of determination of the prediction.
set_fit_request(*[, xs, xt])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.References
Huang, G., Chen, X., Li, L., Chen, X., Yuan, L., & Shi, W. (2020). Domain adaptive partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 201, 103986.
B. Schölkopf, A. Smola, and K. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299-1319, 1998.
Examples
>>> import numpy as np >>> from diPLSlib.models import KDAPLS >>> x = np.random.rand(100, 10) >>> y = np.random.rand(100, 1) >>> xs = np.random.rand(80, 10) >>> xt = np.random.rand(50, 10) >>> model = KDAPLS(A=2, l=0.5, kernel_params={"type": "rbf", "gamma": 0.001}) >>> model.fit(x, y, xs, xt) KDAPLS(kernel_params={'gamma': 0.001, 'type': 'rbf'}, l=0.5) >>> xtest = np.random.rand(5, 10) >>> yhat = model.predict(xtest)
- fit(X, y, xs=None, xt=None, **kwargs)[source]
Fit the KDAPLS Model.
- Parameters:
- Xnp.ndarray
Labeled source domain data (usually the same as xs).
- ynp.ndarray
Corresponding labels for X.
- xsnp.ndarray
Source domain data.
- xtnp.ndarray
Target domain data.
- **kwargsdict, optional
Additional keyword arguments to pass to the model (e.g., for model selection purposes).
- Returns:
- selfobject
Fitted estimator.
- predict(X)[source]
Predict with KDAPLS model.
- Parameters:
- Xndarray of shape (n_samples, n_features)
Test data matrix to perform the prediction on.
- Returns:
- yhatndarray of shape (n_samples_test,)
Predicted response values for the test data.
- set_fit_request(*, xs: bool | None | str = '$UNCHANGED$', xt: bool | None | str = '$UNCHANGED$') KDAPLS
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- xsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xsparameter infit.- xtstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xtparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDAPLS
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.