User API#

This page only shows partial APIs relevant to users. See all APIs in Developer API.

NodeGAM Packages#

class nodegam.sklearn.NodeGAMClassifier(in_features, cat_features=None, validation_size=0.15, quantile_dist='normal', quantile_noise=0.001, name=None, seed=1377, arch='GAM', ga2m=1, num_classes=1, num_trees=200, num_layers=2, depth=3, addi_tree_dim=0, colsample_bytree=0.5, output_dropout=0, last_dropout=0.3, l2_lambda=0, dim_att=8, n_last_checkpoints=5, batch_size=2048, lr=0.01, lr_warmup_steps=100, lr_decay_steps=300, early_stopping_steps=2000, max_steps=10000, max_time=72000, anneal_steps=2000, report_frequency=100, fp16=0, device='cuda', objective='ce_loss', verbose=1)#

Bases: NodeGAMBase

A NodeGAM Classfier that follows sklearn interface to train.

Parameters

in_features (int) – number of input features.
cat_features – the name of categorical features that match the columns of X.
validation_size – validation size.
quantile_dist – choose between [‘normal’, ‘uniform’]. Data is projected onto this distribution. See the flag ‘output_dist’ of sklearn QuantileTransformer.
quantile_noise – fits QuantileTransformer on data with added gaussian noise with std = :quantile_noise: * data.std; this will cause discrete values to be more separable. Please note that this transformation does NOT apply gaussian noise to the resulting data, the noise is only applied for QuantileTransformer.fit().
name – the model’s name. It’s used to store checkpoints under logs/{name}. If not specified, it randomly generates a temperory name.
seed – random seed.
arch – choose between [‘GAM’, ‘GAMAtt’]. GAMAtt is the architecture with attention. Often GAMAtt is better in large datasets while GAM is better in smaller ones.
ga2m – if 0, only model GAM. If 1, model GA2M.
num_classes – number of target classes. If set to 1, it is binary classification. Set to > 2 for multi-class classifications, but the visualization is not available yet for the multi-class setup.
num_trees – number of trees per layer.
num_layers – number of layers of trees.
depth – depth of the tree. Should be at least 2 if ga2m=1.
addi_tree_dim – additional dimension of tree’s output. Default: 0.
colsample_bytree – the random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train.
output_dropout – the dropout rate on the output of each tree.
last_dropout – the dropout rate on the weight of the last linear layer.
l2_lambda – the l2 penalty coefficient on the outputs of trees.
dim_att – the dimension of the attention embedding.
n_last_checkpoints – number of the most recent checkpoints to take average.
batch_size – batch size. Should be bigger than 1024.
lr – the learning rate.
lr_warmup_steps – warm up the learning rate in the first few steps.
lr_decay_steps – decrease the learning rate by half if not improving for these steps.
early_stopping_steps – early stopping if not improving for k steps.
max_steps – maximum number of steps to optimize.
max_time – maximum number of time to optimize in seconds.
anneal_steps – temperature annealing steps. After this step, the EntMax becomes Max.
report_frequency – how many steps to report.
fp16 – if 1, use fp16 to optimize.
device='cuda' – choose from [‘cpu’, ‘cuda’].
objective – the evaluation objective. Only used in binary classification i.e. `num_classes`=1 . Choose from [‘ce_loss’, ‘negative_auc’, ‘error_rate’]. If num_classes > 2 (multi-class classifier), only [‘ce_loss’, ‘error_rate’] is allowed.
verbose – if 1, print the training progress.

predict(X: DataFrame)#

Predict logits.

Parameters: X (pandas dataframe) – Input.
Returns: logits (numpy array) – logits.

predict_proba(X: DataFrame)#

Predict probability.

Parameters: X – pandas dataframe.
Returns: prob (numpy array) – the probability of 2 classes with shape [N, 2].

class nodegam.sklearn.NodeGAMRegressor(in_features, cat_features=None, validation_size=0.15, quantile_dist='normal', quantile_noise=0.001, name=None, seed=1377, arch='GAM', ga2m=1, num_trees=200, num_layers=2, depth=3, addi_tree_dim=0, colsample_bytree=0.5, output_dropout=0, last_dropout=0.3, l2_lambda=0, dim_att=8, n_last_checkpoints=5, batch_size=2048, lr=0.01, lr_warmup_steps=100, lr_decay_steps=600, early_stopping_steps=2000, max_steps=20000, max_time=72000, anneal_steps=2000, report_frequency=100, fp16=0, device='cuda', verbose=1)#

Bases: NodeGAMBase

A NodeGAM Regressor that follows sklearn interface to train.

Parameters

in_features (int) – number of input features.
cat_features – the name of categorical features that match the columns of X.
validation_size – validation size.
quantile_dist – choose between [‘normal’, ‘uniform’]. Data is projected onto this distribution. See the flag ‘output_dist’ of sklearn QuantileTransformer.
quantile_noise – fits QuantileTransformer on data with added gaussian noise with std = :quantile_noise: * data.std; this will cause discrete values to be more separable. Please note that this transformation does NOT apply gaussian noise to the resulting data, the noise is only applied for QuantileTransformer.fit().
name – the model’s name. It’s used to store checkpoints under logs/{name}. If not specified, it randomly generates a temperory name.
seed – random seed.
arch – choose between [‘GAM’, ‘GAMAtt’]. GAMAtt is the architecture with attention. Often GAMAtt is better in large datasets while GAM is better in smaller ones.
ga2m – if 0, only model GAM. If 1, model GA2M.
num_trees – number of trees per layer.
num_layers – number of layers of trees.
depth – depth of the tree. Should be at least 2 if ga2m=1.
addi_tree_dim – additional dimension of tree’s output. Default: 0.
colsample_bytree – the random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train.
output_dropout – the dropout rate on the output of each tree.
last_dropout – the dropout rate on the weight of the last linear layer.
l2_lambda – the l2 penalty coefficient on the outputs of trees.
dim_att – the dimension of the attention embedding.
n_last_checkpoints – number of the most recent checkpoints to take average.
batch_size – batch size. Should be bigger than 1024.
lr – the learning rate.
lr_warmup_steps – warm up the learning rate in the first few steps.
lr_decay_steps – decrease the learning rate by half if not improving for these steps.
early_stopping_steps – early stopping if not improving for k steps.
max_steps – maximum number of steps to optimize.
max_time – maximum number of time to optimize in seconds.
anneal_steps – temperature annealing steps. After this step, the EntMax becomes Max.
report_frequency – how many steps to report.
fp16 – if 1, use fp16 to optimize.
device='cuda' – choose from [‘cpu’, ‘cuda’].
verbose – if 1, print the training progress.

predict(X: DataFrame)#

Predict regression.

Parameters: X – pandas dataframe.
Returns: prediction – numpy array.

Data Preprocessor#

class nodegam.mypreprocessor.MyPreprocessor(random_state=1377, cat_features=None, y_normalize=False, quantile_transform=False, output_distribution='normal', n_quantiles=2000, quantile_noise=0.001)#

Bases: object

Preprocessor does the data preprocessing like input and target normalization.

Parameters

random_state – Global random seed for an experiment.
cat_features – If passed in, it does the ordinal encoding for these features before other input normalization like quantile transformation. Default: None.
y_normalize – If True, it standardizes the targets y by setting the mean and stdev to 0 and 1. Useful in the regression setting.
quantile_transform – If True, transforms the features to follow a normal or uniform distribution.
output_distribution – Choose between [‘normal’, ‘uniform’]. Data is projected onto this distribution. See the same param of sklearn QuantileTransformer. ‘normal’ is better.
n_quantiles – Number of quantiles to estimate the distribution. Default: 2000.
quantile_noise – If specified, fits QuantileTransformer on data with added gaussian noise with std = :quantile_noise: * data.std; this will cause discrete values to be more separable. Please note that this transformation does NOT apply gaussian noise to the resulting data, the noise is only applied for QuantileTransformer.

Example

>>> preprocessor = nodegam.mypreprocessor.MyPreprocessor(
>>>     cat_features=['ethnicity', 'gender'],
>>>     y_normalize=True,
>>>     random_state=1337,
>>> )
>>> preprocessor.fit(X_train, y_train)
>>> X_train, y_train = preprocessor.transform(X_train, y_train)

fit(X, y)#

Fit the transformer.

Parameters

X (pandas daraframe) – Input data.
y (numpy array) – target y.

transform(*args)#

Transform the data.

Parameters

X (pandas daraframe) – Input data.
y (numpy array) – Optional. If passed in, it will do target normalization.

Returns

X (pandas daraframe) – Normalized Input data.
y (numpy array) – Optional. Normalized y.

Utilities#

nodegam.utils.average_GAM_dfs(all_dfs)#

Take average of GAM dataframes to derive mean and stdev for each term.

Parameters: all_dfs – a list of dataframes.
Returns: df – the averaged dataframe with mean, stdev and the importance.

nodegam.utils.output_csv(the_path, data_dict, order=None, delimiter=',')#

Output a csv file from a python dictionary.

If the csv file exists, it outputs another row under this csv file.

Parameters

the_path – the filename of the csv file.
data_dict – the data dictionary.
order – if specified, the columns of the csv follow the specified order. Default: None.
delimiter – the seperated delimiter. Defulat: ‘,’.

nodegam.vis_utils.vis_GAM_effects(all_dfs, num_cols=4, figsize=None, vertical_margin=2, horizontal_margin=2, sort_by_imp=False, show_density=False, model_names=None, feature_names=None, feature_idxes=None, top_main=-1, top_interactions=-1, only_interactions=False, call_backs=None)#

Visualize main and interaction effects of the GAM model.

Parameters

all_dfs – the dictionary of dataframes. The key is the model name and the value is the GAM dataframe of each model.
num_cols – number of columns when showing GAM graphs.
figsize –
the figure size. If not specified, it uses the (width, height) = (4 * num_cols + (num_cols-1) * horizontal_margin,

3 * num_rows + vertical_margin * (num_rows - 1)).
vertical_margin – the vertical margin. Default: 2.
horizontal_margin – the horizontal margin. Default: 2.
sort_by_imp – if True, sort the figures by the feature importances. Otherwise use the feature default order.
show_density – if True, it represents the data density as color red in the background when showing the main effect GAM graph.
model_names – if specified, only show the GAM models corresponding to these.
feature_names – if specified, only show the GAM graphs corresponding to these names.
feature_idxes – if specified, only show the GAM graphs corresponding to these feature index.
top_main – if > 0, only show the top k main effects. If -1, show all main effects.
top_interactions – if > 0, only show the top k interactions. If -1, show all interactions.
only_interactions – if True, hide all the main effect plots and only show interaction terms.
call_backs – if specified, it calls this function at the end of plotting the graph. It should be a dict with key as the feature name and the value as a function (lambda ax: f(ax)) that can modify the axis corresponding to that feature. Useful to do feature-specific adjustment.

Returns

fig – the figure.
axes (numpy array) – all the axes.

EBM Packages#

class nodegam.gams.MyEBM.MyExplainableBoostingClassifier(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=-2, random_state=42)#

Bases: LabelEncodingClassifierMixin, MyExplainableBoostingMixin, ExplainableBoostingClassifier

Explainable Boosting Classifier. The arguments will change in a future release, watch the changelog.

Parameters

feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile” or “quantile_humanized”.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions. Interactions are forcefully set to 0 for multiclass problems.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.

class nodegam.gams.MyEBM.MyExplainableBoostingRegressor(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=-2, random_state=42)#

Bases: LabelEncodingRegressorMixin, MyExplainableBoostingMixin, ExplainableBoostingRegressor

Explainable Boosting Regressor. The arguments will change in a future release, watch the changelog.

Parameters

feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage on main effects.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile”, or “quantile_humanized”.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.

Spline Packages#

Note: Spline can be combined with MyBagging to get the uncertainty.

class nodegam.gams.MySpline.MySplineLogisticGAM(**kwargs)#

Bases: OnehotEncodingClassifierMixin, MySplineLogisticGAMBase

Logistic Spline for binary classification with one-hot encoding for cat features.

Parameters

search (bool) – if True, it searches the best lam penalty for the model.
search_lam (list or numpy array) – the range of lam penalty to search. If None, it is set to np.linspace(-3, 3, 15).
max_iter (int) – maximum interations to train.
n_splines (int) – number of splines. Default: 50.
cat_features (list) – the column names of the categorical features. Default: None.

fit(X, y, **kwargs)#

get_GAM_df(x_values_lookup=None, **kwargs)#

Get the GAM dataframe.

Parameters

x_values_lookup (dict) – the unique values of X for each feature. If passed, the outputs of the GAM model w.r.t. these x values are extracted. Useful to get a coarser graph when there are too many unique values in a feature.
center (bool) – if True, it centers each GAM graph to 0 by moving its mean to the intercept term.

Returns

df (pandas dataframe) – a GAM dataframe where each row represents a GAM term with the inputs x, outputs y, and feature importance.

get_lam()#: Return the lambda penalty.

get_params(*args, **kwargs)#: Return the parameters.

property is_GAM#: Returns True if it’s a GAM.

property param_distributions#

predict(X)#

predict_proba(X)#

Predict Probability.

Parameters: X (pandas dataframe) – inputs.
Returns: prob (numpy array) – the probability of both classes with shape [N, 2].

revert_dataframe(df)#: Move the old onehot-encoding df to new non-onehot encoding one.

set_params(*args, **kwargs)#

class nodegam.gams.MySpline.MySplineGAM(**kwargs)#

Bases: OnehotEncodingRegressorMixin, MySplineGAMBase

Spline for Regression with one-hot encoding for cat features.

Parameters

search (bool) – if True, it searches the best lam penalty for the model.
search_lam (list or numpy array) – the range of lam penalty to search. If None, it is set to np.linspace(-3, 3, 15).
max_iter (int) – maximum interations to train.
n_splines (int) – number of splines. Default: 50.
cat_features (list) – the column names of the categorical features. Default: None.

fit(X, y, **kwargs)#

get_GAM_df(x_values_lookup=None, **kwargs)#

Get the GAM dataframe.

Parameters

x_values_lookup (dict) – the unique values of X for each feature. If passed, the outputs of the GAM model w.r.t. these x values are extracted. Useful to get a coarser graph when there are too many unique values in a feature.
center (bool) – if True, it centers each GAM graph to 0 by moving its mean to the intercept term.

Returns

df (pandas dataframe) – a GAM dataframe where each row represents a GAM term with the inputs x, outputs y, and feature importance.

get_lam()#: Return the lambda penalty.

get_params(*args, **kwargs)#: Return the parameters.

property is_GAM#: Returns True if it’s a GAM.

property param_distributions#

predict(X)#

Predict regression target.

Parameters: X (pandas dataframe) – inputs.
Returns: prob (numpy array) – the prediction of shape [N].

revert_dataframe(df)#: Move the old onehot-encoding df to new non-onehot encoding one.

set_params(*args, **kwargs)#

XGB-GAM Packages#

Note: XGB-GAM can be combined with MyBagging to get the uncertainty.

class nodegam.gams.MyXGB.MyXGBOnehotClassifier(*args, **kwargs)#

Bases: OnehotEncodingClassifierMixin, MyXGBClassifier

XGB-GAM Classifier with one-hot encoding for categorical features.

Parameters

max_depth=1 – The tree depth of the package. Should be set to 1 to remain as a GAM.
random_state=1377 – Seed.
n_estimators=5000 – Maximum number of rounds to fit.
n_jobs=-1 – Set to -1 to use multi-thread parallel training.
validation_size=0.15 – The validation porportion.
early_stopping_rounds=50 – Early stopping rounds.
logistic' (objective='binary:) – The validation objective.

fit(X, y, **kwargs)#

get_GAM_df(x_values_lookup=None, **kwargs)#

Get the GAM dataframe.

Parameters

x_values_lookup (dict) – the unique values of X for each feature. If passed, the outputs of the GAM model w.r.t. these x values are extracted. Useful to get a coarser graph when there are too many unique values in a feature.
center (bool) – if True, it centers each GAM graph to 0 by moving its mean to the intercept term.

Returns

df (pandas dataframe) – a GAM dataframe where each row represents a GAM term with the inputs x, outputs y, and feature importance.

get_params(*args, **kwargs)#

property is_GAM#: Returns True if it’s a GAM.

property param_distributions#

predict(X)#

predict_proba(X)#

revert_dataframe(df)#: Move the old onehot-encoding df to new non-onehot encoding one.

set_params(*args, **kwargs)#

class nodegam.gams.MyXGB.MyXGBOnehotRegressor(*args, **kwargs)#

Bases: OnehotEncodingRegressorMixin, MyXGBRegressor

XGB-GAM Regressor with one-hot encoding for categorical features.

Parameters

max_depth=1 – The tree depth of the package. Should be set to 1 to remain as a GAM.
random_state=1377 – Seed.
n_estimators=5000 – Maximum number of rounds to fit.
n_jobs=-1 – Set to -1 to use multi-thread parallel training.
validation_size=0.15 – The validation porportion.
early_stopping_rounds=50 – Early stopping rounds.
squarederror' (objective='reg:) – The validation objective.

fit(X, y, **kwargs)#

get_GAM_df(x_values_lookup=None, **kwargs)#

Get the GAM dataframe.

Parameters

x_values_lookup (dict) – the unique values of X for each feature. If passed, the outputs of the GAM model w.r.t. these x values are extracted. Useful to get a coarser graph when there are too many unique values in a feature.
center (bool) – if True, it centers each GAM graph to 0 by moving its mean to the intercept term.

Returns

df (pandas dataframe) – a GAM dataframe where each row represents a GAM term with the inputs x, outputs y, and feature importance.

get_params(*args, **kwargs)#

property is_GAM#: Returns True if it’s a GAM.

property param_distributions#

predict(X)#

revert_dataframe(df)#: Move the old onehot-encoding df to new non-onehot encoding one.

set_params(*args, **kwargs)#

Bagging Packages#

class nodegam.gams.MyBagging.MyBaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#

Bases: OnehotEncodingClassifierMixin, MyBaggingClassifierBase, MyCommonBase

The bagging for the base estimator GAM.

It overwrites the sklearn.ensemble.BaggingClassifier to (1) do ensemble on the logits and NOT: the probabilities to make it still as a GAM, and (2) support GAM df extraction.

Parameters

base_estimator – the base estimator model.
n_estimators – how many number of estimators to do bagging.
max_samples (int or float) – the number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
max_features (int or float) – The number of features to draw from X to train each base estimator (without replacement by default, see bootstrap_features for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
bootstrap (bool) – Whether samples are drawn with replacement. If False, sampling without replacement is performed.
bootstrap_features (bool) – Whether features are drawn with replacement.
oob_score (bool) – Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs (int) – The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
random_state – random state.
verbose – verbose.

decision_function(X)#

Average of the decision functions of the base classifiers.

Parameters: X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: score (ndarray of shape (n_samples, k)) – The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

property estimators_samples_#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

fit(X, y, **kwargs)#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns

self (object) – Fitted estimator.

get_GAM_df(x_values_lookup=None, **kwargs)#

Get the GAM graph parameter.

Parameters

x_values_lookup – a dictionary of mapping feature name to its correpsonding unique increasing x. E.g. {‘BUN’: [1.1, 1.5, 3.1, 5.0], ‘cancer’: [0, 1]}.
get_y_std – to get the error bar of the y. It’s slower if this is set to true. Default: True

Returns

A dataframe of GAM graph.

get_params(deep=True)#

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params (dict) – Parameter names mapped to their values.

property is_GAM#: Returns True if it’s a GAM.

property n_features_#

Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.

Type: DEPRECATED

property param_distributions#

predict(X)#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters: X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: y (ndarray of shape (n_samples,)) – The predicted classes.

predict_log_proba(X)#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters: X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: p (ndarray of shape (n_samples, n_classes)) – The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)#

Modify it to be using the average of the log-odds instead of avg probobability.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters

X – {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
parallel – if True, predict outputs using parallel threads to speed up. But in xgboost, the base estimator already uses multiple threads so it actually slows down.

Returns

p – array of shape = [n_samples, n_classes]. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

revert_dataframe(df)#: Move the old onehot-encoding df to new non-onehot encoding one.

score(X, y, sample_weight=None)#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score (float) – Mean accuracy of self.predict(X) wrt. y.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self (estimator instance) – Estimator instance.

class nodegam.gams.MyBagging.MyBaggingRegressor(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#

Bases: OnehotEncodingRegressorMixin, MyBaggingRegressorBase, MyCommonBase

The bagging for the base estimator GAM regressor.

It overwrites the sklearn.ensemble.BaggingRegressor to support GAM df extraction.

Parameters

base_estimator – the base estimator model.
n_estimators – how many number of estimators to do bagging.
max_samples (int or float) – the number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
max_features (int or float) – The number of features to draw from X to train each base estimator (without replacement by default, see bootstrap_features for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
bootstrap (bool) – Whether samples are drawn with replacement. If False, sampling without replacement is performed.
bootstrap_features (bool) – Whether features are drawn with replacement.
oob_score (bool) – Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs (int) – The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
random_state – random state.
verbose – verbose.

property estimators_samples_#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

fit(X, y, **kwargs)#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns

self (object) – Fitted estimator.

get_GAM_df(x_values_lookup=None, **kwargs)#

Get the GAM graph parameter.

Parameters

x_values_lookup – a dictionary of mapping feature name to its correpsonding unique increasing x. E.g. {‘BUN’: [1.1, 1.5, 3.1, 5.0], ‘cancer’: [0, 1]}.
get_y_std – to get the error bar of the y. It’s slower if this is set to true. Default: True

Returns

A dataframe of GAM graph.

get_params(deep=True)#

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params (dict) – Parameter names mapped to their values.

property is_GAM#: Returns True if it’s a GAM.

property n_features_#

Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.

Type: DEPRECATED

property param_distributions#

predict(X)#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters: X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: y (ndarray of shape (n_samples,)) – The predicted values.

revert_dataframe(df)#: Move the old onehot-encoding df to new non-onehot encoding one.

score(X, y, sample_weight=None)#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score (float) – \(R^2\) of self.predict(X) wrt. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)#

Set the parameters of this estimator.

Parameters: **params (dict) – Estimator parameters.
Returns: self (estimator instance) – Estimator instance.