nodegam.gams package#
Submodules#
nodegam.gams.EncodingBase module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.EncodingBase.EncodingBase#
Bases:
objectA base class for handling label or onehot encoding.
- get_GAM_df(x_values_lookup=None, **kwargs)#
- revert_dataframe(df)#
- class nodegam.gams.EncodingBase.LabelEncodingClassifierMixin#
Bases:
LabelEncodingFitMixin- predict_proba(X)#
- class nodegam.gams.EncodingBase.LabelEncodingFitMixin#
Bases:
EncodingBase- fit(X, y, **kwargs)#
- my_fit(X, y)#
- my_transform(X)#
- revert_dataframe(df)#
- class nodegam.gams.EncodingBase.LabelEncodingRegressorMixin#
Bases:
LabelEncodingFitMixin- predict(X)#
- class nodegam.gams.EncodingBase.OnehotEncodingClassifierMixin#
Bases:
OnehotEncodingFitMixin- predict_proba(X)#
- class nodegam.gams.EncodingBase.OnehotEncodingFitMixin#
Bases:
EncodingBase- fit(X, y, **kwargs)#
- predict(X)#
- revert_dataframe(df)#
Move the old onehot-encoding df to new non-onehot encoding one.
- class nodegam.gams.EncodingBase.OnehotEncodingRegressorMixin#
Bases:
OnehotEncodingFitMixin
nodegam.gams.MyBagging module#
Adapted from https://github.com/zzzace2000/GAMs_models/.
It implements the bagging of GAM models. Unlike sklearn implmementation of BaggingClassifier, it averages the logits instead of the probability to make sure the bagging of GAMs is still a GAM. It also implements the get_GAM_df() that automatically takes average of the GAMs under bagging.
Usage: >>> from nodegam.gams.MyXGB import MyXGBClassifier >>> from nodegam.gams.MyBagging import MyBaggingClassifier >>> base_model = MyXGBClassifier() >>> # Train an XGB-GAM with 10 times bagging >>> bag_model = MyBaggingClassifier(base_model=base_model, n_estimators=10) >>> bag_model.fit(X, y) >>> df = bag_model.get_GAM_df()
- class nodegam.gams.MyBagging.MyBaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#
Bases:
OnehotEncodingClassifierMixin,MyBaggingClassifierBase,MyCommonBaseThe bagging for the base estimator GAM.
- It overwrites the sklearn.ensemble.BaggingClassifier to (1) do ensemble on the logits and NOT
the probabilities to make it still as a GAM, and (2) support GAM df extraction.
- Parameters
base_estimator – the base estimator model.
n_estimators – how many number of estimators to do bagging.
max_samples (int or float) – the number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
max_features (int or float) – The number of features to draw from X to train each base estimator (without replacement by default, see bootstrap_features for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
bootstrap (bool) – Whether samples are drawn with replacement. If False, sampling without replacement is performed.
bootstrap_features (bool) – Whether features are drawn with replacement.
oob_score (bool) – Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs (int) – The number of jobs to run in parallel for both
fit()andpredict().Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.random_state – random state.
verbose – verbose.
- class nodegam.gams.MyBagging.MyBaggingClassifierBase(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#
Bases:
MyBaggingMixin,BaggingClassifier- predict_proba(X, parallel=False)#
Modify it to be using the average of the log-odds instead of avg probobability.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters
X – {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
parallel – if True, predict outputs using parallel threads to speed up. But in xgboost, the base estimator already uses multiple threads so it actually slows down.
- Returns
p – array of shape = [n_samples, n_classes]. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- class nodegam.gams.MyBagging.MyBaggingLabelEncodingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#
Bases:
LabelEncodingClassifierMixin,MyBaggingClassifierBase,MyCommonBase
- class nodegam.gams.MyBagging.MyBaggingLabelEncodingRegressor(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#
Bases:
LabelEncodingRegressorMixin,MyBaggingRegressorBase,MyCommonBase
- class nodegam.gams.MyBagging.MyBaggingMixin#
Bases:
MyGAMPlotMixinBase- get_GAM_df(x_values_lookup=None, get_y_std=True)#
Get the GAM graph parameter.
- Parameters
x_values_lookup – a dictionary of mapping feature name to its correpsonding unique increasing x. E.g. {‘BUN’: [1.1, 1.5, 3.1, 5.0], ‘cancer’: [0, 1]}.
get_y_std – to get the error bar of the y. It’s slower if this is set to true. Default: True
- Returns
A dataframe of GAM graph.
- property is_GAM#
Returns True if it’s a GAM.
- class nodegam.gams.MyBagging.MyBaggingRegressor(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#
Bases:
OnehotEncodingRegressorMixin,MyBaggingRegressorBase,MyCommonBaseThe bagging for the base estimator GAM regressor.
It overwrites the sklearn.ensemble.BaggingRegressor to support GAM df extraction.
- Parameters
base_estimator – the base estimator model.
n_estimators – how many number of estimators to do bagging.
max_samples (int or float) – the number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
max_features (int or float) – The number of features to draw from X to train each base estimator (without replacement by default, see bootstrap_features for more details). - If int, then draw max_features features. - If float, then draw max_features * X.shape[1] features.
bootstrap (bool) – Whether samples are drawn with replacement. If False, sampling without replacement is performed.
bootstrap_features (bool) – Whether features are drawn with replacement.
oob_score (bool) – Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs (int) – The number of jobs to run in parallel for both
fit()andpredict().Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.random_state – random state.
verbose – verbose.
- class nodegam.gams.MyBagging.MyBaggingRegressorBase(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)#
Bases:
MyBaggingMixin,BaggingRegressor
nodegam.gams.MyBaselines module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.MyBaselines.MyEBMPreprocessorTransformMixin(binning='uniform', **kwargs)#
Bases:
object- fit(X, y)#
- transform(X)#
- class nodegam.gams.MyBaselines.MyIndicatorLinearRegressionCV(binning='uniform', **kwargs)#
Bases:
LabelEncodingRegressorMixin,MyGAMPlotMixinBase,MyEBMPreprocessorTransformMixin,MyIndicatorTransformMixin,MyTransformRegressionMixin,MyLinearRegressionCVBase
- class nodegam.gams.MyBaselines.MyIndicatorLogisticRegressionCV(binning='uniform', **kwargs)#
Bases:
LabelEncodingClassifierMixin,MyGAMPlotMixinBase,MyEBMPreprocessorTransformMixin,MyIndicatorTransformMixin,MyTransformClassifierMixin,MyLogisticRegressionCVBase
- class nodegam.gams.MyBaselines.MyLinearRegressionCVBase(alphas=array([0.001, 0.00351119173, 0.0123284674, 0.0432876128, 0.151991108, 0.533669923, 1.87381742, 6.57933225, 23.101297, 81.1130831, 284.803587, 1000.0]), **kwargs)#
Bases:
RidgeCV
- class nodegam.gams.MyBaselines.MyLinearRegressionRidgeCV(*args, **kwargs)#
Bases:
OnehotEncodingRegressorMixin,MyGAMPlotMixinBase,MyStandardizedTransformMixin,MyTransformRegressionMixin,MyLinearRegressionCVBase
- class nodegam.gams.MyBaselines.MyLogisticRegressionCV(*args, **kwargs)#
Bases:
OnehotEncodingClassifierMixin,MyGAMPlotMixinBase,MyStandardizedTransformMixin,MyTransformClassifierMixin,MyLogisticRegressionCVBase
- class nodegam.gams.MyBaselines.MyLogisticRegressionCVBase(Cs=12, cv=5, penalty='l2', random_state=1377, solver='lbfgs', max_iter=3000, n_jobs=-1, **kwargs)#
Bases:
LogisticRegressionCV
- class nodegam.gams.MyBaselines.MyMarginalLinearRegressionCV(binning='uniform', **kwargs)#
Bases:
LabelEncodingRegressorMixin,MyGAMPlotMixinBase,MyEBMPreprocessorTransformMixin,MyMarginalizedTransformMixin,MyTransformRegressionMixin,MyLinearRegressionCVBase
- class nodegam.gams.MyBaselines.MyMarginalLogisticRegressionCV(binning='uniform', **kwargs)#
Bases:
LabelEncodingClassifierMixin,MyGAMPlotMixinBase,MyEBMPreprocessorTransformMixin,MyMarginalizedTransformMixin,MyTransformClassifierMixin,MyLogisticRegressionCVBase
- class nodegam.gams.MyBaselines.MyMarginalizedTransformMixin(*args, **kwargs)#
Bases:
object- fit(X, y)#
- transform(X)#
- class nodegam.gams.MyBaselines.MyMaxMinTransformMixin(*args, **kwargs)#
Bases:
object- fit(X, y)#
- transform(X)#
- class nodegam.gams.MyBaselines.MyRandomForestClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)#
Bases:
LabelEncodingClassifierMixin,MyCommonBase,RandomForestClassifier- property is_GAM#
Returns True if it’s a GAM.
- class nodegam.gams.MyBaselines.MyRandomForestRegressor(n_estimators=100, *, criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None)#
Bases:
LabelEncodingRegressorMixin,MyCommonBase,RandomForestRegressor- property is_GAM#
Returns True if it’s a GAM.
- class nodegam.gams.MyBaselines.MyStandardizedTransformMixin(*args, **kwargs)#
Bases:
object- fit(X, y)#
- transform(X)#
- class nodegam.gams.MyBaselines.MyTransformClassifierMixin#
Bases:
MyTransformMixin- predict_proba(X)#
- class nodegam.gams.MyBaselines.MyTransformRegressionMixin#
Bases:
MyTransformMixin- predict(X)#
nodegam.gams.MyEBM module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.MyEBM.MyExplainableBoostingClassifier(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=-2, random_state=42)#
Bases:
LabelEncodingClassifierMixin,MyExplainableBoostingMixin,ExplainableBoostingClassifierExplainable Boosting Classifier. The arguments will change in a future release, watch the changelog.
- Parameters
feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile” or “quantile_humanized”.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions. Interactions are forcefully set to 0 for multiclass problems.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.
- class nodegam.gams.MyEBM.MyExplainableBoostingMixin#
Bases:
MyCommonBase- fit(X, y)#
- get_GAM_df(x_values_lookup=None)#
- class nodegam.gams.MyEBM.MyExplainableBoostingRegressor(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=-2, random_state=42)#
Bases:
LabelEncodingRegressorMixin,MyExplainableBoostingMixin,ExplainableBoostingRegressorExplainable Boosting Regressor. The arguments will change in a future release, watch the changelog.
- Parameters
feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage on main effects.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile”, or “quantile_humanized”.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.
- class nodegam.gams.MyEBM.MyOnehotExplainableBoostingClassifier(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=-2, random_state=42)#
Bases:
OnehotEncodingClassifierMixin,MyFitMixin,MyExplainableBoostingMixin,ExplainableBoostingClassifierExplainable Boosting Classifier. The arguments will change in a future release, watch the changelog.
- Parameters
feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile” or “quantile_humanized”.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions. Interactions are forcefully set to 0 for multiclass problems.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.
- class nodegam.gams.MyEBM.MyOnehotExplainableBoostingRegressor(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=-2, random_state=42)#
Bases:
OnehotEncodingRegressorMixin,MyFitMixin,MyExplainableBoostingMixin,ExplainableBoostingRegressorExplainable Boosting Regressor. The arguments will change in a future release, watch the changelog.
- Parameters
feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage on main effects.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile”, or “quantile_humanized”.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.
nodegam.gams.MySpline module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.MySpline.MySplineGAM(**kwargs)#
Bases:
OnehotEncodingRegressorMixin,MySplineGAMBaseSpline for Regression with one-hot encoding for cat features.
- Parameters
search (bool) – if True, it searches the best lam penalty for the model.
search_lam (list or numpy array) – the range of lam penalty to search. If None, it is set to np.linspace(-3, 3, 15).
max_iter (int) – maximum interations to train.
n_splines (int) – number of splines. Default: 50.
cat_features (list) – the column names of the categorical features. Default: None.
- class nodegam.gams.MySpline.MySplineGAMBase(**kwargs)#
Bases:
MyFitMixin,MySplineMixin- predict(X)#
Predict regression target.
- Parameters
X (pandas dataframe) – inputs.
- Returns
prob (numpy array) – the prediction of shape [N].
- class nodegam.gams.MySpline.MySplineLogisticGAM(**kwargs)#
Bases:
OnehotEncodingClassifierMixin,MySplineLogisticGAMBaseLogistic Spline for binary classification with one-hot encoding for cat features.
- Parameters
search (bool) – if True, it searches the best lam penalty for the model.
search_lam (list or numpy array) – the range of lam penalty to search. If None, it is set to np.linspace(-3, 3, 15).
max_iter (int) – maximum interations to train.
n_splines (int) – number of splines. Default: 50.
cat_features (list) – the column names of the categorical features. Default: None.
- class nodegam.gams.MySpline.MySplineLogisticGAMBase(**kwargs)#
Bases:
MyFitMixin,MySplineMixin- predict_proba(X)#
Predict Probability.
- Parameters
X (pandas dataframe) – inputs.
- Returns
prob (numpy array) – the probability of both classes with shape [N, 2].
- class nodegam.gams.MySpline.MySplineMixin(model_cls, search=True, search_lam=None, max_iter=500, n_splines=50, fit_binary_feat_as_factor_term=False, cat_features=None, **kwargs)#
Bases:
MyExtractLogOddsMixin- fit(X, y, **kwargs)#
- get_lam()#
Return the lambda penalty.
- get_params(*args, **kwargs)#
Return the parameters.
- set_params(*args, **kwargs)#
nodegam.gams.MyXGB module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.MyXGB.MyXGBClassifier(*args, **kwargs)#
Bases:
MyGAMPlotMixinBase,MyXGBMixin- predict_proba(data, ntree_limit=None, validate_features=True)#
- class nodegam.gams.MyXGB.MyXGBLabelEncodingClassifier(*args, **kwargs)#
- class nodegam.gams.MyXGB.MyXGBLabelEncodingRegressor(*args, **kwargs)#
- class nodegam.gams.MyXGB.MyXGBMixin(max_depth=1, random_state=1377, n_estimators=5000, n_jobs=-1, model_cls=<class 'xgboost.sklearn.XGBClassifier'>, validation_size=0.15, early_stopping_rounds=50, objective=None, **kwargs)#
Bases:
object- fit(X, y, verbose=False, **kwargs)#
- get_params(*args, **kwargs)#
- property is_GAM#
- set_params(*args, **kwargs)#
- class nodegam.gams.MyXGB.MyXGBOnehotClassifier(*args, **kwargs)#
Bases:
OnehotEncodingClassifierMixin,MyXGBClassifierXGB-GAM Classifier with one-hot encoding for categorical features.
- Parameters
max_depth=1 – The tree depth of the package. Should be set to 1 to remain as a GAM.
random_state=1377 – Seed.
n_estimators=5000 – Maximum number of rounds to fit.
n_jobs=-1 – Set to -1 to use multi-thread parallel training.
validation_size=0.15 – The validation porportion.
early_stopping_rounds=50 – Early stopping rounds.
logistic' (objective='binary:) – The validation objective.
- class nodegam.gams.MyXGB.MyXGBOnehotRegressor(*args, **kwargs)#
Bases:
OnehotEncodingRegressorMixin,MyXGBRegressorXGB-GAM Regressor with one-hot encoding for categorical features.
- Parameters
max_depth=1 – The tree depth of the package. Should be set to 1 to remain as a GAM.
random_state=1377 – Seed.
n_estimators=5000 – Maximum number of rounds to fit.
n_jobs=-1 – Set to -1 to use multi-thread parallel training.
validation_size=0.15 – The validation porportion.
early_stopping_rounds=50 – Early stopping rounds.
squarederror' (objective='reg:) – The validation objective.
- class nodegam.gams.MyXGB.MyXGBRegressor(*args, **kwargs)#
Bases:
MyGAMPlotMixinBase,MyXGBMixin- predict(data, output_margin=False, ntree_limit=None, validate_features=True)#
nodegam.gams.base module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.base.MyCommonBase#
Bases:
object- property is_GAM#
Returns True if it’s a GAM.
- property param_distributions#
- class nodegam.gams.base.MyExtractLogOddsMixin#
Bases:
MyCommonBaseExtract the output from the underlying model.
It uses the predict function to extract the log odds from the underlying model. It is useful to deal with a black-box model that is hard to extract the marginal plot from it. It can then use “get_GAM_df(self, x_values_lookup=None)” to extract.
- Requirement:
the cls needs to implement one of: 1) predict(): this is for regression model. 2) predict_proba(): this is for binary classification.
- get_GAM_df(x_values_lookup=None, center=True)#
Get the GAM dataframe.
- Parameters
x_values_lookup (dict) – the unique values of X for each feature. If passed, the outputs of the GAM model w.r.t. these x values are extracted. Useful to get a coarser graph when there are too many unique values in a feature.
center (bool) – if True, it centers each GAM graph to 0 by moving its mean to the intercept term.
- Returns
df (pandas dataframe) – a GAM dataframe where each row represents a GAM term with the inputs x, outputs y, and feature importance.
- class nodegam.gams.base.MyFitMixin#
Bases:
objectMy Mixin to record the feature names and counts when called fit().
It overides the fit() to record the self.feature_names and self.X_value_counts. It would call the super().fit() if there exists such function or just silently returns if not.
- fit(X, y, **kwargs)#
- class nodegam.gams.base.MyGAMPlotMixinBase#
Bases:
MyFitMixin,MyExtractLogOddsMixin
nodegam.gams.general_utils module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.general_utils.Timer(name, remove_start_msg=True)#
Bases:
object
- nodegam.gams.general_utils.output_csv(the_path, data_dict, order=None, delimiter=',')#
- nodegam.gams.general_utils.vector_in(vec, names)#
nodegam.gams.model_utils module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- nodegam.gams.model_utils.get_ebm_model(model_name, problem, random_state=1377, **kwargs)#
- nodegam.gams.model_utils.get_ilr_model(model_name, problem, random_state=1377, **kwargs)#
Get Indicator Logistic Regression
- nodegam.gams.model_utils.get_lr_model(model_name, problem, random_state=1377, **kwargs)#
- nodegam.gams.model_utils.get_mlr_model(model_name, problem, random_state=1377, **kwargs)#
Get Marginal Logistic Regression
- nodegam.gams.model_utils.get_model(X_train, y_train, problem, model_name, random_state=1377, **kwargs)#
- nodegam.gams.model_utils.get_rf_model(model_name, problem, random_state=1377, **kwargs)#
- nodegam.gams.model_utils.get_spline_model(model_name, problem, random_state=1377, **kwargs)#
- nodegam.gams.model_utils.get_xgb_model(model_name, problem, random_state=1377, **kwargs)#
nodegam.gams.utils module#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.
- class nodegam.gams.utils.DotDict(*args, **kwargs)#
Bases:
dictdot.notation access to dictionary attributes
- class nodegam.gams.utils.Timer(name, remove_start_msg=True)#
Bases:
object
- nodegam.gams.utils.bin_data(X, max_n_bins=256)#
Do a quantile binning for the X.
- Parameters
X – the pandas table or numpy array with shape as [N, D] where N is number of samples and D is number of features.
max_n_bins – the maximum number of bins per feature. Default: 256.
- Returns
Binned X with the same input type (pandas table or numpy array)
- nodegam.gams.utils.extract_GAM(X, predict_fn, predict_type='binary_logodds', max_n_bins=None)#
X: input 2d array predict_fn: the model prediction function predict_type: choose from [“binary_logodds”, “binary_prob”, “regression”]
This corresponds to which predict_fn to pass in.
- max_n_bins: default set as None (No binning). It bins the value into
this number of buckets to reduce the resulting GAM graph clutterness. Should set large enough to not change prediction too much.
- nodegam.gams.utils.get_GAM_df_by_models(models, x_values_lookup=None, aggregate=True)#
- nodegam.gams.utils.get_X_values_counts(X, feature_names=None)#
- nodegam.gams.utils.get_x_values_lookup(X, feature_names=None)#
Get x values lookup.
- Parameters
X – input features. Numpy array or pandas dataframe.
- Returns
x_values_lookup – a dictionary with key as feature name and the value is all unique values of that feature.
- nodegam.gams.utils.my_interpolate(x, y, new_x)#
Handle edge cases for interpolation.
- nodegam.gams.utils.predict_score(model, X)#
- nodegam.gams.utils.predict_score_by_df(GAM_plot_df, X)#
- nodegam.gams.utils.predict_score_with_each_feature(model, X)#
- nodegam.gams.utils.predict_score_with_each_feature_by_df(GAM_plot_df, X, sum_directly=False)#
- nodegam.gams.utils.sigmoid(x)#
Numerically stable sigmoid function.
Module contents#
GAM baselines adapted from https://github.com/zzzace2000/GAMs_models/.