Metadata-Version: 1.1
Name: sklearn-gbmi
Version: 1.0.0
Summary: Compute Friedman and Popescu's H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models.
Home-page: https://github.com/ralphhaygood/sklearn-gbmi
Author: Ralph Haygood
Author-email: ralph@ralphhaygood.org
License: MIT
Download-URL: https://github.com/ralphhaygood/sklearn-gbmi/tarball/1.0.0
Description: sklearn-gbmi: scikit-learn gradient-boosting-model interactions
        ===============================================================
        
        This distribution provides a Python module for computing Friedman and Popescu's *H* statistics, in order to look for
        interactions among variables in scikit-learn gradient-boosting models
        (http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting).
        
        See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.*
        **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
        
        
        Installation
        ------------
        
            pip install sklearn-gbmi
        
        
        Usage
        -----
        
        Given a scikit-learn gradient-boosting model `gbm` that has been fitted to a NumPy array or pandas data frame
        `array_or_frame` and a list of indices of columns of the array or columns of the data frame `indices_or_columns`, the
        *H* statistic of the variables represented by the elements of `array_or_frame` and specified by `indices_or_columns` can
        be computed via
        
            from sklearn_gbmi import *
        
            h(gbm, array_or_frame, indices_or_columns)
        
        Alternatively, the two-variable *H* statistic of each pair of variables represented by the elements of `array_or_frame`
        and specified by `indices_or_columns` can be computed via
        
            from sklearn_gbmi import *
        
            h_all_pairs(gbm, array_or_frame, indices_or_columns)
        
        (Compared to iteratively calling `h`, calling `h_all_pairs` avoids redundant computations.)
        
        `indices_or_columns` is optional, with default value `'all'`. If it is `'all'`, then all columns of `array_or_frame` are
        used.
        
        `NaN` is returned if a computation is spoiled by weak main effects and rounding errors.
        
        *H* varies from 0 to 1. The larger *H*, the stronger the evidence for an interaction among the variables.
        
        
        Example
        -------
        
        See the Jupyter notebook example.ipynb (https://github.com/ralphhaygood/sklearn-gbmi/blob/master/example.ipynb) for a
        complete example of how to use the module.
        
        
        Notes
        -----
        
        1. Per Friedman and Popescu, only variables with strong main effects should be examined for interactions. Strengths of
        main effects are available as `gbm.feature_importances_` once `gbm` has been fitted.
        
        2. Per Friedman and Popescu, collinearity among variables can lead to interactions in `gbm` that are not present in the
        target function. To forestall such spurious interactions, check for strong correlations among variables before fitting
        `gbm`.
        
Keywords: boosted,boosting,data science,Friedman,gradient boosted,gradient boosting,H statistic,interaction,machine learning,Popescu,scikit learn,sklearn
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Scientific/Engineering
