Metadata-Version: 1.1
Name: sklearn-gbmi
Version: 1.0.4
Summary: Compute Friedman and Popescu's H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models.
Home-page: https://github.com/ralphhaygood/sklearn-gbmi
Author: Ralph Haygood
Author-email: ralph@ralphhaygood.org
License: MIT
Download-URL: https://github.com/ralphhaygood/sklearn-gbmi/tarball/1.0.4
Description: **Warning:** this package has reached the end of its life.
        It is incompatible with recent versions of scikit-learn, and I (Ralph Haygood) have neither the time nor the interest to solve this problem.
        If you wish to have a go at it yourself, by all means fork the GitHub repository and proceed.
        
        Ultimately, the problem may be one or more incompatibilities between recent versions of scikit-learn and/or NumPy and a wad (25,644 lines) of C code not written by me, which I was forced to add to this package when the maintainers of scikit-learn replaced the `sklearn.ensemble.partial_dependence.partial_dependence` function with the not-fully-equivalent `sklearn.inspection.partial_dependence` function.
        This package depended on the grid argument of the former, which is missing from the latter.
        I worked around this defect by extracting the code that had implemented the grid argument and integrating it into this package.
        However, this code may well depend on internal characteristics of scikit-learn or NumPy that have changed since then.
        
        
        sklearn-gbmi: scikit-learn gradient-boosting-model interactions
        ===============================================================
        
        This package provides a Python module for computing Friedman and Popescu's *H* statistics, in order to look for
        interactions among variables in scikit-learn gradient-boosting models
        (http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting).
        
        See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.*
        **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
        
        
        Installation
        ------------
        
            pip install sklearn-gbmi
        
        On some systems, if you wish to use this package with Python 3, then you must install with `pip3` rather than `pip`.
        
        In case of difficulties with installing or using this package, consult "Advanced installation" below.
        
        
        Usage
        -----
        
        Given a scikit-learn gradient-boosting model `gbm` that has been fitted to a NumPy array or pandas data frame
        `array_or_frame` and a list of indices of columns of the array or columns of the data frame `indices_or_columns`, the
        *H* statistic of the variables represented by the elements of `array_or_frame` and specified by `indices_or_columns` can
        be computed via
        
            from sklearn_gbmi import *
        
            h(gbm, array_or_frame, indices_or_columns)
        
        Alternatively, the two-variable *H* statistic of each pair of variables represented by the elements of `array_or_frame`
        and specified by `indices_or_columns` can be computed via
        
            from sklearn_gbmi import *
        
            h_all_pairs(gbm, array_or_frame, indices_or_columns)
        
        (Compared to iteratively calling `h`, calling `h_all_pairs` avoids redundant computations.)
        
        `indices_or_columns` is optional, with default value `'all'`. If it is `'all'`, then all columns of `array_or_frame` are
        used.
        
        `NaN` is returned if a computation is spoiled by weak main effects and rounding errors.
        
        *H* varies from 0 to 1. The larger *H*, the stronger the evidence for an interaction among the variables.
        
        
        Example
        -------
        
        See the Jupyter notebook example.ipynb (https://github.com/ralphhaygood/sklearn-gbmi/blob/master/example.ipynb) for a
        complete example of how to use this package.
        
        
        Notes
        -----
        
        1. Per Friedman and Popescu, only variables with strong main effects should be examined for interactions. Strengths of
        main effects are available as `gbm.feature_importances_` once `gbm` has been fitted.
        
        2. Per Friedman and Popescu, collinearity among variables can lead to interactions in `gbm` that are not present in the
        target function. To forestall such spurious interactions, check for strong correlations among variables before fitting
        `gbm`.
        
        
        Advanced installation
        ---------------------
        
        Installing this package requires NumPy, so if installation fails with a complaint that NumPy is missing, add it to the
        install command:
        
            pip install numpy sklearn-gbmi
        
        For performance, this package is partly implemented using Cython (C extensions for Python). It includes a C file that
        was generated by Cython, which is compiled for your system when you install the package. Normally, this C file is fine,
        but occasionally, it may not compile, or the result may not run. In the first case, installing the package fails, while
        in the second case, using the package fails, typically with a cryptic error message; for example:
        
            ValueError: sklearn.tree._criterion.Criterion size changed, may indicate binary incompatibility.
        
        In such a case, you may still be able to install and use the package by regenerating the C file, as follows.
        
        First, if this package is installed (i.e., installation succeeds, but usage fails), uninstall it:
        
            pip uninstall sklearn-gbmi
        
        Then, install Cython:
        
            pip install cython
        
        Next, set the environment variable `USE_CYTHONIZE` to 1. For bash and similar shells:
        
            export USE_CYTHONIZE=1
        
        For csh and similar shells:
        
            setenv USE_CYTHONIZE 1
        
        Finally, reinstall this package:
        
            pip install sklearn-gbmi --no-cache-dir
        
        The C file should be regenerated and compiled for your system, hopefully making this package usable on your system.
        
Keywords: boosted,boosting,data science,Friedman,gradient boosted,gradient boosting,H statistic,interaction,machine learning,Popescu,scikit learn,sklearn
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Scientific/Engineering
