Backlog

  • Interface: Need to determine options (SKLearn transformer, custom interface, etc)
  • Interface: Need to outline functionality
  • Boolean: Need to determine if it’ll be handled as numerical or categorical
  • Pip installable: Need to determine level of effort

Interface: Need to determine options (SKLearn transformer, custom interface, etc)

Options:

  • SKLearn transformer
  • Pandas-SKLearn style module
  • Custom module

Requirements:

  • Be able to expand one column to many columns (datetime)
  • Abilityt o use SKLearn transformers

SKLearn transformer

  • Inputs would have to be Numpy arrays
  • Inputs can be Numpy arrays
  • Tighter integration to SKLearn infrastructure

Pandas-SKLearn style module

  • Can have pandas dataframe inputs
  • Can have multipel column inputs
  • Can apply multiple transformations to the same column (many to one relationship)
  • Can not apply transformations to some columns
  • Can use default transformer

Custom module

  • Can sit on top of Pandas-SKLearn
  • Can mimic SKLearn fit and transform interface

Need to determine if it’ll be handled as numerical or categorical

Numerical

  • Less compute time
  • Reduced complexity

Categorical

  • Embedding representing different values

Pip installable: Need to determine level of effort

Lit review

  • PMOTW: Setuptools
  • Common library: setuptools
  • PMOTW: distutils
  • Common library: distutils
  • Blog review

PMOTW: Setuptools

  • Unavailable

Common library: setuptools

  • Designed to facilitate packaging Python projects
  • Enhancement to distutils

Highlights

Superset of of distutils

PMOTW: distutils

  • Unavailable

Common library: distutils

Intro

  • Setup script
  • Source distribution
  • Binary distributions

Setup script

  • Handles packaging
  • Not aware of package managers

PyPi

  • Registering
  • Upload

Blogs

Marthall

  • Strong, convenient walk through

so

  • Rambling

Scott Torborg

  • Strong advanced discussion
  • How to declare dependencies

Decisions

  • Interface: Will use custom interface, similar to SKLearn, with Pandas-SKLearn under the hood.
  • Pip installable: Will move forward w/ setuptools