Automater

class Automater.Automater(numerical_vars=[], categorical_vars=[], boolean_vars=[], datetime_vars=[], text_vars=[], non_transformed_vars=[], response_var=None, df_out=False)

Bases: object

__init__(numerical_vars=[], categorical_vars=[], boolean_vars=[], datetime_vars=[], text_vars=[], non_transformed_vars=[], response_var=None, df_out=False)

x.__init__(…) initializes x; see help(type(x)) for signature

_check_input_dataframe_columns_(input_dataframe)
_check_output_dataframe_columns_(output_dataframe)
_create_input_nub(variable_type_dict, input_dataframe)

Generate a ‘nub’, appropriate for use as an input (and possibly additional Keras layers). Each Keras input variable has on input pipeline, with:

  • One Input (required)
  • Possible additional layers (optional, such as embedding layers for text)

All input pipelines are then joined with a Concatenate layer

Parameters:
  • variable_type_dict ({str:[str]}) – A dictionary, with keys describing variables types, and values listing particular variables
  • input_dataframe (pandas.DataFrame) – A pandas dataframe, containing all keras input layers
Returns:

A Keras layer, which can be fed into future layers

Return type:

([keras,Input], Layer)

_create_mappers(variable_type_dict)

Creates two sklearn-pandas mappers, one for the input variables, and another for the output variable(s)

Parameters:variable_type_dict ({str:[str]}) – A dictionary, with keys describing variables types, and values listing particular variables
Returns:Two sklearn-pandas mappers, one for the input variables, and another for the output variable(s)
Return type:(DataFrameMapper, DataFrameMapper)
_create_output_nub(variable_type_dict, output_variables_df, y)

Generate a ‘nub’, appropriate for use as an output / final Keras layer.

The structure of this nub will depend on the y variable’s data type

Parameters:
  • variable_type_dict ({str:[str]}) – A dictionary, with keys describing variables types, and values listing particular variables
  • output_variables_df (pandas.DataFrame) – A dataframe containing the output variable. This is necessary for some data types (e.g. a categorical output needs to know how levels the categorical variable has)
  • y (str) – The name of the response variable
Returns:

A single Keras layer, correctly formatted to output the response variable provided

Return type:

Layer

fit(input_dataframe)

Get the data and layers ready for use

  • Train the input transformation pipelines
  • Create the keras input layers
  • Train the output transformation pipeline(s) (optional, only if there is a response variable)
  • Create the output layer(s) (optional, only if there is a response variable)
  • Set self.fitted to True
Parameters:input_dataframe
Returns:self, now in a fitted state. The Automater now has initialized input layers, output layer(s) (if response variable is present), and can be used for the transform step
Return type:Automater
fit_transform(input_dataframe)

Perform a fit, and then a transform. See transform for return documentation

get_transformer(variable)
get_transformers()
list_default_transformation_pipelines()
transform(input_dataframe, df_out=None)
  • Validate that the provided input_dataframe contains the required input columns
  • Transform the keras input columns
  • Transform the response variable, if it is present
  • Format the data for return
Parameters:input_dataframe (pandas.DataFrame) – A pandas dataframe, containing all keras input layers
Returns:Either a pandas dataframe (if df_out = True), or a numpy object (if df_out = False). This object will contain: the transformed input variables, and the transformed output variables (if the output variable is present in input_dataframe