Automater¶
-
class
Automater.
Automater
(numerical_vars=[], categorical_vars=[], boolean_vars=[], datetime_vars=[], text_vars=[], non_transformed_vars=[], response_var=None, df_out=False)¶ Bases:
object
-
__init__
(numerical_vars=[], categorical_vars=[], boolean_vars=[], datetime_vars=[], text_vars=[], non_transformed_vars=[], response_var=None, df_out=False)¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
_check_input_dataframe_columns_
(input_dataframe)¶
-
_check_output_dataframe_columns_
(output_dataframe)¶
-
_create_input_nub
(variable_type_dict, input_dataframe)¶ Generate a ‘nub’, appropriate for use as an input (and possibly additional Keras layers). Each Keras input variable has on input pipeline, with:
- One Input (required)
- Possible additional layers (optional, such as embedding layers for text)
All input pipelines are then joined with a Concatenate layer
Parameters: - variable_type_dict ({str:[str]}) – A dictionary, with keys describing variables types, and values listing particular variables
- input_dataframe (pandas.DataFrame) – A pandas dataframe, containing all keras input layers
Returns: A Keras layer, which can be fed into future layers
Return type: ([keras,Input], Layer)
-
_create_mappers
(variable_type_dict)¶ Creates two sklearn-pandas mappers, one for the input variables, and another for the output variable(s)
Parameters: variable_type_dict ({str:[str]}) – A dictionary, with keys describing variables types, and values listing particular variables Returns: Two sklearn-pandas mappers, one for the input variables, and another for the output variable(s) Return type: (DataFrameMapper, DataFrameMapper)
-
_create_output_nub
(variable_type_dict, output_variables_df, y)¶ Generate a ‘nub’, appropriate for use as an output / final Keras layer.
The structure of this nub will depend on the y variable’s data type
Parameters: - variable_type_dict ({str:[str]}) – A dictionary, with keys describing variables types, and values listing particular variables
- output_variables_df (pandas.DataFrame) – A dataframe containing the output variable. This is necessary for some data types (e.g. a categorical output needs to know how levels the categorical variable has)
- y (str) – The name of the response variable
Returns: A single Keras layer, correctly formatted to output the response variable provided
Return type: Layer
-
fit
(input_dataframe)¶ Get the data and layers ready for use
- Train the input transformation pipelines
- Create the keras input layers
- Train the output transformation pipeline(s) (optional, only if there is a response variable)
- Create the output layer(s) (optional, only if there is a response variable)
- Set self.fitted to True
Parameters: input_dataframe – Returns: self, now in a fitted state. The Automater now has initialized input layers, output layer(s) (if response variable is present), and can be used for the transform step Return type: Automater
-
fit_transform
(input_dataframe)¶ Perform a fit, and then a transform. See transform for return documentation
-
get_transformer
(variable)¶
-
get_transformers
()¶
-
list_default_transformation_pipelines
()¶
-
transform
(input_dataframe, df_out=None)¶ - Validate that the provided input_dataframe contains the required input columns
- Transform the keras input columns
- Transform the response variable, if it is present
- Format the data for return
Parameters: input_dataframe (pandas.DataFrame) – A pandas dataframe, containing all keras input layers Returns: Either a pandas dataframe (if df_out = True), or a numpy object (if df_out = False). This object will contain: the transformed input variables, and the transformed output variables (if the output variable is present in input_dataframe
-