This section will explain how additional features can be added to the framework.
Be careful to note that features are generally linked to the Data type (see New data).
Form implementation
Features Choice
First, you need to add a tuple in the form (acronym, full name)
to the alambic_app > constantes.py
script in the corresponding list of the data preprocessing and features choice, according to the fact if your new features is one or the other, for example PREPROCESSING_TEXT_CHOICES
and FEATURES_TEXT_CHOICES
are for the text data.
This operation will link your new features to the data type and make it available to the user when setting up the process.
Parametrization
New features must be added to the form of the corresponding data type they can be used with in the folder alambic_app > forms > data
.
It should be noted that initial preprocessing steps must be put before the features in order for the pipeline to work correctly.
This operation will add the features in the front end without any additional steps.
Features implementation
Location
If needed, new features should be implemented in the script with the name corresponding to their related data type (e.g. text.py
) in the folder alambic_app > features
.
Once implemented, the features must be imported and added to the OPERATIONS_MATCH
dictionary in the alambic_app > machine_learning > setup.py
script with their corresponding acronym
(see Features Choice in the previous section).
Features from existing libraries (such as sklearn) can simply be directly imported and included in the dictionary with their corresponding acronym
.
If the feature is custom and do not follow the structure of the functions in sklearn in order to be included in a pipeline (see Pipeline in sklearn, their acronym must be added to the set of CUSTOM_FUNCTIONS
so that they can be transformed to be used as such.
Parametrization
If specific parameters must be added to the features process (outside to what can be selected in a form), you can add the code in the alambic_app > machine_learning > preprocessing.py
script, in the get_pipeline
method.
Documentation
Update the documentation ! It is present in the folder docs
in the features.md
document.
Checklist
- Add the (acronym, full name) tuple in the PREPROCESSING_DATA_CHOICES or FEATURES_DATA_CHOICES list in the constantes.py script
- Add your features in the corresponding data form in the folder alambic_app > forms > data
- (Opt.) Implement the feature if not existing in the corresponding data script in the alambic_app > features folder and add the function to the init script
- Import in alambic_app > machine_learning > preprocessing.py and add to OPERATIONS_MATCH with the acronym as key and the feature function as value
- (Opt.) If custom function, add the acronym of the function to the CUSTOM_FUNCTIONS set
- Modify the get_pipeline method if needed for the parameters of the features.
- Update the documentation (it’s important too !)
- Be proud of yourself, you did it ! (Well, after tons of debug and testing, of course)