Modeling
Here are the grammar elements available for modeling.
A technique for modeling can be specified either through the grammar or a custom function. For all models built into Nylon, the type of your model should be specified with the type tag like this:
1
"modeling": {
2
"type": "neighbors"
3
}
Copied!
Each model type specified below is denoted by the vocabulary element that allows you to access that specific model. For example, as depicted above, a user can use the nearest neighbors classifier by specifying the value of neighbors with the type tag.
You can find the supported vocabulary elements below:

Modeling Vocabulary: Specific Models

neighbors: Supervised nearest neighbors classifier. Finds a predefined number of training samples closest in distance to the new point, and predict the label from these.
tree: Supervised creation of a decision tree classifier. Goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
sgd: Implements regularized classifier with stochastic gradient descent (SGD) learning: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
gradient-boost: Gradient boosting for classification problems. GB builds an additive model in a forward stage-wise fashion.
adaboost: AdaBoost classifier that is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset.
rf: Random forest classifier that is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
mlp: Multi-layer Perceptron classifier. This model optimizes the log-loss function using LBFGS.

Modeling Vocabulary: Broad Strokes

You can also try a category of models. This allows you to be less selective in the specific type of model you want to try out, while restricting the class of models that's tried out. In Nylon, we call these strokes.

​Ensemble Stroke​

ensembles: Tries a series of ensembled models in order to find the best option. Includes Random Forest, Extra Trees, Gradient Boosting, and AdaBoost Classifier.

​SVM Stroke​

svms: Tries a collection of SVM models with a collection of different kernels and hyperparameters.

Custom Modeling

Similar to how custom functions work in our components of Nylon, you can specify the location and name of your custom modeling function with two simple parameters. Your custom function should take in these parameters IN ORDER:
Parameter
Description
Pandas DataFrame
DataFrame
Labeled Data
Target Column Data

Example

1
"modeling":{
2
"custom" : {
3
"loc": "file_path",
4
"name":"function_name"
5
}
6
}
Copied!
Last modified 5mo ago