Public API

fit_evotree

EvoTrees.fit_evotreeFunction
fit_evotree(
    params::EvoTypes{L}, 
    dtrain;
    target_name,
    fnames=nothing,
    w_name=nothing,
    offset_name=nothing,
    deval=nothing,
    metric=nothing,
    early_stopping_rounds=9999,
    print_every_n=9999,
    verbosity=1,
    return_logger=false,
    device="cpu")

Main training function. Performs model fitting given configuration params, dtrain, target_name and other optional kwargs.

Arguments

Keyword arguments

  • target_name: name of target variable.
  • fnames = nothing: the names of the x_train features. If provided, should be a vector of string with length(fnames) = size(x_train, 2).
  • w_name = nothing: name of the variable containing weights. If nothing, common weights on one will be used.
  • offset_name = nothing: name of the offset variable.
  • deval: A Tables compatible evaluation data containing features and target variables.
  • metric: The evaluation metric that wil be tracked on deval. Supported metrics are:
    • :mse: mean-squared error. Adapted for general regression models.
    • :rmse: root-mean-squared error (CPU only). Adapted for general regression models.
    • :mae: mean absolute error. Adapted for general regression models.
    • :logloss: Adapted for :logistic regression models.
    • :mlogloss: Multi-class cross entropy. Adapted to EvoTreeClassifier classification models.
    • :poisson: Poisson deviance. Adapted to EvoTreeCount count models.
    • :gamma: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.
    • :tweedie: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at y == 0.
    • :gaussian_mle: Gaussian maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :gaussian_mle.
    • :logistic_mle: Logistic maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :logistic_mle.
  • early_stopping_rounds::Integer: number of consecutive rounds without metric improvement after which fitting in stopped.
  • print_every_n: sets at which frequency logging info should be printed.
  • verbosity: set to 1 to print logging info during training.
  • return_logger::Bool = false: if set to true (default), fit_evotree return a tuple (m, logger) where logger is a dict containing various tracking information.
  • device="cpu": Hardware device to use for computations. Can be either "cpu" or "gpu". Following losses are not GPU supported at the moment:l1, :quantile, :logistic_mle.
source
fit_evotree(
    params::EvoTypes{L};
    x_train::AbstractMatrix, 
    y_train::AbstractVector, 
    w_train=nothing, 
    offset_train=nothing,
    x_eval=nothing, 
    y_eval=nothing, 
    w_eval=nothing, 
    offset_eval=nothing,
    early_stopping_rounds=9999,
    print_every_n=9999,
    verbosity=1)

Main training function. Performs model fitting given configuration params, x_train, y_train and other optional kwargs.

Arguments

Keyword arguments

  • x_train::Matrix: training data of size [#observations, #features].
  • y_train::Vector: vector of train targets of length #observations.
  • w_train::Vector: vector of train weights of length #observations. If nothing, a vector of ones is assumed.
  • offset_train::VecOrMat: offset for the training data. Should match the size of the predictions.
  • x_eval::Matrix: evaluation data of size [#observations, #features].
  • y_eval::Vector: vector of evaluation targets of length #observations.
  • w_eval::Vector: vector of evaluation weights of length #observations. Defaults to nothing (assumes a vector of 1s).
  • offset_eval::VecOrMat: evaluation data offset. Should match the size of the predictions.
  • metric: The evaluation metric that wil be tracked on x_eval, y_eval and optionally w_eval / offset_eval data. Supported metrics are:
    • :mse: mean-squared error. Adapted for general regression models.
    • :rmse: root-mean-squared error (CPU only). Adapted for general regression models.
    • :mae: mean absolute error. Adapted for general regression models.
    • :logloss: Adapted for :logistic regression models.
    • :mlogloss: Multi-class cross entropy. Adapted to EvoTreeClassifier classification models.
    • :poisson: Poisson deviance. Adapted to EvoTreeCount count models.
    • :gamma: Gamma deviance. Adapted to regression problem on Gamma like, positively distributed targets.
    • :tweedie: Tweedie deviance. Adapted to regression problem on Tweedie like, positively distributed targets with probability mass at y == 0.
    • :gaussian_mle: Gaussian maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :gaussian_mle.
    • :logistic_mle: Logistic maximum log-likelihood. Adapted to EvoTreeMLE models with loss = :logistic_mle.
  • early_stopping_rounds::Integer: number of consecutive rounds without metric improvement after which fitting in stopped.
  • print_every_n: sets at which frequency logging info should be printed.
  • verbosity: set to 1 to print logging info during training.
  • fnames: the names of the x_train features. If provided, should be a vector of string with length(fnames) = size(x_train, 2).
  • return_logger::Bool = false: if set to true (default), fit_evotree return a tuple (m, logger) where logger is a dict containing various tracking information.
  • device="cpu": Hardware device to use for computations. Can be either "cpu" or "gpu". Following losses are not GPU supported at the moment:l1, :quantile, :logistic_mle.
source

predict

MLJModelInterface.predictFunction
predict(model::EvoTree, X::AbstractMatrix; ntree_limit = length(model.trees))

Predictions from an EvoTree model - sums the predictions from all trees composing the model. Use ntree_limit=N to only predict with the first N trees.

source

importance

EvoTrees.importanceFunction
importance(model::EvoTree; fnames=model.info[:fnames])

Sorted normalized feature importance based on loss function gain. Feature names associated to the model are stored in model.info[:fnames] as a string Vector and can be updated at any time. Eg: model.info[:fnames] = new_fnames_vec.

source