Ranking with Yahoo! Learning to Rank Challenge.

In this tutorial, we present how a ranking task can be tackled using regular regression techniques without compromising performance compared to specialized ranking learners. The data used is from the C14 - Yahoo! Learning to Rank Challenge, which can be obtained following a request to https://webscope.sandbox.yahoo.com.

Getting started

To begin, we load the required packages:

using EvoTrees
using DataFrames
using Statistics: mean
using CategoricalArrays
using Random

Load LIBSVM format data

Some datasets come in the so called LIBSVM format, which stores data using a sparse representation:

<label> <query> <feature_id_1>:<feature_value_1> <feature_id_2>:<feature_value_2>

We use the ReadLIBSVM.jl package to perform parsing:

using ReadLIBSVM
dtrain = read_libsvm("set1.train.txt"; has_query=true)
deval = read_libsvm("set1.valid.txt"; has_query=true)
dtest = read_libsvm("set1.test.txt"; has_query=true)

Preprocessing

Preprocessing is minimal since all features are parsed as floats and specific files are provided for each of the train, eval and test splits.

Several features are fully missing (contain only 0s) in the training dataset. They are removed from all datasets since they cannot bring value to the model.

Then, the features, targets and query ids are extracted from the parsed LIBSVM format.

colsums_train = map(sum, eachcol(dtrain[:x]))
drop_cols = colsums_train .== 0

x_train = dtrain[:x][:, .!drop_cols]
x_eval = deval[:x][:, .!drop_cols]
x_test = dtest[:x][:, .!drop_cols]

# assign queries
q_train = dtrain[:q]
q_eval = deval[:q]
q_test = dtest[:q]

# assign targets
y_train = dtrain[:y]
y_eval = deval[:y]
y_test = dtest[:y]

Training

Now we are ready to train our model. We first define a model configuration using the EvoTreeRegressor model constructor. Then, we use fit_evotree to train a boosted tree model. The optional x_eval and y_eval arguments are provided to enable the usage of early stopping.

config = EvoTreeRegressor(
    nrounds=6000,
    loss=:mse,
    eta=0.02,
    nbins=64,
    max_depth=11,
    rowsample=0.9,
    colsample=0.9,
)

m_mse, logger_mse = fit_evotree(
    config;
    x_train=x_train,
    y_train=y_train,
    x_eval=x_eval,
    y_eval=y_eval,
    early_stopping_rounds=200,
    print_every_n=50,
    metric=:mse,
    return_logger=true
);

p_test = m_mse(x_test);

Model evaluation

For ranking problems, a commonly used metric is the Normalized Discounted Cumulative Gain. It essentially considers whether the model is good at identifying the top K outcomes within a group. There are various flavors to its implementation, though the most commonly used one is the following:

function ndcg(p, y, k=10)
    k = min(k, length(p))
    p_order = partialsortperm(p, 1:k, rev=true)
    y_order = partialsortperm(y, 1:k, rev=true)
    _y = y[p_order]
    gains = 2 .^ _y .- 1
    discounts = log2.((1:k) .+ 1)
    ndcg = sum(gains ./ discounts)

    y_order = partialsortperm(y, 1:k, rev=true)
    _y = y[y_order]
    gains = 2 .^ _y .- 1
    discounts = log2.((1:k) .+ 1)
    idcg = sum(gains ./ discounts)
    return idcg == 0 ? 1.0 : ndcg / idcg
end

To compute the NDCG over a collection of groups, it is handy to leverage DataFrames' combine and groupby functionalities:

test_df = DataFrame(p=p_test, y=y_test, q=q_test)
test_df_agg = combine(groupby(test_df, "q"), ["p", "y"] => ndcg => "ndcg")
ndcg_test = round(mean(test_df_agg.ndcg), sigdigits=5)
@info "ndcg_test MSE" ndcg_test

┌ Info: ndcg_test MSE
└   ndcg_test = 0.8008

Logistic regression alternative

The above regression experiment shows a performance competitive with the results outlined in CatBoost's ranking benchmarks.

Another approach is to use a scaling of the the target ranking scores to perform a logistic regression.

max_rank = 4
y_train = dtrain[:y] ./ max_rank
y_eval = deval[:y] ./ max_rank
y_test = dtest[:y] ./ max_rank

config = EvoTreeRegressor(
    nrounds=6000,
    loss=:logloss,
    eta=0.01,
    nbins=64,
    max_depth=11,
    rowsample=0.9,
    colsample=0.9,
)

m_logloss, logger_logloss = fit_evotree(
    config;
    x_train=x_train,
    y_train=y_train,
    x_eval=x_eval,
    y_eval=y_eval,
    early_stopping_rounds=200,
    print_every_n=50,
    metric=:logloss,
    return_logger=true
);

To measure the NDCG, the original targets must be used since NDCG is a scale sensitive measure.

y_train = dtrain[:y]
y_eval = deval[:y]
y_test = dtest[:y]

p_test = m_logloss(x_test);
test_df = DataFrame(p=p_test, y=y_test, q=q_test)
test_df_agg = combine(groupby(test_df, "q"), ["p", "y"] => ndcg => "ndcg")
ndcg_test = round(mean(test_df_agg.ndcg), sigdigits=5)
@info "ndcg_test LogLoss" ndcg_test

┌ Info: ndcg_test LogLoss
└   ndcg_test = 0.80267

Conclusion

We've seen that a ranking problem can be efficiently handled with generic regression tasks, yet achieve comparable performance to specialized ranking loss functions. Below, we present the NDCG obtained from the above experiments along those published on CatBoost's benchmarks.

ModelNDCG
EvoTrees - mse0.80080
EvoTrees - logistic0.80267
cat-rmse0.802115
cat-query-rmse0.802229
cat-pair-logit0.797318
cat-pair-logit-pairwise0.790396
cat-yeti-rank0.802972
xgb-rmse0.798892
xgb-pairwise0.800048
xgb-lambdamart-ndcg0.800048
lgb-rmse0.8013675
lgb-pairwise0.801347

It should be noted that the later results were not reproduced in the scope of current tutorial, so one should be careful about any claim of model superiority. The results from CatBoost's benchmarks were however already indicative of strong performance of non-specialized ranking loss functions, to which this tutorial brings further support.