Bipartite RankBoost

^Description

This program is an implementation of the RankBoost algorithm [Freund et al., 2003] also described in [Amini, 2015; p.177-184].

^{Download and Installation}

The program is free for scientific use only, it is developed on Linux and the source code is available from:
http://ama.liglab.fr/~amini/RankBoost/RankBoost.tar.bz2

After downloading the file, and unpackting it:

 > bzip2 -cd RankBoost.tar.bz2 | tar xvf -

you need to compile the program in the new directory RankBoost/

 > make

After compilation, two executables are created:

RankBoost-Train (for training the model)
RankBoost-Test (for testing it)

^{Training and testing}

Both train and test modules operate on feature:value representation of examples:

Rel feature:value feature:value

where Rel (in -1|1) is the relevance judgement of an example; -1 (or 1) if the example is judged irrelevant (resp. relevant) to the given topic. In RankBoost/ there are two training_set and test_set files in the directory Example/ which are given for test purposes. training_set 10 training examples and test_set contains 8,000 test examples. (These datasets are built over a subset of the RCV1 Reuters collection).

^{Train the model:}

 > RankBoost [options] input_file parameter_file

Options are:

-t (integer)	The number of boosting iterations (default 10),
-p (integer)	The number of candidate thresholds over features - stumps (default 10),
-d (integer)	Display (default 1),
-?	Help

^{Test the model:}

 > RankBoost-Train input_file parameter_file

^Example

Running the program on training_set [Freund et al. 2003] (i.e. 10 candidate thresholds and 10 boosting iterations) gives

 > ./RankBoost_Train Example/training_set Params

La base d'apprentissage contient 10 exemples, en dimension 20080

Apprentissage ...

   1 --> alpha= 0.9729550745276566 j*=  28 theta*=1.00000000

   2 --> alpha= 0.7679765526894443 j*=  3 theta*=1.00000000

   3 --> alpha= 0.8061756522837317 j*= 28 theta*=1.00000000

   4 --> alpha= 0.9058990607798512 j*= 53 theta*=1.00000000

   5 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000

   6 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000

   7 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000

   8 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000

   9 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000

   10 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000

Where, j* and theta* are the chosen feature variable and the corresponding threshold which determine the base ranking function at a given iteration ([Freund et al., 2003] eq. 9) and alpha correspond to the boosting weights ([Freund et al., 2003] eq. 6).

The testing of the previous model gives

 > /RankBoost-Test Example/test_set Params

La base de test contient 8000 exemples en dimension 21530

AUC=0.518593 AvP=0.299255

This program is publicly available for research use only. It should not be distributed for commercial use and the author is not responsible for any (mis)use of this algorithm.

^{Acknowledgements}

The author is thankful to Reuters for making the RCV1/RCV2 data available and granting permission to distribute processed versions of it as the examples used in this release come from part of RCV1 collection.

^Bibliography

[Amini, 2015] Massih-Reza Amini. Apprentissage Machine: de la théorie à la pratique. Eyrolles, 2015.

[Freud et al., 2003] Yoav Freund, Raj Iyer, Robert E. Schapire, Yoram Singer. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research (JMLR 2003), 2003

Bipartite ranking

Description

Download and Installation

Training and testing

Example