Using the COLLE Benchmark

Training and Testing

The COLLE benchmark can be used to train and/or test models on multiple tasks. To train or fine-tune a model, you can fetch the train, validation and test data splits from our The COLLE benchmark can be used to train and/or test models on multiple tasks. To train or fine-tune a model, you can fetch the train, validation and test data splits from our. We recommend using Hugging Face’s libraries to simplify the process.

To test a model, you also need to fetch the data in the same way. Once done, your model should infer predictions for each line in the test split. Our repository includes benchmark evaluation scripts for each dataset. You only need to plug in your model's inference method using HuggingFace Model interface. Our inference script are available on our GitHub Repository.

If you prefer to run inference separately, please ensure that the predictions are formatted correctly before submitting them for evaluation (see our "Formatting the Dataset" section).

Formatting the Dataset

Before submitting your results, make sure your output is properly formatted so that our systems can process it. The expected format is a nested JSON dictionary as follows:

{
  "model_name": "a_model_name",
  "model_url": "a_model_url",
  "tasks": [
    {
      "qfrcola": { "predictions": [1,1,1,1,1] }
    },
    {
      "allocine": { "predictions": [1,1,1,1,1] }
    }
  ]
}