Colle is constituted of X tasks, each of them aims to test one or more facets of language understanding in machine learning. Below are each of the tasks in more detail.

Allo-ciné.ca

Allo-ciné tests language understanding in sentiment classification by feeding movie reviews which can be either positive and negative. The task consists in giving the correct sentiment for each review.

Metrics:: Accuracy

FQuAD - French Question Answering Dataset

FQuAD is question/answer pairs built on high-quality Wikipedia articles. The goal in this task is to accurately predict if the answer to the question can be found in the provided article.

Metrics:: F1 Score, Exact Match Ratio

GQNLI-Fr - The Generalized Quantifier NLI Challenge Dataset

The dataset consists of carefully constructed premise-hypothesis pairs. Each hypothesis logically follows from the premise, contradicts it, or is neutral.

Metrics:: Accuracy

Opus Parcus - Open Subtitles Paraphrase Corpus

Opus Parcus, built with data from Open Subtitles, consists of paraphrase pairs with identical semantic meaning. When testing, a label is provided being a score of 1 to 5, with 1 meaning sentences are dissimilar and 5 meaning they are identical.

Metrics:: Pearson

PAWS: Paraphrase Adversaries from Word Scrambling

This task aims to test paraphrase identification by giving two sentences and having the model define if these sentences are equivalent in meaning or not.

Metrics:: Accuracy

PIAF - The French-Language Dataset of Questions-Answers

This task consists of pairs of questions and text answers with an indication of where the truly relevant information is located in the answer.

Metrics:: F1 Score, Exact Match Ratio

QFrCoLA - a Quebec-French Corpus of Linguistic Acceptability Judgments

QFrCoLA is a French dataset sourced from multiple linguistic sites such as académie-française.fr and vitrinelinguistique.com. It aims to test models’ ability to determine grammatical correctness. The answer is a binary label indicating if the sentence is correct or not.

Metrics:: Accuracy

QFrBLiMP - a Quebec-French Linguistic minimal pairs

This task gives the model sentence pairs. The goal is to determine if the sentences are semantically equivalent, even with slightly different syntax and words.

Metrics:: Accuracy

Sick-FR - French Sentences Involving Compositional Knowledge

This task also has pairs of sentences annotated on two dimensions: relatedness (scored 1 to 5) and entailment (choices: entails, contradicts, neutral).

Metrics:: Pearson

Sts22-Crosslingual - Multilingual News Article Similarity

This task evaluates whether pairs of news articles, written in different languages, cover the same story. It focuses on document-level similarity, where systems rate article pairs on a 4-point scale from most to least similar.

Metrics:: Pearson

XNLI - The Cross-Lingual NLI Corpus

This task consists of pairs of sentences where the goal is to determine the relation between the two: entailment, neutral, or contradiction.

Metrics:: Accuracy