Colle is constituted of X tasks, each of them aims to test one or more facets of language understanding in machine learning. Below are each of the tasks in more detail.
Allo-ciné.ca
Allo-ciné tests language understanding in sentiment classification by feeding movie reviews which can be either positive and negative. The task consists in giving the correct sentiment for each review.
Metrics:: Accuracy
FQuAD - French Question Answering Dataset
FQuAD is question/answer pairs built on high-quality Wikipedia articles. The goal in this task is to accurately predict if the answer to the question can be found in the provided article.
Metrics:: F1 Score, Exact Match Ratio
GQNLI-Fr - The Generalized Quantifier NLI Challenge Dataset
The dataset consists of carefully constructed premise-hypothesis pairs. Each hypothesis logically follows from the premise, contradicts it, or is neutral.
Metrics:: Accuracy
Opus Parcus - Open Subtitles Paraphrase Corpus
Opus Parcus, built with data from Open Subtitles, consists of paraphrase pairs with identical semantic meaning. When testing, a label is provided being a score of 1 to 5, with 1 meaning sentences are dissimilar and 5 meaning they are identical.
Metrics:: Pearson
PAWS: Paraphrase Adversaries from Word Scrambling
This task aims to test paraphrase identification by giving two sentences and having the model define if these sentences are equivalent in meaning or not.
Metrics:: Accuracy
PIAF - The French-Language Dataset of Questions-Answers
This task consists of pairs of questions and text answers with an indication of where the truly relevant information is located in the answer.
Metrics:: F1 Score, Exact Match Ratio
QFrCoLA - a Quebec-French Corpus of Linguistic Acceptability Judgments
QFrCoLA is a French dataset sourced from multiple linguistic sites such as académie-française.fr and vitrinelinguistique.com. It aims to test models’ ability to determine grammatical correctness. The answer is a binary label indicating if the sentence is correct or not.
Metrics:: Accuracy
QFrBLiMP - a Quebec-French Linguistic minimal pairs
This task gives the model sentence pairs. The goal is to determine if the sentences are semantically equivalent, even with slightly different syntax and words.
Metrics:: Accuracy
Sick-FR - French Sentences Involving Compositional Knowledge
This task also has pairs of sentences annotated on two dimensions: relatedness (scored 1 to 5) and entailment (choices: entails, contradicts, neutral).
Metrics:: Pearson
Sts22-Crosslingual - Multilingual News Article Similarity
This task evaluates whether pairs of news articles, written in different languages, cover the same story. It focuses on document-level similarity, where systems rate article pairs on a 4-point scale from most to least similar.
Metrics:: Pearson
XNLI - The Cross-Lingual NLI Corpus
This task consists of pairs of sentences where the goal is to determine the relation between the two: entailment, neutral, or contradiction.
Metrics:: Accuracy