Metadata-Version: 2.1
Name: experimaestro-ir
Version: 0.0.0
Summary: "Experimaestro common module for IR experiments"
Home-page: https://github.com/bpiwowar/experimaestro-ir
Author: Benjamin Piwowarski
Author-email: benjamin@piwowarski.fr
License: GPL-3
Keywords: neural information retrieval,experiment platform
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: anserini
Provides-Extra: neural
License-File: LICENSE

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![Documentation Status](https://readthedocs.org/projects/experimaestro-ir/badge/?version=latest)](https://experimaestro-ir.readthedocs.io/en/latest/?badge=latest)

# Information Retrieval for experimaestro

Information Retrieval module for [experimaestro](https://experimaestro-python.readthedocs.io/)

The full documentation can be read at [IR@experimaestro](https://experimaestro-ir.readthedocs.io/).

## Install

Base experimaestro-IR can be installed with `pip install xpmir`.
Functionalities can be added by installing optional dependencies:

- `pip install xpmir[neural]` to install neural-IR packages (torch, etc.)
- `pip install xpmir[anserini]` to install Anserini related packages

## What's inside?

- Collection management (using datamaestro)
    - Interface for the [IR datasets library](https://ir-datasets.com/)
    - Splitting IR datasets
    - Shuffling training triplets
- Representation
    - Word Embeddings
    - HuggingFace transformers
- Indices
    - dense: [FAISS](https://github.com/facebookresearch/faiss) interface
    - sparse: [xpmir-rust library](https://github.com/experimaestro/experimaestro-ir-rust)
- Standard Indexing and Retrieval
    - Anserini
- Learning to Rank
    - Pointwise
    - Pairwise
    - Distillation
    - (*planned*) Pipelines (e.g. ANCE)
- Neural IR
    - Cross-Encoder
    - Splade
    - DRMM
    - ColBERT
- Paper reproduction:
    - *MonoBERT* (Passage Re-ranking with BERT. Rodrigo Nogueira and Kyunghyun Cho. 2019)
    - (planned) *DuoBERT* (Multi-Stage Document Ranking with BERT. Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin. 2019)
    - (planned) *Splade v2* (SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval, Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. SIGIR 2021)
- Pre-trained models
    - [HuggingFace](https://huggingface.co) integration (direct, through the Sentence Transformers library)

## Examples

- [BM25 retrieval](./examples/bm25.py)
- [MS Marco (cross-encoder)](./examples/msmarco-rerank.py)

## Thanks

Some parts of the code have been adapted from [OpenNIR](https://github.com/Georgetown-IR-Lab/OpenNIR)
