pybool_ir.experiments.retrieval#

Classes and methods for running retrieval experiments.

Functions

AdHocExperiment(indexer[, raw_query, ...])

Unlike the RetrievalExperiment class, which expects a Collection object, this class allows for ad-hoc queries to be run, for example:

Classes

LuceneSearcher(indexer)

Basic wrapper around a lucene index that provides a simple interface for searching.

RetrievalExperiment(indexer, collection, ...)

This class provides a convenient interface for running retrieval experiments.

pybool_ir.experiments.retrieval.AdHocExperiment(indexer: ~pybool_ir.index.index.Indexer, raw_query: str | None = None, query_parser: ~pybool_ir.query.parser.QueryParser = <pybool_ir.query.pubmed.parser.PubmedQueryParser object>, date_from='1900/01/01', date_to='3000/01/01', ignore_dates: bool = False, date_field: str = 'dp') RetrievalExperiment#

Unlike the RetrievalExperiment class, which expects a Collection object, this class allows for ad-hoc queries to be run, for example:

>>> from pybool_ir.experiments.retrieval import AdHocExperiment
>>> from pybool_ir.index.pubmed import PubmedIndexer
>>>
>>> with AdHocExperiment(PubmedIndexer(index_path="pubmed"), raw_query="headache[tiab]") as experiment:
>>>     print(experiment.count())
class pybool_ir.experiments.retrieval.LuceneSearcher(indexer: Indexer)#

Bases: ABC

Basic wrapper around a lucene index that provides a simple interface for searching. This class can be used as a context manager, which will automatically open and close the index. It is possible to directly use this class to do experiments, but the other classes in this module provide a more convenient interface.

index: Indexer#

The underlying lucene index.

indexer#

The pybool_ir pybool_ir.index.index.Indexer class that is used to open the index.

class pybool_ir.experiments.retrieval.RetrievalExperiment(indexer: ~pybool_ir.index.index.Indexer, collection: ~pybool_ir.experiments.collections.Collection, query_parser: ~pybool_ir.query.parser.QueryParser = <pybool_ir.query.pubmed.parser.PubmedQueryParser object>, eval_measures: ~typing.List[~ir_measures.measures.base.Measure] | None = None, run_path: ~pathlib.Path | None = None, filter_topics: ~typing.List[str] | None = None, ignore_dates: bool = False, date_field: str = 'dp')#

Bases: LuceneSearcher

This class provides a convenient interface for running retrieval experiments. It can be used as a context manager, which will automatically open and close the index. This class should be used for simple experiments where a collection of queries is executed on the index, for example:

>>> from pybool_ir.experiments.collections import load_collection
>>> from pybool_ir.experiments.retrieval import RetrievalExperiment
>>> from pybool_ir.index.pubmed import PubmedIndexer
>>> from ir_measures import *
>>> import ir_measures
>>>
>>> # Automatically downloads, then loads this collection.
>>> collection = load_collection("ielab/sysrev-seed-collection")
>>> # Point the experiment to your index, your collection.
>>> with RetrievalExperiment(PubmedIndexer(index_path="pubmed"),
...                                        collection=collection) as experiment:
...     # Get the run of the experiment.
...     # This automatically executes the queries.
...     run = experiment.run
>>> # Evaluate the run using ir_measures.
>>> ir_measures.calc_aggregate([SetP, SetR, SetF], collection.qrels, run)
go() None#

Run the experiment without returning anything.