pybool_ir.experiments.collections#

Classes and methods for loading collections.

Functions

load_collection(name)

Given the name of a collection, load it from disk.

load_collection_ir_datasets(name)

Load a collection from the ir_datasets package.

parse_clef_tar_topic(topic_str[, date_from, ...])

Helper function that parses a topic from the CLEF TAR collection.

Classes

Collection(identifier, topics, qrels)

A collection contains a list of topics and a list of qrels.

Topic(identifier, description, raw_query, ...)

A topic contains a query and a date range for reproducing when the query was issued.

class pybool_ir.experiments.collections.Collection(identifier: str, topics: List[Topic], qrels: List[Qrel])#

Bases: object

A collection contains a list of topics and a list of qrels.

classmethod from_dir(collection_path: Path) Collection#

Internally, pybool_ir stores collections as a directory with a topics.jsonl file and a qrels file. This ensures a common format for all collections. This method loads a collection in this format.

class pybool_ir.experiments.collections.Topic(identifier: str, description: str, raw_query: str, date_from: str, date_to: str)#

Bases: object

A topic contains a query and a date range for reproducing when the query was issued.

classmethod from_file(topic_path: Path) List[Topic]#

Internally, pybool_ir uses a jsonl file to store topics. This method loads a topic from a jsonl file.

pybool_ir.experiments.collections.load_collection(name: str) Collection#

Given the name of a collection, load it from disk. A collection contains a list of topics and a list of qrels. The actual documents for a collection are handled separately.

pybool_ir.experiments.collections.load_collection_ir_datasets(name: str) Collection#

Load a collection from the ir_datasets package.

pybool_ir.experiments.collections.parse_clef_tar_topic(topic_str: str, date_from: str = '1940', date_to: str = '2017', parse_query: bool = False) Topic#

Helper function that parses a topic from the CLEF TAR collection. These files are in a non-standard TREC format, so this function is used to parse them.