pybool_ir.index.ir_datasets#
Classes
|
- class pybool_ir.index.ir_datasets.IRDatasetsIndexer(index_path: Path | str, dataset_name: str, store_fields: bool = True, optional_fields: List[str] | None = None)#
Bases:
Indexer
- bulk_index()#
Index a collection of documents from a file or directory.
- parse_documents() -> (typing.Iterable[pybool_ir.index.document.Document], <class 'int'>)#
Return an iterable of documents from a path. Depending on different ways documents can be stored, indexers might have multiple ways to store files. This method chooses the best way to parse a file given the filename.
- set_index_fields(store_fields: bool = False)#
Set fields of the index. Off-the-shelf implementations of indexing particular collections require specific fields in lucene to be set.