pybool_ir.index.index#
Base classes for indexing and searching documents.
Classes
|
Base class that provides the basic functionality for indexing and searching documents. |
Include this mixin to add search functionality to an Indexer. |
- class pybool_ir.index.index.Indexer(index_path: Path | str, store_fields: bool = True, optional_fields: List[str] | None = None)#
Bases:
ABC
Base class that provides the basic functionality for indexing and searching documents. By default, this class provides no ability to search documents without directly using the lucene API.
- add_document(doc: Document, optional_fields: Dict[str, Callable[[Document], Any]] | None = None) None #
Add a single document to the index. This method is called by bulk_index.
optional_fields is a dictionary of field names to functions that take a document and return a value for that field. This is useful for adding fields that are not part of the document, but are derived from the document, calculated at index time.
- bulk_index(fname: Path | str, optional_fields: Dict[str, Callable[[Document], Any]] | None = None)#
Index a collection of documents from a file or directory.
- index: Indexer#
The underlying lucene index.
- abstract parse_documents(fname: ~pathlib.Path) -> (typing.Iterable[pybool_ir.index.document.Document], <class 'int'>)#
Return an iterable of documents from a path. Depending on different ways documents can be stored, indexers might have multiple ways to store files. This method chooses the best way to parse a file given the filename.
- abstract set_index_fields(store_fields: bool = False)#
Set fields of the index. Off-the-shelf implementations of indexing particular collections require specific fields in lucene to be set.
- class pybool_ir.index.index.SearcherMixin#
Bases:
ABC
Include this mixin to add search functionality to an Indexer.
- abstract search(query: str, n_hits=10) List[Document] #
Given a query, return the top n_hits documents. When n_hits is None, return all documents that match the query.
- abstract search_fmt(query: str, n_hits=10, hit_formatter: str | None = None) None #
Perform a search and print the results.