pybool_ir.index.generic#
Generic indexers and searchers for JSONL and JSONLD files.
Classes
|
Generic searcher for any kind of index. |
|
Generic indexer for JSONL files. |
|
Generic indexer for JSONLD files. |
- class pybool_ir.index.generic.GenericSearcher(index_path: Path | str, store_fields: bool = True, optional_fields: List[str] | None = None)#
Bases:
Indexer
,SearcherMixin
Generic searcher for any kind of index.
- parse_documents(fname: ~pathlib.Path) -> (typing.Iterable[pybool_ir.index.document.Document], <class 'int'>)#
Return an iterable of documents from a path. Depending on different ways documents can be stored, indexers might have multiple ways to store files. This method chooses the best way to parse a file given the filename.
- search(query: str, n_hits=10) List[Document] #
Given a query, return the top n_hits documents. When n_hits is None, return all documents that match the query.
- search_fmt(query: str, n_hits=10, hit_formatter: str | None = None)#
Perform a search and print the results.
- set_index_fields(store_fields: bool = False)#
Set fields of the index. Off-the-shelf implementations of indexing particular collections require specific fields in lucene to be set.
- class pybool_ir.index.generic.JsonlIndexer(index_path: Path | str, store_fields: bool = True, optional_fields: List[str] | None = None)#
Bases:
Indexer
Generic indexer for JSONL files. The JSONL file should contain one JSON object per line.
Each document must have an id and date field.
- parse_documents(fname: ~pathlib.Path) -> (typing.Iterable[pybool_ir.index.document.Document], <class 'int'>)#
Return an iterable of documents from a path. Depending on different ways documents can be stored, indexers might have multiple ways to store files. This method chooses the best way to parse a file given the filename.
- set_index_fields(store_fields: bool = False)#
Set fields of the index. Off-the-shelf implementations of indexing particular collections require specific fields in lucene to be set.
- class pybool_ir.index.generic.JsonldIndexer(index_path: Path | str, store_fields: bool = True, optional_fields: List[str] | None = None)#
Bases:
JsonlIndexer
Generic indexer for JSONLD files. The JSONLD file should contain one JSON object per line. This indexer assumes that the first line of the file is the document ID, and the second line is the document datasets. This class can be used to index datasets in the same way ElasticSearch does.
Each document must have an id and date field.
- parse_documents(fname: ~pathlib.Path) -> (typing.Iterable[pybool_ir.index.document.Document], <class 'int'>)#
Return an iterable of documents from a path. Depending on different ways documents can be stored, indexers might have multiple ways to store files. This method chooses the best way to parse a file given the filename.