Query Automation for Systematic Reviews

PhD Thesis, School of Information Technology and Electrical Engineering, The University of Queensland

This thesis explores and investigates query automation methods and tools to support more effective study retrieval for the creation of medical systematic reviews. A systematic review is a synthesis of very high-quality medical studies with a stringent protocol to guide creation with minimal bias and error. Modern medicine practice and healthcare policies rely heavily on systematic reviews in order to make decisions on the best available evidence.

Systematic review creation is often highly time-intensive and costly. Thus there has been much attention to attempt to automate phases in the creation of a systematic review. In this context, the definition of automation is the encoding of manual tasks into computational procedures. Existing automation research has primarily focused on the phase concerned with the critical appraisal of studies. Prior research has attempted to automate this appraisal phase by, for example, simulating human appraisal, (re-)ranking the retrieved studies, and classifying relevant studies, among others. A major drawback of these approaches is that they all rely on the initial search’s retrieval effectiveness, which is contingent on the query’s quality. We follow a fresh, new direction of research by investigating methods for query automation.

In investigating automatic methods for building effective queries for systematic review creation, this thesis develops three main areas of enquiry: (i) query formulation: the development of Boolean queries suitable for systematic review literature search, (ii) query refinement: the iterative process of improving the search effectiveness of a Boolean query suitable for systematic review literature search, and (iii) query exploitation: the application of methods to Boolean queries in order to improve systematic review literature search effectiveness. For each of these query automation areas, this thesis devises computational methods as well as practical tools.

This thesis makes contributions across three main areas of enquiry. They comprise of: (i) computational adaptations of human query formulation methodologies to support the automatic development of complex Boolean queries for systematic review literature search, and practical tools built upon them, (ii) a framework for the generation and identification of more effective query variations to support the automatic refinement of high-quality Boolean queries used in systematic review literature search, as well as tools built upon them, and (iii) methods to exploit intrinsic characteristics of Boolean queries in order to improve the retrieval effectiveness of systematic review literature search. We found that for each of these inquiry areas, query automation was able to improve search effectiveness, reducing the amount of time necessary to create systematic reviews.