Journal of Chemical Information and Modelling 46, 2206-2219, (2006).
Martin Whittle, Valerie J. Gillet, Peter Willett and Jens Loesel.
In a recent companion paper we have related the operation of simple data fusion rules used in virtual screening to a multiple integral formalism. In this paper we extend these ideas to the analysis of data fusion methods applied to real data. We examine several cases of similarity fusion using different coefficients and different representations and consider the reasons for positive or negative results in terms of the similarity distributions. Results are obtained using the SUM-, MAX- MIN-, and CombMNZ-fusion rules. We also develop a customized fusion rule, which provides an estimate of the optimal possible result for fusing multiple searches of a specific database; this shows that similarity fusion can, in principle, achieve retrieval enhancements even if this is not achieved in practice with current fusion rules. The methods are extended to analyze the comparatively successful results of group fusion with multiple actives, and we provide a rationale for the observed superiority of the MAX-rule over the SUM-rule in this context.