Software systems and computational methodsReference:
On approximation of the output of a probabilistic model of hierarchical bit indices
Abstract.The subject of the study is a probabilistic model of hierarchical bit indexes of databases. The object of the study is the output of the model — a three-parameter discrete distribution of the number of indexes for implementing queries to the database, parametrized by the intensity of recording records in the database, the average query length, and the size of a large index. The author considers such aspects of the topic as the choice of a hypothesis from known theoretical distributions, a method for testing a hypothesis, selection of functions for approximating the dependence of the expectation on the third parameter, selection of a function for approximating the dependence of the minimum point of the expectation for the third parameter from the first two. The study of such dependencies is explained by the fact that the optimal choice of the third parameter is the goal of the designer, and the first two are the initial data of the model. The methodology of the research is the methods of mathematical statistics, in particular, the estimation of parameters and the Pearson criterion of testing hypotheses, methods for constructing the best approximations, in particular, the method of least squares, the theory of curves of the third order. The main conclusions of the study: the best approximation for the studied family of distributions is the Polya distribution; The best approximations for the dependence of the expectation on the third parameter are the Bacon-Watts model and the heat capacity model. A special contribution of the author to the study of the topic is the derivation of an empirical formula that has practical significance. It allows the designer on the basis of the first two parameters at once, without using cumbersome calculations on the model, to obtain an approximate optimal value of the third parameter and thus construct an index of the database of the optimal size. The novelty of the research lies in obtaining approximate dependencies for a new type of distribution that cannot be described by a closed formula.
Keywords: power function, output analysis, least square method, heat capacity model, third order curves, Bacon-Watts model, Polia distribution, negative binomial distribution, discrete probability distribution, hierarchical bitmap indexes
Article was received:02-11-2018
This article written in Russian. You can find full text of article in Russian here .
The journal allows the author(s) to hold the copyright without restrictions. All authors automatically own full copyright in their work as soon as they create it, and current Russian Federal legislation protects them.
Licence type: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
The journal is an open access journal which means that everybody can read, download, copy, distribute, print, search, or link to the full texts of these articles in accordance with Creative Commons Attribution- NonCommercial 4.0 International License.
You are free to:
Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.