|Articles and journals | Tariffs | Payments | Your profile|
On approximation of the output of a probabilistic model of hierarchical bit indices
Abstract.The subject of the study is a probabilistic model of hierarchical bit indexes of databases. The object of the study is the output of the model — a three-parameter discrete distribution of the number of indexes for implementing queries to the database, parametrized by the intensity of recording records in the database, the average query length, and the size of a large index. The author considers such aspects of the topic as the choice of a hypothesis from known theoretical distributions, a method for testing a hypothesis, selection of functions for approximating the dependence of the expectation on the third parameter, selection of a function for approximating the dependence of the minimum point of the expectation for the third parameter from the first two. The study of such dependencies is explained by the fact that the optimal choice of the third parameter is the goal of the designer, and the first two are the initial data of the model. The methodology of the research is the methods of mathematical statistics, in particular, the estimation of parameters and the Pearson criterion of testing hypotheses, methods for constructing the best approximations, in particular, the method of least squares, the theory of curves of the third order. The main conclusions of the study: the best approximation for the studied family of distributions is the Polya distribution; The best approximations for the dependence of the expectation on the third parameter are the Bacon-Watts model and the heat capacity model. A special contribution of the author to the study of the topic is the derivation of an empirical formula that has practical significance. It allows the designer on the basis of the first two parameters at once, without using cumbersome calculations on the model, to obtain an approximate optimal value of the third parameter and thus construct an index of the database of the optimal size. The novelty of the research lies in obtaining approximate dependencies for a new type of distribution that cannot be described by a closed formula.
Keywords: power function, output analysis, least square method, heat capacity model, third order curves, Bacon-Watts model, Polia distribution, negative binomial distribution, discrete probability distribution, hierarchical bitmap indexes
Article was received:02-11-2018
This article written in Russian. You can find full text of article in Russian here .