Eng During last 365 days Approved articles: 2075,   Articles in work: 306 Declined articles: 850 
Library
Articles and journals | Tariffs | Payments | Your profile

Back to contents

On approximation of the output of a probabilistic model of hierarchical bit indices
Trub Il'ya

PhD in Technical Science

Senior Software Engineer, Samsung Research Center Ltd.

127018, Russia, Moscow, ul. Dvintsev, 12, of. C

itrub@yandex.ru

 

 

Abstract.

The subject of the study is a probabilistic model of hierarchical bit indexes of databases. The object of the study is the output of the model — a three-parameter discrete distribution of the number of indexes for implementing queries to the database, parametrized by the intensity of recording records in the database, the average query length, and the size of a large index. The author considers such aspects of the topic as the choice of a hypothesis from known theoretical distributions, a method for testing a hypothesis, selection of functions for approximating the dependence of the expectation on the third parameter, selection of a function for approximating the dependence of the minimum point of the expectation for the third parameter from the first two. The study of such dependencies is explained by the fact that the optimal choice of the third parameter is the goal of the designer, and the first two are the initial data of the model. The methodology of the research is the methods of mathematical statistics, in particular, the estimation of parameters and the Pearson criterion of testing hypotheses, methods for constructing the best approximations, in particular, the method of least squares, the theory of curves of the third order. The main conclusions of the study: the best approximation for the studied family of distributions is the Polya distribution; The best approximations for the dependence of the expectation on the third parameter are the Bacon-Watts model and the heat capacity model. A special contribution of the author to the study of the topic is the derivation of an empirical formula that has practical significance. It allows the designer on the basis of the first two parameters at once, without using cumbersome calculations on the model, to obtain an approximate optimal value of the third parameter and thus construct an index of the database of the optimal size. The novelty of the research lies in obtaining approximate dependencies for a new type of distribution that cannot be described by a closed formula.

Keywords: power function, output analysis, least square method, heat capacity model, third order curves, Bacon-Watts model, Polia distribution, negative binomial distribution, discrete probability distribution, hierarchical bitmap indexes

DOI:

10.7256/2454-0714.2018.4.27809

Article was received:

02-11-2018


Review date:

03-11-2018


Publish date:

10-01-2019


This article written in Russian. You can find full text of article in Russian here .

References
1.
Buslenko N.P. Modelirovanie slozhnykh sistem. - M.: Nauka, 1968. 356 s.
2.
Vadzinskii R.N. Spravochnik po veroyatnostnym raspredeleniyam.-Sankt-Peterburg, "Nauka", 2001. 150 s.
3.
Gmurman V.E. Teoriya veroyatnostei i matematicheskaya statistika.-Moskva, Vysshaya shkola, 1997.-480 s.
4.
Devyatkov T. V. Nekotorye voprosy sozdaniya sistem avtomatizatsii imitatsionnykh issledovanii // Prikladnaya informatika. 2010. 5 (29). S. 102 116.
5.
Eliseeva I.I., Yuzbashev M.M. Obshchaya teoriya statistiki. Uchebnik.-Moskva, Finansy i statistika, 2004.-656 s.
6.
Kleinrok L. Teoriya massovogo obsluzhivaniya.-L.: Mashinostroenie, 1979.-380 s.
7.
Savelov A.A. Ploskie krivye: semantika, svoistva, primenenie.-Moskva, Gosudarstvennoe izdatel'stvo fiziko-matematicheskoi literatury, 1960.-294 s.
8.
Smogorzhevskii A.S., Stolova E.S. Spravochnik po teorii ploskikh krivykh tret'ego poryadka.-Moskva, Gosudarstvennoe izdatel'stvo fiziko-matematicheskoi literatury, 1961.-272 s.
9.
Trub I.I. Analiticheskoe veroyatnostnoe modelirovanie bitmap-indeksov // Programmnye sistemy i vychislitel'nye metody. 2016. 4. S. 315-323. DOI: 10.7256/2305-6061.2016.4.2109
10.
Trub I.I. Veroyatnostnaya model' ierarkhicheskikh indeksov bazy dannykh // Programmnye sistemy i vychislitel'nye metody. 2017.- 4.-S.15-31. DOI: 10.7256/2454-0714.2017.4.24437. URL: http://e-notabene.ru/ppsvm/article_24437.html
11.
Trub I. I. Imitatsionnoe modelirovanie ierarkhicheskikh bitmap-indeksov // Prikladnaya informatika. 2018. 4 (76). S.5369.
12.
Trub I.I., Trub N.V. Model' ierarkhicheskikh indeksov baz dannykh s prinyatiem reshenii i ee sravnenie s minimaksnoi model'yu // Programmnye sistemy i vychislitel'nye metody. 2018.- 1.-S.18-36. DOI: 10.7256/2454-0714.2018.1.25369. URL: http://e-notabene.ru/ppsvm/article_25369.html
13.
Tyrsin A.N. Metod podbora nailuchshego zakona raspredeleniya nepreryvnoi sluchainoi velichiny na osnove obratnogo otobrazheniya.-Vestnik YuUrGU, serii "Matematika. Mekhanika. Fizika", t.9, 1, 2017. s.31-38.
14.
Shikin E.V., Frank-Kamenetskii M.M. Krivye na ploskosti i v prostranstve.-Moskva, Fazis, 1997.-336 s.
15.
Bacon D.W., Watts D.G. Estimating the Transition between Two Intersecting Straight Lines.-Biometrika, vol. 58, no. 3, December, 1971. pp. 525-534.
16.
Lockwood E.H. A Book of Curves.-Cambridge University Press, Great Britain, 1961. 210 p.
17.
McLaughlin M.P. Compendium of Common Probability Distributions.-December, 2016. URL: www.causascientia.org/math-stat/Dists/Compendium.pdf
18.
Walkowiak R., Kala R. Two-Phase Nonlinear Regression with Smooth Transition.-Communication in Statistics-Simulation and Computation, vol. 29, no. 2, January, 2000. pp. 385-397.
19.
Yates R.C. Curves and Their Properties.-National Council of Teachers Mathematics, Washington, USA, 1974. 260 p