Стаття | Article

*Olena Syrotkina*

Department of Software Engineering

Dnipro University of Technology

Dnipro, Ukraine

*Mykhailo Aleksieiev*

Department of Software Engineering

Dnipro University of Technology

Dnipro, Ukraine

*Iryna Udovyk*

Department of Software Engineering

Dnipro University of Technology

Dnipro, Ukraine

** Abstract**. This paper addresses the issue of creating and applying mathematical methods for optimizing time and computing resources when processing multiple data streams circulating in a distributed information management system. In order to solve this problem, we suggest methods of reducing the space of analyzed states using the data organization structure “m-tuples based on ordered sets of arbitrary cardinality”. Using this data structure allows us to minimize the time and computing resources involved.

** Keywords**: Big Data, Big Data reduction methods, data organization structure, ordered set of arbitrary cardinality, minimization of time and computing resources.

At the present stage in the creation and application of information technologies in various industries and energy, the problems of diagnosing the operability of complex hardware/software systems of an industrial enterprise remain relevant [1, 2]. In the event of a system failure or an abnormal situation being detected, large volumes of poorly structured low-level primary diagnostic information are generated within seconds, which requires further processing and analysis to restore the system [3, 4]. The shortage of time, computing and information resources in the processing of data, methods used for data analysis and the need for prompt decision-making in real time affect the quality of diagnostics and the recovery time of the system [5]. Therefore, the most significant problem is the creation and application of mathematical methods for optimizing time and computing resources when processing multiple streams of diagnostic data circulating in the information management systems of an industrial enterprise.

We propose mathematical methods for working with the data structure “m-tuples based on ordered sets of arbitrary cardinality (OSAC)” to solve the problems of analysis and processing of various combinations of several basic parameter sets for information management systems [6, 7]. This data structure allows us to describe a Boolean template in a general way. This template is an ordered set of all subsets of an ordered basis set of arbitrary cardinality for any data type. In this case, there is a triple ordering of data:

a) Ordering by basis set. Basis set ordering allows an arbitrary algorithm to be created for ordering the data of the basis set for a particular data type and for a specific task of working with data when instantiating a template class based on the given data structure.

b) Ordering by a Boolean of the basis set. This approach divides a Boolean, elements of which are tuples of element combinations of the basis set into subsets by the criterion for the length of a tuple. This allows us to order the subsets of a Boolean according to the length of the tuple.

c) Ordering by a Boolean subset. This approach implies that we have a Boolean subset with tuples of the same length. The tuple ordering is consistent with the ordering of the basis set elements.

Triple ordering of data allows us to determine the characteristic properties of the data structure elements and the functional relationships between m-tuples based on the analysis of their location in the data structure.

The application of mathematical methods for working with the given data structure based on the properties obtained and the functional dependencies thus derived is one of the ways to solve the problem.

The list of designations of the main elements of the data structure "m-tuples based on OSAC" is shown in Table 1. A more detailed description of the basic terms and definitions, properties and patterns, as well as mathematical methods for working with the data structure “m-tuples based on OSAC”, is given in [6, 7].

TABLE 1 THE LIST OF BASIC ELEMENT DESIGNATIONS FOR THE GIVEN DATA STRUCTURE “M-TUPLES BASED ON OSAC”

In the data structure being considered, each m-tuple is unique and its location is determined univocally. The location of the m-tuple in the two-dimensional structure is defined by a pair of indexes (j, m). As a result of research of the properties and functional dependencies between the data structure elements “m-tuples based on OSAC”, certain properties of this data structure, partially described below, were determined and theoretically proven.

A) Certain properties of the ordered set of cardinalities $K_{{}}^{n}$of subsets $Y_{m}^{n}$ for Boolean ${{2}^{X}}$

Property A.1. The value of the first element of set$K_{{}}^{n}$ is always equal to the value of cardinality n of basis set X.

Property A.2. The value of the last element of set $K_{{}}^{n}$is always equals to 1.

Property A.3. The value of the mth elements corresponding to the first and last elements of set $K_{{}}^{n}$are always equal.

B) Certain properties of maximum elements of an ordered set with cardinalities $K_{{}}^{n}$

Property B.1. If basis set X contains an odd number of elements n, then the set of cardinalities $K_{{}}^{n}$of subsets $Y_{m}^{n}$ for Boolean ${{2}^{X}}$ has two elements with a maximum value.

Property B.2. If basis set X contains an even number of elements n, then the set of cardinalities $K_{{}}^{n}$of subsets $Y_{m}^{n}$ for Boolean ${{2}^{X}}$ has one element with the maximum value:

Set $K_{\max }^{n}$of maximal elements of set$K_{{}}^{n}$in general is defined as follows:

\(K_{\max }^{n}:=(n\%2)?(\{k_{\frac{n-1}{2}}^{n},k_{\frac{n+1}{2}}^{n}\}):)(\{k_{\frac{n}{2}}^{n}\})\),

where % is the operation of obtaining the remainder of the division.

The graphs for evaluating the execution time of some methods for working with the given data structure “m-tuples based on OSAC” are given in [6]

The methods applied for working with the data structure “m-tuples based on OSAC” allow optimizing the data processing, since:

̶ in this data structure only the basis set X is initialized, other elements of the structure are generated as required on the basis of a certain set of formal rules for obtaining data structure elements;

̶ the size of the memory used to store the basis set in comparison with the entire data structure is reduced by the amount $({{2}^{n}}-n)\cdot sizeof(T)$, where T is a type of element of the basis set;

̶ for a certain class of operations on data structure elements containing a large array with a complex data type structure as a basis set X, the mathematical methods used can show m-tuples with a complex data type as sets of integers that are indexes of their location in the structure. This allows time and computing resources to be minimized for data processing;

̶ for a certain class of operations on data structure elements, the determination of functional dependencies between these elements by their position in the structure is defined by a pair of indexes (j, m). It allows us to calculate the result based on these functional dependencies without performing a cumbersome data processing algorithm for a large array and complex data type.

[1] K.S. Manoj, “Industrial Automation with SCADA: Concepts,Communications and Security”, Chennai: Notion Press, 2019, 242 p.

[2] Technical Manual, “Supervisory Control and Data Acquisition(SCADA) Systems for Command, Control, Communications, Computer, Intelligence,Surveillance, and Reconnaissance (C4ISR) Facilities”, Washington: Department of the Army, 2006, 94p.

[3] S. Windmann and O. Niggemann, “EfficientFault Detection for Industrial Automation Processes with Observable ProcessVariables”, proceedings IEEE International Conference on Industrial Informatics(INDIN) Cambridge: 2015. pp. 121-126.

[4] J. MacGregor and A. Cinar, “Monitoring, Fault Diagnosis,Fault-Tolerant Control and Optimization: Data Driven Methods”, Computers &Chemical Engineering, 2012, vol. 47, pp. 111–120.

[5] M. Chen, S. Mao, Y. Zhang and V. Leung, “Big Data. RelatedTechnologies, Challenges, and Future Prospects”, Spinger, 2014, 100 p.

[6] O. Syrotkina, M. Alekseyev and O. Aleksieiev “Evaluation todetermine the efficiency for the diagnosis search formation method of failuresin automated systems”, Eastern-European Journal of Enterprise Technologies,2017, vol. 4, issue 9 (88), pp. 59–68.

[7] O. Syrotkina, M. Alekseyev, V. Asotskyi and I. Udovyk, “Analysis of how the properties of structureddata can influence the way these data are processed”, Naukovyi Visnyk NHU,Dnipro, 2019, vol. 3 (171), pp. 119-129

May 30, 2020