Essbase dense vs sparse

Essbase dense vs sparse, the fundamentals

In “new to Essbase” by Arthur van den Berg

If you are serious in building or maintaining planning solutions you need to know about Essbase dense vs sparse. You might even have heard of these concepts along your path. I sincerely hope that you get a proper understanding of “dense” and “spare” before you start building models as this is a subject that will have serious consequences on the overall performance of your solution. This blog covers the fundamentals you should know about Essbase dense vs sparse.

Essbase dense vs sparse, the theory

“Dense” and “sparse” are the key words to understand the data structures around Essbase. If you Google on these words you should come to the Essbase Database Administration Guide (link to version 12 here). The information in the admin guide on this topic is pretty old — it has not changed since 2006 — but still valid.

So please open the link I provided and let me know if it is clear to you.

… pause for browsing and reading … do you get it? If so, you can skip the rest of this blog and continue to the next one … or may be — just to be sure you do not miss anything out — read a bit further …

To really understand the background of the terms “dense” and “sparse” I will to give you a little bit of background on Essbase itself. Essbase is a really a veteran when it comes to EPM tools in the digital age. Back in 1991, the Essbase founding fathers at the Arbor company, invented a specific data storage system that would allow for optimal performance whilst using as little data space as required. Another thing you need to realize — if you are completely fresh to the subject — is that Essbase is a multidimensional database; completely different than relational database. You can read more on this in the “what is an Essbase cube” article.

dense vs sparse, explanation by example

The creators of Essbase realized that they needed to find a solution for presenting data in a pivot table-like way (compare the pivot-table functionality in Excel). The strangest thing with presenting data in a “pivot table” fashion is that in a database like Essbase, data “explodes” when you aggregate the data over the dimensions. I would like to illustrate the Essbase solution they invented with a — I hope — comprehensible example.

Consider a cube with the 5 dimensions with each dimension having the following hierarchy with members:

consider a cube with these 5 dimensions (note the definition of “level 0”)

You could view the dimension as the axes in a “multidimensional” coordinate. This means that to store an amount in this cube or to view an amount, you need to provide a valid coordinate for each dimension.

a valid coordinate in this Essbase cube example (#1)

For example, valid coordinates are:

1 1-a 2-b 3-c 4-f 5-e 99.75
2 1-total 2-total 3-a 4-total 5-abcde 23.35

Invalid examples are:

3 1-a 2-a, 2-b 3-a 4-b 5-b 89.1
4 1-a 5-b -122.335

(#3 is invalid because there are two values for dim 2, #4,is invalid because there is no value for dim2, dim3 and dim 4)

Similar to an Excel pivot table, the amounts can be aggregated from the bottom level (level 0) to the (sub-)total levels in each direction. If we aggregate the valid examples as mentioned above the following 35 new values will be created (#1 is the input example) (please try to create the list yourself):

dim 1 dim 2 dim 3 dim 4 dim 5 value datablock
1 1-a 2-b 3-c 4-f 5-e 99.75 level 0, input
2 1-ab 2-b 3-c 4-f 5-e 99.75
3 1-total 2-b 3-c 4-f 5-e 99.75
4 1-a 2-total 3-c 4-f 5-e 99.75
5 1-ab 2-total 3-c 4-f 5-e 99.75
6 1-total 2-total 3-c 4-f 5-e 99.75
7 1-a 2-b 3-c 4-def 5-e 99.75 (*)
8 1-ab 2-b 3-c 4-def 5-e 99.75
9 1-total 2-b 3-c 4-def 5-e 99.75
10 1-a 2-total 3-c 4-def 5-e 99.75
11 1-ab 2-total 3-c 4-def 5-e 99.75
12 1-total 2-total 3-c 4-def 5-e 99.75
13 1-a 2-b 3-c 4-total 5-e 99.75 (*)
14 1-ab 2-b 3-c 4-total 5-e 99.75
15 1-total 2-b 3-c 4-total 5-e 99.75
16 1-a 2-total 3-c 4-total 5-e 99.75
17 1-ab 2-total 3-c 4-total 5-e 99.75
18 1-total 2-total 3-c 4-total 5-e 99.75
19 1-a 2-b 3-c 4-f 5-abcde 99.75 (*)
20 1-ab 2-b 3-c 4-f 5-abcde 99.75
21 1-total 2-b 3-c 4-f 5-abcde 99.75
22 1-a 2-total 3-c 4-f 5-abcde 99.75
23 1-ab 2-total 3-c 4-f 5-abcde 99.75
24 1-total 2-total 3-c 4-f 5-abcde 99.75
25 1-a 2-b 3-c 4-def 5-abcde 99.75 (*)
26 1-ab 2-b 3-c 4-def 5-abcde 99.75
27 1-total 2-b 3-c 4-def 5-abcde 99.75
28 1-a 2-total 3-c 4-def 5-abcde 99.75
29 1-ab 2-total 3-c 4-def 5-abcde 99.75
30 1-total 2-total 3-c 4-def 5-abcde 99.75
31 1-a 2-b 3-c 4-total 5-abcde 99.75 (*)
32 1-ab 2-b 3-c 4-total 5-abcde 99.75
33 1-total 2-b 3-c 4-total 5-abcde 99.75
34 1-a 2-total 3-c 4-total 5-abcde 99.75
35 1-ab 2-total 3-c 4-total 5-abcde 99.75
36 1-total 2-total 3-c 4-total 5-abcde 123.10

(*) indicates if a new data block is created, read on for explanation

This “exploding” behavior is typically for “multidimensional” databases and it increases exponentially with deeper hierarchies and more dimensions. Vendors for these types of databases have invented all sorts of technical structures to ensure the products will keep on performing. Some of the vendors rely on in-memory data storage, others, like Essbase, rely on a combination of data storage on disk and storage in-memory.

how multidimensionality is implemented in Essbase

Focusing on Essbase again; this is the solution they invented for optimal performance:

  • data is stored on disk (and in memory) in structures called data blocks and indexes;
  • depending on the “performance setting” of a dimension, its members are part of the data block or the data index;
  • each member of a dimension must be set to either “store in the database” or “calculate dynamically”;
  • If the dimension is tagged as “dense”, each member that is set to “store in the database” will always physically get space allocated in the data block;
  • If the dimension is tagged as “sparse”, each member that is set to “store in the database” will be part of the index only if there is a value available.

Back to the example. Suppose we have the following performance settings for our multidimensional database in Essbase:

suppose dim1 and dim2 are dense, the others sparse

As dim1 and dim2 are “dense” they are — by definition —part of the data block. Dim3, dim4 and dim5 are “sparse” and therefore part of the index.

Now also suppose that we set all level 0 members for the dense dimension to be “stored on disk” and all (sub-)totals to be “calculate dynamically”. We set all members for the “sparse” dimensions to “store on disk”. The result is the following data block, having all possible combinations of the dense dimensions: 1-a, 1-b, 1-d, 1e and 2-a, 2-b, 2-c, 2-d, 2-e including a “tag” with a member for each spare dimension:

two data blocks illustrating the reserved spaces for the stored dense members including the reference to its index coordinates

The example above shows the data blocks for the data input (#1) and aggregation #25.

Note that Essbase reserves space for each combination in the data block regardless if there is data for that particular combination. The presence of one value alone, in this case 99.75, forces Essbase to reserve space for 20 values in the data block!

A data block that has an index with all level 0 members is called a level 0 data block. Coordinate #1, the input value, is an example of a level 0 block. If a data block contains a “tag” with at least one index member that refers to a (sub-)total, the block is considered to be non-level 0. Coordinate #25 is an example of a non-level0 block (4-def and 5-abcde are non-level 0).

If Essbase aggregates the data, it will execute the following logical sequence:

  1. open all level 0 data block that are present;
  2. based on these blocks, calculate the (intermediate) aggregations of the sparse dimension combinations and store the results in a new data block or update the existing;
  3. continue until all values are calculated.

If you follow this specific sequence, you will see that only data blocks for #7, #13, #19, #25 and #31 are created in Essbase (check this for yourself). The index will store the combination of sparse coordinates for which data blocks are present. The other combinations in the table (like #8 till #12) will be dynamically calculated using the block with the level 0 dense members. If, for example, a user would request the value for the combination as given in #27, Essbase will perform the following operations:

  1. Retrieve the data block for the given sparse coordinates (this is #25). To load a data block into memory takes a considerable amount of time.
  2. As a result, all 20 values will be available in-memory for Essbase.
  3. Calculate the value “1-total” in memory (dynamically) using the underlying values “1-a” and “1-b”). An in-memory calculation like this is really fast.

Note the remarks for the duration of the specific step; to retrieve a block in memory takes a considerable amount of time; to calculate a value dynamically with data already in memory is relatively fast. Based on this characteristic of Essbase you might notice now, that it is a pretty good decision to only set the level 0 members of the dense dimension to “store on disk”, don’t you think?

how Essbase dense vs sparse affects the disk file

So, the dense dimensions are part of the Essbase data block. And the data block is the data that is stored on disk. Now, let’s focus on how this data will be stored on disk. Below a picture of the data blocks that have been created in our example:

Essbase database file (.pag) as a result of the current dense/sparse settings, the data storage settings of the members and the data in the database. The “x” indicates the value, the empty squares are the empty “reserved spaces”.

The Essbase database file — called “.pag file” or “page file” — contains the space for the input block and the 4 additional blocks that have been generated after aggregation. This representation shows the 5 data blocks positioned on disk next to each other. In reality, the exact structure is determined by block compression and fragmentation due to block recalculation, but that is a different topic. It’s about the general idea.

Suppose, we had all the dense members stored, the data block would consists of 7 x 6 members = 42 spaces. In this case, how many data blocks will be created after aggregation, do you think?

Excacly, also 5, as the number of data blocks are determined by the settings in the sparse dimension and nothing has changed there. The Essbase data block and resulting page file after aggregation would now be:

the data block layout (size) and the resulting Essbase page file when all dense members are stored on disk

aggregation and the page file

Although you now have the sub-totals for the dense dimensions also in the data block, you still need to tell Essbase to aggregate these. Aggregating will mean the execution of the following steps:

  1. open all level 0 data block that are present;
  2. calculate the dense sub-totals;
  3. based on these blocks, calculate the (intermediate) aggregations of the sparse dimension combinations and store the results in a new data block or update the existing;
  4. continue until all values are calculated.

Retrieving the data will have a step less than the smaller block example as the dense sub-totals do not have to be calculated anymore, they are already present.

Now comes the key-point of this explanation: The benefit of having the dense sub-totals already calculated does not weigh against the benefit of having a small block size and have Essbase calculate dense sub-totals on-the-fly! Therefore, out best practice from the DSED to only store the level 0 members of a dense dimension to disk and set the members that can be calculated on the fly  to be calculated dynamically. In this way the data block will remain as small as possible and the resulting Essbase page file will also be as small as possible. A “win” times two!

I sincerely hope that this rather complex explanation has helped you to understand what is happening under the Essbase “hood” when it stores and calculates data. It is all about making the right design decision on Essbase dense vs sparse; which dimension to set to “dense” and which dimension to set to “sparse”. And if you are not sure what to do, I recommend you have a look at our Essbase best practices for Hyperion Planning.

If you are still struggling and would welcome a second opinion from the Essbase expert you might consider trainingour quick stress-relief service or expert consulting. We’re glad to help you out.

-----
about the author
mm

Arthur van den Berg

LinkedIn

I love the calculation power of Essbase and the financial problems you can solve with it. I have developed, implemented and improved dozens of EPM solutions with Essbase at its core and I love sharing what I learned with professionals that need to build and/or support these solutions.

share this post