Skip to content

Beginner's Guide

Dima Enns edited this page Jan 9, 2022 · 1 revision

What is data virtualization?

Data virtualization is the concept of working with any kind of data but not requiring to know or memorize everything of the data. Take movie information as an example. For a list of movies you wouldn't need all the details like who directed or acted in it. These details might even be irritating for a quick overview. So a list of just titles might be sufficient for an overview. That way the titles represent virtually the whole movie without the need to have all the information of the particular movie at hand. Data such as the director and the actors is virtualized.

So data virtualization let's you virtualize data in such a way that only the information necessary for your use case is used. It's is a broad technical field (for example it is often used in cloud computing) and the implementation details might differ greatly. So it'll make sense to classify the BFF.DataVirtualizingCollection (DVC) project.

How does the DVC fit into the broad field of data virtualization?

The DVC implements and virtualizes the interface IList<T> by loading the items only on demand (i.e. when requested per indexer). That means the DVC mimics a full list as if it wouldn't be virtualized, but only starts loading items selectively as soon as they are requested. So the DVC's use case is whenever you need to work with a relatively huge list, but only actually work on a smaller section of it at once.

Contrary, the DVC won't be a good fit if your application needs all items at an instant.

More on use cases is elaborated in this wiki entry (Why use data virtualization).

Let's go through the Getting-Started sample

The getting started sample is the most minimal example and therefore should be suited as a starting point in usage of the DVC. Because of the minimal nature the sample just shows what is done without detailed explanation. So it still might be difficult to grasp for a person who is completely new to DVC. Therefore we'll have a more detailed look on this minimal example here.

First, we'll take a look at the data source. Then, we'll see what the user has to provide at a minimum, and finally we'll see where to go next.

The data source of the sample

The data source are the integer numbers from 0 to 2,147,483,646 (int.MaxValue - 1). As a mathematical construct integers are inherently virtual. They are nowhere persisted and don't have to be calculated with much effort. Plus, you can skip an arbitrarily big section and just jump into a section which is relevant to you. So it is perfect for a minimal example.

Why so many numbers? Just for demonstration purposes the maximum amount possible was chosen. The index type and the type of the Count-property of IList<T> is int. That means you cannot access more elements by the indexer (well theoretically you could access the negative indices, however UI-collection-elements usually only support positive indices).

A small excursus: in preparation to this wiki page I have tried to replicated a list of the same number sequence but non-virtualized for a rough comparison. I came up with:

var start = DateTime.Now;
        
var max = int.MaxValue / 2;
var array = new int[max]; 
for(var i = 0; i < max; i++)
    array[i] = i;

var list = new List<int>(array);
        
var end = DateTime.Now;
var timeSpan = end - start;

The time span was of three seconds which is an eternity compared to instantaneous. Also please notice that the backing array and therefore also the list only contain half of the amount of items. That's because this sheer amount of items is seemingly impossible to create with the standard array (Lists are always using backing arrays), see the answer for this stackoverflow post for more detail.

So just the amount of items is an example for what is more easily accomplished with the DVC instead of the standard elements (arrays, Lists).

What needs to be provided by the user

There are three things the user needs to give to the DVC at a minimum:

Func<CancellationToken, int> countFetcher = _ => int.MaxValue;
  1. A function fetching the count of the items. It is a Func<CancellationToken, int> parameter, therefore you don't need to conform to an interface instead you can just use anything that retrieve the count for you through a lambda. The cancellation token parameter can be ignored for the getting started sample. So for our sample data source it just returns int.Max.
Func<int, int, CancellationToken, int[]> pageFetcher = (offset, pageSize, _) => Enumerable.Range(offset, pageSize).ToArray();
  1. A function to fetch an arbitrary section (or page) of the virtualized data as an array. It is a Func<int, int, CancellationToken, int[]> parameter, because of the same reasons as the count fetching function. Its first parameter is an offset to skip all items before the offset-index and the second parameter is the amount of consecutive items (page size) the returned array should have. The cancellation token here can be ignored as well. For our sample data source the function Enumerable.Range(int, int) is exactly the implementation we need here.
var notificationScheduler =
    new SynchronizationContextScheduler(
        new DispatcherSynchronizationContext(
            Application.Current.Dispatcher));
  1. A reactive extensions (Rx) scheduler for the change notifications. In case you would like to use the DVC with WPF or a similar UI framework this scheduler will make sure that the change notifications that the DVC wants to send to the UI will be emitted on the right thread (main thread). It's a Rx scheduler, because the DVC utilizes Rx heavily in the background. However, as the sample code shows WPF's dispatcher is easily wrapped into a Rx scheduler.

That's it. A few remarks on the builder pattern:

var dataVirtualizingCollection = DataVirtualizingCollectionBuilder
    .Build<int>(notificationScheduler)
    .NonPreloading()
    .Hoarding()
    .NonTaskBasedFetchers(pageFetcher, countFetcher)
    .SyncIndexAccess();

It is designed in such a way that you should be able to "dot" yourself (i.e. use the autocompletion of Visual Studio or Rider) from start to end. And at each stage there should only be legitimate options be offered which should prevent you from misconfiguration. After der Build-call there always should be four calls choosing one kind of option (these are described in more detail in this wiki page). Again, the options chosen for the getting started sample are the most simple.

Where to from here?

The getting started sample is minimalistic and just a starting point. Depending on what you would like to do, you could do various things from here on:

  • Further explore the sample. For example, put breakpoints into the count and or page fetching functions in order to get a feeling for when and with which parameters the DVC is calling these functions
  • Replace the data source with something that is more likely to help you out or is (closer to) your use case. For example, as you should have seen the count and page fetching function should be mappable to databases (like tables from Sqlite).
  • If you would like to explore into the Builder pattern options and how they would behave, please have the "Sample.View"-project's WPF application recommended to you. It is like a playground and designed to be configurable to all possible option combinations.

Have fun.