This is an internal documentation. There is a good chance you’re looking for something else. See Disclaimer.

Memory management

The new persistence layer based on Hibernate behaves a bit differently when a lot of data is loaded. All entities that are loaded in a specific Context will be referenced by the Session until it is closed, even after the transaction is committed. When a lot of data is loaded or persisted within one action, the action should be split up into multiple transactions and contexts.

PartitionedTask

The easiest way to split a big task into multiple transactions is to use a PartitionedTask. A partitioned task is an extension of PersistTask and can be passed to the CommandExecutor.

The PartitionedTask requires an inner task, a subclass of AbstractPartitionedPersistTask, that contains the execution logic.

The AbstractPartitionedPersistTask has the following functionality:

  • A method partition() that converts it into a PartitionedTask, which then can be passed to the CommandExecutor for memory safe execution.

  • Managing the ‘task data’: Task data are Context dependent objects that are required for every iteration (for example an Entity). Because a new context is initialized after a certain number of iterations, these objects need to be reinitialized for each context. The task data object can be defined by overriding the createTaskData() method. During task execution it can be accessed using the getTaskData(Context) method. It is automatically cached for each context, and recreated when a new context is created.

In addition to the inner task, it is possible to specify input and output transformation functions. Remember that context dependent objects (like entities) may not be used as input parameters, therefore EntityId or PrimaryKey should be used instead. To facilitate the conversion between these objects, a BiFunction can be passed to the partitioned task. This function is executed each time a new partition is started and converts the input parameters before they are passed to the inner task.

This means that it’s possible to write the inner task using an Entity argument as usual, but pass EntityId or PrimaryKey instances to the PartitionedTask.

There are several standard transformation functions available, for example PartitionedTask#loadEntities(). Similar functions are available for the output value, for example to convert an Entity to an EntityId.

The final argument is the size of the transaction, that is, how many iterations should be completed before a new context is created.

Note

It is currently only possible to split up into multiple transactions. Hibernate would offer the possibility to flush() and then clear() the session, without committing the transaction, however this is not available in the Tocco API yet (it’s not clear which listeners to invoke on a flush()).

Internally the PartitionedTask simply splits the input arguments into partitions of the given transaction size. For each partition a new Context created. Then the input transformation function is applied and the inner task is executed. After that the output transformation is applied to the result and the context is closed.

EntityList

The behaviour of the different EntityList implementations is a bit different compared to the old persistence layer.

EntityListImpl

The EntityListImpl is the default implementation. It is based on a List of Entity instances. These entities are already loaded, that means this implementation should not be used for very large lists, otherwise a lot of memory will be required.

The EntityListImpl is mainly used as a result of the execute() method of the Query class.

Note

Queries that are expected to have a lot of result rows should not use the execute() method. Instead getKeys() or the PathQueryBuilder should be used (perhaps in combination with a PartitionedTask).

LazyEntityList

The LazyEntityList is based on a PrimaryKeyList. No entities are loaded unless required and getKeys() can be called without any additional queries.

When an Entity is accessed, a number (see setPageSize()) of entities is loaded together.

This implementation works well, when only getKeys() (or only a few entities) are accessed. Also, it does not unnecessarily load all entities, even when they are never used later.

However the loaded entities are always referenced by the list (and context) and high memory usage is still possible when the entire list is loaded.

The LazyEntityList is returned from EntityManager#createEntityList(PrimaryKey...) and PrimaryKeyList#toEntityList().

MemoryEfficientLazyEntityList

The MemoryEfficientLazyEntityList is also based on a PrimaryKeyList and is based on pages like the LazyEntityList.

The difference is that in the MemoryEfficientLazyEntityList only one page is loaded at the same time. Each page is loaded with a new Context, the previous Context is closed as soon as a new page is loaded.

This implementation implements the AutoCloseable interface and should be used with the try-with-resources pattern so that the last Context is closed properly.

This list can be used with very large sizes, because the memory of the previous page is freed when a new page is loaded (or close() is called on the list).

Warning

This list is only efficient when its elements are accessed in the given order. If the elements are accessed randomly, too many data is loaded from the database.

Entity instances obtained from this list should only be used within the loop and primarily for read-only operations. As soon as its Context is closed, it’s no longer possible to participate in a transaction or to load associations.

PrimaryKeyList

The PrimaryKeyList is basically a List<PrimaryKey> with the following additional methods:

It should be used where it can be expected that the size of the list is potentially very large, to indicate to the developer that it’s probably not a good idea to load all entities at once.