Child pages
  • DSO Data Structures Guide
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 88 Next »

About Terracotta Documentation

This documentation is about Terracotta DSO, an advanced distributed-computing technology aimed at meeting special clustering requirements.

Terracotta products without the overhead and complexity of DSO meet the needs of almost all use cases and clustering requirements. To learn how to migrate from Terracotta DSO to standard Terracotta products, see Migrating From Terracotta DSO. To find documentation on non-DSO (standard) Terracotta products, see Terracotta Documentation. Terracotta release information, such as release notes and platform compatibility, is found in Product Information.

Unknown macro: {div9}
Release: 3.6
Publish Date: November, 2011

Documentation Archive »

Clustered Data Structure Guide

Introduction

This guide is intended to provide you with the information you need to make informed decisions about which data structures to use in which context, how to maximize their efficiency, and how to avoid certain pitfalls.

Clustered data structures in Terracotta are functionally equivalent to their non-clustered counterparts. However, there are some performance differences between the different data structures in a clustered context that are important to understand when designing a system. Some of the performance differences are inherent to the design of the data structure; for example, the very different synchronization and concurrency characteristics of Hashtable, HashMap, and ConcurrentHashMap. Other performance differences come from the implementation of a datastructure that is expected to work in a local context but operates somewhat differently in a clustered context. Still other differences are simply the result of point-in-time implementation details in the Terracotta core engine. Be sure to check this document regularly to see the status of these point-in-time effects.

Unknown macro: {HTMLcomment}

Coming Soon

A discussion of cross-JVM latency and concurrency patterns.

Latency

TODO: describe the effect of cross-JVM latency and how to plan for it – what is "cross-JVM" latency

Concurrency

TODO: describe the effect of different concurrency patterns and how to plan for them.

Important Concepts for Concurrent Data Structures

A number of important concepts relating to concurrent data structures are discussed in this document. These concepts are defined in the following sections.

Latency

Latency is the lag of response to some requested action.

Locality of Reference

High locality of reference typically implies low latency because required data is available in local memory. Locality of reference is Low if the fault rate for data is high.

Concurrency

Partial Loading

A data structure that can be partially loaded has parts of itself loaded into a client JVM, while other parts remain unloaded until accessed. Partial loading, which is sometimes referred to as "lazy loading," happens on Terracotta clients, not Terracotta server instances.

To understand one of the main reasons for partial loading, it's important to understand logically managed objects. Most of the data structures in the java.util and java.util.concurrent packages that are supported by Terracotta are logically managed, which means they are kept up-to-date across JVMs by replaying the method calls made in another JVM. This is done for performance reasons and, in the case of hash-based structures, for correctness. However, because they aren't managed by keeping track of their physical references to other objects, they do not share the same virtual heap mechanism enjoyed by physically managed objects. (For more information on physically and logically managed objects, see the Concept and Architecture Guide)

Without special support in the Terracotta core, logically managed data structures are loaded in their entirety into a client JVM when they are accessed. In the case of very large data structures, this can lead to a number of application issues:

  • The data structure may take a long time to load from the Terracotta server.
  • The data structure may take up too much space in the local JVM heap.
  • The data structure may exceed the available JVM heap, leading to OutOfMemoryErrors.

To solve these problems, Terracotta implemented partial loading for some data structures. In the case of certain Map implementations, the entire key set is loaded when the Map is loaded, but the values are not. The values are loaded automatically when the value is accessed. This can have a significant improvement on heap usage for clustered Maps. However, the following conditions can negate the benefits of partial loading:

  • Values composed of Java primitives – Only values composed of object references can be partially loaded. A map with large numerical or String values is fully loaded.
  • Extremely large key sets – A collection's keys are always fully loaded. Maps with an extremely large number of keys, and maps with extremely large keys, may not benefit from partial loading.

Because the partial loading algorithm for each data structure is different, not all data structures are partially loaded. The implication is that data structures that are partially loaded have different implementation and performance characteristics under different scenarios. For a list of data structures that can be partially loaded, see [Data Structure Characteristics Table].

Synchronization

Synchronization refers to standard Java synchronization, which makes a data structure thread-safe.

Auto-Locking

Auto-locking refers to the locking available on synchronized data structures. In Terracotta, auto-locking can be automatic on certain data structures, or configured on other data structures.

Data Structure Characteristics Table

The following table shows how a number of the concepts discussed in [Important Concepts for Concurrent Data Structures] are applied to certain data structures. If a data structure name in the Data Structure column is a link, you can click it to go to a section with more information on that data structure.

Data Structure

Partial Loading

Concurrent

Synchronized

Auto-Locking

Array

Unknown macro: {center}

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

ArrayList

Unknown macro: {center}

*

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

AtomicInteger

Unknown macro: {center}

N/A

Unknown macro: {center}

Unknown macro: {center}

Unknown macro: {center}

N/A

AtomicLong

Unknown macro: {center}

N/A

Unknown macro: {center}

Unknown macro: {center}

Unknown macro: {center}

N/A

ConcurrentHashMap

Unknown macro: {center}

Unknown macro: {center}

Unknown macro: {center}

Unknown macro: {center}

HashMap

Unknown macro: {center}

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

HashSet

Unknown macro: {center}

*

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

HashTable

Unknown macro: {center}

Unknown macro: {center}

*

Unknown macro: {center}

Unknown macro: {center}

TIM

LinkedBlockingQueue

Unknown macro: {center}

Unknown macro: {center}

N/A

Unknown macro: {center}

Unknown macro: {center}

LinkedHashMap

Unknown macro: {center}

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

LinkedList

Unknown macro: {center}

*

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

TreeMap

Unknown macro: {center}

2.8

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

TreeSet

Unknown macro: {center}

*

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Unknown macro: {center}

N/A

Vector

Unknown macro: {center}

*

Unknown macro: {center}

*

Unknown macro: {center}

Unknown macro: {center}

TIM

– The characteristic is part of this data structure.
TIM – The characteristic is not part of this data structure, but can be added by using the appropriate Terracotta Integration Module.
N/A – The characteristic is not applicable to (or is not a part of) this data structure.
* – The characteristic is not currently part of this data structure, but may become so in the future.
Version number – Support for this characteristic begins with the indicated Terracotta version number.

Selected Data Structures

ConcurrentHashMap

Unknown macro: {table}
Unknown macro: {tr}
Unknown macro: {td}

Package

Unknown macro: {td}

java.util.concurrent

Unknown macro: {tr}
Unknown macro: {td}

Type

Unknown macro: {td}

Map

Unknown macro: {tr}
Unknown macro: {td}

Available with Java Version

Unknown macro: {td}

1.5

Unknown macro: {tr}
Unknown macro: {td}

Latency Characteristics

Unknown macro: {td}

Slow on full iterations because all segments must be loaded and traversed. Low-latency if accessing narrow set of values; higher latency if accessing wide set of values.

Unknown macro: {tr}
Unknown macro: {td}

Special Characteristics

Unknown macro: {td}

Read access is fully concurrent. Write access is exclusive of read and write access per segment.

ConcurrentHashMap (CHM), unlike a synchronized wrapper around a java.util.HashMap, has very good concurrency characteristics that are exploited by Terracotta in a clustered context.

  • Unlike synchronized collections which are locked using a monolithic mutex for all operations, CHM uses read and write locks.
  • CHM keys are partitioned into separate segments so when exclusive access is required (e.g., two threads need to write to the same segment), only the segment is locked instead of the entire data structure.
  • Terracotta clustered CHMs have full read concurrency such that all read operations are allowed to occur concurrently.
  • Terracotta clustered CHMs have a read locking optimization such that all read lock acquisitions become greedy concurrently on all relevant nodes. This allows read lock acquisition to happen without a network hop to the Terracotta server on every node. For more information on the greedy lock optimization in Terracotta, see the Concept and Architecture Guide.
  • Read operations do not have to wait for write operations on other partitions to fully complete.
  • While the key set is monolithically loaded, the values are partially loaded.

Coming Soon

Information on the following data structures:

Array Primitives

ArrayList

HashMap

HashTable

LinkedBlockingQueue

HashSet

LinkedList

POJOs

While POJOs are not explicitly a collection, they can be constructed to behave like collections. If your code uses POJOs

TreeMap

TreeSet

Vector

  • No labels