Showing posts with label memory. Show all posts
Showing posts with label memory. Show all posts

Tuesday, April 7, 2015

What's this Offheap thing anyway?

As you may have noticed already, there's a lot of open-source activity around Ehcache and Terracotta in the past couple of week:
  1. Ehcache 3 Milestone 1 is out, and includes offheap storage. Check it out at http://ehcache.github.io/
  2. Terracotta 4.3 with offheap storage is also available as an open-source offering. Check it out at http://blog.terracotta.org/2015/04/02/terracotta-bolsters-in-memory-open-source-offerings/.
So that's all great and there's a lot to talk about on both these announcements...

But it turns out that one of the first question I got while sharing the news to wider non-tech circles was:
"What's this offheap thing anyway? What's so special about it, and why should I care."

Really fair questions indeed!

So as I wrote my reply and tried hard not to dive in "geeky" land while doing so (please refer to https://github.com/Terracotta-OSS/offheap-store for these technical aspects such as detailed explanations and implementation code), I figured it could be useful to others as well...
So hopefully the following explanation will make sense to a wider non-developper audience (and developers out there too of course!).

So here it is...starting from, well, the start:

Traditionally in Java programming land, the memory space accessible to Java programs (called "heap") is totally managed by the Java Virtual Machine (JVM)...making it much easier for developers to NOT have to think about memory allocations and clean ups (like we used to with programming languages such as C, C++ etc…). And really, "not having to think about memory complexities" is a big part of JAVA’s success over the years.

But the memory management that JAVA performs under-the-hood (refer to as Garbage Collection, or GC) is something that potentially becomes costly performance-wise (lower throughput, higher latencies) especially as the used "heap" space grows (for example, heap space would grow if you started to cache lots of objects in memory)

So to reconcile these 2 contradictory concepts of:

(A) Being able to cache a lot more data (10s of GB or TBs possibly) within your Java application, and
(B) Not incurring a big cost on application performance due to underlying Java memory management operations,

--> Enter Offheap Memory.

Offheap Memory, as the name implies, is a memory space that is "outside the Java heap" (and hence outside the traditional Java memory management responsibilities), but yet still accessible within the Java process through the java.nio API.

So when a product or framework refers to "Offheap" as a general concept, really it means that this product/framework can natively access the machine’s RAM memory directly from the JAVA process (as opposed to doing it the "traditional" way of accessing the machine’s RAM memory through JAVA’s managed memory heap space). In other words, it’s like poking a hole through JAVA’s walls to access the RAM directly.

To the question of why should you care:
  1. With offheap, your Java program can put as much data as it needs in-memory, and access it all within process (there’s no memory limitation aside from the amount of RAM the machine has to offer), even TBs of data (check out this Intel white paper[PDF] showing offheap usage and benchmarks with a single 6TB Intel server)
  2. Your Java program will demonstrate very predictable latencies even if you're storing large amounts of data in-memory (even at the TB scale)...
    1. This is because offheap memory space is not managed by Java in the first place, and as such, storing data in that Offheap space will simply not add any extra JAVA memory management overhead to the picture.
So overall, it’s really the best of both worlds: storing lots of data in memory but not incurring performance unpredictability in the process.

The next question you might have is: if it is such a great concept, why doesn’t everybody do it in their own Java programs?

And the simple answer is that it’s not a straightforward thing to do because you have to create yourself all that low-level memory management when you use the offheap.

And that’s really the "secret" sauce of libraries implementing offheap usage...such as Ehcache/Terracotta libraries (not so secret anymore for ehcache/terracotta since it's officially open-sourced now - refer to offheap-store on github): all these low-level memory mechanisms are done for you and are especially hidden from you so you don’t have to care about them as a Java developer. All you have to know is that you can cache as much as you want/need on a single machine (GBs, TBs even) and that it will not slow down your app unpredictably while doing so (as it would if you were putting all that stuff in the traditional JAVA heap.)

To explore further, find Ehcache Offheap store implementation at https://github.com/Terracotta-OSS/offheap-store

Please leave comments if you have any questions, or better yet, post your question on the Ehcache-users google group!