Monday, April 13, 2015

Java 7 and TLSv1.2 - supported, but not enabled by default

Lately, I had to investigate how to make a Java 7 client connect to a server resource (in my case a JMS broker server) using sockets or http traffic over secure TLS v1.2 protocol (and only TLS v1.2... all other communication protocols should be refused). This would all have been a breeze if my client app was on Java 8...but being on Java 7, it was a completely different story.

At first, I thought it'd be easy since my JMS broker server (Terracotta Universal Messaging in this case) already supports TLS v1.2, lets you enforce it through system property (-DSSLProtocols), and also lets you pick which cipher(s) to use. So I thought all I needed was to set that system property on the server side and disable in the UI (called Enterprise Manager) all ciphers for this communication interface BUT the TLS v1.2 ciphers I wanted to use (the "SHA256" cipher suites in this case -- check full list of available ciphers for TLS v1.2 at shown below in screenshot:

And everything would work fine, right? The client would start the secure handshake with the server, figure out that TLSv1.2 is the only protocol that can be used, and use the right secure protocol version and ciphers as defined by the server.

Well, this theoretical scenario is partially true: Everything is setup properly on the server side, as it rightly says "I only accept TLSv1.2 sessions with the following ciphers". But unfortunately, the Java 7 client does not seem to be able to start a TLSv1.2 secure connection by default (if you're on Java 8, stop reading right now -- unless you're curious of course -- as it would all work fine out of the box)

And yes, after quick research, Java 7 introduced support for TLS v1.2 (refer to BUT does not enable it by default. In other words, your client app must explicitly specify "TLS v1.2" at SSLContext creation, or otherwise will just not be able to use it.

This is a pretty important nuance that you can test yourself very easily with the following 2 test cases:

1) Output for Gist - Java 7 + SSLContext context = SSLContext.getDefault()
Supported Protocols: 5
Enabled Protocols: 1
Enabled Ciphers: 80
... list of available ciphers omitted here ...
2) Output for Gist - Java 7 + SSLContext context = SSLContext.getInstance("TLSv1.2")
Supported Protocols: 5
Enabled Protocols: 3
Enabled Ciphers: 80
... list of available ciphers omitted here ...
So this is great! All you and I need is to update the client app's code to specify TLSv1.2 at SSL Context creation! 
But wait...what if I'm using a client library to perform all that low level SSLContext connection that case, it's going to be tricky to update the code, unless specifying the SSL protocol versions and ciphers are exposed to client app in some way (eg. system properties).
If you reach this situation, here are 3 solutions I found (well, I found 1 and 3...solution 2 is credited to a smarter developer :) )
1) If all you want is to create a HTTPS connection (over TLSv1.2) from your client app to the server, you're in luck: all you need is add the following 2 system properties to your client app and the SSL context will be created properly as specified:
  • https.protocols ==> -Dhttps.protocols=TLSv1.2 (comma-separated list of you want to use several protocols)
  • https.cipherSuites ==> -Dhttps.cipherSuites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (comma-separated list of you want to use several ciphers)
2) If you want to use the secure socket protocol (as opposed to HTTPs traffic), the above properties won't work unfortunately...but you could create a "TLSv1.2" SSLContext at application startup and use the "SSLContext.setDefault(ctx)" call to register that new context as the default one.

3) Upgrade your client to JAVA 8, which enables TLSv1.2 protocol by default (at this time, JAVA 8 should be already used anyway, or on most project roadmaps hopefully)

That's it! Another fun investigative work day at the office!

Tuesday, April 7, 2015

What's this Offheap thing anyway?

As you may have noticed already, there's a lot of open-source activity around Ehcache and Terracotta in the past couple of week:
  1. Ehcache 3 Milestone 1 is out, and includes offheap storage. Check it out at
  2. Terracotta 4.3 with offheap storage is also available as an open-source offering. Check it out at
So that's all great and there's a lot to talk about on both these announcements...

But it turns out that one of the first question I got while sharing the news to wider non-tech circles was:
"What's this offheap thing anyway? What's so special about it, and why should I care."

Really fair questions indeed!

So as I wrote my reply and tried hard not to dive in "geeky" land while doing so (please refer to for these technical aspects such as detailed explanations and implementation code), I figured it could be useful to others as well...
So hopefully the following explanation will make sense to a wider non-developper audience (and developers out there too of course!).

So here it is...starting from, well, the start:

Traditionally in Java programming land, the memory space accessible to Java programs (called "heap") is totally managed by the Java Virtual Machine (JVM)...making it much easier for developers to NOT have to think about memory allocations and clean ups (like we used to with programming languages such as C, C++ etc…). And really, "not having to think about memory complexities" is a big part of JAVA’s success over the years.

But the memory management that JAVA performs under-the-hood (refer to as Garbage Collection, or GC) is something that potentially becomes costly performance-wise (lower throughput, higher latencies) especially as the used "heap" space grows (for example, heap space would grow if you started to cache lots of objects in memory)

So to reconcile these 2 contradictory concepts of:

(A) Being able to cache a lot more data (10s of GB or TBs possibly) within your Java application, and
(B) Not incurring a big cost on application performance due to underlying Java memory management operations,

--> Enter Offheap Memory.

Offheap Memory, as the name implies, is a memory space that is "outside the Java heap" (and hence outside the traditional Java memory management responsibilities), but yet still accessible within the Java process through the java.nio API.

So when a product or framework refers to "Offheap" as a general concept, really it means that this product/framework can natively access the machine’s RAM memory directly from the JAVA process (as opposed to doing it the "traditional" way of accessing the machine’s RAM memory through JAVA’s managed memory heap space). In other words, it’s like poking a hole through JAVA’s walls to access the RAM directly.

To the question of why should you care:
  1. With offheap, your Java program can put as much data as it needs in-memory, and access it all within process (there’s no memory limitation aside from the amount of RAM the machine has to offer), even TBs of data (check out this Intel white paper[PDF] showing offheap usage and benchmarks with a single 6TB Intel server)
  2. Your Java program will demonstrate very predictable latencies even if you're storing large amounts of data in-memory (even at the TB scale)...
    1. This is because offheap memory space is not managed by Java in the first place, and as such, storing data in that Offheap space will simply not add any extra JAVA memory management overhead to the picture.
So overall, it’s really the best of both worlds: storing lots of data in memory but not incurring performance unpredictability in the process.

The next question you might have is: if it is such a great concept, why doesn’t everybody do it in their own Java programs?

And the simple answer is that it’s not a straightforward thing to do because you have to create yourself all that low-level memory management when you use the offheap.

And that’s really the "secret" sauce of libraries implementing offheap usage...such as Ehcache/Terracotta libraries (not so secret anymore for ehcache/terracotta since it's officially open-sourced now - refer to offheap-store on github): all these low-level memory mechanisms are done for you and are especially hidden from you so you don’t have to care about them as a Java developer. All you have to know is that you can cache as much as you want/need on a single machine (GBs, TBs even) and that it will not slow down your app unpredictably while doing so (as it would if you were putting all that stuff in the traditional JAVA heap.)

To explore further, find Ehcache Offheap store implementation at

Please leave comments if you have any questions, or better yet, post your question on the Ehcache-users google group!

Friday, December 5, 2014

JBOSS Vault to encrypt JMS password for secure JCA configuration

In the previous post (SSL Encryption / Authentication between JBOSS JCA + webMethods Broker) I explained how you can setup a MDB (hosted on JBOSS) to securely connect and consume JMS messages from SoftwareAG webMethods Broker using JCA, SSL encryption and authentication...

That was a very easy setup since all we had to do is add a couple of system properties in the JBOSS admin console to make this work fine.
But as you may have noticed, we specified the keystore password in clear text (for simplicity sake...and also because I knew I'd be writing this post soon after of course!!) as part of those system properties...and that should raise a couple of alarms for most IT professional...

So this post is to explain how we can remediate this situation using JBOSS built-in Vault feature.

Please note this post is the 3rd out of the following 3 related posts:
  1. Integrating SoftwareAG webMethods messaging Broker with JBOSS AS 7 through standard JCA
  2. SSL Encryption / Authentication between JBOSS JCA + WebMethods Broker
  3. JBOSS Vault to encrypt JMS password for secure JCA configuration


In short, this component offers a very nice way to obfuscate/encrypt sensitive text information within JBOSS configuration files. Using this concept, I'm going to show how to encrypt the SoftwareAG webMethods Broker password and how to use it through the Resource Adapter configuration...

First, Since I'm not a big fan of recreating documentation when the original product one is pretty good already, please refer to the well-written JBOSS EAP 6.1 doc to setup the JBOSS Vault for your environment.

If you follow it pretty closely, you should have:
- a keystore file saved somewhere on your file system,
- added in that keystore the various sensitive passwords you want to securely use in the JBOSS configuration, by using the provided JBOSS script
- A "VAULT" block in your JBOSS configuration file, similar to the following:


From there, you should be able to add the encrypted strings in most JBOSS configurations without too much problem...using the pattern that should have been given to you during the encrypting process...

For example, here is my sample "encrypted" string for the test password I used with my test instance of webMethods Broker:

Wrong instinct!

Ok now if you're like me, your first reflex is going to be to use this encrypted string in the password system property we added in the previous post...


But unfortunately, that does not least not with EAP 6 standalone...Seems like it's due to a race condition where the JBOSS Vault is not yet initialized when the system properties are added...hence the system property "com.webmethods.jms.password" ends up still having the encrypted string in its opposed to the decrypted value...

And of course, this is not quite what we the resource adapter is not going to be able to do anything with that encrypted password...

Keep pushing, there's light at the end of the tunnel!

But fear not! There's another simple way to make use of that encrypted password (and I think ultimately a better way...which does not save the clear password in a system property that is readable by anybody...).

The generic jms resource adapter provides 2 properties for username and password (refer to user guide), and fortunately for us, we can use the {VAULT} encrypted string for those since the resource-adapters subsystem is initialized after the VAULT subsystem!

So the solution is to remove from the global system properties block those 2 system properties for username and password (com.webmethods.jms.username, com.webmethods.jms.password), and instead add them in the wm Broker Resource Adapter configuration block a follow (using the VAULT encrypted password instead of the clear text one!):

                [some jms username]

Now, the Resource Adapter component will "magically" get access to the right decrypted password (because the VAULT will have decrypted it first) and provide it (alongside the username) to the JMS new connection method!

Done and happy...onto next challenge!

So with all this in place, you get a secured JBOSS / Resource Adapter configuration file free of any clear text password...and all the while can take advantage of JBOSS Vault auto-decryption to have your application components (and/or resource adapter in this case) use that precious password in the very same way as before...all this without having to add/write one extra line of code!! Sweet!

SSL Encryption / Authentication between JBOSS JCA + SoftwareAG webMethods Broker

In our previous post (Integrating SoftwareAG webMethods messaging Broker with JBOSS AS 7 through standard JCA) we created a simple setup to publish and consume JMS messages using JCA Resource Adapter construct on JBOSS AS 7.

This post will extend this simple setup by explaining how to use secure communications (SSL encryption + SSL Authentication) between JBOSS and webMethods Broker.

Please note this post is the 2nd out of the following 3 related posts:
  1. Integrating SoftwareAG webMethods messaging Broker with JBOSS AS 7 through standard JCA
  2. SSL Encryption / Authentication between JBOSS JCA + SoftwareAG webMethods Broker
  3. JBOSS Vault to encrypt JMS password for secure JCA configuration

First, let's assume that you're already a webMethods Broker expert and have already setup your Broker server with the right SSL certificates (and if not, please refer to that "pretty"-screenshots SoftwareAG "techcommunity" document (PDF - 4MB) I was referring to in the previous post -- go to "Configuring SSL Communication / Authentication" on page 13).

And all we need to do now is to have our JBOSS client encrypt all communications and authenticate to Wm Broker over SSL...

It's actually very easy:
All you need to do is add the right system properties for it (jboss admin console at “profile > General Configuration > System Properties”)...and the wM Broker client library will take care of the rest without changing anything in the code or configuration!

Here are the needed properties:
  • com.webmethods.jms.username
  • com.webmethods.jms.password
  • com.webmethods.jms.ssl.keystore
  • com.webmethods.jms.ssl.keystoretype
  • com.webmethods.jms.ssl.truststore
  • com.webmethods.jms.ssl.truststoretype
All these values must match the keystore and trustore you used on the wM Broker server side...

Important notes:
  • keystore should be of type "PKCS12" (keystoretype=PKCS12)
  • trustore of type "JKS" (truststoretype=JKS)
  • username / password must match (of course) the ones used by your keystore...

That's it: once those properties are set, you should be able to verify in the wM Broker client sessions are indeed using SSL encryption + SSL authentication...

Wednesday, July 23, 2014

Integrating SoftwareAG webMethods messaging Broker with JBOSS AS 7 through standard JCA

Lately I had to work a bit on integrating some of SoftwareAG's messaging brokers (SoftwareAG webMethods Broker and Universal Messaging) with common application servers (JBOSS AS 7, WebSphere 8) through standard JCA resource adaptor construct.

So this post, the first of 3, is to summarize (before I forget myself :) ) some of the steps involved with that setup using SoftwareAG webMethods JMS Broker + JBOSS combination...Hoping that it might be useful to somebody in the meantime.

Please note this post is the 1st out of the following 3 related posts:
  1. Integrating SoftwareAG webMethods messaging Broker with JBOSS AS 7 through standard JCA
  2. SSL Encryption / Authentication between JBOSS JCA + SoftwareAG webMethods Broker
  3. JBOSS Vault to encrypt JMS password for secure JCA configuration

Note 1*: all the resources/code I mention in this post are accessible on github at jbossjca-sample-mdbs

Note 2**: If you like a more formal documentation with "pretty" screenshots, I posted such doc (PDF) on the SoftwareAG "techcommunity" resource wiki, accessible publicly at Please check it out as well...

First words

First, many posts out there talk about the JCA Resource Adapter construct and why it is useful...(eg. Yes it is useful as it decouples your code from the JMS low level implementation, and make directly available to you all the enterprise production-ready features that you really don’t want to re-develop yourself (unless you have a lot of extra time on your hand and don’t know what to do with it) such as connection pooling, transactional support, connection validation, connection failure strategies, reconnection strategies, etc…

Secondly, if you’re reading this, you’re likely a knowledgeable webMethods users already, and as such, I won’t go into the details of setting up the SoftwareAG webMethods JMS Broker etc... But for you to reproduce quickly some of the steps identified by this post, I’ve added the admin script that allows you to create all the webMethods Broker objects in 1 liner executable command (using the jmsadmin tool), as follow:

$WM_BROKER_HOME/bin/jmsadmin -properties -f jmsadmin.script

At this point, you should have a SoftwareAG webMethods Broker working and accessible with the following objects:
  • InboundQueueConnectionFactory
    • This is the factory we’ll use to consume messages from our MDBs
  • OutboundQueueConnectionFactory
    • This is the factory we’ll use to send messages from our sample servlet
  • simplequeue
    • This is the queue we’ll use
Let’s now get started on the JBOSS side.

1 - Deploy the SoftwareAG webMethods Broker RAR

  • The RAR package is at $WM_BROKER_HOME/lib/webm-jmsra.rar
  • Deploy RAR package onto JBOSS using either way:
    • Copy to the JBOSS deployment folder (<JBOSS-EAP-HOME>/standalone/deployments) and the RAR should be deployed automatically
      • A file “webm-jmsra.rar.deployed” should be created.
      • If nothing is created, or a file “webm-jmsra.rar.failed” is created, an error occurred during deployment.
    • Use the JBOSS admin console to deploy the package just like you would do it for any other deployable resource (EAR, WAR, etc…)
      • Success or failure should be displayed in the console

2 - Configure the Resource Adapter

Then, we just need to configure the resource adapter for both inbound (used by message consumers) and outbound (used by message producers). This can all be done thorough the JBOSS admin console at “profile > subsystems > Connector > Resource Adapters”.
For brevity, I’ll pass on the multi-screen setup (it's really well explained on the red-hat website at Red-Hat Doc: Configure_a_Deployed_Resource_Adapter), and show here the end result that will be written in jboss configuration (standalone.xml or domain.xml) within the subsystem resource-adapters:

Couple of quick notes on this:
  • The resource adapter id / name should be same as the RAR you just uploaded
  • "JndiProperties" setting is self-explanatory: it's the usual connection settings to the JMS broker JNDI. For SoftwareAG webMethods broker, the factory class is "com.webmethods.jms.naming.WmJmsNamingCtxFactory"...make sure to use that. And then, customize the url based on your setup.
  • The connection-definition section is for outbound pooled connections (sending messages), hence that's why we used the "OutboundQueueConnectionFactory" broker object for the "ConnectionFactoryJndiName" setting
  • In the connection-definition section, you notice also that it's registered in JBOSS JNDI with the name specified in jndi-name="java:/jms/broker". This is important as we'll need to refer to that in our code to send messages to the queue (see further down)
  • In that same connection-definition section, you can also see that the "class-name" attribute is "com.sun.genericra.outbound.ManagedJMSConnectionFactory" not change that, as it's the resource adapter connection factory that will take advantage of application server connection pooling amongst others
  • "pool" section: customize it to meet you needs
For a complete reference of all the properties available in the resource adapter, please check out the RA implementation page at user guide (this is the core RA implementation used in the SoftwareAG webMethods Broker RAR package)

When you restart JBOSS, you now should see some JCA activity in the console output. If something is not quite right, and you'd like to see more of what's going on under the hood, an easy way I found was to enable deeper logging within JBOSS for the resource-adapter components:





3 - Tune SoftwareAG webMethods Broker behavior by setting the right

To tweak the webMethods broker client library, all you need to do is add the right system property in JBOSS...This can be done through the admin console at “profile > General Configuration > System Properties”, or simply be written directly in the jboss configuration (standalone.xml or domain.xml), right under the "extensions" section...
Here is a sample block containing some useful webMethods Broker properties:


4 - Create your MDBs with the right Activation Properties

Here is a sample of an MDB that just prints the received messages in the logs...(full code and working maven-enabled project is on github at

As you noticed, I specified - on purpose - a lot of the activation config available in the resource adapter implementation. Interestingly though, you don't see the "JndiProperties" activation spec here...and that's good and expected, since we specified it in the resource adapter "JndiProperties" property...That way I don't have to specify the connection definition in all my MDBs...which is great.

For a complete reference of all the activation parameters available, refer to user guide and go to the "Activation Spec Properties" section.


@MessageDriven(name = "SimpleQueueConsumerBean", activationConfig = {
  @ActivationConfigProperty(propertyName = "connectionFactoryJndiName", propertyValue = "InboundQueueConnectionFactory"),
  @ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Queue"),
  @ActivationConfigProperty(propertyName = "destinationJndiName", propertyValue = "simplequeue"),
        @ActivationConfigProperty(propertyName = "maxPoolSize", propertyValue = "50"),
        @ActivationConfigProperty(propertyName = "maxWaitTime", propertyValue = "10"),
        @ActivationConfigProperty(propertyName = "redeliveryAttempts", propertyValue = "10"),
        @ActivationConfigProperty(propertyName = "redeliveryInterval", propertyValue = "1"),
        @ActivationConfigProperty(propertyName = "reconnectAttempts", propertyValue = "10"),
        @ActivationConfigProperty(propertyName = "reconnectInterval", propertyValue = "5")

public class SimpleQueueConsumerBean implements MessageListener, MessageDrivenBean {

 public void onMessage(Message rcvMessage) {
  TextMessage msg = null;
  try {
   if(null != rcvMessage){
    if (rcvMessage instanceof TextMessage) {
     msg = (TextMessage) rcvMessage;"SimpleQueueConsumerBean: Received Message from queue: " + msg.getText());
    } else {
                    log.error("SimpleQueueConsumerBean: Message of wrong type: " + rcvMessage.getClass().getName());
   } else {
      "SimpleQueueConsumerBean: Received Message from queue: null");
  } catch (JMSException e) {
   throw new RuntimeException(e);

Another thing you might have noticed is the 2 JBOSS-specific annotations @ResourceAdaper (org.jboss.ejb3.annotation.ResourceAdapter) and @Pool (org.jboss.ejb3.annotation.Pool). This is 1 of the ways to specify which resource adapter and MDB pool your MDB should use.

Depending on your setup (eg. all your MDBs should be using the same resource adapter + parameters), it might just be easier/better to make the webMethods Broker resource adapter the default one, and assign it a default pool...This is done in the EJB3 subsystem, as follow:





Tune the "mdb-strict-max-pool" to meet your performance needs...One rule of thumb though: whatever "max-pool-size" you chose for the MDB pool, make sure that it's the same number also for the maxPoolSize activation property. For example, I used 50 for both.

@ActivationConfigProperty(propertyName = "maxPoolSize", propertyValue = "50"),

5 - Create message producer that uses the Resource Adapter outbound pooled connection

Ok, so now we need to send messages to that webMethods Broker queue and see if our MDB setup works ok. Sending messages to a queue is nothing new in Java JMS and we've written that type of code thousands of time. BUT the interesting part here is to send messages using the outbound connection defined in the resource adapter and identified by the JBOSS JNDI name "java:/jms/broker". By doing so, we automatically get access to app server goodness such as connection pooling for best performance...
All your code needs to do "different" (from a non-managed implementation) is to bind the connection factory to that jndi entry "java:/jms/broker", which is easily achieved using the standard @resource annotation. See the extract below for details (full class at
public class JcaMessageProducer extends HttpServlet {

    //this uses the resource-adapter to make sure it's a managed connection etc...
    @Resource(mappedName = "java:/jms/broker")
    private ConnectionFactory connectionFactory;

 private void sendMessage(String textToSend, String destinationName, boolean isQueue) throws JMSException {
        Connection connection = null;

        try {
            if (null == connectionFactory)
                throw new JMSException("connection factory is null...can't do anything.");

            connection = connectionFactory.createConnection();
            Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);

            //here we avoid a JNDI lookup...
            Destination destination;
            if (isQueue)
                destination = session.createQueue(destinationName);
                destination = session.createTopic(destinationName);

            MessageProducer messageProducer = session.createProducer(destination);
            TextMessage message = session.createTextMessage();

  "Sending new message to %s %s : %s ", (isQueue) ? "queue" : "topic", destinationName, textToSend));

            messageProducer.send(message); // Send Message

  "Messages Sent"));
        } catch (Exception e) {
            log.error("error while sending message", e);
            throw new JMSException("Couldn't send to queue");
        } finally {
            if (null != connection)

6 - Deploy and test!

To deploy the sample code directly onto your local JBOSS, a simple maven command (thanks maven and jboss plugin!):
"mvn clean package jboss-as:deploy"

Once successfully deployed, you should be able to access the URL "http://localhost:8080/jbossjca-sample-mdbs/JcaMessageProducer", providing the right parameter for queue name and number of message to send...

For example, I have a queue called "simplequeue" and want to send 20 message...hence url would be: "http://localhost:8080/jbossjca-sample-mdbs/JcaMessageProducer?queue=simplequeue&count=20"

Upon execution of that url, you should see in the JBOSS standard output the messages being submitted as well as consumed by our sample MDB...

Final Words

I think that's it for now...You can easily recreate all this on your local laptop (as long as you have access to JBOSS EAP 6 / JBOSS AS 7 and webMethods Broker that is) as I put everything I wrote about (and more) on github at j2ee-jms-examples.

To build and deploy the project:

  • First, You'll need to put the webMethods Broker client libraries in the jbossjca-sample-mdbs/libs folder (not required to compile, but required to be added to the WAR package). Another way would be to put these jars in a Jboss global module...
  • Then simple maven command: mvn clean package jboss-as:deploy -P jboss-ga-repository
In some follow up posts, I'll go over the steps involved in setting up JBOSS for SSL encryption and authentication to SoftwareAG webMethods well as specifics around JBOSS vault usage in order to encrypt the password for SSL certs...

Hope that was helpful...

Friday, July 19, 2013

Terracotta BigMemory-Hadoop connector: A detailed hands-on tutorial

In my previous post, "How to reconcile Real-Time and Batch processing using In-Memory technology: A demo at the AFCEA Cyber Symposium Plugfest", I went over the challenges and benefits of reconciling real-time analytics with batched analytics. Doing so, I explained the solution we put together to create an integrated Real-Time analytical capability "augmented" by a batched BigData Hadoop cluster.

A critical piece of that architecture is the ability for Terracotta BigMemory to act as a fast In-Memory buffer, accessible by both the real-time world and the batch world...effectively bridging the gap between the 2.
The Terracotta BigMemory-Hadoop connector is at the center of that piece, allowing hadoop to write seamlessly to BigMemory.

For general information, please refer to existing writings about this connector:
But in this post, I want to be "hands-on" and enable you to see it running for yourself on your own development box. I've outlined the 5 major steps to successfully install and test the Hadoop-to-BigMemory connector on your own development platform.
I'll be using as a guide the code I put together for "AFCEA Cyber Symposium Plugfest", available on github at

Master Step 1 - Get the software components up and running

1 - Let's download the needed components:

2 - Clone the git repository to get the cyberplugfest code: 

git clone
In the rest of the article, we will assume that $CYBERPLUGFEST_CODE_HOME is the root install directory for the code.

3 - Extract the hadoop connector somewhere on your development box. 

The content of the package has some simple instructions as well as a "wordcount" map reduce package.
If you want to explore and follow the default instructions + sample word count program, it works fine…but please note that I took some liberties when it comes to my setup…and these will be explained in this article.
In the rest of the article, we will assume that $TC_HADOOP_HOME is the root install directory of the terracotta hadoop connector.

4 - Install, configure, and start BigMemory Max 

Follow this guide at Make sure to try the helloWorld application to see if things are setup properly.
In the rest of the article, we will assume that $TC_HOME is the root directory of BigMemory Max.

I added a sample tc-config.xml at

To get bigmemory-max started with that configuration file on your local machine, run:

export CYBERPLUGFEST_CODE_HOME=<root path to cyberplugfest code cloned from github>
export TC_HOME=<root path to terracotta install>
$TC_HOME/server/bin/ -f $CYBERPLUGFEST_CODE_HOME/configs/tc-config.xml -n Server1

5 - Install Hadoop

I used the pseudo distributed mode for development…Tuning and configuring hadoop is outside the scope of this article…but should certainly be explored as a "go further" step. The apache page is good to get started on that...
In the rest of the article, we will assume that $HADOOP_INSTALL is the root install directory of Apache Hadoop

6 - Add the needed terracotta libraries to the Hadoop class path

  • The hadoop connector library: bigmemory-hadoop-0.1.jar
  • The ehcache client library: ehcache-ee-<downloaded version>.jar
  • The terracotta toolkit library: terracotta-toolkit-runtime-ee-<downloaded version>.jar
Note: I've downloaded on step 4 the version 4.0.2 of bigmemory-max, so that's the version I'll be using here. Adjust appropriately the HADOOP_CLASSPATH below based on the version you downloaded.

Edit $HADOOP_INSTALL/conf/ and add the following towards the top (replace the default empty HADOOP_CLASSPATH= line with it)
export TC_HOME=<root path to terracotta install>
export TC_HADOOP_HOME=<root path to hadoop install>
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TC_HADOOP_HOME}/jars/bigmemory-hadoop-0.1.jar:${TC_HOME}/apis/ehcache/lib/ehcache-ee-2.7.2.jar:${TC_HOME}/apis/toolkit/lib/terracotta-toolkit-runtime-ee-4.0.2.jar

7 - Start Hadoop in pseudo-distributed mode

Master Step 2 - Write the Map/Reduce job using spring-hadoop and Terracotta BigMemory output connector

Ok, at this point, you should have all the software pieces (big memory max and hadoop) ready and running in the background. Now it's time to build a map/reduce job that will output something in Terracotta BigMemory. For the "AFCEA Cyber Symposium Plugfest" which this article is based on, I decided to build a simple "Mean Calculation" map/reduce job…the idea being that the job would run on a schedule, calculate the mean for all the transactions per Vendor, and output the calculated mean per vendor into a Terracotta BigMemory cache.

Key Value
Vendor A Mean A
Vendor B Mean B
... ...
Vendor N Mean N

And since I really like Spring ( and wanted to extend the simple hadoop wordCount example, I decided to use Spring-Data Hadoop ( to build the map reduce job for the plugfest.

Some really good tutorials for Spring Hadoop out there, so I don't want to duplicate here…One I liked for it's simplicity and clarity was

Rather, I'll concentrate at the specificities related to the Terracotta BigMemory output writing.
Code available at:

1 - Let's explore the application-context.xml

a - Specify the output cache name for the BigMemory hadoop job

In the <hdp:configuration><hdp:configuration>, make sure to add the "bigmemory.output.cache" entry that specifies the output cache. Since our output cache is "vendorAvgSpend", it should basically be: bigmemory.output.cache=vendorAvgSpend
NOTE: I use Maven resource plugin, so this value is actually specific in the pom.xml (in the property "hadoop.output.cache")

b - Check the difference between hadoop jobs

  • hdjob-vendoraverage=standard M/R job that outputs to HDFS
  • hdjob-vendoraverage-bm=the same M/R job that outputs to BigMemory
You'll notice 4 differences:
  1. output-format
    1. For the hadoop BigMemory job (hdjob-vendoraverage-bm), output-format value is: "org.terracotta.bigmemory.hadoop.BigmemoryOutputFormat"
    2. For hdjob-vendoraverage, it's the standard "org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
  2. output-path
    1. It is not needed for the hadoop BigMemory job since it does not write onto HDFS...
  3. reducer
    1. For the hadoop BigMemory job, a different reducer implementation is needed (org.terracotta.pocs.cyberplugfest.VendorSalesAvgReducerBigMemory) because you need to return an object of type "BigmemoryElementWritable"
    2. For hdjob-vendoraverage job, the reducer returns an object of type"Text".
  4. files
    1. In the hdjob-vendoraverage-bm job, you need to add the terracotta license file so the hadoop job can connect to the terracotta bigmemory (enterprise feature)

c - Specify the job to run.

Done in the <hdp:job-runner ...> tag. You can switch back and forth to see the difference...

2 - Now, let's look at the reducers

  • hdjob-vendoraverage-bm reducer class: org.terracotta.pocs.cyberplugfest.VendorSalesAvgReducerBigMemory
  • hdjob-vendoraverage: org.terracotta.pocs.cyberplugfest.VendorSalesAvgReducer
The difference is pretty much the return type that must be a "BigmemoryElementWritable" type if you want to output the results to Terracotta BigMemory.

3 - Include the cache configuration (Ehcache.xml) in your M/R project

To specify the details for the vendorAvgSpend cache. Using the maven conventions, the file is included (along my other resources files) in the resources folder (

In this ehcache.xml file, you'll notice our hadoop output cache (as well as several other caches that are NOT used by the hadoop jobs). The one thing that is needed is that it must be a "distributed" cache - in other word, the data will be stored on the BigMemory Max Server instance that should be already running on your development box (The "" and "" tags specifies that)

For more info on that, go to

Master Step 3 - Prepare the sample data

In the real demo scenario, I use Apache Flume ( to "funnel" near real-time the generated sample data into HDFS…But for the purpose of this test, it can all work fine with some sample data. All we need to do is import the data into our local HDFS.

Extract the sample at: $CYBERPLUGFEST_CODE_HOME/HadoopJobs/SampleTransactionsData/
It should create a "flume" folder with the following hierarchy:
  1. flume/
    1. events/
      1. 13-07-17/
        1. events.* (those are the files with the comma separated data)

Navigate to $CYBERPLUGFEST_CODE_HOME/HadoopJobs/SampleTransactionsData/
Run the hadoop shell "put" command to add all these files into HDFS:
$HADOOP_INSTALL/bin/hadoop dfs -put flume/ .
Once done, verify that the data is in HDFS by running (This should bring a lot of event files…):
$HADOOP_INSTALL/bin/hadoop dfs -ls flume/events/13-07-17/ 

Master Step 4 - Compile and Run the hadoop job

Still referring to the Cyberplugfest code available at, you'll need to simply execute a maven build to get going.

Before that though, make sure the maven properties are right for your environment (i.e. hadoop name node url, hadoop job tracker url, terracotta url, cache name, etc…). These properties are specified towards the end of the pom file, in the maven profiles I created for that event (dev profile is for my local, prod profile is to deploy in the amazon ec2 cloud)

Then, navigate to the $CYBERPLUGFEST_CODE_HOME/HadoopJobs folder and run:
mvn clean package appassembler:assemble
This should build without an issue, and create a "PlugfestHadoopApp" executable script (the maven appassembler plugin helps with that) in $CYBERPLUGFEST_CODE_HOME/HadoopJobs/target/appassembler/bin folder.

Depending on your platform (window or nix), chose the right script (sh or bat) and run:
sh $CYBERPLUGFEST_CODE_HOME/HadoopJobs/target/appassembler/bin/PlugfestHadoopApp
Your hadoop job should be running.

Master Step 5 - Verify data is written to Terracotta BigMemory

Now we'll verify that the data was written to BigMemory from the hadoop job. Simply run:
sh $CYBERPLUGFEST_CODE_HOME/HadoopJobs/target/appassembler/bin/VerifyBigmemoryData
You should see 6 entries being printed for cache vendorAvgSpend

Final Words

Using this hadoop-to-bigmemory connector, you can truly start to think: "I can now access all my BigData insights at micro-second speed directly from within all my enterprise applications, AND confidently rely on the fact that these insights will be updated automatically whenever you hadoop jobs are running next".

Hope you find this hands-on post useful.

Monday, July 1, 2013

How to reconcile Real-Time and Batch processing using In-Memory technology: A demo at the AFCEA Cyber Symposium Plugfest

As you might remember, we(*) participated in a "Plugfest"(**) earlier this year in San Diego. Here is the summary post of what we built for that occasion:

This time around, we entered the plugfest competition as a technology provider at the AFCEA Cyber Symposium, which happened last week (June 25-27 2013) in Baltimore. We not only provided technologies components and data feeds to the challengers (San Diego State University, GMU, Army PEO C3T milSuite), but also built a very cool Fraud Detection and Money Laundering demo which was 1 of the plugfest use case for this cyber event.

Our demo was centered around a fundamental "Big Data" question: How can you detect fraud on 100,000s transactions per seconds in real-time (which is absolutely critical if you don't want to lose lots of $$$$ to fraud) while efficiently incorporating in that real-time process data from external systems (i.e. data warehouse or hadoop clusters).
Or in more general words: How to reconcile Real-time processing and Batch processing when dealing with large amounts of data.

To answer this question, we put together a demo centered around Terracotta's In-Genius intelligence platform ( which provides a highly scalable low-latency in-memory layer capable of "reconciling" the real-time processing needs (ultra low latency with large amounts of new transactions) with the traditional batch processing needs (100s of TB/PB processed in an asynchronous background jobs), all bundled in a simple software package deployable on any commodity hardware.

Here is the solution we assembled:

Cyber Plugfest Software Architecture
How to reconcile real-time and batch processing

A quick view at how it all works:
  1. A custom transaction simulator generates pseudo-random fictional credit card transactions and publish all of them onto a JMS topic (Terracotta Universal Messaging bus)
  2. Each JMS message is delivered through pub/sub messaging to both real-time and batch track:
    1. The Complex Event Processing (CEP) engine which will identify fraud in real-time through the use of continuous queries.
      1. See "Real-Time fraud detection route"
    2. Apache Flume, an open source platform which will efficiently and reliably route all the messages into HDFS for further batch processing.
      1. See "Batch Processing Route"
  3. Batch Processing Route:
    1. Apache hadoop to collect and store all the transaction data in its powerful batch-optimized file system
    2. Map-Reduce jobs to compute transaction trends (simplified rolling average in this demo case) on the full transaction data for each vendors, customer, or purchase types.
    3. Output of map-reduce jobs stored in Terracotta BigMemory Max In-Memory platform.
  4. Real-Time fraud detection route:
    1. CEP fraud detection queries fetch from Terracotta BigMemory Max (microsecond latency under load) the hadoop-calculated averages (in 3.2), and correlates those with the current incoming transaction to detect anomalies (potential fraud) in real-time.
    2. Mashzone, a mashup and data aggregation tool to provide visualization on detected fraud data.
    3. For other plugfest challengers and technology providers to be able to use our data, all our data feeds were also available in REST, SOAP, and Web socket formats (which were used by ESRI, Visual Analytics, and others)

As I hope you can see in this post, having a scalable and powerful in-memory layer acting as the middle man between Hadoop and CEP is the key to providing true real-time analysis while still taking advantage of all the powerful computing capabilities that Hadoop has to offer.

In further posts, I'll explain in more detail all the components and code (code and configs are available on github at


(*) "we" = The SoftwareAG Government Solutions team, which I'm part of...
(**) "Plugfest" = "collaborative competitive challenge where industry vendors, academic, and government teams work towards solving a specific set of "challenges" strictly using the RI2P industrial best practices (agile, open standard, SOA, cloud, etc.) for enterprise information system development and deployment." (source: