Spring into Apache Hadoop

Spring is a popular open-source application development framework for enterprise Java. Spring focuses on the “plumbing” of enterprise applications so that development teams can focus on application-level business logic.

Spring for Apache Hadoop is a sub-project under the Spring Data umbrella. It provides support for developing applications based on Apache Hadoop technologies by leveraging the capabilities of the Spring ecosystem. Whether one is writing MapReduce applications, performing HBase analytics, or coordinating Hive jobs, Spring for Apache Hadoop harnesses power of the Spring framework to offer a simplified programming model and addresses the complexity caused by the infrastructure.

Let’s walk through an example to show how to create a Java application to scan the data in an HBase table. Assuming that there is an HBase table ‘test’ with one column “cf:a” and two rows.


hbase(main):001:0> scan ‘test’
ROW           COLUMN+CELL
row1         column=cf:a, timestamp=1358276934265, value=foo
row2         column=cf:a, timestamp=1358367853745, value=bar
1 row(s) in 0.4020 seconds

Maven is the easiest way to setup a Spring Hadoop application. Add the following dependencies to Maven configuration file pom.xml.


<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>1.0.0.RC1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>

<version>1.0.0</version>
</dependency>

<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.94.3</version>
</dependency>

 <dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.5.8</version>
</dependency>

<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-tx</artifactId>
<version>3.0.7.RELEASE</version>
</dependency>

 As in every Spring application, there is a Spring configuration file applicationContext.xml. Here we add definitions for Hadoop and HBase configuration, as well as our own client code “hbaseClient” that is built on Spring “hbaseTemplate”.


<hdp:configuration />
<hdp:hbase-configuration>
hbase.zookeeper.quorum=bd01.iguident.com
</hdp:hbase-configuration>


<bean id=“hbaseTemplate” class=“org.springframework.data.hadoop.hbase.HbaseTemplate” p:configuration-ref=“hbaseConfiguration”/>
<bean id=“hbaseClient” class=“com.guident.bd.hbase.HBaseClientImpl”>
<property name=“hbaseTemplate” ref=“hbaseTemplate” />
</bean>

Spring HbaseTemplate provides the API to work with HBase tables. Now we use the “find” method to scan “cf:a” column in “test” table. The method returns “foo” and “bar” saved in HBase table.


List<String> rows = hbaseTemplate.find(“test”, “cf”, “a”,
new RowMapper<String>() {
@Override
public String mapRow(Result result, int rowNum)
throws Exception {
byte[] bytes = result.value();
return Bytes.toString(bytes);
}
});

As shown in the example, Spring for Apache Hadoop abstracts the complexity of Hadoop platform and provides developer friendly configuration and APIs. Spring for Apache Hadoop is a really powerful tool to the Java community to work with Apache Hadoop based big data solutions.

This entry was posted in Guident News and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>