About Me

Love JAVA related technologies. Recently researching on Enterprise Integration (SOA and Messaging), Mobility and Big Data. I have working in JAVA related technologies as Software Architect, Enterprise Architect and Software Developer/Engineer for over 11 years. Currently, I am working as Senior Consultant of VMWare Inc.

Tuesday, November 20, 2012

Spring bean lifecycle

1 Spring instantiates the bean.
2 Spring injects values and bean references into the bean’s properties.
3 If the bean implements BeanNameAware, Spring passes the bean’s ID to the set-
BeanName() method.
4 If the bean implements BeanFactoryAware, Spring calls the setBeanFactory()
method, passing in the bean factory itself.
5 If the bean implements ApplicationContextAware, Spring will call the set-
ApplicationContext() method, passing in a reference to the enclosing application
context.
6 If any of the beans implement the BeanPostProcessor interface, Spring calls
their postProcessBeforeInitialization() method.
7 If any beans implement the InitializingBean interface, Spring calls their
afterPropertiesSet() method. Similarly, if the bean was declared with an
init-method, then the specified initialization method will be called.
8 If there are any beans that implement BeanPostProcessor, Spring will call their
postProcessAfterInitialization() method.
9 At this point, the bean is ready to be used by the application and will remain in
the application context until the application context is destroyed.
10 If any beans implement the DisposableBean interface, then Spring will call
their destroy() methods. Likewise, if any bean was declared with a destroymethod,
then the specified method will be called.

Wednesday, November 14, 2012

Hadoop (setup fully-distributed/multiple nodes mode)

Coming!

Hadoop (setup in standalone)

 Pre-setup
1) install Hadoop
2) setup environment variable
  • JAVA_HOME
  • HADOOP_HOME
standalone
setup SSH for a Hadoop cluster

1) Define a common account
create a user level account with no Hadoop management privilegesAssume it is "hadoopUser"

2) Generate SSH key pair
execute this command and following the prompts for additional inputs "ssh-keygen -t rsa"
the public keys are stored in location you have specified

3) Distribute the public key to all nodes (master and slaves)
scp <the location of your public key> hadoopUser@<hostname>:<new location>/master_key

on the target host, execute the following commands
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
$ mv ~/master_key ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys

4) Hadoop configuration
cd $HADOOP_HOME
In "hadoop-env.sh" add "export JAVA_HOME=/usr/share/jdk"

5) the 3 main configuration files should be empty
  1. core-site.xml
  2. hdfs-site.xml
  3. mapred-site.xml
Hadoop runs completely on local machine and it doesn't launch any of the Hadoop daemons.






Tuesday, November 13, 2012

Hadoop clustering components

Hadoop clustering is composed with following daemons on one server or across multiple servers
  • NameNode -- keeps track of the file metadata, which files are in the system and how each file is broken down into blocks
  • DataNode -- provides backup store of data blocks and constantly report to NameNode to keep track of metadata update
  • Secondary NameNode -- assistant daemon for monitoring the state of a cluster HDFS. It communicates with the NameNode to take snapshots of the HDFS metadata at intervals defined by the cluster configuration
  • JobTracker -- liaison between application and Hadoop.  It determines the execution plan by determining which files to process, assign nodes to different tasks, and monitors all tasks as they are running.
    • one per Hadoop cluster
    • automatic relaunch failed task
    • oversees the overall execution of a MapReduce job
  • TaskTracker -- slave to the JobTracker
    • executes individual tasks that the JobTracker assigns
    • one per a slave node
    • able to spawn multiple map or reduce tasks in parallel
    • send heartbeat to JobTracker

Hadoop commands

Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  • namenode                     format format the DFS filesystem
  • secondarynamenode     run the DFS secondary namenode
  • namenode                     run the DFS namenode
  • datanode                      run a DFS datanode
  • dfsadmin                      run a DFS admin client
  • fsck                             run a DFS filesystem checking utility
  • fs                                 run a generic filesystem user client
  • balancer                       run a cluster balancing utility
  • jobtracker                    run the MapReduce job Tracker node
  • pipes                            run a Pipes job
  • tasktracker                   run a MapReduce task Tracker node
  • job                               manipulate MapReduce jobs
  • version                         print the version
  • jar <jar>                       run a jar file
  • distcp <srcurl> <desturl> copy file or directories recursively
  • archive -archiveName NAME <src>* <dest> create a hadoop archive
  • daemonlog                     get/set the log level for each daemon
  • CLASSNAME              run the class named CLASSNAME

Hadoop (software stack)

Currently there are nine sub-projects in Hadop
  • Common - common code
  • Avro - serialization and RPC
  • MapReduce - computation
  • HDFS - storage
  • Pig - data flow language
  • Hive - data warehousing and query language
  • HBase - column-oriented database
  • ZooKeeper - coordination service
  • Chukwa - data collection and analysis

Sunday, November 11, 2012

hadoop useful CLI commands

$ hadoop fs -ls /                           (list all files in HDFS root directory)
$ hadoop job -list                         (find all running MapReduce jobs)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc start; done    (Start up hadoop cluster)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc stop; done    (Stop cluster)

(HDFS commands)
http://hadoop.apache.org/docs/r1.0.0/file_system_shell.html

(MapReduce commands)
http://hadoop.apache.org/docs/r1.0.0/commands_manual.html#job