IT Gossips

Tuesday, November 20, 2012

Spring bean lifecycle

1 Spring instantiates the bean.
2 Spring injects values and bean references into the bean’s properties.
3 If the bean implements BeanNameAware, Spring passes the bean’s ID to the set-
BeanName() method.
4 If the bean implements BeanFactoryAware, Spring calls the setBeanFactory()
method, passing in the bean factory itself.
5 If the bean implements ApplicationContextAware, Spring will call the set-
ApplicationContext() method, passing in a reference to the enclosing application
context.
6 If any of the beans implement the BeanPostProcessor interface, Spring calls
their postProcessBeforeInitialization() method.
7 If any beans implement the InitializingBean interface, Spring calls their
afterPropertiesSet() method. Similarly, if the bean was declared with an
init-method, then the specified initialization method will be called.
8 If there are any beans that implement BeanPostProcessor, Spring will call their
postProcessAfterInitialization() method.
9 At this point, the bean is ready to be used by the application and will remain in
the application context until the application context is destroyed.
10 If any beans implement the DisposableBean interface, then Spring will call
their destroy() methods. Likewise, if any bean was declared with a destroymethod,
then the specified method will be called.

Wednesday, November 14, 2012

Hadoop (setup fully-distributed/multiple nodes mode)

Coming!

Hadoop (setup in standalone)

Pre-setup
1) install Hadoop
2) setup environment variable

JAVA_HOME
HADOOP_HOME

standalone
setup SSH for a Hadoop cluster

1) Define a common account
create a user level account with no Hadoop management privileges. Assume it is "hadoopUser"

2) Generate SSH key pair
execute this command and following the prompts for additional inputs "ssh-keygen -t rsa"
the public keys are stored in location you have specified

3) Distribute the public key to all nodes (master and slaves)
scp <the location of your public key> hadoopUser@<hostname>:<new location>/master_key

on the target host, execute the following commands
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
$ mv ~/master_key ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys

4) Hadoop configuration
cd $HADOOP_HOME
In "hadoop-env.sh" add "export JAVA_HOME=/usr/share/jdk"

5) the 3 main configuration files should be empty

core-site.xml
hdfs-site.xml
mapred-site.xml

Hadoop runs completely on local machine and it doesn't launch any of the Hadoop daemons.

Tuesday, November 13, 2012

Hadoop clustering components

Hadoop clustering is composed with following daemons on one server or across multiple servers

NameNode -- keeps track of the file metadata, which files are in the system and how each file is broken down into blocks
DataNode -- provides backup store of data blocks and constantly report to NameNode to keep track of metadata update
Secondary NameNode -- assistant daemon for monitoring the state of a cluster HDFS. It communicates with the NameNode to take snapshots of the HDFS metadata at intervals defined by the cluster configuration
JobTracker -- liaison between application and Hadoop. It determines the execution plan by determining which files to process, assign nodes to different tasks, and monitors all tasks as they are running.

one per Hadoop cluster
automatic relaunch failed task
oversees the overall execution of a MapReduce job

TaskTracker -- slave to the JobTracker

executes individual tasks that the JobTracker assigns
one per a slave node
able to spawn multiple map or reduce tasks in parallel
send heartbeat to JobTracker

Hadoop commands

Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:

namenode format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
job manipulate MapReduce jobs
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME <src>* <dest> create a hadoop archive
daemonlog get/set the log level for each daemon
CLASSNAME run the class named CLASSNAME

Hadoop (software stack)

Currently there are nine sub-projects in Hadop

Common - common code
Avro - serialization and RPC
MapReduce - computation
HDFS - storage
Pig - data flow language
Hive - data warehousing and query language
HBase - column-oriented database
ZooKeeper - coordination service
Chukwa - data collection and analysis

Sunday, November 11, 2012

hadoop useful CLI commands

$ hadoop fs -ls /                           (list all files in HDFS root directory)
$ hadoop job -list                         (find all running MapReduce jobs)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc start; done    (Start up hadoop cluster)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc stop; done    (Stop cluster)

(HDFS commands)
http://hadoop.apache.org/docs/r1.0.0/file_system_shell.html

(MapReduce commands)
http://hadoop.apache.org/docs/r1.0.0/commands_manual.html#job

About Me

Tuesday, November 20, 2012

Wednesday, November 14, 2012

Tuesday, November 13, 2012

Sunday, November 11, 2012