1 Spring instantiates the bean.
2 Spring injects values and bean references into the bean’s properties.
3 If the bean implements BeanNameAware, Spring passes the bean’s ID to the set-
BeanName() method.
4 If the bean implements BeanFactoryAware, Spring calls the setBeanFactory()
method, passing in the bean factory itself.
5 If the bean implements ApplicationContextAware, Spring will call the set-
ApplicationContext() method, passing in a reference to the enclosing application
context.
6 If any of the beans implement the BeanPostProcessor interface, Spring calls
their postProcessBeforeInitialization() method.
7 If any beans implement the InitializingBean interface, Spring calls their
afterPropertiesSet() method. Similarly, if the bean was declared with an
init-method, then the specified initialization method will be called.
8 If there are any beans that implement BeanPostProcessor, Spring will call their
postProcessAfterInitialization() method.
9 At this point, the bean is ready to be used by the application and will remain in
the application context until the application context is destroyed.
10 If any beans implement the DisposableBean interface, then Spring will call
their destroy() methods. Likewise, if any bean was declared with a destroymethod,
then the specified method will be called.
About Me
- Eddy Zhu
- Love JAVA related technologies. Recently researching on Enterprise Integration (SOA and Messaging), Mobility and Big Data. I have working in JAVA related technologies as Software Architect, Enterprise Architect and Software Developer/Engineer for over 11 years. Currently, I am working as Senior Consultant of VMWare Inc.
Tuesday, November 20, 2012
Wednesday, November 14, 2012
Hadoop (setup in standalone)
Pre-setup
1) install Hadoop
2) setup environment variable
setup SSH for a Hadoop cluster
1) Define a common account
create a user level account with no Hadoop management privileges. Assume it is "hadoopUser"
2) Generate SSH key pair
execute this command and following the prompts for additional inputs "ssh-keygen -t rsa"
the public keys are stored in location you have specified
3) Distribute the public key to all nodes (master and slaves)
scp <the location of your public key> hadoopUser@<hostname>:<new location>/master_key
on the target host, execute the following commands
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
$ mv ~/master_key ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys
4) Hadoop configuration
cd $HADOOP_HOME
In "hadoop-env.sh" add "export JAVA_HOME=/usr/share/jdk"
5) the 3 main configuration files should be empty
1) install Hadoop
2) setup environment variable
- JAVA_HOME
- HADOOP_HOME
setup SSH for a Hadoop cluster
1) Define a common account
create a user level account with no Hadoop management privileges. Assume it is "hadoopUser"
2) Generate SSH key pair
execute this command and following the prompts for additional inputs "ssh-keygen -t rsa"
the public keys are stored in location you have specified
3) Distribute the public key to all nodes (master and slaves)
scp <the location of your public key> hadoopUser@<hostname>:<new location>/master_key
on the target host, execute the following commands
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
$ mv ~/master_key ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys
4) Hadoop configuration
cd $HADOOP_HOME
In "hadoop-env.sh" add "export JAVA_HOME=/usr/share/jdk"
5) the 3 main configuration files should be empty
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
Tuesday, November 13, 2012
Hadoop clustering components
Hadoop clustering is composed with following daemons on one server or across multiple servers
- NameNode -- keeps track of the file metadata, which files are in the system and how each file is broken down into blocks
- DataNode -- provides backup store of data blocks and constantly report to NameNode to keep track of metadata update
- Secondary NameNode -- assistant daemon for monitoring the state of a cluster HDFS. It communicates with the NameNode to take snapshots of the HDFS metadata at intervals defined by the cluster configuration
- JobTracker -- liaison between application and Hadoop. It determines the execution plan by determining which files to process, assign nodes to different tasks, and monitors all tasks as they are running.
- one per Hadoop cluster
- automatic relaunch failed task
- oversees the overall execution of a MapReduce job
- TaskTracker -- slave to the JobTracker
- executes individual tasks that the JobTracker assigns
- one per a slave node
- able to spawn multiple map or reduce tasks in parallel
- send heartbeat to JobTracker
Hadoop commands
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
where COMMAND is one of:
- namenode format format the DFS filesystem
- secondarynamenode run the DFS secondary namenode
- namenode run the DFS namenode
- datanode run a DFS datanode
- dfsadmin run a DFS admin client
- fsck run a DFS filesystem checking utility
- fs run a generic filesystem user client
- balancer run a cluster balancing utility
- jobtracker run the MapReduce job Tracker node
- pipes run a Pipes job
- tasktracker run a MapReduce task Tracker node
- job manipulate MapReduce jobs
- version print the version
- jar <jar> run a jar file
- distcp <srcurl> <desturl> copy file or directories recursively
- archive -archiveName NAME <src>* <dest> create a hadoop archive
- daemonlog get/set the log level for each daemon
- CLASSNAME run the class named CLASSNAME
Hadoop (software stack)
Currently there are nine sub-projects in Hadop
- Common - common code
- Avro - serialization and RPC
- MapReduce - computation
- HDFS - storage
- Pig - data flow language
- Hive - data warehousing and query language
- HBase - column-oriented database
- ZooKeeper - coordination service
- Chukwa - data collection and analysis
Sunday, November 11, 2012
hadoop useful CLI commands
$ hadoop fs -ls / (list all files in HDFS root directory)
$ hadoop job -list (find all running MapReduce jobs)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc start; done (Start up hadoop cluster)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc stop; done (Stop cluster)
(HDFS commands)
http://hadoop.apache.org/docs/r1.0.0/file_system_shell.html
(MapReduce commands)
http://hadoop.apache.org/docs/r1.0.0/commands_manual.html#job
$ hadoop job -list (find all running MapReduce jobs)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc start; done (Start up hadoop cluster)
$ for svc in /etc/init.d/hadoop-0.20-*; do sudo $svc stop; done (Stop cluster)
(HDFS commands)
http://hadoop.apache.org/docs/r1.0.0/file_system_shell.html
(MapReduce commands)
http://hadoop.apache.org/docs/r1.0.0/commands_manual.html#job
hadoop configuration
Configuration
The Hadroop configs are contained under /etc/hadoop/conf.
Log
/var/log/hadoop (all hadoop daemon log files are resided)
hadoop-hadoop-namenode-<HOSTNAME>.log (NameNode logs)
The Hadroop configs are contained under /etc/hadoop/conf.
Log
/var/log/hadoop (all hadoop daemon log files are resided)
hadoop-hadoop-namenode-<HOSTNAME>.log (NameNode logs)
CentOS add user to the sudoers list
go to /etc/sudoer
put your cursor on the "root ALL=(ALL) ALL" and add the following on the next line
replace <username> with your username
"<username> ALL=(ALL) ALL"
put your cursor on the "root ALL=(ALL) ALL" and add the following on the next line
replace <username> with your username
"<username> ALL=(ALL) ALL"
Friday, November 9, 2012
Hadoop stack (Installation) - redhat
Download hadroop from http://www.cloudera.com/hadoop
Prerequisites
1) JDK1.6 update 8 or newer
Download and install the “bootstrap” RPM
$ sudo -s
$ wget http://archive.cloudera.com/redhat/cdh/cdh3-repository-1.0-1.noarch.rpm
$ rpm -ivh cdh3-repository-1.0-1.noarch.rpm
Import Cloudera's RPM signing key
$ rpm --import \
http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera
Install the pseudo-distributed RPM package and it dependencies: Pig, Hive, and Snappy
$ yum install hadoop-0.20-conf-pseudo hadoop-0.20-native \
hadoop-pig hadoop-hive
Prerequisites
1) JDK1.6 update 8 or newer
Download and install the “bootstrap” RPM
$ sudo -s
$ wget http://archive.cloudera.com/redhat/cdh/cdh3-repository-1.0-1.noarch.rpm
$ rpm -ivh cdh3-repository-1.0-1.noarch.rpm
Import Cloudera's RPM signing key
$ rpm --import \
http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera
Install the pseudo-distributed RPM package and it dependencies: Pig, Hive, and Snappy
$ yum install hadoop-0.20-conf-pseudo hadoop-0.20-native \
hadoop-pig hadoop-hive
Hadoop limitations
- Availability -- Master process are single point of failure (Hadoop 2.x brings HA supports for NameNode and JobTracker to migrate this issue)
- Security -- security model is disabled by default. No storage or wire-level encryption.
- configure to run with Keberos (a network authentication protocol)
- HDFS -- lack of availability
- MapReduce -- is a "batch-based architecture" and it uses "shared-nothing architecture". Not good for job that needs real-time data access.
- Ecosystem version compatibilities
Wednesday, November 7, 2012
Subscribe to:
Posts (Atom)