-
Syllabus
- Course Duration8 week
- PriceRs. 7000/-
- Book Demo
Hadoop course in Pune
Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Pay Fees After Satisfaction With Interview Guidance
Hadoop Syllabus
- What is Hadoop?
- The Hadoop Distributed File System
- How Hadoop Map Reduce Works
- Anatomy of a Hadoop Cluster
- Setting up Hadoop Cluster
- Make a fully distributed Hadoop cluster
- Cluster Specification
- Network Topology
- Cluster Specification and installation
- Hadoop Daemons
- Master Daemons
- Name node
- Job Tracker
- Secondary name node
- Slave Daemons
- Data Node
- Task tracker
- Examining a Sample MapReduce Program With several examples
- Basic API Concepts
- The Driver Code
- The Mapper
- The Reducer
- The configure and close Methods
- Sequence Files
- Record Reader
- Record Writer
- Role of Reporter
- Output Collector
- Processing XML files
- Counters Directly Accessing HDFS
- ToolRunner
- Using The Distributed Cache
- Common Map Reduce Alogorithms
- Sorting, Searching and Indexing
- Word Co-Occurrence Word Co-Occurrence
- Identity Mapper
- Identity Reducer
- Exploring well known problems using MapReduce applications
- HDFS(Hadoop Distributed File System)
- Blocks and Splits
- Input Splits
- HDFS Splits
- Methods of accessing HDFS
- JAVA Approach
- CLI Approach
- Cluster architecture and block placement
- Data Replication
- Hadoop Rack Awareness
- High data availability
- Data Integrity
- Programming Practices
- Developing MapReduce Programs in
- Local Mode Running without HDFS and Mapreduce
- Pseudo-distributed Mode
- Running all daemons in a single node
- Fully distributed mode
- Running daemons on dedicated nodesApps
- Testing with MRUnit
- Logging
- Other Debugging Strategies
- Advanced Map Reduce Program
- A Recap of the MapReduce Flow
- The Secondary Sort
- Customized Input Formats and Output Format
- Introduction to YARN
- What is YARN?
- Why YARN?
- Advantages of YARN
- YARN Daemons
- Resource Manager
- Node Manager
- Application Master
- Classic Mapreduce vs YARN
- Anatomy of a YARN application run
- Scheduling in YARN
- Fair Scheduler
- Capacity Scheduler
- YARN as a platform for multiple applications
- Supported YARNapplications
- Overview of Spark
- What is Spark?
- Hadoop & Spark
- Features of Spark
- Spark Ecosystems
- Spark Streaming
- Spark SQL
- Spark MLib
- Spark Architecture
- Resilient Distributed Datasets
- How to Install Spar
- How to Run Spark
- How to Interact with
- Spark Spark Web Console
- Shared Variables
- Spark Applications
- Word Count Application
- HIVE
- Hive concepts
- Hive architecture
- Create database, access it from java clien
- Buckets
- Partition
- Joins in hive
- Inner joins
- Outer Joins
- Hive UDF
- Introducing Cloudera Impala
- Impala Benefits
- How Cloudera Impala Wor with CDH
- Primary Impala Features
- Impala Concepts and Architecture
- Components of the Impala Server
- The Impala Daemon
- The Impala Statestore
- The Impala Catalog Service
- Overview of the Impala SQL Dialect
- How Impala Fits Into the Hadoop Ecosystem
- How Impala Works with Hive
- Overview of Impala Metadata and the Metastore
- How Impala Uses HD
- FLUME
- Flume concepts
- Create a sample application to capture logs from Apache using flume
- QOOP
- Getting Sqoop
- A Sample Import
- Database Imports
- Controlling the import
- Imports and consistency
- Direct-mode imports
- Performing an Export
- Overview of services in Android
- Implementing a Service
- Service lifecycle
- Bound versus unbound services
- PIG
- Pig basics
- PIG Vs MapReduce and SQ
- Pig Vs Hive
- Pig Vs Hive
- Write sample Pig Latin scripts
- Modes of running PIG
- Running in Grunt shell
- PIG UDFs
- Pig Macros
- Name Node High – Availability
- Name Node federation
- Fencing
- Interview Preparation
- Personal Interview
- Group Discussion