Pig and Hive – TechSoft

Pig and Hive

Developer-Pig-and-Hive

Day 1 Objective

  • List the Three “V”s of Big Data
  • List the Six Key Hadoop Data Types
  • Describe Hadoop, YARN and Use Cases for Hadoop
  • Describe Hadoop Ecosystem Tools and Frameworks
  • Describe the Differences Between Relational Databases and Hadoop
  • Describe What is New in Hadoop 2.x
  • Describe the Hadoop Distributed File System (HDFS)
  • Describe the Differences Between HDFS and an RDBMS
  • Describe the Purpose of NameNodes and DataNodes
  • List Common HDFS Commands
  • Describe HDFS File Permissions
  • List Options for Data Input
  • Describe WebHDFS
  • Describe the Purpose of Sqoop and Flume
  • Describe How to Export to a Table
  • Describe the Purpose of MapReduce
  • Define Key/Value Pairs in MapReduce
  • Describe the Map and Reduce Phases
  • Describe Hadoop Streaming

Day 1 Demonstrations

  • Starting a Hadoop Cluster
  • Demonstration: Understanding Block Storage
  • Using HDFS Commands
  • Importing RDBMS Data into HDFS
  • Exporting HDFS Data to an RDBMS
  • Importing Log Data into HDFS Using Flume
  • Demonstration: Understanding MapReduce
  • Running a MapReduce Job

Enquire Now

Day 1 Objective

  • List the Three “V”s of Big Data
  • List the Six Key Hadoop Data Types
  • Describe Hadoop, YARN and Use Cases for Hadoop
  • Describe Hadoop Ecosystem Tools and Frameworks
  • Describe the Differences Between Relational Databases and Hadoop
  • Describe What is New in Hadoop 2.x
  • Describe the Hadoop Distributed File System (HDFS)
  • Describe the Differences Between HDFS and an RDBMS
  • Describe the Purpose of NameNodes and DataNodes
  • List Common HDFS Commands
  • Describe HDFS File Permissions
  • List Options for Data Input
  • Describe WebHDFS
  • Describe the Purpose of Sqoop and Flume
  • Describe How to Export to a Table
  • Describe the Purpose of MapReduce
  • Define Key/Value Pairs in MapReduce
  • Describe the Map and Reduce Phases
  • Describe Hadoop Streaming

Day 1 Demonstrations

  • Starting a Hadoop Cluster
  • Demonstration: Understanding Block Storage
  • Using HDFS Commands
  • Importing RDBMS Data into HDFS
  • Exporting HDFS Data to an RDBMS
  • Importing Log Data into HDFS Using Flume
  • Demonstration: Understanding MapReduce
  • Running a MapReduce Job

Day 2 Objective

  • Describe the Purpose of Apache Pig
  • Describe the Purpose of Pig Latin
  • Demonstrate the Use of the Grunt Shell
  • List Pig Latin Relation Names and Field Names
  • List Pig Data Types
  • Define a Schema
  • Describe the Purpose of the GROUP Operator
  • Describe Common Pig Operators, Including

   o ORDER BY  o CASE o DISTINCT o PARALLEL

   o FLATTEN o FOREACH

  • Perform an Inner, Outer and Replicated Join
  • Describe the Purpose of the DataFu Library

Day 2 Demonstrations

  • Demonstration: Understanding Apache Pig
  • Getting Starting with Apache Pig
  • Exploring Data with Apache Pig
  • Splitting a Dataset
  • Joining Datasets with Apache Pig
  • Preparing Data for Apache Hive
  • Demonstration: Computing Page Rank
  • Analyzing Clickstream Data
  • Analyzing Stock Market Data Using Quantiles 

Day 3 Objective

  • Describe the Purpose of Apache Hive
  • Describe the Differences Between Apache Hive and SQL
  • Describe the Apache Hive Architecture
  • Demonstrate How to Submit Hive Queries
  • Describe How to Define Tables
  • Describe How to Load Date Into Hive
  • Define Hive Partitions, Buckets and Skew
  • Describe How to Sort Data
  • List Hive Join Strategies
  • Describe the Purpose of HCatalog
  • Describe the HCatalog Ecosystem
  • Define a New Schema
  • Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig
  • Perform a Multi-table/File Insert
  • Describe the Purpose of Views
  • Describe the Purpose of the OVER Clause
  • Describe the Purpose of Windows
  • List Hive Analytics Functions
  • List Hive File Formats
  • Describe the Purpose of Hive SerDe

Day 3 Demonstrations

  • Understanding Hive Tables
  • Understanding Partition and Skew
  • Analyzing Big Data with Apache Hive
  • Demonstration: Computing NGrams
  • Joining Datasets in Apache Hive
  • Computing NGrams of Emails in Avro Format
  • Using HCatalog with Apache Pig

Day 4 Objective

  • Describe the Purpose HDFS Federation
  • Describe the Purpose of HDFS High Availability (HA)
  • Describe the Purpose of the Quorum Journal Manager
  • Demonstrate How to Configure Automatic Failover
  • Describe the Purpose of YARN
  • List the Components of YARN
  • Describe the Lifecycle of a YARN Application
  • Describe the Purpose of a Cluster View
  • Describe the Purpose of Apache Slider
  • Describe the Origin and Purpose of Apache Spark
  • List Common Spark Use Cases
  • Describe the Differences Between Apache Spark and MapReduce
  • Demonstrate the Use of the Spark Shell
  • Describe the Purpose of an Resilient Distributed Dateset (RDD)
  • Demonstrate How to Load Data and Perform a Word Count
  • Define Lazy Evaluation
  • Describe How to Load Multiple Types of Data
  • Demonstrate How to Perform SQL Queries
  • Demonstrate How to Perform DataFrame Operations
  • Describe the Purpose of the Optimization Engine
  • Describe the Purpose of Apache Oozie
  • Describe Apache Pig Actions
  • Describe Apache Hive Actions
  • Describe MapReduce Actions
  • Describe How to Submit an Apache Oozie Workflow
  • Define an Oozie Coordinator Job

Day 4 Demonstrations

  • Advanced Apache Hive Programming
  • Running a YARN Application
  • Getting Started with Apache Spark
  • Exploring Apache Spark SQL
  • Defining an Apache Oozie Workflow

Course Pre-Requisite

Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

 

Course Calendar

Start Date End Date Duration Locaton Register Now
11th July 2019 14th July 2019 4 Days Pune, Bangalore Register Now
8th Aug 2019 11th Aug 2019 4 Days Pune, Bangalore Register Now
5th Sep 2019 8th Sep 2019 4 Days Pune, Bangalore Register Now

Course Calendar

Stat Date

End Date Duration Location
11th July 2019 14th July 2019 4 Days Pune, Bangalore
Stat Date

End Date Duration Location
8th Aug 2019 11th Aug 2019 4 Days Pune, Bangalore
Stat Date

End Date Duration Location
5th Sep 2019 8th Sep 2019 4 Days Pune, Bangalore

Click Here For More Big Data Related Services.