Spark 2.0 – TechSoft

Spark 2.0

Spark-2

Day 1 Objective

  • Scala Introduction
  • Working with:

              o Variables o Data Types o Control Flow

  • The Scala Interpreter
  • Collections and their Standard Methods (e.g. map())
  • Working with:

             o Functions o Methods o Function Literals

  • Define the Following as they Relate to Scale:

             o Class o Object o Case Class

  • Overview, Motivations, Spark Systems
  • Spark Ecosystem
  • Spark vs. Hadoop
  • Acquiring and Installing Spark
  • The Spark Shell, SparkContext

Day 1 Demonstrations

  • Setting Up the Lab Environment
  • Starting the Scala Interpreter
  • A First Look at Spark
  • A First Look at the Spark Shel

Enquire Now

Day 1 Objective

  • Scala Introduction
  • Working with:

     o Variables o Data Types o Control Flow

  • The Scala Interpreter
  • Collections and their Standard Methods (e.g. map())
  • Working with:

    o Functions o Methods o Function Literals

  • Working with:

             o Functions o Methods o Function Literals

  • Define the Following as they Relate to Scale:

             o Class o Object o Case Class

  • Overview, Motivations, Spark Systems
  • Spark Ecosystem
  • Spark vs. Hadoop
  • Acquiring and Installing Spark
  • The Spark Shell, SparkContext

Day 1 Demonstrations

  • Setting Up the Lab Environment
  • Starting the Scala Interpreter
  • A First Look at Spark
  • A First Look at the Spark Shel

Day 2 Objective

  • RDD Concepts, Lifecycle, Lazy Evaluation
  • RDD Partitioning and Transformations
  • Working with RDDs Including: o Creating and Transforming (map, filter, etc.)
  • An Overview of RDDs
  • SparkSession, Loading/Saving Data, Data Formats (JSON, CSV, Parquet, text …)
  • Introducing DataFrames and DataSets (Creation and Schema Inference)
  • Identify Supported Data Formats, Including:

           o JSON o Text o CSV o Parquet

  • Working with the DataFrame (untyped) Query DSL, including:

         o Column o Filtering o Grouping o  Aggregation

  • SQL-based Queries
  • Working with the DataSet (typed) API
  • Mapping and Splitting (flatMap(), explode(), and split())
  • DataSets vs. DataFrames vs. RDDs

Day 2 Demonstrations

  • RDD Basics
  • Operations on Multiple RDDs
  • Data Formats
  • Spark SQL Basics
  • DataFrame Transformations
  • The DataSet Typed API
  • Splitting Up Data

Day 3 Objective

  • Working with:

        o Grouping o Reducing o Joining

  • Shuffling, Narrow vs. Wide Dependencies, and Performance Implications
  • Exploring the Catalyst Query Optimizer (explain(), Query Plans, Issues with lambdas)
  • The Tungsten Optimizer (Binary Format, Cache Awareness, Whole-Stage Code Gen)
  • Discuss Caching, Including: 

        o Concepts o Storage Type o Guidelines

  • Minimizing Shuffling for Increased Performance
  • Using Broadcast Variables and Accumulators
  • General Performance Guidelines

         o Using the Spark UI o Efficient Transformations o Data Storage o Monitoring

Day 3 Demonstrations

  • Exploring Group Shuffling
  • Seeing Catalyst at Work
  • Seeing Tungsten at Work
  • Working with Caching, Joins, Shuffles, Broadcasts, Accumulators
  • Broadcast General Guideline

Day 4 Objective

    • Core API, SparkSession.Builder
    • Configuring and Creating a SparkSession
    • Building and Running Applications – sbt/build.sbt and spark-submit
    • Application Lifecycle (Driver, Executors, and Tasks)
    • Cluster Managers (Standalone, YARN, Mesos)
    • Logging and Debugging
    • Introduction and Streaming Basics
    • Spark Streaming (Spark 1.0+) 

             o DStreams, Receivers, Batching o Stateless Transformation o Windowed Transformation o Stateful Transformation

  • Structured Streaming (Spark 2+)

        o Continuous Applications o Table Paradigm, Result Table o Steps for Structured Streaming o Sources and Sinks

  • Consuming Kafka Data

         o Kafka Overview o Structured Streaming – “kafka” Format o Processing the Stream

Day 4 Demonstrations

  • Spark Job Submission
  • Additional Spark Capabilities
  • Spark Streaming
  • Spark Structured Streaming
  • Spark Structured Streaming with Kafka

Course Pre-Requisite

Students should be familiar with programming principles and have previous experience in software development using Scala. Previous experience with data streaming, SQL, and Hadoop is also helpful, but not required.

 

Course Calendar

Start Date End Date Duration Locaton Register Now
18th July 2019 21th July 2019 4 Days Pune, Bangalore Register Now
19th Aug 2019 22nd Aug 2019 4 Days Pune, Bangalore Register Now
12th Sep 2019 15th Sep 2019 4 Days Pune, Bangalore Register Now

Course Calendar

Stat Date

End Date Duration Location
18th July 2019 21th July 2019 4 Days Pune, Bangalore
Stat Date

End Date Duration Location
19th Aug 2019 22nd Aug 2019 4 Days Pune, Bangalore
Stat Date

End Date Duration Location
12th Sep 2019 15th Sep 2019 4 Days Pune, Bangalore

Click Here For More Big Data Related Services.