Apache Spark Basics

Total time: 2 days
Location: At location, Online
Starting date and place: 12 starting dates

Apache Spark Basics

GFU Cyrus AG

Get alternatives

Provider rating:

Need more information? Get more details on the site of the provider.

Get alternatives

Starting dates and places

Köln

23 Jul 2026 until 24 Jul 2026

Online: Zoom

23 Jul 2026 until 24 Jul 2026

Köln

10 Dec 2026 until 11 Dec 2026

Online: Zoom

10 Dec 2026 until 11 Dec 2026

Köln

11 Mar 2027 until 12 Mar 2027

Online: Zoom

11 Mar 2027 until 12 Mar 2027

Köln

10 Jun 2027 until 11 Jun 2027

Online: Zoom

10 Jun 2027 until 11 Jun 2027

Köln

9 Sep 2027 until 10 Sep 2027

Online: Zoom

9 Sep 2027 until 10 Sep 2027

Köln

9 Dec 2027 until 10 Dec 2027

Online: Zoom

9 Dec 2027 until 10 Dec 2027

Description

Schulungen der Extraklasse ✔ Durchführungsgarantie ✔ Trainer aus der Praxis ✔ Kostenfreies Storno ✔ 3=2 Kostenfreie Teilnahme für den Dritten ✔ Persönliche Lernumgebung ✔ Kleine Lerngruppen

Seminarziel

The goal of the Apache Spark Basics course is to provide participants with a solid understanding of Apache Spark and its fundamental concepts. By the end of the course, participants should be able to understand the challenges of big data processing and the advantages of Spark. They will gain comprehension of Spark's architecture and its components, such as the driver, executor, and cluster manager. Participants will also learn how to work with Resilient Distributed Datasets (RDDs) and perform various transformations and actions on them. Additionally, they will acquire knowledge of Spark Streaming for real-time data processing and gain the ability to integrate Spark with …

Read the complete description

Frequently asked questions

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Get alternatives

Didn't find what you were looking for? See also: Data Storage, CompTIA A+ / Network+ / Security+, IT Security, Software / System Engineering, and Retail (Management).

Seminarziel

Inhalt

Introduction to Apache Spark with Python (PySpark)
- Overview of big data processing challenges
- Introduction to distributed computing and parallel processing
- Introduction to Spark's architecture and components (driver, executor, cluster manager)
- Comparison with traditional batch processing frameworks (Hadoop MapReduce)
- Setting up Spark with Python-Shell
Spark Fundamentals with PySpark
- Understanding Resilient Distributed Datasets (RDDs)
  - RDD characteristics (immutable, partitioned, resilient)
  - RDD operations: transformations (map, filter, flatMap, etc.) and actions (count, collect, reduce, etc.)
  - Lazy evaluation and lineage in Spark
- Hands-on exercises using PySpark
Spark Streaming
- Introduction to Spark Streaming
- Streaming data processing concepts
- DStream (Discretized Stream) operations in Spark Streaming
  - Windowed operations
  - Stateful processing using updateStateByKey()
- Handling data sources (Flume, Kafka) and sinks (HDFS, Cassandra) in Spark Streaming
- Hands-on exercises with Spark Streaming
Integration with Flume, Kafka, and Cassandra
- Introduction to Apache Flume and its integration with Spark
  - Overview of Flume's event-based data ingestion
  - Setting up Flume agents and Spark integration
- Integration of Apache Kafka with Spark Streaming
  - Overview of Kafka's distributed publish-subscribe messaging system
  - Configuring Kafka and Spark integration for real-time data processing
- Introduction to Apache Cassandra and its integration with Spark
  - Overview of Cassandra's distributed NoSQL database
  - Connecting Spark to Cassandra for data storage and retrieval

Get alternatives

There are no reviews yet.

Share your review

Do you have experience with this course? Submit your review and help other people make the right choice. As a thank you for your effort we will donate $1.- to Stichting Edukans.

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.