Apache Spark

A Story by Fast and general engine for large-scale data processing

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. Apache Spark is a tool in the Big Data Tools category of a tech stack.

Who is using it?

522 companies use Apache Spark in their tech stacks, including Uber , Shopify , and Slack . Uber Shopify Slack CRED Delivery Hero Hepsiburada Walmart Hubspot Groww technology Wellhub Trendyol Group

Apache Spark

What is Apache Spark?

Who is using it?

Why developers like Apache Spark

Open-source

Fast and Flexible

One platform for every big data problem

Great for distributed SQL like applications

Easy to install and to use

Works well for most Datascience usecases

Interactive Query

Machine learning libratimery, Streaming in real

In memory Computation