EMR (Elastic MapReduce)

aws/analytics aws/big-data aws/service

💡 Definition

Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Hadoop, Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.

🔑 Key Concepts

⚙️ How it Works

  1. Launch Cluster: Select applications (e.g., Spark) and hardware config.
  2. Process: Submit steps/jobs to the cluster.
  3. Output: Results are usually written to S3.

🎯 Use Cases

💰 Pricing Model

📝 Exam Tips (CLF-C02)


See Also: * Redshift * Athena * Glue