Redshift
💡 Definition
Amazon Redshift is a fully managed, petabyte-scale data warehouse service that enables you to run complex analytic queries against structured data, using standard SQL, and get results in seconds.
🔑 Key Concepts
- Data Warehouse: Optimized for analytical processing (OLAP), not transactional processing (OLTP).
- Columnar Storage: Stores data in a columnar format, which is efficient for analytical queries.
- Massively Parallel Processing (MPP): Distributes queries across multiple nodes for faster execution.
- Managed Service: AWS handles setup, operational tasks, and scaling.
⚙️ How it Works
You launch a Redshift cluster, which consists of one or more compute nodes. You load your data (often from S3), and then you can run SQL queries to analyze large datasets.
🎯 Use Cases
- Business Intelligence (BI): Analyzing sales data, customer behavior, and operational metrics.
- Big Data Analytics: Performing complex queries on very large datasets.
- Reporting: Generating reports from consolidated data.
💰 Pricing Model
- Compute Nodes: Charged per hour per node, based on node type.
- Storage: Included with the node price.
- Data Transfer: Charges apply for data transfer in/out of Redshift.
📝 Exam Tips (CLF-C02)
- Remember Redshift = Data Warehouse.
- Best for OLAP (Online Analytical Processing) workloads.
- Uses columnar storage and MPP architecture.
- Good for BI and big data analytics.