Redshift

aws/database aws/service

💡 Definition

Amazon Redshift is a fully managed, petabyte-scale data warehouse service that enables you to run complex analytic queries against structured data, using standard SQL, and get results in seconds.

🔑 Key Concepts

Data Warehouse: Optimized for analytical processing (OLAP), not transactional processing (OLTP).
Columnar Storage: Stores data in a columnar format, which is efficient for analytical queries.
Massively Parallel Processing (MPP): Distributes queries across multiple nodes for faster execution.
Managed Service: AWS handles setup, operational tasks, and scaling.

⚙️ How it Works

You launch a Redshift cluster, which consists of one or more compute nodes. You load your data (often from S3), and then you can run SQL queries to analyze large datasets.

🎯 Use Cases

Business Intelligence (BI): Analyzing sales data, customer behavior, and operational metrics.
Big Data Analytics: Performing complex queries on very large datasets.
Reporting: Generating reports from consolidated data.

💰 Pricing Model

Compute Nodes: Charged per hour per node, based on node type.
Storage: Included with the node price.
Data Transfer: Charges apply for data transfer in/out of Redshift.

📝 Exam Tips (CLF-C02)

Remember Redshift = Data Warehouse.
Best for OLAP (Online Analytical Processing) workloads.
Uses columnar storage and MPP architecture.
Good for BI and big data analytics.

See Also: * RDS * DynamoDB * S3