Platform features
Data management and analytics functions |
Projects & components |
Cloudera Private Cloud Base Edition |
Enterprise Data Hub |
HDP Enterprise Plus |
Distributed batch processing of large data sets | Apache Hadoop | |||
Database for structured data storage of large tables | Apache HBase +conn, +indx | |||
Data warehouse summarization & ad hoc querying | Apache Hive | |||
Metadata store for Hive tables | Hive Metastore (HMS) | |||
Workflow scheduler to manage Hadoop jobs | Apache Oozie | |||
Columnar storage format for Hadoop ecosystem | Apache Parquet | |||
Fast compute engine for ETL, ML, stream processing | Apache Spark | |||
Bulk data between Hadoop and structured datastores | Apache Sqoop | |||
Job scheduling and cluster resource management | YARN | |||
Coordination service for distributed applications | Apache Zookeeper | |||
Store and manage large data sets across a cluster | Apache Accumulo | |||
Metadata management, governance & data catalog | Apache Atlas | |||
OLTP and real-time SQL access of large datasets | Apache Phoenix | |||
Manage data security across the Hadoop ecosystem | Apache Ranger | |||
Smallest, fastest columnar storage for Hadoop | Apache ORC | |||
Data-flow framework for batch, interactive use-cases | Apache Tez | |||
Fast analytical queries on event-driven data | Apache Druid | |||
Perimeter security governing access to Hadoop | Apache Knox | |||
Easy interaction with Spark clusters via REST interface | Apache Livy | |||
Cryptographic key | Ranger KMS | |||
Notebook for interactive analytics | Apache Zeppelin | |||
Data serialization system | Apache Avro | |||
Manage and control Hadoop ecosystem functions | Cloudera Manager | |||
SQL workbench for data warehouses | Hue | |||
Distributed MPP SQL query engine for Hadoop | Apache Impala | |||
Cryptographic key management | Key Trustee Server | |||
Column-oriented data store for fast data analytics | Apache Kudu | |||
Enterprise search platform | Apache Solr | |||
Key Trustee Server hardware security integration | Key HSM | |||
Transparently encrypts and secures data at rest | Navigator Encrypt | |||
Real-time streaming data pipelines and apps | Apache Kafka | |||
Distributed object store for Hadoop | Apache Ozone | |||
Streams Messaging for data ingestion and buffering | Apache Kafka | |||
Monitoring and management of Kafka clusters | Streams Messaging Manager | |||
Replication of cross-cluster Kafka data | Streams Replication Manager | |||
Integrate with data sources from Kafka | Kafka Connect | |||
Governance and management of metadata and schemas | Schema Registry | |||
Auto-balancing of Kafka clusters | Cruise Control | |||
Light-weight stream processing engine for Kafka | Kafka Streams | |||
High-performance format for huge analytic tables | Apache Iceberg | |||
Disaster Recovery & Backups | Iceberg Replication |