Overview
Deploy a broad range of analytics in the public cloud quickly and easily.
Cloudera Data Hub is a powerful analytics service for Cloudera on cloud that makes it easier and faster to achieve high-value analytics from the Edge to AI in a familiar cluster model in the cloud. Featuring the widest range of analytical workloads—including streaming, ETL, data marts, databases, and machine learning—Data Hub lets you easily move existing workloads from on premises to the cloud or build directly in the cloud.
The comprehensive, cloud-based solution is powered by Cloudera Runtime, a suite of integrated open source technologies, and built on SDX. It offers extensive choices in cluster shapes, workload types, pre-built templates, and configuration options, delivering an intuitive, customizable experience for users who are comfortable with traditional architectures.
Data Hub use cases
Simplify your journey to cloud
Easily lift and shift on-premises Cloudera workloads to the public cloud thanks to a platform that spans both public and private clouds and provides:
- The improved performance, robust governance, and availability of public cloud
- The flexibility to optimize your workloads in both deployment models
- The benefits of a familiar form factor with a traditional cluster model facilitating your move to the cloud
- A seamless migration path to Cloudera's containerized experiences
Deploy complex multi-analytic workloads quickly
Speed up the deployment of complex workloads in the public cloud across the data lifecycle with:
- A cloud-based architecture that lets you deploy a wide variety of flexible, custom analytics workloads
- An intuitive experience employed using familiar node-based clusters, whether you choose a templated approach or build your own workloads
- A high degree of customization, allowing you to deploy workloads tailor-made for your specific business requirements
Select workloads
Real-time data mart
Data engineering for complex pipelines
Streaming on hybrid cloud
Operational database
Real-time data mart
Enable analytics on high volumes of fast-arriving data.
The Real Time Data Mart template in Data Hub lets you ingest millions of records per second, with in-place updates as needed. The data is immediately available in an optimal format for querying. This pattern is ideal for time-series applications, event analytics, CDC reconciliation, and real-time data processing pipelines. The template features the Apache Kudu analytic storage engine, Apache Impala for fast SQL execution, HUE for SQL development and analysis, and Apache Spark Streaming for stream processing/analytics.
Data Engineering for complex pipelines
Enrich, transform, and load data.
Data Hub enables you to enrich, transform, and cleanse data in order to create, execute, and manage end-to-end data pipelines with high degrees of flexibility and customization. The Data Engineering template enables you to execute a wide range of data processing workloads including batch and real-time stream processing using Apache Spark and Hive.
Streaming on hybrid cloud
Collect, process, and build real-time analytics
DataFlow for Data Hub is a comprehensive edge-to-cloud streaming data platform that addresses some of the streaming data challenges across hybrid environments with Apache NiFi and Kafka. It enables users to extend the same on-premises streaming experience of Cloudera DataFlow to the cloud without taxing enormous resources to develop, configure, and maintain them.
Operational Database
Build highly reliable enterprise-class applications.
Data Hub allows you to run high-performance NoSQL databases with support for ANSI SQL. This provides unparalleled scale and performance for business-critical operational applications with Apache Hbase. Operational Database provides evolutionary schema support that enables developers to leverage the power of data while preserving flexibility in application design. It also provides auto-scaling based on the workload utilization of the cluster to optimize infrastructure utilization and cost.