Cloudera acquires Octopai's platform to enhance metadata management capabilities

Read the press release
Overview

Deploy a broad range of analytics in the public cloud quickly and easily.

Cloudera Data Hub is a powerful analytics service for Cloudera on cloud that makes it easier and faster to achieve high-value analytics from the Edge to AI in a familiar cluster model in the cloud. Featuring the widest range of analytical workloads—including streaming, ETL, data marts, databases, and machine learning—Data Hub lets you easily move existing workloads from on premises to the cloud or build directly in the cloud.

The comprehensive, cloud-based solution is powered by Cloudera Runtime, a suite of integrated open source technologies, and built on SDX. It offers extensive choices in cluster shapes, workload types, pre-built templates, and configuration options, delivering an intuitive, customizable experience for users who are comfortable with traditional architectures.

Data Hub use cases

Simplify your journey to cloud

Easily lift and shift on-premises Cloudera workloads to the public cloud thanks to a platform that spans both public and private clouds and provides:

  • The improved performance, robust governance, and availability of public cloud
  • The flexibility to optimize your workloads in both deployment models
  • The benefits of a familiar form factor with a traditional cluster model facilitating your move to the cloud 
  • A seamless migration path to Cloudera's containerized experiences 
Deploy complex multi-analytic workloads quickly

Speed up the deployment of complex workloads in the public cloud across the data lifecycle with: 

  • A cloud-based architecture that lets you deploy a wide variety of flexible, custom analytics workloads 
  • An intuitive experience employed using familiar node-based clusters, whether you choose a templated approach or build your own workloads
  • A high degree of customization, allowing you to deploy workloads tailor-made for your specific business requirements

Select workloads

  • Real-time data mart
  • Data engineering for complex pipelines
  • Streaming on hybrid cloud
  • Operational database

Real-time data mart


Enable analytics on high volumes of fast-arriving data. 

The Real Time Data Mart template in Data Hub lets you ingest millions of records per second, with in-place updates as needed. The data is immediately available in an optimal format for querying. This pattern is ideal for time-series applications, event analytics, CDC reconciliation, and real-time data processing pipelines. The template features the Apache Kudu analytic storage engine, Apache Impala for fast SQL execution, HUE for SQL development and analysis, and Apache Spark Streaming for stream processing/analytics.

 

Data Engineering for complex pipelines


Enrich, transform, and load data. 

Data Hub enables you to enrich, transform, and cleanse data in order to create, execute, and manage end-to-end data pipelines with high degrees of flexibility and customization. The Data Engineering template enables you to execute a wide range of data processing workloads including batch and real-time stream processing using Apache Spark and Hive.

Streaming on hybrid cloud



Collect, process, and build real-time analytics

DataFlow for Data Hub is a comprehensive edge-to-cloud streaming data platform that addresses some of the streaming data challenges across hybrid environments with Apache NiFi and Kafka. It enables users to extend the same on-premises streaming experience of Cloudera DataFlow to the cloud without taxing enormous resources to develop, configure, and maintain them.

Learn more about DataFlow Cloud Services

Operational Database


Build highly reliable enterprise-class applications. 

Data Hub allows you to run high-performance NoSQL databases with support for ANSI SQL. This provides unparalleled scale and performance for business-critical operational applications with Apache Hbase. Operational Database provides evolutionary schema support that enables developers to leverage the power of data while preserving flexibility in application design. It also provides auto-scaling based on the workload utilization of the cluster to optimize infrastructure utilization and cost.

Key features

Data Hub is for users who want flexibility, scalability, and ease of use. It allows you to rearrange worker roles, configure GPU support, adjust resource management settings, and tune clusters to implement complex, multi-function analytics use cases at scale.

Data Hub clusters can be provisioned and disposed of quickly with pre-built or custom configuration options for infrastructure. Pre-configured cluster definitions with cloud provider-specific settings and cluster templates with Cloudera Runtime service configurations allow you to quickly provision workload clusters for prescriptive use cases. You can also save your own cluster definitions and templates for future reuse.

Data Hub enables you to easily move your legacy workloads in a familiar form factor to a cloud model. The cloud-based architecture decouples data from the compute infrastructure, and the data delivery layer is abstracted from raw data. This decoupled architecture significantly improves flexibility, agility, data protection, and scale.

It’s easy to provision multiple clusters on shared data, so customers can launch new applications that can be fully isolated with the right security and governance and without interrupting existing production applications.

Data Hub is underpinned by Cloudera SDX, which allows you to secure and govern platform data and metadata and control capabilities with dedicated, integrated interfaces to manage it. Data security, governance, and control policies are set once and consistently enforced everywhere, reducing operational costs and business risks while also enabling complete infrastructure choice and flexibility.

Data Hub is built in Cloudera Runtime, the core open source software distribution within Cloudera's platform that includes approximately 50 open source projects. Leveraging Runtime allows you to leverage the right set of open source tools to build your workloads and applications.

Ready to take a deeper look?


Experience Cloudera Data Hub for yourself

Datasheet

CloudSmart: Get started on your cloud analytics journey

Webinar

Scale analytics with confidence in the public cloud

Ebook

3 steps to successfully migrate to public cloud

Solution Brief

Drive better health outcomes with Cloudera and IQVIA

World-class training, support, & services

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.