Cloudera Launches Kudu, New Hadoop Storage for Fast Analytics on Fast Data

PALO ALTO, Calif., Sept. 28, 2015 (GLOBE NEWSWIRE) — Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, announced today the public beta release of Kudu, a revolutionary new columnar store for Hadoop that enables the powerful combination of fast analytics on fast data. Complementing the existing Hadoop storage options, HDFS and Apache HBase, Kudu is the first native Hadoop storage engine that supports both low-latency random access and high-throughput analytics, dramatically simplifying Hadoop architectures for increasingly common real-time use cases. A public beta of Kudu is immediately available under the Apache open source license, and will be transitioned to the Apache Software Foundation incubator in the future.

Until now, developers have been forced to make a choice between fast analytics with HDFS or efficient updates with HBase. Especially with the rise of streaming data, there has been a growing demand for combining the two features to build real-time analytic applications on changing data – leading developers to create complex architectures with the storage options available. Kudu complements the capabilities of HDFS and HBase, providing simultaneous fast inserts and updates and efficient columnar scans. This powerful combination enables real-time analytic workloads with a single storage layer, eliminating the need for complex architectures.

“We’ve been making Hadoop better since the very beginning,” said Charles Zedlewski, vice president, products, Cloudera. “We have an ambitious mission: to constantly drive innovation within the community to usher in the next generation of analytics supported by Hadoop, so companies can adapt to the latest technologies. Cloudera has already transformed what’s possible with Hadoop — enabling interactive data discovery and analytics with Impala and flexible data processing and streaming with Apache Spark. Kudu continues this trend by revolutionizing Hadoop’s storage architecture to better support development of real-time analytic applications, and serves as a crucial step towards solidifying Hadoop as the leading platform for modern analytics.”

Kudu’s architecture streamlines the developer experience for building analytic applications – supporting common use cases that include time series analysis, machine data analytics, and online reporting. Additionally, Kudu is designed to take advantage of changing trends in hardware and in-memory processing. It delivers outstanding CPU performance, takes advantage of RAM and flash, and drives high I/O efficiency as a true columnar store. Finally, as a native, open component within Hadoop, Kudu is integrated with and provides faster query performance for the most powerful analytic frameworks. Users already rely upon many of them, including Impala and Spark – for end-to-end analytic applications in a single platform.

Kudu was jointly engineered by Cloudera and Intel in advance of the changing hardware landscape. Intel has actively contributed to Kudu to help it take full advantage of current and future Intel processor and memory technologies. Kudu was designed to use new persistent memory (pmem) innovations being developed through Intel’s pmem project.

“As Hadoop analytics evolve, it’s critical that they are designed with next-generation hardware in mind,” said Vin Sharma, Intel’s Director of Strategy & Products for Big Data Analytics. “Kudu is a critical milestone for Hadoop, supporting the growing need for simplified real-time applications. Intel worked with Cloudera and the community to ensure Kudu is optimized for fast analytic performance today, but is also built to use Intel’s platform advancements well into the future, such as Intel DIMMs with 3D XPoint memory.”

As an open source project, Kudu has drawn wide involvement from the community. Xiaomi, one of the largest smartphone developers in the world, has been one of the first beta users of Kudu and actively contributes to the project. Other organizations, including AtScale, Splice Machine and Zoomdata, have also been developing on Kudu.

“Xiaomi has been a long-time user of and contributor to the Hadoop ecosystem, using it to power a wide range of use cases across our business,” said Baoqiu Cui, Chief Architect at Xiaomi. “Our infrastructure team has been working with Cloudera to develop Kudu, taking advantage of its unique ability to support columnar scans and fast inserts and updates to continue to expand our Hadoop ecosystem footprint. Using Kudu, alongside interactive SQL tools like Impala, has allowed us to build a next-generation data analytics platform for real-time analytics and online reporting. We are excited to continue to work with the community to further drive Kudu and the capabilities of Hadoop as a whole.”

“Kudu effectively enables the next generation of analytic architectures, especially for Business Intelligence (BI). With its support for both random and sequential high volume reads and writes, it is the ideal storage system for low-latency, scale-out BI architectures of the kind that AtScale customers demand. As enterprises look to democratize access to data and enable Hadoop to run large-scale, fast analytical workloads, Kudu will play a critical role,” says Josh Klahr, VP of Product Management at AtScale. “As a strong proponent of the Apache Hadoop open source ecosystem, AtScale is to be a part of this community effort and is excited to help further develop it for our customers.”

“We are very excited to be part of the Kudu community,” said John Leach, co-founder and CTO, Splice Machine. “At Splice Machine, we have developed an ACID compliant RDBMS that runs on Hadoop and are pushing the envelope in terms of running mixed workloads on Hadoop. As a result, we welcome and support innovation in Hadoop’s storage architecture. Kudu holds incredible promise in its ability to handle real-time updates combined with long-running analytics. It strengthens the Hadoop ecosystem by providing a scalable, alternate storage engine that complements existing ones.”

“Kudu provides a simplified storage architecture for use cases that are quite common among Zoomdata users,” said Justin Langseth, CEO at Zoomdata. “As a native component of Hadoop, Kudu’s integration with Impala and Spark make it easy to open up this data using Zoomdata’s fast visual analytics solution. We have worked closely with Cloudera and the community to help develop Kudu to meet the needs of our users – supporting the streamlined combination of real-time and analytic applications – and we are excited to continue this effort with the public beta release.”

For organizations to continue to benefit from data-driven insights, Hadoop’s architecture has to work at the same, ever-accelerating speed at which data is being created and changed. With Kudu, the Hadoop community ushers in the next generation of Hadoop applications with storage for fast analytics on fast data.

“In the era of machine-generated data, there’s an increasing need to analyze data in human real-time. This is true across a broad range of analytic use cases, from monitoring and business intelligence to predictive modeling and recommendation,” said Curt Monash, president, Monash Research. “Kudu, Spark and the rest of the Hadoop stack are a promising approach toward eventually meeting those needs.”

Resources

Start Contributing – [http://getkudu.io/contributing.html]

Download the Public Beta or Try the VM – [www.cloudera.com/downloads]

Technical Whitepaper – [getkudu.io/kudu.pdf]

###

This information is not a commitment, promise or legal obligation to deliver any material, code or functionality. Cloudera does not guarantee that the beta software will be made generally available or that any individual feature in the beta version will be made generally available. Cloudera may make the beta software generally available, or not, in its sole discretion and without obligation to make any communication of any kind with regard to such availability.

About Cloudera

Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera’s open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,800 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.

Connect with Cloudera

Read our blogs: http://www.cloudera.com/blog/ and http://vision.cloudera.com/

Visit us on Facebook: http://www.facebook.com/cloudera

Join the Cloudera Community: http://cloudera.com/community

Cloudera, Cloudera’s Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

###

CONTACT: Deborah Wiltshire
         Cloudera
         +1 (650) 644-3900 ext. 5907
         [email protected]