Apache Kylin

From Wikipedia, the free encyclopedia
Apache Kylin
Developer(s)Apache Kylin Committee
Initial releaseJune 10, 2015; 8 years ago (2015-06-10)[1]
Stable release
3.x3.1.3 / 5 January 2022; 2 years ago (2022-01-05)[2]
4.x4.0.1 / 5 January 2022; 2 years ago (2022-01-05)[2]
RepositoryKylin Repository
Written inJava
LicenseApache License 2.0
Websitekylin.apache.org

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

It was originally developed by eBay, and is now a project of the Apache Software Foundation.[3]

History[edit]

The Kylin project was started in 2013, in eBay's R&D in Shanghai, China. In Oct 2014, Kylin v0.6 was open sourced on github.com with the name "KylinOLAP".[4]

In November 2014, Kylin joined Apache Software Foundation incubator.

In December 2015, Apache Kylin graduated to be a Top Level Project.[3]

In March 2016, Kyligence, Inc. was founded by the creators of Apache Kylin.[5][6] Kyligence provides a commercial analytics platform based on Apache Kylin for on-premise and cloud-based datasets.[7]

Architecture[edit]

Apache Kylin is built on top of Apache Hadoop, Apache Hive, Apache HBase, Apache Parquet, Apache Calcite, Apache Spark and other technologies.[8] These technologies enable Kylin to easily scale to support massive data loads.[9]

Kylin has the following core components:[10][8]

  • REST Server: Receive and response user or API requests
  • Metadata: Persistent and manage system, especially the cube metadata;
  • Query Engine: Parse SQL queries to execution plan, and then talk with storage engine;
  • Storage Engine: Pushdown and scan underlying cube storage (default in HBase);
  • Job Engine: Generate and execute MapReduce or Spark job to build source data into cube;

Users[edit]

Apache Kylin has been adopted by many companies as their OLAP platform in production. Typical users includes eBay, Meituan, XiaoMi, NetEase, Beike, Yahoo! Japan.

Roadmap[edit]

Apache Kylin roadmap (from Kylin website[11]):

  • Hadoop 3.0 support (Erasure Coding) - completed (v2.5)
  • Fully on Spark Cube engine - completed (v2.5)
  • Connect more data sources (MySQL, Oracle, SparkSQL, etc) - completed (v2.6)
  • Real-time analytics with Lambda Architecture - completed (v3.0)
  • Cloud-native storage (Parquet) - In progress (v4.0.0-alpha)
  • Ad hoc queries without Cubing

References[edit]

  1. ^ "Previous Release". v0.7.1-incubating (First Apache Release). Retrieved 15 June 2019.
  2. ^ a b "Apache Kylin - Release Notes". Retrieved 27 September 2022.
  3. ^ a b Apache Software Foundation. "The Apache Software Foundation Announces Apache Kylin as a Top-Level Project", 8 December 2015
  4. ^ "Announcing Kylin: Extreme OLAP Engine for Big Data". www.ebayinc.com. 2014-10-20. Retrieved 2018-11-08.
  5. ^ "Apache Kylin Through the Eyes of the Founders - Part One". Kyligence. 2020-06-12. Retrieved 2020-09-30.
  6. ^ "Big Data Analytics Platform | Learn More About Kyligence". Kyligence. Retrieved 2020-09-30.
  7. ^ "Big Data Analytics Platform: Apache Kylin vs. Kyligence". Kyligence. Retrieved 2020-09-30.
  8. ^ a b "Apache Kylin | Analytical Data Warehouse for Big Data". kylin.apache.org. Retrieved 2020-09-30.
  9. ^ Knorr, Eric (2016-03-07). "What eBay looks like under the hood". InfoWorld. Retrieved 2020-09-30.
  10. ^ "Apache Kylin Adds Real-time OLAP". www.i-programmer.info. Retrieved 2020-09-30.
  11. ^ Kylin, Apache. "Apache Kylin | Development Quick Guide". kylin.apache.org. Retrieved 2020-09-30.