KNIME

From Wikipedia, the free encyclopedia
KNIME
Developer(s)KNIME
Stable release
5.2 / December 6, 2023[1]
Repository
Written inJava
Operating systemLinux, macOS, Windows
Available inEnglish
TypeGuided Analytics / Enterprise Reporting / Business Intelligence / Data Mining/ Deep Learning / Data Analysis / Text Mining / Big Data
LicenseGNU General Public License
Websitewww.knime.com

KNIME (/nm/), the Konstanz Information Miner,[2] is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing (ETL: Extraction, Transformation, Loading), for modeling, data analysis and visualization without, or with only minimal, programming.

Since 2006, KNIME has been used in pharmaceutical research,[3] it also used in other areas such as CRM customer data analysis, business intelligence, text mining and financial data analysis. Recently attempts were made to use KNIME as robotic process automation (RPA) tool.[4]

KNIME's headquarters are based in Zurich, with additional offices in Konstanz, Berlin, and Austin (USA).

History[edit]

The Development of KNIME was started January 2004 by a team of software engineers at University of Konstanz as an open-source platform. The original developer team headed by Michael Berthold came from a company in Silicon Valley providing software for the pharmaceutical industry. The initial goal was to create a modular, highly scalable and open data processing platform that allowed for the easy integration of different data loading, processing, transformation, analysis and visual exploration modules without the focus on any particular application area. The platform was intended to be a collaboration and research platform and also serve as an integration platform for various other data analysis projects.[5]

In 2006 the first version of KNIME was released and several pharmaceutical companies started using KNIME and a number of life science software vendors began integrating their tools into KNIME.[6][7][8][9][10] Later that year, after an article in the German magazine c't,[11] users from a number of other areas[12][13] joined ship. As of 2012, KNIME is in use by over 15,000 actual users (i.e. not counting downloads but users regularly retrieving updates when they become available) not only in the life sciences and also at banks, publishers, car manufacturer, telcos, consulting firms, and various other industries as well as at a large number of research groups worldwide. Latest updates to KNIME Server and KNIME Big Data Extensions, provide support for Apache Spark 2.3, Parquet and HDFS-type storage.

For the sixth year in a row, KNIME has been placed as a leader for Data Science and Machine Learning Platforms in Gartner's Magic Quadrant.

A screenshot of KNIME

Internals[edit]

KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. KNIME is written in Java and based on Eclipse. It makes use of extension mechanism to add plugins providing additional functionality. The core version already includes hundreds of modules for data integration (file I/O, database nodes supporting all common database management systems through JDBC or native connectors: SQLite, MS-Access, SQL Server, MySQL, Oracle, PostgreSQL, Vertica and H2), data transformation (filter, converter, splitter, combiner, joiner) as well as the commonly used methods of statistics, data mining, analysis and text analytics. Visualization supports with the free Report Designer extension. KNIME workflows can be used as data sets to create report templates that can be exported to document formats such as doc, ppt, xls, pdf and others. Other capabilities of KNIME are:

  • KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (not limited to the available RAM). E.g. KNIME allows analysis of 300 million customer addresses, 20 million cell images and 10 million molecular structures.
  • Additional plugins allows the integration of methods for text mining, image mining, as well as time series analysis and network.
  • KNIME integrates various other open-source projects, e.g., machine learning algorithms from Weka, H2O.ai, Keras, Spark, the R project and LIBSVM; as well as plotly, JFreeChart, ImageJ, and the Chemistry Development Kit.[14]

KNIME is implemented in Java nevertheless it allows for wrappers calling other code in addition to providing nodes that allow to run Java, Python, R, Ruby and other code fragments.

License[edit]

As of version 2.1, KNIME is released under the GPLv3 license, with an exception that allows others to use the well-defined node API to add proprietary extensions.[15] This allows also commercial software vendors to add wrappers calling their tools from KNIME.

KNIME Courses[edit]

Even though Data Science assumes programming skills, KNIME allows Data Analysts to practice Data Science without one. For study KNIME provides two lines of online courses based on Data Wrangling and Data Science lines.[16]

See also[edit]

  • Weka – machine-learning algorithms that can be integrated in KNIME
  • ELKI – data mining framework with many clustering algorithms
  • Keras - neural network library
  • Orange - an open-source data visualization, machine learning and data mining toolkit with a similar visual programming front-end
  • List of free and open-source software packages

References[edit]

  1. ^ "What's New in KNIME Analytics Platform 5.2". knime.com.
  2. ^ Berthold, Michael R.; Cebron, Nicolas; Dill, Fabian; Gabriel, Thomas R.; Kötter, Tobias; Meinl, Thorsten; Ohl, Peter; Thiel, Kilian; Wiswedel, Bernd (16 November 2009). "KNIME - the Konstanz information miner" (PDF). ACM SIGKDD Explorations Newsletter. 11 (1): 26. doi:10.1145/1656274.1656280. S2CID 408188.
  3. ^ Tiwari, Abhishek; Sekhar, Arvind K.T. (October 2007). "Workflow based framework for life science informatics". Computational Biology and Chemistry. 31 (5–6): 305–319. doi:10.1016/j.compbiolchem.2007.08.009. PMID 17931570.
  4. ^ "KNIME Analytics Platform Bot".,
  5. ^ "Open for Innovation". KNIME.com.
  6. ^ Tripos, Inc. Archived 2011-07-17 at the Wayback Machine
  7. ^ Schrödinger Archived 2009-09-25 at the Wayback Machine
  8. ^ ChemAxon Archived 2011-07-17 at the Wayback Machine
  9. ^ NovaMechanics Ltd.
  10. ^ Treweren Consultants
  11. ^ Datenbank-Mosaik Data Mining oder die Kunst, sich aus Millionen Datensätzen ein Bild zu machen, c't 20/2006, S. 164ff, Heise Verlag.
  12. ^ Forum auf der KNIME Webseite
  13. ^ "Pervasive". Archived from the original on 2010-08-29. Retrieved 2010-12-07.
  14. ^ Beisken, S.; Meinl, T.; Wiswedel, B.; De Figueiredo, L. F.; Berthold, M.; Steinbeck, C. (2013). "KNIME-CDK: Workflow-driven Cheminformatics". BMC Bioinformatics. 14: 257. doi:10.1186/1471-2105-14-257. PMC 3765822. PMID 24103053.
  15. ^ KNIME 2.1.0 released Archived 2010-04-17 at the Wayback Machine
  16. ^ the new learning path[permanent dead link]

External links[edit]

  • KNIME Homepage
  • KNIME Hub - Official community platform to search and find nodes, components, workflows and collaborate on new solutions
  • Nodepit - KNIME node collection supporting versioning and node installation