_db_sql_bigdata_hive_spark_meta_collection

Hub · September 24, 2020, 4:17pm

A meta collection of KNIME and database related issues, dealing with basic SQL as well as Big Data related workflows with Hive/Impala and Spark (including PySpark) A meta collection of KNIME and database related issues, dealing with basic SQL as well as Big Data related workflows with Hive/Impala and Spark (including PySpark) KNIME as a powerful data anaylytics platform is able to connecct to all sorts of data bases and supports a lot of functions mainly by SQL and SQL-like languages. KNIME also supports big data platforms and related frameworks - like Spark and PySpark. There are myriads of database related topics and books and online ressources but ... What I try to do is present a collection of useful links and examples when you want to set out to use databases and KNIME abd hopefully by providing some sampel workflows that one could try out at home ease some quirks especially when it comes to dealing with big data systems and Spark (including PySpark). The 'normal' SQL examples will be with SQLite and H2 so you can easily try this at home (but they will work for other databases like Postgres or MySQL as well). The big data examples are based on KNIME's local big data environment - which is very useful if you happen not to have such a cluster at home. A good starting point in general is the official "KNIME Database Extension Guide" https://docs.knime.com/latest/db_extension_guide/index.html?u=mlauber71 => you can dive right into the examples and will see quick results, but sometimes it helps to read about the concept. ------------------------------- A - About SQL in general - a few links to refersh and learn Structured Query Language Basics https://www.nuwavesolutions.com/structured-query-language-basics/ SQL CASE WHEN - and other code in the sqltutorial https://www.sqltutorial.org/sql-case/ KNIME courses Lesson 4. Bringing Things Together - Join and Concatenation https://www.knime.com/self-paced-course/l1-dw-knime-analytics-platform-for-data-wranglers-basics/lesson4#join-concatenation?u=mlauber71 KNIME Nodeguide for Joining and Concatenating https://www.knime.com/nodeguide/etl-data-manipulation/joining-and-concatenating?u=mlauber71 ------------------------------- B - KNIME and SQL databases - going from basic to advanced Database - Simple IO (using standalone SQL-databse SQLite) https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/01_Database_Simple_IO_Example?u=mlauber71 KNIME and SQLite - simple example (with the New DB nodes) https://hub.knime.com/mlauber71/spaces/Public/latest/sqlite_knime_40/t_001_sqlite_knime_40_db_nodes?u=mlauber71 H2 - use a SQL standalone Database from Scratch or from upload https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_h2_create_table_from_scratch?u=mlauber71 Databases - Advanced Usage (using SQLite) https://hub.knime.com/knime/spaces/Examples/latest/01_Data_Access/02_Databases/02_Database_Advanced_Example?u=mlauber71 ------------------------------- C - KNIME and Big Data KNIME can also handle Big Data systems and databases. So you can go from handling your local or small SQL-DB right up to big enterprise systems with (eg.) Cloudera. KNIME Big Data Extensions User Guide https://docs.knime.com/latest/bigdata_extensions_user_guide/index.html?u=mlauber71 School of Hive – everything you need to know to work with Hive tables on a Big Data system https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_school_of?u=mlauber71 => if you want to understand the concepts of Big Data tables and partitions by doing it in code steps from scratch An overview of KNIME based functions to access big data systems (with KNIME's local big data environment) https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_020_db_access_local_bigdata_tables?u=mlauber71 => see for yourself how the big data nodes work on your local computer KNIME and Hive - load multiple CSV files at once via external table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71 -- C+ Bonus Track (some more advanced big data functions with Hive like external tables demonstrated and partitions half-open code) work with Hive and external tables in CSV and Parquet KNIME and Hive - load multiple CSV files at once via external table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_csv_loader/m_001_import_hive_csv?u=mlauber71 KNIME and Hive - load multiple Parquet files at once via external table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_parquet_loader/m_001_import_hive_parquet?u=mlauber71 Some more functions with Hive like adding fields to Hive table https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_hive_add_column_db_40?u=mlauber71 Hive - how to handle missing tables in Hive? - us a try-if combination (THX to H. Stölting for the inspiration) https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_hive_missing_table_switch?u=mlauber71 D ------------------------------- KNIME and Spark Being Lazy is Useful — Lazy Evaluation in Spark https://medium.com/analytics-vidhya/being-lazy-is-useful-lazy-evaluation-in-spark-1f04072a3648 => you can do without and jump right into it, but it makes sense to understand this basic concept of Spark Comparison of Hive and Spark SQL - a gentle introductory example https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/07_SparkSQL_meets_HiveQL?u=mlauber71 An overview of KNIME based functions to access big data systems - use it on your own big data system (including PySpark) https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_bigdata_nodes/m_120_db_access_bigdata_tables?u=mlauber71 => use the DB, Spark and PySpark nodes on your big data system -- going further with Spark Overview of Examples using Spark (and ML) with KNIME https://hub.knime.com/knime/spaces/Education/latest/Courses/L4-BD%20Introduction%20to%20Big%20Data%20with%20KNIME%20Analytics%20Platform/3_Spark/4_Examples/?u=mlauber71 Local Big Data Irish Meter https://hub.knime.com/knime/spaces/Examples/latest/10_Big_Data/02_Spark_Executor/09_Big_Data_Irish_Meter_on_Spark_only?u=mlauber71 Cleaning the NYC taxi dataset on Spark https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/49_NYC_Taxi_Visualization/Data_Preparation?u=mlauber71 E ------------------------------- More database and SQL related stuff Tobias Kötter's KNIME hub space with a lot of DB related workflows https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/?u=mlauber71 If you want to transfer data between data bases without down- and uploading it all https://hub.knime.com/tobias.koetter/spaces/Public/latest/DB/DBStreamingDataTransfer?u=mlauber71 Microsoft Access is still a thing - and you can access it with KNIME https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_db_update_merge_ms_access?u=mlauber71 The full SQL can be found in this free eBook (not all SQL commands might work on all databases) https://goalkicker.com/SQLBook/

This is a companion discussion topic for the original entry at https://kni.me/w/5b-Kvx67vcvieGfC