Ka-Boom! Palladian 2.0

Palladian 2.0 for KNIME is here. Since we introduced Palladian into the KNIME ecosystem nine years ago, this is our first major update. It involved tweaking, fixing and improving existing nodes, replacing old nodes with updated ones and adding a new node on which we’d been working for a while and of which we’re particularly proud of: Say hello to Regex Extractor!

Regex Extractor

Fiddling with data inevitably brings you down to the hell of string voodoo and dark regex magic. If you’re like us and you’ve always felt both under- and overwhelmed by KNIME’s “String Manipulation” or “Regex Split” nodes, then we’ve got a true gem for you. The new Regex Extractor allows you to create, test and tweak your regular expressions as easily as a breeze – and it shows you what the result will look like in real time. Extract URLs, numbers, email addresses, product codes, split your string or tokenize your texts.

A picture is worth a thousand words, and a movie a million, so check this out:

Further Improvements

  • The Date Extractor now provides several output modes (single row, multi row, collection cells) and keeps the original input table structure intact – yes, this was long overdue!

  • Map Viewer fixes an issue with OSM tiles and adds new tile presets to make your maps fun. Welcome the new kids called “Toner”, “Terrain”, and “Watercolor”.

  • The Hash Calculator supports a whole bunch of hashing algorithms now: MD2, MD5, SHA, SHA-224, SHA-256, SHA-384, SHA-512 – and many many more whose names we can’t even remember ourselves. You can hash strings or binary data.

Other Changes

  • We have moved all nodes which depend on “KNIME Textprocessing” to a separate, optional installation. Why? We do not want to force our users to install a big, heavyweight extension which they probably do not need. Old nodes such as the “Content Extractor” have a modern successor which works more generically and independently from the Textprocessing plugin now.

  • Palladian is no longer a “KNIME Community Contribution”. This was a deliberate decision on our side and you can find some of our reasons here.

  • Check NodePit for the detailed changelog.

How to get it? If you’re running the NodePit plugin within KNIME (which you undoubtedly should), then just search for “Palladian” and start the installation. Alternatively, add the following URL in Preferences → Available Software Sites → Add … and then go to File → Install KNIME Extensions.

Installation

http://download.nodepit.com/palladian/4.1

Did we miss something? Let us know! Do you like what we’re doing? Let us know! Do you have a cool use case which involved our nodes? Let us know! Do you want to support us? Let us know!

Cheers,
Philipp

11 Likes

Finally! Thank you very much for the hard work, @qqilihq!

Just my two cents, but the Regex Extractor is one of the best nodes that have been released in the KNIME universe in the last years. Incredibly awesome piece of software. I really love it :heart:

Keep on going!

Best regards,
Daniel

4 Likes

Very cool thank you :slight_smile:

2 Likes

Today, we released Palladian 2.2 for KNIME into the wild. Here’s a wrap up of the changes since the initial 2.0 release:

version-2.2.0 (2020-05-15)

  • [Add] Regex Extractor: Add a “Columns” output mode which appends a column for each matched group.
  • [Fix] Google Address Geocoder: Fix pointer to preferences in node documentation – kudos to joan_beneyto
  • [Fix] Location Extractor: Fix pointer to preferences in node documentation
  • [Fix] MapQuest Geocoder: Fix pointer to preferences in node documentation

version-2.1.0 (2020-05-08)

  • [Add] Regex Extractor: Add “Drop Full Match” option (see here)

version-2.0.2 (2020-02-01)

  • [Fix] Regex Extractor: Fix configuration logic which would prevent output when picking a different input column than the first – kudos to Armin Ghassemi Rudd

version-2.0.1 (2020-01-26)

  • [Fix] Date Extractor: Fix execution exception which would happen for some settings combinations – kudos to Armin Ghassemi Rudd
  • [Fix] Improve KNIME server detection, avoid false alarms on “normal” KNIME when “KNIME Executor connector” is installed

Also, the current changelog is always available here:

4 Likes

Hello autumn, hello Palladian 2.3! :maple_leaf:

Palladian 2.3 contains three new “GeoIP2” nodes for resolving Geo-related data based on IP addresses via MaxMind. This is perfect for analyzing log data, getting statistics about your web site users, or detecting fraud behavior.

Changes since 2.2.0 in detail (see also https://nodepit.com/product/palladian):

  • (Info) Requires at least KNIME 4.0 (please make sure you’re using an update site URL corresponding to your KNIME version)
  • (Add) Regex Extractor: Add a “Rows or Missing” output mode which appends a row with missing value cells in case of a no-match (see here)
  • (Add) Text Classifier Model Writer: Report progress while writing model
  • (Add) Text Classifier Model Reader: Report progress whiel reading model
  • (Add) GeoIP2 Extractor, GeoIP2 DB Connector, GeoIP2 WS Connector: New nodes to get information for IP addresses using the MaxMind API or MMDB files
  • (Change) More efficient storage of HttpResult cells
  • (Change) Improved renderer for HttpResult cells showing headers and payload
  • (Change) HTML Parser: Add “Drop input column” setting (see here)
  • (Change) HTML Parser: Allow to input HTML strings
  • (Change) Regex Extractor: Timeout presumably endless regexes in dialog after 15 seconds
  • (Change) Regex Extractor: Allow to cancel long running regexes during node execution
  • (Change) String Similarities: Allow to configure name of output column (see here)
  • (Fix) Text Classifier Model Writer: Ensure that model file is always written in GZIP format (see here)
  • (Fix) Text Classifier Model Writer: Ensure that .palladianDictionaryModel extension is appended
  • (Fix) String Similarities: Handle missing value input

Update site URLs:

  • (KINME 4.2) https://download.nodepit.com/palladian/4.2
  • (KNIME 4.1) https://download.nodepit.com/palladian/4.1
  • (KNIME 4.0) https://download.nodepit.com/palladian/4.0
5 Likes

:birthday: Happy Birthday Palladian – Here’s v2.4

We have just celebrated the 10th birthday of the Palladian plugin for KNIME (useless fact: the first commit was 2011-02-01, 17:59 CET), so this seems like a good opportunity to release this update.

Please find the detailed change log below. We have upgraded the version of the wrapped Palladian lib to fix several issues and added some new nodes and functionalities.

There’s a new OAuth Connector [BETA] node, which allows to build browser-based OAuth flows in your KNIME workflow:

This way you can easily authenticate with the following API services: 500px, Asana, AWeber, Box, Dataporten, Digg, Discord, Dropbox, Etsy, Facebook, Fitbit, Flickr, Foursquare, Freelancer, Genius, GitHub, Google, HeadHunter ХэдХантер, HiOrg-Server, Imgur, Kaixin 开心网, Kakao, Keycloak, LinkedIn, Mail.Ru, MediaWiki, Meetup, Microsoft Azure Active Directory (Azure AD), Microsoft Azure Active Directory (Azure AD) 2.0, Microsoft Live, NAVER, Odnoklassniki Одноклассники, Pinterest, Polar, Renren, Salesforce, Sina, Skyrock, Slack, StackExchange, Trello, Tumblr, TUT.BY, Twitter, uCoz, Viadeo, VK ВКонтакте, Xero, XING, and Yahoo. Admittedly, this is a node for advanced users familiar with REST APIs and we still consider it BETA. As there was quite some demand recently, we decided to roll it out in this early stage and we’re looking for your feedback. To install it, you must explicitly enable “Palladian for KNIME: OAuth Nodes”.

Changes since 2.3 in details (see also here):

  • (Info) Requires at least KNIME 4.1 (please make sure you’re using an update site URL corresponding to your KNIME version)
  • (Add) Trim Image Whitespace: Node to remove white space surrounding a PNG image
  • (Add) HTTP Retriever: Allow to override default proxy configuration in the “Proxy” tab (see here)
  • (Add) HTTP Retriever: Store redirected location in HTTP Results
  • (Add) HTTP Result Data Extrator: Add setting “Append redirected locations”
  • (Add) HTTP Retriever: Allow PATCH
  • (Add) N-Gram Extractor: Allow to specify output column name
  • (Add) N-Gram Extractor: Allow to drop input column
  • (Add) Base64 Encoder, Base64 Decoder: New nodes for encoding/decoding Base64
  • (Add) OAuth Connector: New node for connecting to 50+ OAuth-based APIs. This node is currently labled as “BETA” – there might be bugs or later versions might change the functionality. In case of feedback or bug reports, please do reach out!
  • (Change) Use version 2.0 of Palladian Toolkit library
  • (Change) HTTP Retriever: Show exection warnings on node additionally to logging them (e.g. when HTTP method is missing or invalid, when URL contains whitespace, in case of network errors)
  • (Change) HTTP Retriever: Automatically trim whitespace around URLs (see here)
  • (Change) AP Calculator: Make node streamable
  • (Change) Coordinate to Latitude/Longitude: Make node streamable
  • (Change) Form Encoded HTTP Entity Creator: Make node streamable
  • (Change) Hash Calculator: Make node streamable
  • (Change) HTML Parser: Make node streamable
  • (Change) Latitude/Longitude to Coordinate: Make node streamable
  • (Change) Multipart Encoded HTTP Entity Creator: Make node streamable
  • (Change) String Similarity: Make node streamable
  • (Change) Trim Image Whitespace: Make node streamable
  • (Change) URL Domain Extractor: Make node streamable
  • (Change) URL Normalizer: Make node streamable
  • (Change) URL Resolver: Make node streamable
  • (Change) Web Page Content Extractor: Make node streamable
  • (Remove) Ranking Services: Remove obsolete Compete, Delicious, DMOZ
  • (Fix) HTTP Retriever: Required validation for User Agent input in dialog
  • (Fix) HTTP Retriever: Prevent entering negative values for Socket Timeout
  • (Fix) Text Classifier Learner, Text Classifier Predictor: Fix link to press release in node documentation (kudos to @armingrudd)
  • (Fix) TF-IDF Similarity: Fix NaN values (see here)
  • (Fix) HTML Parser: Fix absolute URLs on redirected requests
  • (Fix) HTTP Retriever: Properly handle URLs with ? which are not query params (see here)
  • (Fix) HTTP Retriever: Make parsing of cookie “expires” attribute more lenient and behave more like a web browser (see here)
  • (Fix) HTTP Retriever: Automatically strip away URL #fragments (see here)
  • (Fix) HTTP Retriever: Properly honor the given “Maximum file size” limit, even when below 1024 bytes (see here)
  • (Fix) GeoIP2 Extractor: Improve error message when DB file cannot be accessed
  • (Fix) Hash Calculator: Correctly honor “Remove input column” setting
  • (Fix) HTTP Retriever: Catch potential NullPointerException in SSL-related code (see here)
  • (Fix) Fix potential version conflicts with NodePit licensing plugin

Anything missing? Any question? Any feedback? Get in touch!

9 Likes

Happy Birthday Palladian! :partying_face:

Great new features and fixes. Thank you @qqilihq

:blush:

3 Likes

Great news. Happy celebration!

3 Likes

YES!! I was able to use this new OAuth Connector [BETA] to successfully authenticate with Google Search Console Webmasters API and make POST Requests. It was sooo much easier than anything else out there. I’ll follow up on the subsequent tasks/workflow in this thread here - Having difficulties setting up Post Request to Google API requiring OAuth2.0 - #14 by qqilihq.

Thank you @qqilihq for great features and community support.

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.