Improve Path type Support for filtering and classification

Hi,

I just stumbled upon this error where the rule engine seems to have no access to the row id / index as it cannot cope with the column type Path and therefore believes the table is empty.

Best
Mike

Hello @mwiegand,

This is intended behavior as the Rule Engine is not designed to work directly with file paths because they are not a type of value, they are file locations (which we should not be able to perform comparison operators on); Although, you can convert them to a string and then run them through the Rule Engine node.

Hope this explains it,
TL

5 Likes

Good evening @thor_landstrom,

thanks for the explanation. It took me quite some time to put this together. The test workflow further down below was reworked a few times and might not match up the screenshots but nevertheless the findings should apply.

The point I want to make is that the path variable has been introduced quite some time ago. Yet its support / integration into nodes seems to have stalled.

Let me provide a more comprehensive overview with an example workflow. As follows a few remarks highlighting the inconsistencies (not comprehensive due to time constraints):

Conversion Path>String>Path not working
By simply converting an existing path to a string and back to path, the path does not match.

Cause is that the String to Path does not support the creation of Knime-Specific destinations “knime://”

Start
"temp_dir_path" (FSLocation: (RELATIVE, knime.workflow.data, ./knimetemp-52d6fbf8b2074301))	(RELATIVE, knime.workflow.data, ./knimetemp-52d6fbf8b2074301)

Path > String
"temp_dir_path_location" (STRING: knime://knime.workflow/data/knimetemp-52d6fbf8b2074301)	knime://knime.workflow/data/knimetemp-52d6fbf8b2074301

String > Path
knime:/knime.workflow/data/knimetemp-52d6fbf8b2074301

Rule-based Row Filter: URI supported, Path not

Rule Engine: URI-Type (as boolean???) supported, Path not

Reference Row Filter: Path is supported and working

Path-Variable Handling unnecessary complex
Working with path as a variable is a chore as it is not supported by Constant to Value Column, nor by Missing Value nor any other mean to populate data into a cell / column. Hence, it is necessary to:

  1. Convert the Path-Variable to a String
  2. Write the string to the table
  3. Convert back to Path
  4. Clean up all names and not required columns
  5. Do not loose focus xD

Overall, I’d suggest:

  1. If a data type is not supported, it should not be made available as a choice or at least be disabled to indicate that it is not supported.
  2. Improve consistency by enable generation of Knime-specific paths “knime://” in all nodes that do handle the path type like the List Files/Folders or String to Path nodes or remove that specific path declaration (unlikely because of Hub I assume)
  3. Add the option in the List Files/Folders node to create URI or string

Talking about a concrete use case. When you have an index of files using the Rule based row filter to match multiple criteria should explain it thoroughly.

PS: I might have missed the red line, going astray with some thoughts or ideas, while pushing this over the line. Let me know if you have any questions.

Best
Mike

3 Likes

Hey @mwiegand,

Thanks for the detailed response and also attaching the test workflow you were using. Yes, I can agree with you, at least in my experience, that paths are not fun work with especially when I tried using them by passing them as a variable to a python script node. If I recall I had to convert them from relative path → absolute URI.

So I do not have a deep understanding of how paths were implemented in the code base, but, if I had to guess why such behavior is caused (like in the picture you attach with trying to use ‘LIKE’ on a path vs a URI) this is due to the overall nature of paths being abstractions of the actual URI. Now, because of this, I believe KAP handles URI’s like strings which is why you can use string operations such as ‘LIKE’, but it does not do that for type PATH which is why you can’t use LIKE for it. Also, URI is like a standardized way of representing files across different systems, so this is what I assume is why URI works versus PATH not working (also why at least in my experience it is more consistent with working across various nodes).

Now while PATH doesn’t support operations such as those for strings, I believe it is included for example in the rule engine node because it looks like you can use the operation ‘MISSING ?’ for path types.

I like the suggestions you lay out, there should be a clear distinction made between URI and Paths and also add options to use a URI versus Path.

These are some great suggestions!

If you want, you can probably rename the title to cover your new suggestions as I think these suggestions should be considered for future implementations.

Thanks for taking the time to share!
TL

1 Like

De rién @thor_landstrom. Though, I must also bravo you for the in depth explanation. I’d never have thought that the Path type is an abstraction of the URI type.

Which begs the question, why was the Path type introduced in the first place instead of improving the already solid URI integration. It rather feels that the Path type is adding an unnecessary developer burden, reduces intercompatibility (i.e. when data is passed to other formats such as Excel, CSV or DBs), increases complexity especially because of the back and forth conversion and so on.

From my perspective, and please correct me if that is wrong or misses something important, there is barely any advantage et all for anyone using Path.

PS: Do you think the title accurately reflects the suggestion now?

Best
Mike

Hi,

Path comes with a file system that you can traverse, read from, write to, etc (see File Handling) and can be represented in the URI format.

A URI can be anything that identifies a resource, e.g. an ISBN. It would not make sense to list files of an ISBN, right? But it makes sense to list the files in a Path on the Hub File System – that it is a Hub and where it is (and credentials…) is information available from the Path cell, but not from a URI cell.

Hope that makes it clear why both types exist. Just because one can be represented as the other does not invalidate the existence of the former :). Otherwise, we could just use BLOB cells :slight_smile:

5 Likes