Stanage again (last post for a while).
OK here is the problem. I have a text file that I need to parse, it is a tab delimited file, but contains data that is comma seperated, it would be great to be able to either call an external application to parse this information(hence previous post), or a node that can handle text manipulation, so that the file can be split and data extracted (PERL/Python scripts? - although I don't have the programming skills and will need to learn them). Can this be hidden behind a more user friendly interface or series of nodes.
Further to the above post. What is need is a general node that allows for text transformations, so for example, CSV seperated values can be parsed into a new table view, ie. seperated into seperate table rows or columns for further analysis or filtering.
This is a pretty basic requirement, as you need these functions to manipulate your data prior to analysis (particularly if the datasets are large).
Have you any of these nodes in development?
I am not quite sure what you are looking for - we have a CSV reader (just have the File Reader guess the right column seperator) and extracting new columns from an existing cell (sort of subcolumns inside the CSV file) can be easily achieved using e.g. the Java Snippet node. What else would you want to do?
This is exactly what I want to do, unfortunately my java skills are nil, I am at the moment trying to learn python, but this is where the text mining functions come into play. It would be great, if you take a file reader --> filter columns to extract CSV value row, --> text mining node (to split this futher into seperated cells forming a new column/table that can be combined with another column to form a new table or appended to an exitising table.
At the moment, I am just exploring the possibilities that KNIME has to offer, but the text manipulation/filter node functions would be a big asset.
That sounds more like text-manupulation than real "mining" to me - I guess an example of how to split a string into pieces using the Java Snippet node may come in handy here.
Something along those lines:
String result = "";
int firstOccurence = $class2$.indexOf('\'');
int secondOccurence = $class2$.indexOf('\'', firstOccurence+1);
if ((firstOccurence >= 0) && (secondOccurence>firstOccurence))
result = $class2$.substring(firstOccurence+1, secondOccurence);
Will extract the fragment between the first two single quotes in the original
String cell (called class2 in this example).
The next version of KNIME (1.3) will very likely(*) be based on java6 and
make use of the Mustang scripting support. We should(*) then also be able
to offer nodes for Python and Pearl scripts.
(*) bare any nasty surprises when moving KNIME to java6...