Hi all
I'm hoping KNIME can help with text processing that I'm currently having to use VBA for.
I have a list of 100,000 URLs which contain values for search terms in the following format:
search?parameter1=value1¶meter2=value2¶meter3=value3
There is little consistancy overall with the URLs, some could contain 1 parameter, others 10, the value length can vary or be blank, etc.
The constants are:
"?" before the parameters start.
"&" before and after each combination of parameter and associated value (if present)
"=" seperating parameter and value (if present)
Can Knime help? If so can someone point me in the right direction?
Thanks
You could use the Java Snippet node, and use the Apache URLEncodedUtils class to do the parsing. You would need to load the JAR file containing the library in the "Additional Libraries" tab.
References: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/utils/URLEncodedUtils.html
Great, thanks I'll take a look into it.
Well I've looked and I'm guessing I need to know something about Java programming right? Its not a language I've used before so any pointers would be appreciated.
My URL strings are contained in one column is a CSV which has been read in currently.
rss
January 31, 2014, 7:30am
5
Hi! It is a trivial task for scripting languages. E.g. in Ruby:
require 'uri'
require 'cgi'
uri = URI("search?parameter1=value1¶meter2=value2¶meter3=value3")
params = CGI::parse(uri.query)
Result is a hash: {"parameter1"=>["value1"], "parameter2"=>["value2"], "parameter3"=>["value3"]}
It is possible to use either snippet or script node. See code examples at http://tech.knime.org/forum/scripting-integrations/ruby-scripting#comment-31047
http://www.ruby-doc.org/stdlib-2.0.0/libdoc/uri/rdoc/URI.html
http://www.ruby-doc.org/stdlib-2.0.0/libdoc/cgi/rdoc/CGI.html#method-c-parse
Bobblank,
If you have the logic in VBA it should be easy to replicate in knime. Have a look at nodes like csv reader, cell splliter, column and row functions, groupby and string manipulation nodes.
Between then it should not be hard to encode your previous logic in knime.
Hi InsilicoConsulting
Where I'm struggling is getting the info into a useful list or table. Because the search URL changes in what it contains, the order of variables can be mixed up. The output is looking like either of the following two tables once transformed:
URL_No Col1 Col2 Col3
1 Search_Var1=Value Search_Var2=Value Search_Var3=Value
2 Search_Var3=Value
3 Search_Var1=Value Search_Var3=ValValue
OR
URL_No Col1
1 [Search_Var1=Value,Search_Var2=Value,Search_Var3=Value]
What I need is either:
RowNo URL_No Col1
Row1 1 Search_Var1=Value
Row1 1 Search_Var2=Value
Row1 1 Search_Var3=Value
Row2 1 Search_Var3=Value
OR
RowNo URL_No Search_Var1 Search_Var2 Search_Var3
1 1 Value Value Value
2 2 Value
If that makes sense??!
Thanks
Rob
Hi Insilico
Forgive me if we get a double post but my last one hasn't appeared so here goes again!
Where I'm struggling is getting the data from a list of paired parameter and value to a useable table.
I can split everything out so that I either have one column with a list of all the pairs in it
e.g
RowID
Column1
Row1
[parameter1=value1, parameter2=value2, parameter4=value4]
Row2
[parameter1=value5, parameter3=value3]
Or the pairs in seperate columns
e.g.
RowID
Column1
Column2
Column3
Column4
Row1
parameter1=value1
parameter2=value2
parameter4=value4
Row2
parameter1=value5
parameter3=value3
What I can't get to is a table with columns for each parameter and the value for the parameter if present in each row
RowID
parameter1
parameter2
parameter3
parameter4
Row1
value1
value2
value4
Row2
value5
value3
Hope that makes sense?
Rob
rss
February 5, 2014, 7:46am
9
Hi!
This is a simple RubyScript that parses URI column (with index 0) and maps parameters into separate columns:
require 'uri'
require 'cgi'
PARAMS = [
"parameter1",
"parameter2",
"parameter3",
"parameter4",
"parameter5"
] $inData0.each do |row|
parsed_params = CGI::parse( URI(row[0] .to_s).query )
$outContainer << PARAMS.reduce(Cells.new) do |cells, param|
cells.string (parsed_params[param].join)
end
end
The number of columns of output DataTable must be equal to number of strings in PARAMS.
Hi rss
Which node/s does that need?
Rob
rss
February 5, 2014, 6:03pm
11
Non official ruby4knime: https://github.com/rssdev10/ruby4knime
See installation section for the binary.
Apologies for going to old thread, but this covers exactly what i need to do, and got excited. However, i am having issues with installing the ruby4knime extension. Ive tried adding it to dropins and plugins directories under knime, but still i cant install.
If anyone could help that would be great.
Cheers,
system
Closed
June 2, 2023, 9:48pm
13
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.