Retrieving search parameters & values from URL

Hi all

I'm hoping KNIME can help with text processing that I'm currently having to use VBA for.

I have a list of 100,000 URLs which contain values for search terms in the following format:

search?parameter1=value1&parameter2=value2&parameter3=value3

There is little consistancy overall with the URLs, some could contain 1 parameter, others 10, the value length can vary or be blank, etc. 

The constants are:

  • "?" before the parameters start. 
  • "&" before and after each combination of parameter and associated value (if present)
  • "=" seperating parameter and value (if present)

Can Knime help? If so can someone point me in the right direction?

Thanks

You could use the Java Snippet node, and use the Apache URLEncodedUtils class to do the parsing. You would need to load the JAR file containing the library in the "Additional Libraries" tab.

References: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/utils/URLEncodedUtils.html

Great, thanks I'll take a look into it.

Well I've looked and I'm guessing I need to know something about Java programming right? Its not a language I've used before so any pointers would be appreciated.

My URL strings are contained in one column is a CSV which has been read in currently.

Hi! It is a trivial task for scripting languages. E.g. in Ruby:

require 'uri'
require 'cgi'

uri = URI("search?parameter1=value1&parameter2=value2&parameter3=value3")
params = CGI::parse(uri.query)

Result is a hash: {"parameter1"=>["value1"], "parameter2"=>["value2"], "parameter3"=>["value3"]}

It is possible to use either snippet or script node. See code examples at http://tech.knime.org/forum/scripting-integrations/ruby-scripting#comment-31047

http://www.ruby-doc.org/stdlib-2.0.0/libdoc/uri/rdoc/URI.html

http://www.ruby-doc.org/stdlib-2.0.0/libdoc/cgi/rdoc/CGI.html#method-c-parse

Bobblank,

 

If you have the logic in VBA it should be easy to replicate in knime. Have a look at nodes like csv reader, cell splliter, column and row functions, groupby and string manipulation nodes.

 

Between then it should not be hard to encode your previous logic in knime.

Hi InsilicoConsulting

Where I'm struggling is getting the info into a useful list or table. Because the search URL changes in what it contains, the order of variables can be mixed up. The output is looking like either of the following two tables once transformed:

URL_No      Col1                            Col2                            Col3

1                 Search_Var1=Value    Search_Var2=Value    Search_Var3=Value

2                 Search_Var3=Value    

3                 Search_Var1=Value   Search_Var3=ValValue 

OR

URL_No       Col1                            

1                [Search_Var1=Value,Search_Var2=Value,Search_Var3=Value]

What I need is either:

RowNo      URL_No       Col1                            

Row1             1             Search_Var1=Value

Row1             1             Search_Var2=Value

Row1             1             Search_Var3=Value

Row2             1             Search_Var3=Value

OR

RowNo      URL_No       Search_Var1      Search_Var2      Search_Var3

1                   1                   Value                 Value                   Value

2                   2                                                                         Value

 

If that makes sense??!

Thanks

 

Rob 

 

Hi Insilico

Forgive me if we get a double post but my last one hasn't appeared so here goes again!

Where I'm struggling is getting the data from a list of paired parameter and value to a useable table.

I can split everything out so that I either have one column with a list of all the pairs in it

e.g

RowID Column1
Row1 [parameter1=value1, parameter2=value2, parameter4=value4]
Row2 [parameter1=value5, parameter3=value3]

Or the pairs in seperate columns

e.g.

RowID Column1 Column2 Column3 Column4
Row1 parameter1=value1 parameter2=value2 parameter4=value4  
Row2 parameter1=value5 parameter3=value3    

What I can't get to is a table with columns for each parameter and the value for the parameter if present in each row

RowID parameter1 parameter2 parameter3 parameter4
Row1 value1 value2   value4
Row2 value5   value3  

Hope that makes sense?

Rob

Hi! 

This is a simple RubyScript that parses URI column (with index 0) and maps parameters into separate columns:

require 'uri'
require 'cgi'

PARAMS = [
    "parameter1",                                                                  
    "parameter2",                                                                  
    "parameter3",                                                                  
    "parameter4",                                                                  
    "parameter5"                                                                   
]                                                                                  $inData0.each do |row|                                                             
    parsed_params = CGI::parse( URI(row[0].to_s).query )                           
    $outContainer << PARAMS.reduce(Cells.new) do |cells, param|                    
      cells.string (parsed_params[param].join)                               
    end
end

The number of columns of output DataTable must be equal to number of strings in PARAMS.

 

Hi rss

Which node/s does that need?

 

Rob

Non official ruby4knime: https://github.com/rssdev10/ruby4knime

See installation section for the binary.

Apologies for going to old thread, but this covers exactly what i need to do, and got excited. However, i am having issues with installing the ruby4knime extension. Ive tried adding it to dropins and plugins directories under knime, but still i cant install. 

If anyone could help that would be great.

 

Cheers,