I used the weblog reader under knime to parse the file with the following structure
%t , %h , %b , %0, "%r", %>s , %0, %0, %0, %0
But knime tell me that the first line does not match the pattern ! the problem is that i'm not sure if the weblog is from an appache server ?! i have a file using separations using a comma (like a CSV) but the knime node is unbale to parse it !
The format doesn't look like an Apache log file. You cannot read it with the weblog reader in this case. You can try to use the normal File Reader instead.
i used the file reader but the problem is that i want an automatic parser because i have many records for the same url request where some records contain .js .jpeg .html etc.... the problem is that i have to filter different elements that constitute a visited webpage ! and sometimes it's hard to know wich element is a part of a page or not, this can give make fake statitical results ! a parser node or algorithm generally can help filter and group results by user or request !
Is there otherwise a way or node to convert "non appache" to "appache log format" ?
The WebLog reader does not perform any automatic grouping of requests either. However, if you can read it with a File Reader you can do the pre-processing with other nodes, such as String Manipulation or GroupBy.
The difficulty is to parse multiple lines and to know what to select ! i have seen the text nodes but i still wonder if there is a clean and sure way to parse data and be sure of wish hits should be counted ! for example having rules like including only html + htm + pdf + php to be counted ! this is the hard part !
Anyway, i will continue searching ! thanks again !