Hello, I am stuck on a rather "simple issue". We are loading in a 2 GB Datafile. It is in multi line format:
product/productId: B00006HAXW product/title: Rock Rhythm & Doo Wop: Greatest Early Rock product/price: unknown review/userId: A1RSDE90N6RSZF review/profileName: Joseph M. Kotow review/helpfulness: 9/9 review/score: 5.0 review/time: 1042502400 review/summary: Pittsburgh - Home of the OLDIES review/text: I have all of the doo wop DVD's and this one is as good or
I am parsing it usually with Python:
pyOut = {} for str in kIn['Col0']: tokens = str.split(':', 1) if len(tokens): if tokens[0] in pyOut: column = pyOut[tokens[0]] else: column = [] column.append(tokens[1].strip()) pyOut.update({tokens[0] : column})
And it works fine, but when I am trying to load larger sizes ~32GB. I get this error:
ERROR Python Snippet 0:325:5 Execute failed: java.lang.RuntimeException: java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
I would rather just skip python.. given the above data, any recomendations for a native node to transform it?
Thank you