Generating social network data through Knime

Hello,

I intend to use Knime to generate social network data, considering the format “link list” used by Visone software (www.visone.info). This format is based in two first columns with identification of actors that have relation (e.g. project members). The additional columns show relational attributes (e.g. project identification). So, the relation between two members of a project is represented in a table as following:

Header:     [id]    [id]     [project_title]

Row1:         m1    m2               p1

Where: m1 have relation with m2 with direction ->, once the arrow is directed from first column actor to second column actor (m1->m2).

My question: How to handle data in Knime to achieve this “link list” structure? Follows bellow a data set example, with two different situations:

------------------------------------------

header:     [member1] [member2] [member3] [project_title]

row1:               m1                -                     -                    p1

row2:               m1                m2                m3                  p2        

------------------------------------------

Situation 1 is represented in data set “row1”. There is just a member (m1) in project 1 (p1). From this situation, the data after be handled in Knime should be arranged as following:

header:     [id1] [id2] [project_title]

row1:          m1    m1           p1

When just one actor appears, he needs to be duplicated in second column.  

------------------------------------------

Situation 2 is represented in data set “row2”. There are three members (m1, m2, m3) in project 2 (p2). From this situation, the data after be handled in Knime should be arranged as following:

header:     [id1] [id2] [project_title]

row1:          m1    m2           p2

row2:          m1    m3           p2

row3:          m2    m3           p2

When three or more actors are members of the same project, they need to be combined. So, one row of the data set (row 2) was transformed in three rows. The “project_title” needs just to be duplicated.

If the number of actors was 4, one row would be transformed in six rows, considering all possibilities of combination among them (and so on considering 5, 6 or more actors):

header:     [id1] [id2] [project_title]

row1:          m1    m2           p3

row2:          m1    m3           p3

row3:          m1    m4           p3

row4:          m2    m3           p3

row5:          m2    m4           p3

row6:          m3    m4           p3

------------------------------------------

Thanks indeed in advance for any information about how to generate this kind of network data (link list) using Knime.

 Cadu

Hello Cadu,

attached you can find a workflow that transforms your input data into the desired output format.

You might be also interested in the new plugin for network creation and mining within KNIME. We plan to release it with the next KNIME release and it will be available on the KNIME labs site (http://tech.knime.org/knime-labs). In The plugin will also provide nodes to send and receive networks to/from visone. So check out the KNIME labs site after the next KNIME release.

Bye,

Tobias

Hi Tobias,

This workflow is amazing!!! Very thanks for that. I will study it in more detail…

About the new plugin for network, will it handle data in similar way as the workflow you sent me? In relation to plugin BisoNet, will the new plugin have more powerful tools?

All the best,

Cadu

Hi Cadu,

the network plugin is an offspring of the BisoNet plugin that contains most of the nodes from the BisoNet plugin. The first release is more of an maintanance release that will have a better documentation of the nodes and the API. It will thus contain only a few new nodes such as the network viewer to view networks within KNIME. But we are always open for feature requests. If you miss some funtionality just let me know.

Up to now it wont handle such data. Is it a very common type of data? I would assume to have such a kind of data more formated like the following since the number of members varies in each project:

header:    [member]    [project id]
row1        m1            p1
row2        m1            p2
row3        m2            p2
row4        m3            p2
row5        m4            p2

For such formated data you could create a hypergraph with the project id as edge id and the members as nodes. But this would of cause result in a diffrent graph structure.

Bye,

Tobias

Hi Tobias,

Thanks about your concern and help to deal with network data through Knime!

Bellow, I am describing a complement to the workflow you sent and other network data arrangements:

Related to item “b”, I attached a workflow complement with “CSV writer” node (socialnetwork_csv.zip), considering this node output as the format handled by Visone. Also, I added a new edge attribute (year) to the dataset (socialnetwork_dataset). I realized an error at the node “rule engine”, which output is showing just “false” values.

In item “c” is illustrated a kind of two-mode network, that could be arranged through Knime as well.

Related to item “d”, I think a very useful workflow to many people could come from RIS format, rearranged by Knime to a link list file handled by Visone.

I would appreciate very much your additional help to deal with this.

Bye,

Cadu

------------------------------------

a) About this type of data, I think this is relatively common considering a couple of aspects:

- A lot of data handled by people without database knowledge is arranged in excel sheets in just one row (as the examples above). After that they want to extract network data and the marvelous Knime is here to help!!!

- Google Scholar provides authors data arranged inside just one cell (S Shepler, M Eisler, D Robinson, B Callaghan…). So, first it is necessary to split them and after that to proceed as in the workflow you attached in this post. Anyway, since some time ago Google doesn’t provide all paper authors when the paper has too many authors (they just insert “…” after some author and this indicates there are more authors yet. I think this is a problem Google have caused to fast network-bibliometric studies!). The other way to deal with Google data is through exportation to reference management software (e.g. Zotero, EndNote), as approached in item d (bellow).

b) Besides the simple data I showed before, some other rules might to be applied, considering whether the graph is undirected or directed (digraph). In the example I gave, the graph is undirected (all actors have reciprocal relation). Although in network terms the workflow output is directed (e.g. m1->m2; m1->m3; m2->m3), inside the network software (I use Visone, a very good software as well) it is possible to transform these directed relations in undirected (e.g. m1--m2; m1--m3; m2--m3). Otherwise, it could be necessary to duplicate and change positions in the workflow’s last node output to achieve reciprocal relation (e.g. m1->m2; m2->m1; m1->m3;m3->m1; m2->m3; m3->m2).

Besides just one edge attribute (e.g. project_title), real networks will have others (e.g. tie strength, tie type, year). In academic networks, these edges attributes could be: journal, journal impact factor, citation, year etc. Anyway, these edge features seems to be just an extension of the workflow we are talking about. Considering this workflow, I added a last “CSV Writer” node to provide the file format to be opened through Visone (workflow socialnetwork_csv.zip is attached). Just to clarify, link list format (handled by Visone) deals with the two first columns named equal (id, id), not (id1, id2). So, it will be necessary replace the header in CSV file, once Knime only provides columns with different names. Also, I added in this workflow a column named “year”. Some problem is happening in “rule engine”, because the output shows just “false” values. At the node “column resorter” output, just m1 should have a reflexive relation (loop m1->m1) related to project 1 (p1) – follow attached the “socialnetwork_dataset” I am using.

c) To a directed network (digraph) other rules needs to be applied. For instance, let’s consider that there is a project leader, and we are dealing with a kind of two-mode network. What is important is to identify relations between leaders and members (each project has a leader and one member could be tied to more than one project, but it isn’t necessary to know about relations among members). The data set is arranged as follow:

header:     [leader] [member1] [member2] [member3] [project_title]

row1:               l1         m1                  -                   -                    p1

row2:               l2         m1                m2                 -                    p2 

row2:               l3         m2                m3                m4                  p3 

From this situation, the data after be handled in Knime should be arranged as following:

header:     [id1] [id2] [project_title]

row1:           l1      m1           p1

row2:           l2      m1           p2

row3:           l2      m2           p2

row4:           l3      m2           p3

row5:           l3      m3           p3

row6:           l3      m4           p3

d) Other data type very common is RIS (Research Information Systems) format. It is very useful to bibliometrics and to build networks from them (follow attached an example and the documentation about this data type). Web of Science, Jstor, Google Scholar etc. provide RIS format, in the same way that any reference management software (e.g. Zotero, EndNote). I think a Knime node to rearranged RIS format in link list format handled by Visone would be useful to many people. Besides, if it were possible to connect a “Visone output connector” to the “Knime network RIS node” and after that to load direct the network in Visone (without to be necessary to generate a CSV file), this function would be amazing! An example of just one RIS register, related to the paper (TI- Networks and meaning: styles and switchings), is illustrated bellow. But the RIS data to generate network is handled easier when saved in a TXT file with a lot of this kind of register together (as in RIS_Format_Example attached).

TY  - JOUR

AU  - White, H.

AU  - Fuhse, J.

AU  - Thiemann, M.

AU  - Buchholz, L.

N1  - importante

PY  - 2007

SP  - 514-526

ST  - Networks and meaning: styles and switchings

T2  - Soziale Systeme

TI  - Networks and meaning: styles and switchings

VL  - 13

ID  - 1271

From RIS format, the data after be handled in Knime should be arranged as following:

header:   [id1]            [id2]                              [TI]                          [PY]      [TY, T2 etc.]

row1:  White, H.       Fuhse, J.          Networks and meaning…      2007      others

row2: White, H.        Thiemann, M.   Networks and meaning…      2007      others

row3: White, H.        Buchholz, L.     Networks and meaning…      2007      others

row4: Fuhse, J.        Thiemann, M.    Networks and meaning…      2007      others

row5: Fuhse, J.        Buchholz, L.      Networks and meaning…      2007      others

row6: Thiemann, M. Buchholz, L.      Networks and meaning…      2007      others

Hello Cadu,

a) This type of data is very common (e.g. co-authorship) but I wouldn't create a network structure like you propose from it. In the case of co-authorship I would create a bipartite graph with the document as the first node partition and the authors as second partition. Every author is then connected to each of his documents.

b) The network plugin allows the conversion from directed networks to undirected networks simply by filtering the is_target feature using the Feature Filter node.

In order to add additional features to network objects (nodes, edges, the network itself) you can use the Feature Inserter node.

In your attached example workflow you need to change the join column in the Joiner node from Rowid to project for both tables since this is the identifier per project.

c) This could be achieved using the Unpivoting node with the member columns as pivoting columns and the project_title as retained columns.

d)See attached workflow as an idea on how to process RIS data in KNIME.

Hello Tobias,

Very thanks for your valuable assistance! I am goint to study the workflow and other information you sent.

About network plugin, I still working with BisoNet (2.2.2). Is the new improved network plugin available? What will be its name?   

Bye,

Cadu

Hello Cadu,

the plan is to release the network plugin with the next KNIME version comming in December. It will be available via the http://tech.knime.org/knime-labs page. The name is not final but I guess it will be simply network plugin.

Bye,

Tobias

Helo Tobias,

Could I consider this "network" plugin as an substitute to BisoNet? Or will each one have its own application?

Best,

Cadu

Hello Cadu,

the network plugin will replace the BisoNet plugin. It will contain most of the nodes from the BisoNet plugin and some new once such as a network viewer within KNIME. We have also revised the node description and icons to improve the usability. The two plugins can be installed side by side without interferences. However, the port objects are not compatible. In order to read a BisoNet with a Network node you have to write the network to a beef /or beef.zip file using the network writer from the BisoNet plugin and read it with the network reader from the Network plugin.

Bye,

Tobias

TKS for this!!