Ported to the latest gensim, line model dimentionality fixed, output format extended by luav · Pull Request #9 · GTmac/HARP

luav · 2019-08-04T17:24:25Z

Fix for #8 (ported to the latest gensim), output extended with the .mat format

GTmac

LGTM. Thanks for the effort!

GTmac

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

GTmac

Thanks for adding this! I actually feel it would be better to set the default number of workers to a smaller value (for example, gensim word2vec uses 3: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py#L660). Sometimes it is not desirable to use up all the CPUs, especially when you are running a model on a shared server. Let me know your thoughts :-)

GTmac

LGTM. Thanks for adding this!

GTmac · 2019-08-08T21:02:54Z

@@ -1,7 +1,10 @@
-from gensim.models import Word2Vec
+from gensim.models.word2vec  import Word2Vec


Nit: could you remove that extra space? Thanks!

GTmac

LGTM. Thanks!

GTmac

LGTM. Thanks!

GTmac

This commit is touching the model part, so have you tested this change in terms of classification performance? Do you still get similar classification F1 score compared to the numbers in the paper / README file? Thanks!

luav · 2019-08-09T04:39:12Z

set the default number of workers to a smaller value (for example, gensim word2vec uses 3

Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.

This commit is touching the model part, so have you tested this change in terms of classification

I have not changed any model parameters or internals except adaptation to the updated gensim API. I did perform training and evaluation of the harp + deepwalk / line embeddings on my datasets and they look fine.

luav · 2019-08-09T04:56:44Z

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension.

…3 style

luav · 2019-08-09T14:18:39Z

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension.

I just verified, the original latest Deepwalk lacks support of the numerical walk items to work with HARP, so the specified repository should be used until this Deepwalk pull request is merged.

luav · 2019-08-09T14:21:08Z

set the default number of workers to a smaller value (for example, gensim word2vec uses 3

Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.

Workers number is set to 1 by default and to cpu_num for the specified workers = -1 since
71c490c.

luav added 2 commits August 4, 2019 17:17

Ported to the latest gensim, output extended with .mat format

93e597d

Representation size fixed for the 'line' model

c693362

luav changed the title ~~Ported to the latest gensim, output extended with the .mat format~~ Ported to the latest gensim, line model dimentionality fixed, output format extended Aug 4, 2019

luav added 5 commits August 5, 2019 06:48

Proted to the latest deepwalk, workers parameter is considered

acc6f87

Temporary walk files cleaned up on completion

591b7fb

Workers described in the readme

87d4173

Deepwalk installation adjusted to HARP described

cb48e68

Case-insensitive extension made

a82bf88

GTmac approved these changes Aug 8, 2019

View reviewed changes

GTmac reviewed Aug 8, 2019

View reviewed changes

GTmac approved these changes Aug 8, 2019

View reviewed changes

GTmac reviewed Aug 8, 2019

View reviewed changes

GTmac approved these changes Aug 8, 2019

View reviewed changes

GTmac reviewed Aug 8, 2019

View reviewed changes

Redundant space removed

5290386

Default workers number set to 1, strings formatting updated to the Py…

71c490c

…3 style

luav closed this Aug 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ported to the latest gensim, line model dimentionality fixed, output format extended#9

Ported to the latest gensim, line model dimentionality fixed, output format extended#9
luav wants to merge 9 commits into
GTmac:masterfrom
eXascaleInfolab:master

luav commented Aug 4, 2019

Uh oh!

GTmac left a comment

Uh oh!

GTmac left a comment

Uh oh!

GTmac left a comment

Uh oh!

GTmac left a comment

Uh oh!

GTmac Aug 8, 2019

Uh oh!

GTmac left a comment

Uh oh!

GTmac left a comment

Uh oh!

GTmac left a comment

Uh oh!

luav commented Aug 9, 2019

Uh oh!

luav commented Aug 9, 2019 •

edited

Loading

Uh oh!

luav commented Aug 9, 2019

Uh oh!

luav commented Aug 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,7 +1,10 @@
		from gensim.models import Word2Vec
		from gensim.models.word2vec import Word2Vec

Conversation

luav commented Aug 4, 2019

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

GTmac Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

GTmac left a comment

Choose a reason for hiding this comment

Uh oh!

luav commented Aug 9, 2019

Uh oh!

luav commented Aug 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luav commented Aug 9, 2019

Uh oh!

luav commented Aug 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luav commented Aug 9, 2019 •

edited

Loading