Skip to content

Ported to the latest gensim, line model dimentionality fixed, output format extended#9

Closed
luav wants to merge 9 commits into
GTmac:masterfrom
eXascaleInfolab:master
Closed

Ported to the latest gensim, line model dimentionality fixed, output format extended#9
luav wants to merge 9 commits into
GTmac:masterfrom
eXascaleInfolab:master

Conversation

@luav

@luav luav commented Aug 4, 2019

Copy link
Copy Markdown

Fix for #8 (ported to the latest gensim), output extended with the .mat format

@luav luav changed the title Ported to the latest gensim, output extended with the .mat format Ported to the latest gensim, line model dimentionality fixed, output format extended Aug 4, 2019

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the effort!

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! I actually feel it would be better to set the default number of workers to a smaller value (for example, gensim word2vec uses 3: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py#L660). Sometimes it is not desirable to use up all the CPUs, especially when you are running a model on a shared server. Let me know your thoughts :-)

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding this!

Comment thread src/baseline.py Outdated
@@ -1,7 +1,10 @@
from gensim.models import Word2Vec
from gensim.models.word2vec import Word2Vec

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you remove that extra space? Thanks!

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@GTmac GTmac left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is touching the model part, so have you tested this change in terms of classification performance? Do you still get similar classification F1 score compared to the numbers in the paper / README file? Thanks!

@luav

luav commented Aug 9, 2019

Copy link
Copy Markdown
Author

set the default number of workers to a smaller value (for example, gensim word2vec uses 3

Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.

This commit is touching the model part, so have you tested this change in terms of classification

I have not changed any model parameters or internals except adaptation to the updated gensim API. I did perform training and evaluation of the harp + deepwalk / line embeddings on my datasets and they look fine.

@luav

luav commented Aug 9, 2019

Copy link
Copy Markdown
Author

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension.

@luav

luav commented Aug 9, 2019

Copy link
Copy Markdown
Author

This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!

In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension.

I just verified, the original latest Deepwalk lacks support of the numerical walk items to work with HARP, so the specified repository should be used until this Deepwalk pull request is merged.

@luav

luav commented Aug 9, 2019

Copy link
Copy Markdown
Author

set the default number of workers to a smaller value (for example, gensim word2vec uses 3

Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.

Workers number is set to 1 by default and to cpu_num for the specified workers = -1 since
71c490c.

@luav luav closed this Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants