@@ -7,7 +7,7 @@ This repository contains an R package allowing to build `Paragraph Vector` model
77- The package allows one
88 - to train paragraph embeddings (also known as document embeddings) on character data or data in a text file
99 - use the embeddings to find similar documents, paragraphs, sentences or words
10- - Note. For getting word vectors in R: look at package https://github.com/bnosac/word2vec
10+ - Note. For getting word vectors in R: look at package https://github.com/bnosac/word2vec , details [ here ] ( https://www.bnosac.be/index.php/blog/100-word2vec-in-r ) , for Starspace embeddings: look at package https://github.com/bnosac/ruimtehol , details [ here ] ( https://cran.r-project.org/web/packages/ruimtehol/vignettes/ground-control-to-ruimtehol.pdf )
1111
1212## Installation
1313
@@ -61,7 +61,7 @@ str(model)
6161## List of 3
6262## $ model :<externalptr>
6363## $ data :List of 4
64- ## ..$ file : chr "C:\\Users\\Jan\\AppData\\Local\\Temp\\Rtmpk9Npjg\\textspace_1c4458cb6943 .txt"
64+ ## ..$ file : chr "C:\\Users\\Jan\\AppData\\Local\\Temp\\Rtmpk9Npjg\\textspace_1c4432666686 .txt"
6565## ..$ n : num 170469
6666## ..$ n_vocabulary: num 3867
6767## ..$ n_docs : num 1000
@@ -117,10 +117,10 @@ embedding[, 1:4]
117117```
118118
119119```
120- ## [,1] [,2] [,3] [,4]
121- ## doc_1 0.08172660 -0.03679979 0.05726605 -0.06496991
122- ## doc_10 0.13976580 0.10821507 -0.06986591 -0.05825572
123- ## doc_3 0.09486584 -0.07999156 0.03448128 0.02999697
120+ ## [,1] [,2] [,3] [,4]
121+ ## doc_1 0.038523957 -0.14341952 -0.06087392 -0.01625664
122+ ## doc_10 0.003298676 -0.04789201 0.06048679 -0.14829759
123+ ## doc_3 0.030986091 0.08946659 0.02453904 -0.01900235
124124```
125125
126126- Get similar documents or words when providing sentences, documents or words
134134```
135135## [[1]]
136136## term1 term2 similarity rank
137- ## 1 proximus telefoontoestellen 0.5571629 1
138- ## 2 proximus belfius 0.4994604 2
139- ## 3 proximus toenmalige 0.4873388 3
140- ## 4 proximus internetverbinding 0.4730936 4
141- ## 5 proximus gefactureerd 0.4568973 5
137+ ## 1 proximus telefoontoestellen 0.5364115 1
138+ ## 2 proximus belfius 0.5292925 2
139+ ## 3 proximus internetverbinding 0.5140554 3
140+ ## 4 proximus ceo 0.4961080 4
141+ ## 5 proximus fusie 0.4803250 5
142142##
143143## [[2]]
144- ## term1 term2 similarity rank
145- ## 1 koning grondwet 0.5572801 1
146- ## 2 koning verplaatsingen 0.5373006 2
147- ## 3 koning ministerie 0.5140343 3
148- ## 4 koning familie 0.4943074 4
149- ## 5 koning vereiste 0.4715540 5
144+ ## term1 term2 similarity rank
145+ ## 1 koning ministerie 0.5567209 1
146+ ## 2 koning verplaatsingen 0.5317563 2
147+ ## 3 koning grondwet 0.5118545 3
148+ ## 4 koning gedragen 0.4884593 4
149+ ## 5 koning verantwoordelijk 0.4788159 5
150150```
151151
152152``` r
157157```
158158## [[1]]
159159## term1 term2 similarity rank
160- ## 1 proximus doc_105 0.6922343 1
161- ## 2 proximus doc_863 0.5826316 2
162- ## 3 proximus doc_186 0.5146015 3
163- ## 4 proximus doc_862 0.5051525 4
164- ## 5 proximus doc_746 0.4467830 5
160+ ## 1 proximus doc_105 0.7080573 1
161+ ## 2 proximus doc_863 0.6275553 2
162+ ## 3 proximus doc_186 0.5301130 3
163+ ## 4 proximus doc_862 0.4656175 4
164+ ## 5 proximus doc_620 0.4396312 5
165165##
166166## [[2]]
167167## term1 term2 similarity rank
168- ## 1 koning doc_44 0.6228581 1
169- ## 2 koning doc_583 0.5643232 2
170- ## 3 koning doc_45 0.5535781 3
171- ## 4 koning doc_797 0.4408725 4
172- ## 5 koning doc_943 0.4039679 5
168+ ## 1 koning doc_44 0.6395732 1
169+ ## 2 koning doc_583 0.5574296 2
170+ ## 3 koning doc_45 0.5361990 3
171+ ## 4 koning doc_943 0.4225507 4
172+ ## 5 koning doc_797 0.4086391 5
173173```
174174
175175``` r
180180```
181181## [[1]]
182182## term1 term2 similarity rank
183- ## 1 doc_198 doc_343 0.4893735 1
184- ## 2 doc_198 doc_569 0.4858374 2
185- ## 3 doc_198 doc_358 0.4831750 3
186- ## 4 doc_198 doc_498 0.4766597 4
187- ## 5 doc_198 doc_983 0.4761481 5
183+ ## 1 doc_198 doc_343 0.4947847 1
184+ ## 2 doc_198 doc_899 0.4893836 2
185+ ## 3 doc_198 doc_923 0.4850165 3
186+ ## 4 doc_198 doc_708 0.4697377 4
187+ ## 5 doc_198 doc_642 0.4622465 5
188188##
189189## [[2]]
190190## term1 term2 similarity rank
191- ## 1 doc_285 doc_319 0.5304061 1
192- ## 2 doc_285 doc_286 0.5205777 2
193- ## 3 doc_285 doc_76 0.5086077 3
194- ## 4 doc_285 doc_74 0.4975725 4
195- ## 5 doc_285 doc_537 0.4802507 5
191+ ## 1 doc_285 doc_286 0.5537772 1
192+ ## 2 doc_285 doc_319 0.5478524 2
193+ ## 3 doc_285 doc_874 0.5095125 3
194+ ## 4 doc_285 doc_113 0.4878533 4
195+ ## 5 doc_285 doc_76 0.4863345 5
196196```
197197
198198``` r
206206```
207207## $sent1
208208## term1 term2 similarity rank
209- ## 1 sent1 doc_740 0.4637638 1
210- ## 2 sent1 doc_742 0.4621139 2
211- ## 3 sent1 doc_206 0.4315273 3
212- ## 4 sent1 doc_825 0.4221503 4
213- ## 5 sent1 doc_151 0.4183135 5
209+ ## 1 sent1 doc_742 0.4385398 1
210+ ## 2 sent1 doc_776 0.4269895 2
211+ ## 3 sent1 doc_740 0.4247892 3
212+ ## 4 sent1 doc_206 0.4162723 4
213+ ## 5 sent1 doc_509 0.4153925 5
214214##
215215## $sent2
216216## term1 term2 similarity rank
217- ## 1 sent2 doc_105 0.5789919 1
218- ## 2 sent2 doc_186 0.4938067 2
219- ## 3 sent2 doc_862 0.4848365 3
220- ## 4 sent2 doc_863 0.4685720 4
221- ## 5 sent2 doc_620 0.4497271 5
217+ ## 1 sent2 doc_105 0.5738307 1
218+ ## 2 sent2 doc_863 0.5229421 2
219+ ## 3 sent2 doc_862 0.4981593 3
220+ ## 4 sent2 doc_186 0.4873295 4
221+ ## 5 sent2 doc_18 0.4671208 5
222222```
223223
224224``` r
0 commit comments