Tutorials on training the Skip-thoughts vectors for features extraction of sentence.

简介: Tutorials on training the Skip-thoughts vectors for features extraction of sentence.    1. Send emails and download the training dataset.

Tutorials on training the Skip-thoughts vectors for features extraction of sentence. 

 

  1. Send emails and download the training dataset. 

    the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb 

    first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files: 

    

    unzip these files in the current folders. 

  2. Open and download the tensorflow version code.   

    Do as the following links: https://github.com/tensorflow/models/tree/master/skip_thoughts 

    Then, you will see the processing as follows: 

          

    [Attention]  when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations. 

 

  3. Encoding Sentences :   

    (1). First, open a terminal and input "ipython" : 

    (2). input the following code to the terminal: 

ipython  # Launch iPython.

In [0]:

# Imports.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os.path
import scipy.spatial.distance as sd
from skip_thoughts import configuration
from skip_thoughts import encoder_manager

In [1]:
# Set paths to the model.
VOCAB_FILE = "/path/to/vocab.txt"
EMBEDDING_MATRIX_FILE = "/path/to/embeddings.npy"
CHECKPOINT_PATH = "/path/to/model.ckpt-9999"
# The following directory should contain files rt-polarity.neg and
# rt-polarity.pos.

 

    For this moment, you already defined the environment, then, you need also do the followings: 

In [2]:
# Set up the encoder. Here we are using a single unidirectional model.
# To use a bidirectional model as well, call load_model() again with
# configuration.model_config(bidirectional_encoder=True) and paths to the
# bidirectional model's files. The encoder will use the concatenation of
# all loaded models.
encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(),
                   vocabulary_file=VOCAB_FILE,
                   embedding_matrix_file=EMBEDDING_MATRIX_FILE,
                   checkpoint_path=CHECKPOINT_PATH)

In [3]:
# Load the movie review dataset.
data = [' This is my first attempt  to the tensorflow version skip_thought_vectors ... ']

 

  The, it's time to get the 2400# features now. 

In [4]:
# Generate Skip-Thought Vectors for each sentence in the dataset.
encodings = encoder.encode(data)
print(encodings)
print(encodings[0]) 

  

  You can see the results of the algorithm as followings: 

  

 

  Now that, you have obtain the features of the input sentence. you can now load your texts to obtain the results. Come on ... 

 

相关文章
|
7月前
|
机器学习/深度学习 自然语言处理 TensorFlow
Next Sentence Prediction,NSP
Next Sentence Prediction(NSP) 是一种用于自然语言处理 (NLP) 的预测技术。
167 2
|
4天前
|
算法 TensorFlow 算法框架/工具
[FastText in Word Representations]论文实现:Enriching Word Vectors with Subword Information*
[FastText in Word Representations]论文实现:Enriching Word Vectors with Subword Information*
17 2
|
4天前
|
机器学习/深度学习 自然语言处理 ice
[GloVe]论文实现:GloVe: Global Vectors for Word Representation*
[GloVe]论文实现:GloVe: Global Vectors for Word Representation*
17 2
[GloVe]论文实现:GloVe: Global Vectors for Word Representation*
|
PyTorch 算法框架/工具
Pytorch中Trying to backward through the graph和one of the variables needed for gradient错误解决方案
Pytorch中Trying to backward through the graph和one of the variables needed for gradient错误解决方案
1496 0
Pytorch中Trying to backward through the graph和one of the variables needed for gradient错误解决方案
|
9月前
|
机器学习/深度学习 人工智能 自然语言处理
OneIE:A Joint Neural Model for Information Extraction with Global Features论文解读
大多数现有的用于信息抽取(IE)的联合神经网络模型使用局部任务特定的分类器来预测单个实例(例如,触发词,关系)的标签,而不管它们之间的交互。
113 0
|
9月前
|
自然语言处理 算法
SIFRank New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
在社交媒体上,面临着大量的知识和信息,一个有效的关键词抽取算法可以广泛地被应用的信息检索和自然语言处理中。传统的关键词抽取算法很难使用外部的知识信息。
97 0
SIFRank New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
|
9月前
|
自然语言处理 算法 vr&ar
X-GEAR:Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
我们提出了一项利用多语言预训练生成语言模型进行零样本跨语言事件论元抽取(EAE)的研究。通过将EAE定义为语言生成任务,我们的方法有效地编码事件结构并捕获论元之间的依赖关系。
79 0
|
7月前
|
自然语言处理 数据挖掘 数据处理
【提示学习】Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
目前流行的第四大范式Prompt的主流思路是PVP,即Pattern-Verbalizer-Pair,主打的就是Pattern(模板)与Verbalizer(标签映射器)。   本文基于PVP,提出PET与iPET,但是关注点在利用半监督扩充自己的数据集,让最终模型学习很多样本,从而达到好效果。
|
9月前
|
存储 自然语言处理 测试技术
LASS: Joint Language Semantic and Structure Embedding for Knowledge Graph Completion 论文解读
补全知识三元组的任务具有广泛的下游应用。结构信息和语义信息在知识图补全中都起着重要作用。与以往依赖知识图谱的结构或语义的方法不同
71 0
|
9月前
|
机器学习/深度学习 自然语言处理 算法
Retrieval-Augmented Generative Question Answering for Event Argument Extraction论元解读
长期以来,事件论元抽取一直被研究为基于抽取的方法的序列预测问题,孤立地处理每个论元。尽管最近的工作提出了基于生成的方法来捕获交叉论元依赖性,但它们需要生成和后处理复杂的目标序列(模板)。
110 0