elasticsearch批量导入数据注意事项-阿里云开发者社区

elasticsearch批量导入数据注意事项

2017-12-16 1949

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介：

刚刚初始化启动kiabna后是没有索引的，当然，如果elasticsearch中导入过数据那么kibana会自动匹配索引

现在按照官方例子开始批量给elasticsearch导入数据

链接如下https://www.elastic.co/guide/en/kibana/6.1/tutorial-load-dataset.html

我们会依次导入如下三块数据

1.The Shakespeare data 莎士比亚文集的数据结构

{

"line_id": INT,

"play_name": "String",

"speech_number": INT,

"line_number": "String",

"speaker": "String",

"text_entry": "String",

}

2.The accounts data 账户数据结构

{

"account_number": INT,

"balance": INT,

"firstname": "String",

"lastname": "String",

"age": INT,

"gender": "M or F",

"address": "String",

"employer": "String",

"email": "String",

"city": "String",

"state": "String"

}

3.The schema for the logs data 日志数据

{

"memory": INT,

"geo.coordinates": "geo_point"

"@timestamp": "date"

}

然后向elasticsearch设置字段映射

Use the following command in a terminal (eg bash) to set up a mapping for the Shakespeare data set:

以下是莎士比亚的字段映射可以用postman或者curl等发出请求~完整的url应该是localhost:9200/shakespear

PUT /shakespeare

{

"mappings": {

"doc": {

"properties": {

"speaker": {"type": "keyword"},

"play_name": {"type": "keyword"},

"line_id": {"type": "integer"},

"speech_number": {"type": "integer"}

}

Use the following commands to establish geo_point mapping for the logs:

这是 logs的字段映射

PUT /logstash-2015.05.18

{

"mappings": {

"log": {

"properties": {

"geo": {

"properties": {

"coordinates": {

"type": "geo_point"

}

PUT /logstash-2015.05.19

{

"mappings": {

"log": {

"properties": {

"geo": {

"properties": {

"coordinates": {

"type": "geo_point"

}

COPY AS CURLVIEW IN CONSOLE

PUT /logstash-2015.05.20

{

"mappings": {

"log": {

"properties": {

"geo": {

"properties": {

"coordinates": {

"type": "geo_point"

}

账户信息没有字段映射。。。

现在批量导入

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl

windows下的curl命令可以到https://curl.haxx.se/download.html#Win64下载，解压后设置环境变量即可

这里要注意的是 @accounts.json，@shakespeare_6.0.json，@logs.json这些文件的位置应该是你所在的当前目录，

如果你当前位置是D盘~那么这些文件位置就要放在D盘下，否则读不到

还有一点~~~windows下要把命令行中的单引号换成双引号，，。。。否则会报

curl: (6) Could not resolve host: application这样的错误

相关实践学习

使用阿里云Elasticsearch体验信息检索加速

通过创建登录阿里云Elasticsearch集群，使用DataWorks将MySQL数据同步至Elasticsearch，体验多条件检索效果，简单展示数据同步和信息检索加速的过程和操作。

ElasticSearch 入门精讲

ElasticSearch是一个开源的、基于Lucene的、分布式、高扩展、高实时的搜索与数据分析引擎。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr（也是基于Lucene）。 ElasticSearch的实现原理主要分为以下几个步骤：用户将数据提交到Elastic Search 数据库中通过分词控制器去将对应的语句分词，将其权重和分词结果一并存入数据当用户搜索数据时候，再根据权重将结果排名、打分将返回结果呈现给用户 Elasticsearch可以用于搜索各种文档。它提供可扩展的搜索，具有接近实时的搜索，并支持多租户。

elasticsearch批量导入数据注意事项

热门文章

最新文章

相关课程

相关电子书

相关实验场景