kafka 0.11 spark 2.11 streaming例子-阿里云开发者社区

kafka 0.11 spark 2.11 streaming例子

2017-11-09 1470

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

服务治理 MSE Sentinel/OpenSergo，Agent数量不受限

简介：

"""
 Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
 Usage: kafka_wordcount.py <zk> <topic>
 To run this on your local machine, you need to setup Kafka and create a producer first, see
 http://kafka.apache.org/documentation.html#quickstart
 and then run the example
    `$ bin/spark-submit --jars \
      external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \
      examples/src/main/python/streaming/kafka_wordcount.py \
      localhost:2181 test`
"""
from __future__ import print_function

import sys

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
        exit(-1)

    sc = SparkContext(appName="PythonStreamingKafkaWordCount")
    ssc = StreamingContext(sc, 1)

    zkQuorum, topic = sys.argv[1:]
    kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
    lines = kvs.map(lambda x: x[1])
    counts = lines.flatMap(lambda line: line.split(" ")) \
        .map(lambda word: (word, 1)) \
        .reduceByKey(lambda a, b: a+b)
    counts.pprint()

    ssc.start()
    ssc.awaitTermination()

/spark-kafka/spark-2.1.1-bin-hadoop2.6# ./bin/spark-submit --jars ~/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test

其中：spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar在 http://search.maven.org/#search%7Cga%7C1%7Cspark-streaming-kafka-0-8-assembly 下载

kafka 使用0.11版本：

1.3 Quick Start

This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.

Step 1: Download the code

Download the 0.11.0.0 release and un-tar it.

 
         > 
         tar 
         -xzf kafka_2.11-0.11.0.0.tgz 
        
         > 
         cd 
         kafka_2.11-0.11.0.0

Step 2: Start the server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

 
         > bin
         /zookeeper-server-start
         .sh config
         /zookeeper
         .properties 
        
         [2013-04-22 15:01:37,495] INFO Reading configuration from: config
         /zookeeper
         .properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) 
        
         ...

Now start the Kafka server:

 
         > bin
         /kafka-server-start
         .sh config
         /server
         .properties 
        
         [2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
        
         [2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
        
         ...

Step 3: Create a topic

Let's create a topic named "test" with a single partition and only one replica:

 
         > bin
         /kafka-topics
         .sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic 
         test

We can now see that topic if we run the list topic command:

 
         > bin
         /kafka-topics
         .sh --list --zookeeper localhost:2181 
        
         test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

Step 4: Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

 
         > bin
         /kafka-console-producer
         .sh --broker-list localhost:9092 --topic 
         test 
        
         This is a message
        
         This is another message

Step 5: Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.

 
         > bin
         /kafka-console-consumer
         .sh --bootstrap-server localhost:9092 --topic 
         test 
         --from-beginning 
        
         This is a message
        
         This is another message

    
    本文转自张昺华-sky博客园博客，原文链接：http://www.cnblogs.com/bonelee/p/7435506.html
    ，如需转载请自行联系原作者

kafka 0.11 spark 2.11 streaming例子

1.3 Quick Start

Step 1: Download the code

Step 2: Start the server

Step 3: Create a topic

Step 4: Send some messages

Step 5: Start a consumer

热门文章

最新文章

相关课程

相关电子书