开发者社区> 问答> 正文

当yarn崩溃时恢复Flink

我正在EMR上运行一个Yarn3节点集群(1个Master 2 Core节点)。我使用的是1.6.0。我启用了检查指向(rocksdb),写入S3。检查指向似乎在其他测试中正常工作。在主节点上Yarn崩溃的情况下,我无法从最后一个检查点恢复我的应用程序。这是我尝试重启时的输出:

[hadoop@emr flink-1.6.0]$ bin/flink run -s s3://bucket/kinesis-pipeline-checkpoint/a8a9ceb95845c3ea9833e025b5771470 -p 1 -d ~/pipeline-assembly-0.2.0.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/flink-1.6.0/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-11-08 19:01:06,069 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2018-11-08 19:01:06,069 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-hadoop.
2018-11-08 19:01:06,488 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 1
2018-11-08 19:01:06,488 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 1
YARN properties set default parallelism to 1
2018-11-08 19:01:06,637 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at emr:8032
2018-11-08 19:01:06,745 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-11-08 19:01:06,745 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-11-08 19:01:06,845 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Found application JobManager host name 'emr' and port '39541' from supplied application id 'application_1541703591281_0001'
Starting execution of program


The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: c701b6511ad76b5e4faae703763f388e)

at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)

Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.

at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.

at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 12 more

Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.

... 10 more

Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]

at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 4 more

Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]

at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
... 5 more

这是预期的行为,还是我在这种情况下做错了什么?

更新:jobmanager.log

LogType:jobmanager.log
Log Upload Time:Tue Nov 20 16:37:52 +0000 2018
LogLength:49255
Log Contents:
2018-11-20 16:33:33,276 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2018-11-20 16:33:33,277 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting YarnSessionClusterEntrypoint (Version: 1.6.0, Rev:ff472b4, Date:07.08.2018 @ 13:31:13 UTC)
2018-11-20 16:33:33,278 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current user: yarn
2018-11-20 16:33:33,672 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current Hadoop/Kerberos user: hadoop
2018-11-20 16:33:33,672 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-11-20 16:33:33,672 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum heap size: 13653 MiBytes
2018-11-20 16:33:33,672 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME: /usr/lib/jvm/java-openjdk
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Hadoop version: 2.8.3
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM Options:
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xmx15360m
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog.file=/var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.log
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlogback.configurationFile=file:logback.xml
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog4j.configuration=file:log4j.properties
2018-11-20 16:33:33,673 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program Arguments: (none)
2018-11-20 16:33:33,674 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2018-11-20 16:33:33,675 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered UNIX signal handlers for [TERM, HUP, INT]
2018-11-20 16:33:33,678 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - YARN daemon is running as: hadoop Yarn client user obtainer: hadoop
2018-11-20 16:33:33,680 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: state.backend.fs.checkpointdir, s3://bucket/kinesis-checkpoint
2018-11-20 16:33:33,680 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: web.timeout, 60000
2018-11-20 16:33:33,680 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: high-availability.cluster-id, application_1542731534971_0001
2018-11-20 16:33:33,680 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: rest.port, 8081
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: internal.cluster.execution-mode, NORMAL
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.fraction, 0.9
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: state.backend, rocksdb
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: akka.ask.timeout, 60s
2018-11-20 16:33:33,681 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 20480m
2018-11-20 16:33:33,682 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 20480m
2018-11-20 16:33:33,682 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: state.checkpoints.dir, s3://bucket/kinesis-checkpoint
2018-11-20 16:33:33,695 INFO  org.apache.flink.runtime.clusterframework.BootstrapTools      - Setting directories for temporary files to: /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001
2018-11-20 16:33:33,708 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting YarnSessionClusterEntrypoint.
2018-11-20 16:33:33,708 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install default filesystem.
2018-11-20 16:33:33,772 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to hadoop (auth:SIMPLE)
2018-11-20 16:33:33,786 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Initializing cluster services.
2018-11-20 16:33:33,791 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to start actor system at ip-172-31-18-80.us-west-2.compute.internal:45751
2018-11-20 16:33:34,239 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2018-11-20 16:33:34,328 INFO  akka.remote.Remoting                                          - Starting remoting
2018-11-20 16:33:34,428 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://flink@ip-172-31-18-80.us-west-2.compute.internal:45751]
2018-11-20 16:33:34,437 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor system started at akka.tcp://flink@ip-172-31-18-80.us-west-2.compute.internal:45751
2018-11-20 16:33:34,469 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/blobStore-1dc43ec8-8ed7-4342-adae-c8d20a691640
2018-11-20 16:33:34,473 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:39955 - max concurrent requests: 50 - max backlog: 1000
2018-11-20 16:33:34,488 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics reporter configured, no metrics will be exposed/reported.
2018-11-20 16:33:34,492 INFO  org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  - Initializing FileArchivedExecutionGraphStore: Storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/executionGraphStore-0c4fd7ac-17d2-40d6-b279-dfef5041a76f, expiration time 3600000, maximum cache size 52428800 bytes.
2018-11-20 16:33:34,514 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /mnt/yarn/usercache/hadoop/appcache/application_1542731534971_0001/blobStore-4c662c5c-afa5-4bf2-8a01-3acc0b9aa491
2018-11-20 16:33:34,521 WARN  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload directory /tmp/flink-web-6885656b-18cc-451f-8853-03ff7cf14b0e/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2018-11-20 16:33:34,522 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created directory /tmp/flink-web-6885656b-18cc-451f-8853-03ff7cf14b0e/flink-web-upload for file uploads.
2018-11-20 16:33:34,525 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting rest endpoint.
2018-11-20 16:33:34,702 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined location of main cluster component log file: /var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.log
2018-11-20 16:33:34,702 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined location of main cluster component stdout file: /var/log/hadoop-yarn/containers/application_1542731534971_0001/container_1542731534971_0001_01_000001/jobmanager.out
2018-11-20 16:33:34,844 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at ip-172-31-18-80.us-west-2.compute.internal:35939
2018-11-20 16:33:34,844 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://ip-172-31-18-80.us-west-2.compute.internal:35939 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2018-11-20 16:33:34,844 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://ip-172-31-18-80.us-west-2.compute.internal:35939.
2018-11-20 16:33:34,857 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.yarn.YarnResourceManager at akka://flink/user/resourcemanager .
2018-11-20 16:33:34,948 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-11-20 16:33:34,981 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at ip-172-31-30-52.us-west-2.compute.internal/172.31.30.52:8030
2018-11-20 16:33:35,234 INFO  org.apache.flink.yarn.YarnResourceManager                     - Recovered 0 containers from previous attempts ([]).
2018-11-20 16:33:35,237 INFO  org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - yarn.client.max-cached-nodemanagers-proxies : 0
2018-11-20 16:33:35,238 INFO  org.apache.flink.yarn.YarnResourceManager                     - ResourceManager akka.tcp://flink@ip-172-31-18-80.us-west-2.compute.internal:45751/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2018-11-20 16:33:35,239 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.
2018-11-20 16:33:35,252 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@ip-172-31-18-80.us-west-2.compute.internal:45751/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2018-11-20 16:33:35,252 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering all persisted jobs.
2018-11-20 16:34:20,094 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Submitting job bd0d5dbaeba3990a3bef1eebee49cd79 (Data Session Pipeline v0.0.7).
2018-11-20 16:34:20,108 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/jobmanager_0 .
2018-11-20 16:34:20,115 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Initializing job Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,124 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Using restart strategy FixedDelayRestartStrategy(maxNumberRestartAttempts=2147483647, delayBetweenRestartAttempts=0) for Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,127 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.slotpool.SlotPool at akka://flink/user/0e6f5de3-53ad-4bae-acf3-3c66106c0a54 .
2018-11-20 16:34:20,148 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job recovers via failover strategy: full graph restart
2018-11-20 16:34:20,170 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Running initialization on master for job Data Session Pipeline v0.0.7 (bd0d5dbaeba3990a3bef1eebee49cd79).
2018-11-20 16:34:20,170 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Successfully ran initialization on master in 0 ms.
2018-11-20 16:34:20,203 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 's3://bucket/kinesis-checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=null, enableIncrementalCheckpointing=TRUE}
2018-11-20 16:34:20,203 INFO  org.apache.flink.runtime.jobmaster.JobMaster                  - Configuring application-defined state backend with job/cluster config
2018-11-20 16:34:22,624 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Starting job bd0d5dbaeba3990a3bef1eebee49cd79 from savepoint s3://bucket/kinesis-pipeline-checkpoint/8a6e5aeebeef202a2daddd3cf9419a80 ()
2018-11-20 16:34:22,663 ERROR org.apache.flink.runtime.rest.handler.job.JobSubmitHandler    - Exception occurred in REST handler.
org.apache.flink.runtime.rest.handler.RestHandlerException: Job submission failed.
        at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$handleRequest$2(JobSubmitHandler.java:119)
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
        at akka.pattern.PipeToSupport$PipeableFuture

$$ anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20) at akka.pattern.PipeToSupport$PipeableFuture $$

anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18)

        at scala.concurrent.Future

$$ anonfun$andThen$1.apply(Future.scala:436) at scala.concurrent.Future $$

anonfun$andThen$1.apply(Future.scala:435)

        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$BlockableBatch

$$ anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) at akka.dispatch.BatchingExecutor$BlockableBatch $$

anonfun$run$1.apply(BatchingExecutor.scala:91)

        at akka.dispatch.BatchingExecutor$BlockableBatch

$$ anonfun$run$1.apply(BatchingExecutor.scala:91) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job. at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$submitJob$2(Dispatcher.java:256) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:690) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) at akka.actor.UntypedActor $$

anonfun$receive$1.applyOrElse(UntypedActor.scala:165)

        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        ... 4 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
        ... 24 more
Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:708)
        at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:687)
        ... 18 more
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
        at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
        at org.apache.flink.util.function.ConsumerWithException.accept(ConsumerWithException.java:40)
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$waitForTerminatingJobManager$29(Dispatcher.java:820)
        at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:705)
        ... 19 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
        at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:176)
        at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:936)
        at org.apache.flink.runtime.dispatcher.Dispatcher.createJobManagerRunner(Dispatcher.java:291)
        at org.apache.flink.runtime.dispatcher.Dispatcher.runJob(Dispatcher.java:281)
        at org.apache.flink.runtime.dispatcher.Dispatcher.persistAndRunJob(Dispatcher.java:266)
        at org.apache.flink.util.function.ConsumerWithException.accept(ConsumerWithException.java:38)
        ... 21 more
Caused by: java.io.FileNotFoundException: Cannot find meta data file '_metadata' in directory 's3://sledfs/kinesis-pipeline-checkpoint/8a6e5aeebeef202a2daddd3cf9419a80'. Please try to load the checkpoint/savepoint directly from the metadata file instead of the directory.
        at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:256)
        at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:109)
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1102)
        at org.apache.flink.runtime.jobmaster.JobMaster.tryRestoreExecutionGraphFromSavepoint(JobMaster.java:1220)
        at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1144)
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:295)
        at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)
        ... 26 more
2018-11-20 16:37:52,321 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2018-11-20 16:37:52,322 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting down BLOB cache
2018-11-20 16:37:52,340 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:39955

展开
收起
flink小助手 2018-12-13 14:47:02 9146 0
1 条回答
写回答
取消 提交回答
  • flink小助手会定期更新直播回顾等资料和文章干货,还整合了大家在钉群提出的有关flink的问题及回答。

    您指的检查点s3://bucket/kinesis-pipeline-checkpoint/a8a9ceb95845c3ea9833e025b5771470不包含有效_metadata文件。这表示此检查点已启动但无法完成。请选择已成功完成的检查站。

    2019-07-17 23:20:43
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Flink CDC Meetup PPT - 龚中强 立即下载
Flink CDC Meetup PPT - 王赫 立即下载
Flink CDC Meetup PPT - 覃立辉 立即下载