Flink日常问题

问题一:

flink日志一直报错如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
2019-07-29 16:41:42,634 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler   - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:41:45,632 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:41:48,633 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:41:51,634 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:41:54,634 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:41:57,633 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:00,634 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:03,639 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:06,634 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:09,645 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:12,633 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:15,587 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
2019-07-29 16:42:17,872 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job 4b1ab21b418e7b4128838ee4efbde4dc not found
原因:

有查看该任务的web页面未关闭,关闭之后就不报错了。

问题二:

flink启动之后,checkpoint报错。看起来是kafka问题
1
org.apache.kafka.common.errors.TimeoutException
原因:

因为该任务sink是kafkaproducer,但是未创建topic,所以超时异常。

问题三

之前flink checkpoint存在本地目录,存在两个问题。
一个是,每个运行任务的taskManager都有保存其checkpoint
第二个是,运行一段时间以后,总是会无法生存新的chk-n文件,然后下次一checkpoint就一直找不到文件,失败。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore operator state backend for StreamSource_6cdc5bb954874d922eaee11a8e7b5dd5_(9/18) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:245)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:143)
... 5 more
Caused by: java.io.FileNotFoundException: /data/flink/checkpoints/BDP_WATCH_ANDROID_PKSP_GROUP_2/8898aa4e9a78e55bd99c65313a077d78/chk-1955/6da428d6-9219-41f8-8b45-abcb7baff483 (没有那个文件或目录)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
at org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)
at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
at org.apache.flink.runtime.state.OperatorStreamStateHandle.openInputStream(OperatorStreamStateHandle.java:66)
at org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:286)
at org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:62)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
... 7 more
解决办法:

将状态后端修改为 hdfs,然后将checkpoint保存到hdfs路径下