Skip to content

Latest commit

 

History

History
84 lines (75 loc) · 5.38 KB

报错和问题归纳.md

File metadata and controls

84 lines (75 loc) · 5.38 KB

spark读取HDFS文件java.net.ConnectException: Connection refused异常

报错信息如下:

java.net.ConnectException: Call From josonlee-PC/127.0.1.1 to 192.168.17.10:8020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
  at org.apache.hadoop.ipc.Client.call(Client.java:1474)
  at org.apache.hadoop.ipc.Client.call(Client.java:1401)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  at com.sun.proxy.$Proxy24.getListing(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy25.getListing(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1958)
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1941)
  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
  at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
  at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
  at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
  at org.apache.hadoop.fs.Globber.listStatus(Globber.java:69)
  at org.apache.hadoop.fs.Globber.glob(Globber.java:217)
  at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644)
  at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:257)
  at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
  at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
  ... 49 elided
Caused by: java.net.ConnectException: 拒绝连接
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
  at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523)
  at org.apache.hadoop.ipc.Client.call(Client.java:1440)
  ... 86 more

一开始看着官方文档写代码,加载外部数据集这一块知道是写hdfs的文件的url,但也没看到示例,就写成下面这样了

scala> val data=sc.textFile("hdfs://192.168.17.10//sparkData/test/*")

报错信息中也指出了 Call From josonlee-PC/127.0.1.1 to 192.168.17.10:8020 failed,说明spark默认是通过8020端口访问hdfs的,错误也好解决,hadoop的配置文件【core-site.xml】中指明了是通过9000端口对外的,所以在url中写死端口即可

scala> val data=sc.textFile("hdfs://192.168.17.10:9000//sparkData/test/*")

Spark集群(或spark-shell)读取本地文件报错:无法找到文件

spark-shell默认不是本地模式的。集群要读取文件首先就要确保worker都能访问该文件,而本地文件只在Master节点下,不存在Worker节点下,所有Worker不能通过file:///的形式读取文件

解决办法:

  • 文件上传到hdfs上,通过hdfs://的方式读取
  • 或者把文件复制到Worker节点下对应目录下即可