spark类型不匹配:无法从JavaRDD <Object>转换为JavaRDD <String>

By simon at 2018-02-07 • 0人收藏 • 58人看过

我已经开始将我的Pyspark应用程序写入Java实现。我是 使用Java 8.我刚开始执行一些基本火花progrma in java的。我使用了下面的[wordcount](http://ingini.org/2015/05/04/apache- 火花的java -8-流-概述/)例子。

  SparkConf conf = new SparkConf().setMaster("local").setAppName("Work Count App");

        // Create a Java version of the Spark Context from the configuration
        JavaSparkContext sc = new JavaSparkContext(conf);


            JavaRDD<String> lines = sc.textFile(filename);

            JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")))
                    .mapToPair(word -> new Tuple2(word, 1))
                    .reduceByKey((x, y) -> (Integer) x + (Integer) y)
                    .sortByKey();
我得到Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String> lines.flatMap(line -> Arrays.asList(line.split(" ")))错误当我GOOGLE了,在所有的 基于Java 8的火花示例,我看到了上面的相同的实现 在我的环境或程序错误。 有人能帮我吗 ?

5 个回复 | 最后更新于 2018-02-07
2018-02-07   #1

使用这个代码。实际问题是rdd.flatMap函数期望Iterator<String>而 你的代码正在创建Iterator<String>1.调用迭代器()将修复 问题。

JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
            .mapToPair(word -> new Tuple2<String, Integer>(word, 1))
            .reduceByKey((x, y) ->  x +  y)
            .sortByKey();

counts.foreach(data -> {
        System.out.println(data._1()+"-"+data._2());
    });

2018-02-07   #2

试试这个代码

JavaRDD<String> words =
    lines.flatMap(line -> Arrays.asList(line.split(" ")));
JavaPairRDD<String, Integer> counts =
    words.mapToPair(w -> new Tuple2<String, Integer>(w, 1))
         .reduceByKey((x, y) -> x + y);

2018-02-07   #3

使用这个代码。实际问题是rdd.flatMap函数期望Iterator<String>而 你的代码正在创建Iterator<String>1.调用迭代器()将修复 问题。

JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
            .mapToPair(word -> new Tuple2<String, Integer>(word, 1))
            .reduceByKey((x, y) ->  x +  y)
            .sortByKey();

counts.foreach(data -> {
        System.out.println(data._1()+"-"+data._2());
    });

2018-02-07   #4

JavaRDD<String> obj = jsc.textFile("<Text File Path>");
JavaRDD<String> obj1 = obj.flatMap(l->{
ArrayList<String> al = new ArrayList();
String[] str = l.split(" ");
for(int i=0;i<str/length;i++) {
    al.add(str[i]);
}
return al.iterator();
});

2018-02-07   #5

enter code here

import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import scala.Tuple2;
import java.util.Arrays;

public class CountWord{
     public static void main(String[] args) {

          SparkConf conf = new SparkConf().setMaster("local").setAppName("wordCount");
          JavaSparkContext sc = new JavaSparkContext(conf);


          String inputFile = "D://input/abc.txt";

          JavaRDD < String > input = sc.textFile(inputFile);



          JavaRDD<String> words = input.flatMap(line -> Arrays.asList(line.split(" ")).iterator());


          JavaPairRDD < String, Integer > pairs = words.mapToPair(w -> new Tuple2(w, 1));

          JavaPairRDD < String, Integer > counts = pairs.reduceByKey((x, y) -> x + y);

          System.out.println(counts.collect());
         }
        }

登录后方可回帖

Loading...