SparkSession sparkSession = SparkSession.builder()
.master(“yarn”)
.appName(“MongoDb12”)
.config(“spark.serializer”, “org.apache.spark.serializer.KryoSerializer”)
.config(“spark.mongodb.input.uri”, cfg.mongodbUrl)
.config(“spark.mongodb.input.database”, cfg.dataBase)
.config(“spark.mongodb.input.collection”, cfg.tableName)
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(sparkSession.sparkContext());
Map<String, String> readOverrides = new HashMap<String, String>();
readOverrides.put("collection", cfg.tableName);
readOverrides.put("database", cfg.dataBase);
readOverrides.put("registerSQLHelperFunctions", "true");
ReadConfig readConfig = ReadConfig.create(jsc).withOptions(readOverrides);
Dataset<Row> dataset = MongoSpark.load(jsc, readConfig).toDF();
dataset.printSchema();
org.mongodb.spark
mongo-spark-connector_2.11
2.4.2
Hello, community partner.
I use above code, spark reads mongo data. However, there are some field data that cannot be read. I have checked all the information that can be browsed, but I still can’t solve it.
The missing field is a jsonString, which is stored in string type in mongo