开发者社区> 问答> 正文

如何在Spark Scala中使用root元素读取多行json?

这是一个Sample JSON文件。我有root标签然后如何将JSON数据读入Dataframe并在控制台中打印。

{

    "Crimes": [
{
        "ID": 11034701,
        "Case Number": "JA366925",
        "Date": "01/01/2001 11:00:00 AM",
        "Block": "016XX E 86TH PL",
        "IUCR": "1153",
        "Primary Type": "DECEPTIVE PRACTICE",
        "Description": "FINANCIAL IDENTITY THEFT OVER $ 300",
        "Location Description": "RESIDENCE",
        "Arrest": false,
        "Domestic": false,
        "Beat": 412,
        "District": 4,
        "Ward": 8,
        "Community Area": 45,
        "FBI Code": "11",
        "Year": 2001,
        "Updated On": "08/05/2017 03:50:08 PM"
    },

    {
        "ID": 11162428,
        "Case Number": "JA529032",
        "Date": "11/28/2017 09:43:00 PM",
        "Block": "026XX S CALIFORNIA BLVD",
        "IUCR": "5131",
        "Primary Type": "OTHER OFFENSE",
        "Description": "VIOLENT OFFENDER: ANNUAL REGISTRATION",
        "Location Description": "JAIL / LOCK-UP FACILITY",
        "Arrest": true,
        "Domestic": false,
        "Beat": 1034,
        "District": 10,
        "Ward": 12,
        "Community Area": 30,
        "FBI Code": "26",
        "X Coordinate": 1158280,
        "Y Coordinate": 1886310,
        "Year": 2017,
        "Updated On": "02/11/2018 03:54:58 PM",
        "Latitude": 41.843778126,
        "Longitude": -87.694637678,
        "Location": "(41.843778126, -87.694637678)"
    }, {
        "ID": 4080525,
        "Case Number": "HL425503",
        "Date": "06/16/2005 09:40:00 PM",
        "Block": "062XX N KIRKWOOD AVE",
        "IUCR": "1365",
        "Primary Type": "CRIMINAL TRESPASS",
        "Description": "TO RESIDENCE",
        "Location Description": "RESIDENCE",
        "Arrest": false,
        "Domestic": false,
        "Beat": 1711,
        "District": 17,
        "Ward": 39,
        "Community Area": 12,
        "FBI Code": "26",
        "X Coordinate": 1145575,
        "Y Coordinate": 1941395,
        "Year": 2005,
        "Updated On": "02/28/2018 03:56:25 PM",
        "Latitude": 41.99518667,
        "Longitude": -87.739863972,
        "Location": "(41.99518667, -87.739863972)"
    }, {
        "ID": 4080539,
        "Case Number": "HL422433",
        "Date": "06/15/2005 12:55:00 PM",
        "Block": "042XX S ST LAWRENCE AVE",
        "IUCR": "0460",
        "Primary Type": "BATTERY",
        "Description": "SIMPLE",
        "Location Description": "SCHOOL, PUBLIC BUILDING",
        "Arrest": false,
        "Domestic": false,
        "Beat": 213,
        "District": 2,
        "Ward": 4,
        "Community Area": 38,
        "FBI Code": "08B",
        "X Coordinate": 1180964,
        "Y Coordinate": 1877123,
        "Year": 2005,
        "Updated On": "02/28/2018 03:56:25 PM",
        "Latitude": 41.818075262,
        "Longitude": -87.611675899,
        "Location": "(41.818075262, -87.611675899)"
    }
]
}

我正在使用此代码。

val conf = new SparkConf().setAppName("demo").setMaster("local");

val sc = new SparkContext(conf);
val spark = SparkSession.builder().master("local").appName("ValidationFrameWork").getOrCreate()
val sqlContext = new SQLContext(sc)
sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")
sc.hadoopConfiguration.set("parquet.enable.summary-metadata", "false")  

val jsonRDD = sc.wholeTextFiles("D:/FinalScripts/output/Crimes1.json").map(x=>x._2)
val namesJson = sqlContext.read.json(jsonRDD)
namesJson.printSchema
 namesJson.registerTempTable("JSONdata")
 val data=sqlContext.sql("select * from JSONdata")
data.show()

如何忽略根元素并仅获取原始数据。

我怎样才能将嵌套的JSON读入Dataframe并在控制台中打印。

展开
收起
社区小助手 2018-12-21 13:52:20 2167 0
1 条回答
写回答
取消 提交回答
  • 社区小助手是spark中国社区的管理员,我会定期更新直播回顾等资料和文章干货,还整合了大家在钉群提出的有关spark的问题及回答。

    试试看:

    import org.apache.spark.sql.functions._
    ds.select(explode($"Crimes") as "exploded").select("exploded.*")
    这里ds是你的Dataset,你从JSON记录创建。

    请注意,如果您的数据很大,Spark需要在展平之前将整个数据保存在内存中。

    2019-07-17 23:23:26
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Spark: Data Science as a Service 立即下载
Just Enough Scala for Spark 立即下载
JDK8新特性与生产-for“华东地区scala爱好者聚会” 立即下载