SparkSQL中開窗函數

  1. 開窗函數

注意:

row_number() 開窗函數是按照某個字段分組,然後取另一字段的前幾個的值,相當於 分組取topN

如果SQL語句裏面使用到了開窗函數,那麼這個SQL語句必須使用HiveContext來執行,HiveContext默認情況下在本地無法創建。

開窗函數格式:

row_number() over (partitin by XXX order by XXX)

Java代碼:

   SparkConf conf = new SparkConf();
   conf.setAppName("windowfun");
   JavaSparkContext sc = new JavaSparkContext(conf);
   HiveContext hiveContext = new HiveContext(sc);
   hiveContext.sql("use spark");
   hiveContext.sql("drop table if exists sales");
   hiveContext.sql("create table if not exists sales (riqi string,leibie string,jine Int) "
      + "row format delimited fields terminated by '\t'");
   hiveContext.sql("load data local inpath '/root/test/sales' into table sales");
   /**
    * 開窗函數格式:
    * 【 rou_number() over (partitin by XXX order by XXX) 】
    */
   DataFrame result = hiveContext.sql("select riqi,leibie,jine "
         	+ "from ("
            + "select riqi,leibie,jine,"
            + "row_number() over (partition by leibie order by jine desc) rank "
            + "from sales) t "
         + "where t.rank<=3");
   result.show();
   sc.stop();

Scala代碼:

 val conf = new SparkConf()
 conf.setAppName("windowfun")
 val sc = new SparkContext(conf)
 val hiveContext = new HiveContext(sc)
 hiveContext.sql("use spark");
 hiveContext.sql("drop table if exists sales");
 hiveContext.sql("create table if not exists sales (riqi string,leibie string,jine Int) "
  + "row format delimited fields terminated by '\t'");
 hiveContext.sql("load data local inpath '/root/test/sales' into table sales");
 /**
  * 開窗函數格式:
  * 【 rou_number() over (partitin by XXX order by XXX) 】
  */
 val result = hiveContext.sql("select riqi,leibie,jine "
   	+ "from ("
    + "select riqi,leibie,jine,"
    + "row_number() over (partition by leibie order by jine desc) rank "
    + "from sales) t "
   + "where t.rank<=3");
 result.show();
 sc.stop()

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章