- File Formats
- Text File
sc.textFile, load a text file
sc.wholeTextFiles, load multiple files (filename, entire content) under specified dir
- JSON
- CSV
reader.readNext();) by third-party tool, like opencsv
- Sequence File
sc.sequenceFile(inFile, classOf[Text], classOf[IntWritable]).
map{case (x, y) => (x.toString, y.get())}
- Object File
- Hadoop InputFormat and OutputFormat
val input = sc.hadoopFile[Text, Text, KeyValueTextInputFormat](inputFile).map{ //for hadoop old api
case (x, y) => (x.toString, y.toString)
}
val input = sc.newAPIHadoopFile(inputFile, classOf[LzoJsonInputFormat], // for hadoop new api
classOf[LongWritable], classOf[MapWritable], conf)
- Others
hadoopDataset/saveAsHadoopDataset