數據讀取
hadoopFile
Parameters:
- path – path to Hadoop file
- inputFormatClass – fully qualified classname of Hadoop InputFormat (e.g. “org.apache.hadoop.mapred.TextInputFormat”)
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
- keyConverter – (None by default)
- valueConverter – (None by default)
- conf – Hadoop configuration, passed in as a dict (None by default)
- batchSize – The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)
# hadoopFile:返回鍵值對,鍵爲爲行的偏移量,值爲行的內容
# log.txt:
# http://www.baidu.com
# http://www.google.com
# http://www.google.com
# ... ... ...
rdd = sc.hadoopFile("hdfs://centos03:9000/datas/log.txt",
inputFormatClass="org.apache.hadoop.mapred.TextInputFormat",
keyClass="org.apache.hadoop.io.LongWritable",
valueClass="org.apache.hadoop.io.Text")
print(rdd.collect()) #1
rdd1 = rdd.map(lambda x: x[1].split(":"))
print(rdd1.collect()) #2
#1
[(0, ‘http://www.baidu.com’), (22, ‘http://www.google.com’), (45, ‘http://www.google.com’), (68, ‘http://cn.bing.com’), (88, ‘http://cn.bing.com’), (108, ‘http://www.baidu.com’), (130, ‘http://www.sohu.com’), (151, ‘http://www.sina.com’), (172, ‘http://www.sin2a.com’), (194, ‘http://www.sin2desa.com’), (219, ‘http://www.sindsafa.com’)]
#2
[[‘http’, ‘//www.baidu.com’], [‘http’, ‘//www.google.com’], [‘http’, ‘//www.google.com’], [‘http’, ‘//cn.bing.com’], [‘http’, ‘//cn.bing.com’], [‘http’, ‘//www.baidu.com’], [‘http’, ‘//www.sohu.com’], [‘http’, ‘//www.sina.com’], [‘http’, ‘//www.sin2a.com’], [‘http’, ‘//www.sin2desa.com’], [‘http’, ‘//www.sindsafa.com’]]
newAPIHadoopFile
Parameters:
- path – path to Hadoop file
- inputFormatClass – fully qualified classname of Hadoop InputFormat (e.g. “org.apache.hadoop.mapreduce.lib.input.TextInputFormat”)
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
- keyConverter – (None by default)
- valueConverter – (None by default)
- conf – Hadoop configuration, passed in as a dict (None by default)
- batchSize – The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)
# newAPIHadoopFile:返回鍵值對,鍵爲爲行的偏移量,值爲行的內容
rdd = sc.newAPIHadoopFile("hdfs://centos03:9000/datas/log.txt",
# inputFormatClass與舊的API不同
inputFormatClass="org.apache.hadoop.mapreduce.lib.input.TextInputFormat",
keyClass="org.apache.hadoop.io.LongWritable",
valueClass="org.apache.hadoop.io.Text"
)
print(rdd.collect()) #1
rdd1 = rdd.map(lambda x: x[1].split(":"))
print(rdd1.collect()) #2
#1
[(0, ‘http://www.baidu.com’), (22, ‘http://www.google.com’), (45, ‘http://www.google.com’), (68, ‘http://cn.bing.com’), (88, ‘http://cn.bing.com’), (108, ‘http://www.baidu.com’), (130, ‘http://www.sohu.com’), (151, ‘http://www.sina.com’), (172, ‘http://www.sin2a.com’), (194, ‘http://www.sin2desa.com’), (219, ‘http://www.sindsafa.com’)]
#2
[[‘http’, ‘//www.baidu.com’], [‘http’, ‘//www.google.com’], [‘http’, ‘//www.google.com’], [‘http’, ‘//cn.bing.com’], [‘http’, ‘//cn.bing.com’], [‘http’, ‘//www.baidu.com’], [‘http’, ‘//www.sohu.com’], [‘http’, ‘//www.sina.com’], [‘http’, ‘//www.sin2a.com’], [‘http’, ‘//www.sin2desa.com’], [‘http’, ‘//www.sindsafa.com’]]
hadoopRDD
Parameters:
- inputFormatClass – fully qualified classname of Hadoop InputFormat (e.g. “org.apache.hadoop.mapred.TextInputFormat”)
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
- keyConverter – (None by default)
- valueConverter – (None by default)
- conf – Hadoop configuration, passed in as a dict (None by default)
- batchSize – The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)
confs = {"mapred.input.dir": "hdfs://centos03:9000/datas/log.txt"}
rdd = sc.hadoopRDD(inputFormatClass="org.apache.hadoop.mapred.TextInputFormat",
keyClass="org.apache.hadoop.io.LongWritable",
valueClass="org.apache.hadoop.io.Text",
conf=confs)
print(rdd.collect()) #1
#1` [(0, ‘http://www.baidu.com’), (22, ‘http://www.google.com’), (45, ‘http://www.google.com’), (68, ‘http://cn.bing.com’), (88, ‘http://cn.bing.com’), (108, ‘http://www.baidu.com’), (130, ‘http://www.sohu.com’), (151, ‘http://www.sina.com’), (172, ‘http://www.sin2a.com’), (194, ‘http://www.sin2desa.com’), (219, ‘http://www.sindsafa.com’)]
newAPIHadoopRDD
Parameters:
- inputFormatClass – fully qualified classname of Hadoop InputFormat (e.g. “org.apache.hadoop.mapreduce.lib.input.TextInputFormat”)
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
- keyConverter – (None by default)
- valueConverter – (None by default)
- conf – Hadoop configuration, passed in as a dict (None by default)
- batchSize – The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)
confs = {"mapreduce.input.fileinputformat.inputdir":"hdfs://centos03:9000/datas/log.txt"}
rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.apache.hadoop.mapreduce.lib.input.TextInputFormat",
keyClass="org.apache.hadoop.io.LongWritable",
valueClass="org.apache.hadoop.io.Text",
conf=confs)
print(rdd.collect()) #1
#1
[(0, ‘http://www.baidu.com’), (22, ‘http://www.google.com’), (45, ‘http://www.google.com’), (68, ‘http://cn.bing.com’), (88, ‘http://cn.bing.com’), (108, ‘http://www.baidu.com’), (130, ‘http://www.sohu.com’), (151, ‘http://www.sina.com’), (172, ‘http://www.sin2a.com’), (194, ‘http://www.sin2desa.com’), (219, ‘http://www.sindsafa.com’)]
pickleFile
Parameter:
- name – 加載數據的地址
- minPartitions=None
讀取由saveAsPickleFile保存的RDD
# pickleFile讀取由saveAsPickleFile保存的數據,數據形式與原來保存的數據形式一樣
rdd = sc.newAPIHadoopFile("hdfs://centos03:9000/datas/log.txt",
inputFormatClass="org.apache.hadoop.mapreduce.lib.input.TextInputFormat",
keyClass="org.apache.hadoop.io.LongWritable",
valueClass="org.apache.hadoop.io.Text"
)
print(rdd.collect()) #1
rdd1 = rdd.map(lambda x: x[1].split(":")).map(lambda x: (x[0], x[1]))
print(rdd1.collect()) #2
rdd1.saveAsPickleFile("hdfs://centos03:9000/datas/logp.txt")
print(sc.pickleFile("hdfs://centos03:9000/datas/logp.txt").collect()) #3
#1
[(0, ‘http://www.baidu.com’), (22, ‘http://www.google.com’), (45, ‘http://www.google.com’), (68, ‘http://cn.bing.com’), (88, ‘http://cn.bing.com’), (108, ‘http://www.baidu.com’), (130, ‘http://www.sohu.com’), (151, ‘http://www.sina.com’), (172, ‘http://www.sin2a.com’), (194, ‘http://www.sin2desa.com’), (219, ‘http://www.sindsafa.com’)]
#2
[(‘http’, ‘//www.baidu.com’), (‘http’, ‘//www.google.com’), (‘http’, ‘//www.google.com’), (‘http’, ‘//cn.bing.com’), (‘http’, ‘//cn.bing.com’), (‘http’, ‘//www.baidu.com’), (‘http’, ‘//www.sohu.com’), (‘http’, ‘//www.sina.com’), (‘http’, ‘//www.sin2a.com’), (‘http’, ‘//www.sin2desa.com’), (‘http’, ‘//www.sindsafa.com’)]
#3
[(‘http’, ‘//www.baidu.com’), (‘http’, ‘//www.google.com’), (‘http’, ‘//www.google.com’), (‘http’, ‘//cn.bing.com’), (‘http’, ‘//cn.bing.com’), (‘http’, ‘//www.baidu.com’), (‘http’, ‘//www.sohu.com’), (‘http’, ‘//www.sina.com’), (‘http’, ‘//www.sin2a.com’), (‘http’, ‘//www.sin2desa.com’), (‘http’, ‘//www.sindsafa.com’)]
sequenceFile
Parameters:
- path – path to sequncefile
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
- keyConverter –
- valueConverter –
- minSplits – minimum splits in dataset (default min(2, sc.defaultParallelism))
- batchSize – The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)
# 讀取hadoop序列化的文件,其中keyClass和valueClass可以不用指定
rdd = sc.sequenceFile(path="hdfs://centos03:9000/datas/seqFile",
keyClass="org.apache.hadoop.io.LongWritable",
valueClass="org.apache.hadoop.io.Text")
print(rdd.collect()) #1
#1
[(‘Pandas’, 3), (‘Key’, 6), (‘Sanil’, 2)]
textFile
Parameter:
- name – 文件名稱
- minPartitions=None
- use_unicode=True
# textFile,如果use_unicode=False, 字符串爲str類型,會比unicode更快更小
rdd = sc.textFile(name="hdfs://centos03:9000/datas/log.txt")
print(rdd.collect()) #1
#1
[‘http://www.baidu.com’, ‘http://www.google.com’, ‘http://www.google.com’, ‘http://cn.bing.com’, ‘http://cn.bing.com’, ‘http://www.baidu.com’, ‘http://www.sohu.com’, ‘http://www.sina.com’, ‘http://www.sin2a.com’, ‘http://www.sin2desa.com’, ‘http://www.sindsafa.com’]
wholeTextFiles
從HDFS,本地文件系統或其他hadoop支持的文件系統中讀取文件路徑,每個文件作爲一個record被讀取,並返回一個key-value pair, key爲每個文件的路徑,value爲文件的內容
Parameters:
- path
- minPartitions=None
- use_unicode=True
# wholeTextFiles,比較適合小文件多的情況
rdd = sc.wholeTextFiles(path="hdfs://centos03:9000/table")
print(rdd.collect()) #1
rdd1 = rdd.map(lambda x: x[1].split("\t"))
print(rdd1.collect()) #2
#1
[(‘hdfs://centos03:9000/table/order.txt’, ‘1001\t01\t1\r\n1002\t02\t2\r\n1003\t03\t3\r\n1004\t01\t4\r\n1005\t02\t5\r\n1006\t03\t6’), (‘hdfs://centos03:9000/table/pd.txt’, ‘01\t小米\r\n02\t華爲\r\n03\t格力\r\n’)]
#2
[[‘1001’, ‘01’, ‘1\r\n1002’, ‘02’, ‘2\r\n1003’, ‘03’, ‘3\r\n1004’, ‘01’, ‘4\r\n1005’, ‘02’, ‘5\r\n1006’, ‘03’, ‘6’], [‘01’, ‘小米\r\n02’, ‘華爲\r\n03’, ‘格力\r\n’]]
數據保存
saveAsHadoopFile
Output a Python RDD of key-value pairs(of form RDD[(K, V)])
Parameters:
- path – path to Hadoop file
- outputFormatClass – fully qualified classname of Hadoop OutputFormat (e.g. “org.apache.hadoop.mapred.SequenceFileOutputFormat”)
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.IntWritable”, None by default)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.Text”, None by default)
- keyConverter – (None by default)
- valueConverter – (None by default)
- conf – (None by default)
- compressionCodecClass – (None by default)
# saveAsHadoopFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
print(rdd.collect())
rdd.saveAsHadoopFile(
path="hdfs://centos03:9000/datas/rdd_seq",
outputFormatClass="org.apache.hadoop.mapred.SequenceFileOutputFormat"
)
print(sc.sequenceFile("hdfs://centos03:9000/datas/rdd_seq").collect()) #1
#1
[(‘good’, 1), (“spark”, 4), (“beats”, 3)]
或:
# saveAsHadoopFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
print(rdd.collect())
rdd.saveAsHadoopFile(
path="hdfs://centos03:9000/datas/rdd_seq",
outputFormatClass="org.apache.hadoop.mapred.TextOutputFormat")
rdd1 = sc.hadoopFile(
"hdfs://centos03:9000/datas/rdd_seq",
inputFormatClass="org.apache.hadoop.mapred.TextInputFormat",
keyClass="org.apache.hadoop.io.IntWritable",
valueClass="org.apache.hadoop.io.Text")
print(rdd1.collect()) #1
#1
[(0, ‘good\t1’), (0, ‘spark\t4’), (0, ‘beats\t3’)]
從上面兩段代碼來看,序列化形式保存數據比較好。
但是當數據爲sc.parallelize([{'good': 1}, {'spark': 4}, {'beats': 3}])
時會出現org.apache.spark.SparkException: RDD element of type java.util.HashMap cannot be used
的錯誤,即使rdd中的數據使用json.dumps後仍然出錯(org.apache.spark.SparkException: RDD element of type java.lang.String cannot be used),在網上找到一句話: To use String
and Map
objects you will need to use the more extensive native support available in Scala and Java.
其實在官方API文檔也解釋了輸出的是鍵值對的PythonRDD
saveAsNewAPIHadoopFile
Output a Python RDD of key-value pairs(of form RDD[(K, V)])
Parameters:
- path – path to Hadoop file
- outputFormatClass – fully qualified classname of Hadoop OutputFormat (e.g. “org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat”)
- keyClass – fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.IntWritable”, None by default)
- valueClass – fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.Text”, None by default)
- keyConverter – (None by default)
- valueConverter – (None by default)
- conf – Hadoop job configuration, passed in as a dict (None by default)
# saveAsNewAPIHadoopFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
print(rdd.collect())
rdd.saveAsNewAPIHadoopFile(path="hdfs://centos03:9000/datas/rdd_seq",
outputFormatClass="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat")
print(sc.sequenceFile("hdfs://centos03:9000/datas/rdd_seq").collect()) #1
#1
[(‘good’, 1), (‘spark’, 4), (‘beats’, 3)]
# saveAsNewAPIHadoopFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
print(rdd.collect())
rdd.saveAsNewAPIHadoopFile(
path="hdfs://centos03:9000/datas/rdd_seq",
outputFormatClass="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
)
rdd2 = sc.hadoopFile("hdfs://centos03:9000/datas/rdd_seq", inputFormatClass="org.apache.hadoop.mapred.TextInputFormat", keyClass="org.apache.hadoop.io.IntWritable", valueClass="org.apache.hadoop.io.Text")
print(rdd2.collect()) #1
#1
[(0, ‘good\t1’), (0, ‘spark\t4’), (0, ‘beats\t3’)]
如果改變數據存儲形式呢:
rdd = sc.parallelize([(1, {'good': 1}), (2, {'spark': 4}), (3, {'beats': 3})])
print(rdd.collect())
rdd.saveAsNewAPIHadoopFile(
path="hdfs://centos03:9000/datas/rdd_seq",
outputFormatClass="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat")
print(sc.sequenceFile("hdfs://centos03:9000/datas/rdd_seq").collect()) #1
#1
[(1, {‘good’: 1}), (2, {‘spark’: 4}), (3, {‘beats’: 3})]
rdd = sc.parallelize([(1, {'good': 1}), (2, {'spark': 4}), (3, {'beats': 3})])
print(rdd.collect())
rdd.saveAsNewAPIHadoopFile(
path="hdfs://centos03:9000/datas/rdd_seq",
outputFormatClass="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat")
rdd2 = sc.hadoopFile(
"hdfs://centos03:9000/datas/rdd_seq",
inputFormatClass="org.apache.hadoop.mapred.TextInputFormat",
keyClass="org.apache.hadoop.io.IntWritable",
valueClass="org.apache.hadoop.io.Text")
print(rdd2.collect()) #1
#1
[(0, ‘1\torg.apache.hadoop.io.MapWritable@3e9840’), (0,‘2\torg.apache.hadoop.io.MapWritable@83dcb79’), (0,‘3\torg.apache.hadoop.io.MapWritable@7493c20’)]
從上面代碼看出,保存數據時還是使用序列化的形式比較好,能夠保存原數據的結構
saveAsHadoopDataset
Output a Python RDD of key-value pairs (of form RDD[(K, V)])
Parameters:
- conf – Hadoop job configuration, passed in as a dict
- keyConverter – (None by default)
- valueConverter – (None by default)
# saveAsHadoopDataset
confs = {"outputFormatClass": "org.apache.hadoop.mapred.TextOutputFormat",
"keyClass": "org.apache.hadoop.io.LongWritable",
"valueClass": "org.apache.hadoop.io.Text",
"mapred.output.dir": "hdfs://centos03:9000/datas/rdd"}
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
rdd.saveAsHadoopDataset(conf=confs) # conf中配置job參數
rdd2 = sc.hadoopFile("hdfs://centos03:9000/datas/rdd", inputFormatClass="org.apache.hadoop.mapred.TextInputFormat", keyClass="org.apache.hadoop.io.LongWritable", valueClass="org.apache.hadoop.io.Text")
print(rdd2.collect()) #1
#1
[(0, ‘good\t1’), (0, ‘spark\t4’), (0, ‘beats\t3’)]
# saveAsHadoopDataset
confs = {"outputFormatClass":"org.apache.hadoop.mapred.SequenceFileOutputFormat", "keyClass": "org.apache.hadoop.io.LongWritable",
"valueClass": "org.apache.hadoop.io.Text",
"mapred.output.dir": "hdfs://centos03:9000/datas/rdd"
}
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
rdd.saveAsHadoopDataset(conf=confs)
rdd2 = sc.textFile("hdfs://centos03:9000/datas/rdd") # 序列化的文件可以被textFile讀取
print(rdd2.collect()) #1
#1
[‘good\t1’, ‘spark\t4’, ‘beats\t3’]
saveAsNewAPIHadoopDataset
Output a Python RDD of key-value pairs (of form RDD[(K, V)])
Parameters:
- conf – Hadoop job configuration, passed in as a dict
- keyConverter – (None by default)
- valueConverter – (None by default)
# saveAsNewAPIHadoopDataset
confs = {"outputFormatClass":"org.apache.hadoop.mapreduce.lib.output.TextOutputFormat",
"keyClass": "org.apache.hadoop.io.LongWritable",
"valueClass": "org.apache.hadoop.io.Text",
"mapreduce.output.fileoutputformat.outputdir": "hdfs://centos03:9000/datas/rdd"
}
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
rdd.saveAsNewAPIHadoopDataset(conf=confs)
rdd1 = sc.newAPIHadoopFile(path="hdfs://centos03:9000/datas/rdd", inputFormatClass="org.apache.hadoop.mapreduce.lib.input.TextInputFormat", keyClass="org.apache.hadoop.io.LongWritable", valueClass="org.apache.hadoop.io.Text")
print(rdd1.collect()) #1
rdd2 = sc.textFile("hdfs://centos03:9000/datas/rdd")
print(rdd2.collect()) #2
#1
[(0, ‘good\t1’), (0, ‘spark\t4’), (0, ‘beats\t3’)]
#2
[‘good\t1’, ‘spark\t4’, ‘beats\t3’]
saveAsPickleFile
Save this RDD as a SequenceFile of serialized objects. The serializer used is pyspark.serializers.PickleSerializer
, default batch size is 10.
- path
- batchSize=10
# saveAsPickleFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
rdd.saveAsPickleFile("hdfs://centos03:9000/datas/rdd")
rdd1 = sc.pickleFile("hdfs://centos03:9000/datas/rdd")
print(rdd1.collect()) #1
#1
[(‘good’, 1), (‘spark’, 4), (‘beats’, 3)]
saveAsSequenceFile
Output a Python RDD of key-value pairs (of form RDD[(K, V)])
中間做了兩次轉換:1. pickled python RDD -> java RDD; 2. java RDD -> writables; 3. written out
Parameters:
- path – path to sequence file
- compressionCodecClass – (None by default)
# saveAsSequenceFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
rdd.saveAsSequenceFile("hdfs://centos03:9000/datas/rdd")
rdd1 = sc.sequenceFile("hdfs://centos03:9000/datas/rdd")
print(rdd1.collect()) #1
rdd2 = sc.textFile("hdfs://centos03:9000/datas/rdd")
print(rdd2.collect()) #2
#1
[(‘good’, 1), (‘spark’, 4), (‘beats’, 3)]
#2
['SEQ\x06\x19org.apache.hadoop.io.Text org.apache.hadoop.io.IntWritable\x00\x00\x00\x00\x00\x00�ekpR2\x08�
U��Yn$’, 'SEQ\x06\x19org.apache.hadoop.io.Text org.apache.hadoop.io.IntWritable\x00\x00\x00\x00\x00\x00�4��E�}βZ;�v\x1f\t\x00\x00\x00\t\x00\x00\x00\x05\x04good\x00\x00\x00\x01', 'SEQ\x06\x19org.apache.hadoop.io.Text org.apache.hadoop.io.IntWritable\x00\x00\x00\x00\x00\x00\x14��˹\x02oM�g��f�\x02v\x00\x00\x00', '\x00\x00\x00\x06\x05spark\x00\x00\x00\x04', 'SEQ\x06\x19org.apache.hadoop.io.Text org.apache.hadoop.io.IntWritable\x00\x00\x00\x00\x00\x00F\x0b��\x04lD\x116+\x16n��d�\x00\x00\x00', '\x00\x00\x00\x06\x05beats\x00\x00\x00\x03']
saveAsTextFile
# saveAsTextFile
rdd = sc.parallelize([('good', 1), ("spark", 4), ("beats", 3)])
rdd.saveAsTextFile("hdfs://centos03:9000/datas/rdd")
rdd2 = sc.textFile("hdfs://centos03:9000/datas/rdd")
print(rdd2.collect()) #1
#1
["(‘good’, 1)", “(‘spark’, 4)”, “(‘beats’, 3)”]