The "==" is using the equals methods which checks if the two references point to the same object. The definition of "===" depends on the context/object. For Spark , "===" is using the equalTo method.
== returns a boolean
=== returns a column (which contains the result of the comparisons of the elements of two columns)
Quick Sample: Normally SchemaRDD is as RDD[org.apache.spark.sql.Row] , extracting the information out of schemaRDD is by using rec.getString(column number).
.filter(rec => (rec.getString(5) == "" || rec.getString(5) == null)).map(rec => (rec.getString(1))).
Mergin the PART file to a single file. You can use the same option of .repartition(1) or coalesce(1) to save only 1 part file.
hadoop fs -getmerge "HDFS_LOCATION_PART_FILES_LIST" "NFS_LOCATION_SINGLE_PART_FILE"
0 comments:
Post a Comment