Tuesday, July 18, 2017

Difference between == and === in scala,spark


The "==" is using the equals methods which checks if the two references point to the same object. The definition of "===" depends on the context/object. For Spark , "===" is using the equalTo method.

== returns a boolean
=== returns a column (which contains the result of the comparisons of the elements of two columns)

Quick Sample: Normally SchemaRDD is as RDD[org.apache.spark.sql.Row] , extracting the information out of schemaRDD is by using rec.getString(column number).


.filter(rec => (rec.getString(5) == "" || rec.getString(5) == null)).map(rec => (rec.getString(1))).

Mergin the PART file to a single file. You can use the same option of .repartition(1) or coalesce(1) to save only 1 part file.

hadoop fs -getmerge "HDFS_LOCATION_PART_FILES_LIST"  "NFS_LOCATION_SINGLE_PART_FILE"

0 comments:

Post a Comment