Skip to main content

Spark Cyclone Configuration

Basic configuration to run spark job in Vector Engine:

$SPARK_HOME/bin/spark-submit \
--master yarn \
--num-executors=8 --executor-cores=1 --executor-memory=7G \
--name job \
--jars /opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar \
--conf spark.executor.extraClassPath=/opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar \
--conf spark.plugins=com.nec.spark.AuroraSqlPlugin \
--conf spark.executor.resource.ve.amount=1 \
--conf spark.executor.resource.ve.discoveryScript=/opt/spark/getVEsResources.py \
job.py

General Configuration

NameDescriptionDefault Value
spark.com.nec.spark.ncc.pathSpecifying ncc path. Please specify the absolute path if ncc is not in your $PATHncc
spark.com.nec.spark.ncc.debugncc debug modefalse
spark.com.nec.spark.ncc.oOptimization level for Vector Engine Compiler4
spark.com.nec.spark.ncc.openmpUse openMPfalse
spark.com.nec.spark.ncc.extra-argument.0Additional options for Vector Engine Compiler. For example: "-X"""
spark.com.nec.spark.ncc.extra-argument.1Additional options for Vector Engine Compiler: For example: "-Y"""
spark.com.nec.native-csvNative CSV parser. Available options: "x86" : uses CNativeEvaluator, "ve": uses ExecutorPluginManagedEvaluatoroff
spark.com.nec.native-csv-ipcUsing IPC for parsing CSV. Spark -> IPC -> VE CSVtrue
spark.com.nec.native-csv-skip-stringsTo use String allocation as opposed to ByteArray optimization in NativeCsvExec, set it to false.true
spark.executor.resource.ve.amountThis is definitely needed. For example: "1"-
spark.task.resource.ve.amountNot clear if this is needed, For example: "1"-
spark.worker.resource.ve.amountThis seems to be necessary for cluster-local mode, For example: "1"-
spark.resources.discoveryPluginDetecting resources automatically. Set it to com.nec.ve.DiscoverVectorEnginesPlugin to enable it-
spark.[executor|driver].resource.ve.discoveryScriptSpecifying resources via file. Set it to /opt/spark/getVEsResources.py to enable it or where ever your script is located-
spark.com.nec.spark.kernel.precompiledUse a precompiled directory-
spark.com.nec.spark.kernel.directoryIf precompiled directory is not yet exist, then you can also specify a destination for on-demand compilation. If this is not specified, then a random temporary directory will be used (not removed, however).random temporary directory
spark.com.nec.spark.batch-batchesThis is to batch ColumnarBatch together, to allow for larger input sizes into the VE. This may however use more on-heap and off-heap memory.0
com.nec.spark.preshuffle-partitionsAvoids a coalesce into a single partition, trading it off for pre-sorting/pre-partitioning data by hashes of the group-by expressions-

For spark.com.nec.spark.ncc.extra-argument.[0-?]. Please refer to the NEC C++ compiler guide