Skip to main content

Spark Cyclone Configuration

Basic configuration to run spark job in Vector Engine:

$SPARK_HOME/bin/spark-submit \
--master yarn \
--num-executors=8 --executor-cores=1 --executor-memory=7G \
--name job \
--jars /opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar \
--conf spark.executor.extraClassPath=/opt/cyclone/${USER}/spark-cyclone-sql-plugin.jar \
--conf \
--conf \
--conf \

General Configuration

NameDescriptionDefault Value ncc path. Please specify the absolute path if ncc is not in your $PATHncc debug modefalse level for Vector Engine Compiler4 openMPfalse options for Vector Engine Compiler. For example: "-X""" options for Vector Engine Compiler: For example: "-Y""" CSV parser. Available options: "x86" : uses CNativeEvaluator, "ve": uses ExecutorPluginManagedEvaluatoroff IPC for parsing CSV. Spark -> IPC -> VE CSVtrue use String allocation as opposed to ByteArray optimization in NativeCsvExec, set it to false.true is definitely needed. For example: "1"- clear if this is needed, For example: "1"- seems to be necessary for cluster-local mode, For example: "1"-
spark.resources.discoveryPluginDetecting resources automatically. Set it to to enable it-
spark.[executor|driver] resources via file. Set it to /opt/spark/ to enable it or where ever your script is located- a precompiled directory- precompiled directory is not yet exist, then you can also specify a destination for on-demand compilation. If this is not specified, then a random temporary directory will be used (not removed, however).random temporary directory is to batch ColumnarBatch together, to allow for larger input sizes into the VE. This may however use more on-heap and off-heap memory.0 a coalesce into a single partition, trading it off for pre-sorting/pre-partitioning data by hashes of the group-by expressions-

For[0-?]. Please refer to the NEC C++ compiler guide