Helloo, I am trying to create a flow for performi...
# ask-metaflow
l
Helloo, I am trying to create a flow for performing a feature engineering task. The feature engineering is purely done in PySpark and I was wondering what the best way to go about using Spark in metaflow is? Is the experimental spark decorator https://github.com/outerbounds/metaflow-pyspark the best approach to this? Any advice would be greatly appreciated simple smile
1
s
that would be a good starting point. the experimental decorator works with serverless emr but it wouldn't be too hard to extend it to your flavor of spark
h
Another option could be to use Sagemaker but you will be responsible for packaging up the dependencies yourself. Not that the experimental decorator does it either, but I have a PR on that repo showing one way to do it https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.spark.processing.PySparkProcessor