fancy-mouse-14245
07/28/2023, 8:57 AM0.data.json
is present with correspondence to the task-ID of the step. That file contain key called objects, Those objects are following:
"name": "941c65aaf9cb5bfc65491038c316b2cd35cc1313",
"_transition": "c1b945a8d07ea84f63e82ce28d16b2b2e7b90498",
"_task_ok": "69e77141c3eb7a8c9ce864251d70c02723f29332",
"_success": "69e77141c3eb7a8c9ce864251d70c02723f29332",
"_graph_info": "80e79fc09f06dd1b03a1f180d49e82297fcb8c7f",
"_foreach_stack": "4ca058df2ea422cca260c585409d6ac9face7ebe",
"status_dict": "89f8475b8276fe53f69afbb15a963015d9cc0016",
"base_path": "e9d9627102ec5832fc63c3d4e3d5853e1bb6fb9b",
"_foreach_var": "f3627f46179fdd95bf0e83101840fd1d71b60e40",
"_foreach_num_splits": "f3627f46179fdd95bf0e83101840fd1d71b60e40",
"_exception": "f3627f46179fdd95bf0e83101840fd1d71b60e40",
"_current_step": "01b168d0b2c70f906b295772af9efb98b0797cae"
What are these values associated with this keys?
Under the key info
of the same file, we have something like,
"status_dict": {
"size": 80,
"type": "<class 'dict'>",
"encoding": "gzip+pickle-v2"
}
how are we calculating the value of size?
2. Are real values associated with the variables used in a step are stored in rds?victorious-lawyer-58417
07/28/2023, 6:20 PMself.
), i.e. the inputs and the outputs of each task.victorious-lawyer-58417
07/28/2023, 6:21 PMvictorious-lawyer-58417
07/28/2023, 6:25 PMcool-father-88039
08/01/2023, 1:35 PMcool-father-88039
08/01/2023, 2:44 PMMETAFLOW_BUCKET = "my-bucket"
def download_s3_artifact(s3_path: S3Path) -> Any:
assert s3_path.exists(), f"{s3_path} does not exist"
subprocess.run(['aws', 's3', 'cp', str(s3_path), f'/tmp/{s3_path.stem}.pkl'], check=True)
with gzip.open(f'/tmp/{s3_path.stem}.pkl', 'rb') as f:
data = pickle.load(f)
return data
def download_run_artifact(step_name: str, artifact_name: str) -> Any:
step_dir = S3Path(f"s3://{METAFLOW_BUCKET}/{step_name}")
assert step_dir.exists(), f"step {step_name} does not exist"
task_dir = list(step_dir.iterdir())[0]
data_path = task_dir / "0.data.json"
with data_path.open("r") as f:
metadata = json.load(f)
key = metadata['objects'][artifact_name]
artifact_path = S3Path(f"s3://{METAFLOW_BUCKET}/{step_name.split('/')[0]}/data/{key[:2]}/{key}")
data = download_s3_artifact(artifact_path)
return data
step_name = 'FlowName/RunName/StepName'
data = download_run_artifact(step_name=step_name, artifact_name='dataset')
fancy-mouse-14245
08/02/2023, 6:56 AM.metaflow/flowname/data/
contains all the data artifacts of the flow?fancy-mouse-14245
08/02/2023, 6:57 AMvictorious-lawyer-58417
08/02/2023, 7:04 AMfancy-mouse-14245
08/02/2023, 7:05 AM