Description
I am trying to evaluate reflow but getting stopped in my tracks at Quick Start. I am simply trying to run the hello world example:
$ cat hello.rf
val Main = exec(image := "ubuntu") (out file) {"
echo hello world >>{{out}}
"}
$
I initially assumed that local mode (Docker) would be the quickest. So I ran:
$ reflow run -local hello.rf
2022/12/11 15:36:37 localcluster Init requires taskdb.TaskDB: unspecified
$
This smells like an internal error (dependency injection failure). It is surprising that TaskDB is a hard dependency when in -local
mode. It contradicts the official documentation, which states that TaskDB is a soft dependency even for cluster mode.
Having given up on Docker, I fell back on the official EC2 quickstart from the README. The setup-ec2
/ setup-s3-repository
/ setup-dynamodb-assoc
trio worked fine. Unfortunely, reflow run
failed in a surprising way:
$ reflow run hello.rf
reflow: reflow runtime: ===== started =====
reflow: reflow version: 1.27.0 (go1.18.4)
reflow: run ID: 44898ba0
reflow: evaluating program /Users/me/reflow/hello.rf
(no params)
(no arguments)
reflow: Trace: none (since nopTracer is in use)
reflow: evaluating with configuration: scheduler *sched.Scheduler snapshotter blob.Mux repository *blobrepo.Repository,url=s3://masiunet-reflow-test/ assoc *dydbassoc.Assoc,TableName=masiunet-reflow-test flags nocache,norecomputeempty,topdown flowconfig hashv2 cachelookuptimeout 20m0s imagemap map[ubuntu:index.docker.io/library/ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea] dotwriter(*os.File)
reflow: (flow 3dca1cc0): reviseResources {mem:500.0MiB cpu:1 disk:0B}: resources {mem:500.0MiB cpu:1 disk:0B} are way higher than max {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
reflow: -> hello.Main 3dca1cc0 exec exec ..aec165018ef44a4d2d46c7cdea80a9dff0d1ea echo hello world >>{{out}}
reflow: hello.Main 3dca1cc0 /Users/me/reflow/hello.rf:1:16:
resources: {mem:500.0MiB cpu:1 disk:0B}
sha256:143d42326a7796eab8314a0030604c95e7afad1587ce681492f911b501b54db9
sha256:b5cf39692f785fbbbc9ac03dbc00c2bde0ff2076d0373724293f810b2f1276b3
sha256:3dca1cc06adb7b4a76dbc5a526c60ebed36ad8793b5a13cc6449c4c7ff329c8e
index.docker.io/library/ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea
command:
echo hello world >>{{out}}
where:
reflow: <- hello.Main 3dca1cc0 err exec 0s ?
error resources exhausted: requested resources {mem:500.0MiB cpu:1 disk:0B} not satisfiable even by largest available instance type x2iedn.32xlarge with resources {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
/Users/me/reflow/hello.rf:1:16
index.docker.io/library/ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea
command:
echo hello world >>{{out}}
where:
profile:
cpu mean=0.0 max=0.0 (N=0, duration=0s)
mem mean=0B max=0B (N=0, duration=0s)
disk mean=0B max=0B (N=0, duration=0s)
tmp mean=0B max=0B (N=0, duration=0s)
reflow: total n=1 time=0s
ident n ncache runtime(m) cpu mem(GiB) disk(GiB) tmp(GiB) requested
hello.Main 1 0
reflow: marking run done after nonrecoverable error resources exhausted: requested resources {mem:500.0MiB cpu:1 disk:0B} not satisfiable even by largest available instance type x2iedn.32xlarge with resources {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
reflow: resources exhausted: requested resources {mem:500.0MiB cpu:1 disk:0B} not satisfiable even by largest available instance type x2iedn.32xlarge with resources {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
$
The advertised mem:0B
looks suspicious but I have not looked deeper than that.
I tried a few older release builds but they all fail with the same error. If I go back far enough, I get a different error:
$ ~/Downloads/reflow1.13.0.darwin.amd64 run hello.rf
infra.Init: provider ec2cluster for type *ec2cluster.Cluster: missing AMI parameter
$
I was going to attempt some code fixups but here I encountered yet more trouble: the standard go install
workflow does not work:
$ ~/sdk/go1.19.3/bin/go install github.com/grailbio/reflow/cmd/reflow@latest
go: downloading github.com/grailbio/reflow v0.0.0-20221206232358-04b01f719b84
go: finding module for package github.com/grailbio/base/s3util
go: finding module for package github.com/grailbio/base/cloud/spotadvisor
go: finding module for package github.com/grailbio/base/cloud/spotfeed
go/pkg/mod/github.com/grailbio/[email protected]/ec2cluster/ec2cluster.go:33:2: module github.com/grailbio/base@latest found (v0.0.10), but does not contain package github.com/grailbio/base/cloud/spotadvisor
go/pkg/mod/github.com/grailbio/[email protected]/tool/cost.go:15:2: module github.com/grailbio/base@latest found (v0.0.10), but does not contain package github.com/grailbio/base/cloud/spotfeed
go/pkg/mod/github.com/grailbio/[email protected]/blob/s3blob/s3blob.go:27:2: module github.com/grailbio/base@latest found (v0.0.10), but does not contain package github.com/grailbio/base/s3util
$
My guess is that go.mod
is not being kept in sync with the internal Bazel repo...
I eventually managed to get it to build after a series of guesses around package upgrades and some local patching but by that point I lost any confidence that my local sandbox bears any resemblance to what upstream uses. Belatedly, I realized I maybe could have extracted an up-to-date go.mod
from the buildinfo metadata embedded in the released binaries but I ran out of time dedicated to this experiment.
Overall, a surprisingly poor experience for a project in its 1.x
life phase. It's a shame because the technology seems interesting.