Skip to content

Using private libraries in cloudml_train #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zamorarr opened this issue Jun 14, 2018 · 5 comments
Open

Using private libraries in cloudml_train #171

zamorarr opened this issue Jun 14, 2018 · 5 comments

Comments

@zamorarr
Copy link

Hi - thanks for this great package! Is there any way to use a private libraries in a training script sent to cloudml_train? For example in my train.R file

library(keras)
library(myownlib)

model <- keras_model(....

I'm hoping I would be able to point the training function to an archive stored on googleCloudStorage. For example:

cloudml_train("train-batters.r", mylibs = "gs://mybucket/rpkgs"))

With the idea that packrat would know to look there for libraries it couldn't find on CRAN.

Is this feasible? Happy to help out if I can.

@javierluraschi
Copy link
Contributor

@kevinushey might have some thoughts, in the meantime, since cloudml copies all the contents of your local directory, you could build your private library and save the .tar.gz file in the same folder where your train.R file is located. cloudml will upload this package for you, then you could try installing the package from source by adding to the header of train.R something like:

install.packages("myownlib.tar.gz", repos = NULL, type="source")

# .... your existing code ....

@zamorarr
Copy link
Author

zamorarr commented Jun 15, 2018 via email

@fmannhardt
Copy link
Contributor

fmannhardt commented May 8, 2019

Any news on this issue? I have tried to add the package and install as suggested, but get the error message:

 Unable to retrieve package records for the following packages:
myprivatepackage

Solved the issue with activating packrat for the project myself and following this post:
https://stackoverflow.com/questions/31314229/packrat-with-local-binary-repository

@fmannhardt
Copy link
Contributor

It turned out that by using packrat for dependency management, there are more problems created than solved. When used, all the packages would be uploaded each time to the Google Cloud slowing down everything quite a bit.

I solved the issue by allowing to add the private package to a IGNORE list that was empty so far. See the pull request for details.

@Z-ingdotnet
Copy link

@fmannhardt
sorry im running into similar issue but with common packages instead, for example CloudML failed to obtain packages lime,funModeling,latticeExtra and its dependencies returning error like

2020-02-09T09:30:49.407953977Z master-replica-0 Installing latticeExtra (0.6-29) ... I master-replica-0
2020-02-09T09:30:49.408627985Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0
2020-02-09T09:30:49.408884048Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0
2020-02-09T09:30:49.409107923Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0
2020-02-09T09:30:49.409305094Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0
2020-02-09T09:30:49.409548043Z master-replica-0 curl: (22) The requested URL returned error: 404 Not Found I master-replica-0
2020-02-09T09:30:49.409764050Z master-replica-0 FAILED I master-replica-0
2020-02-09T09:30:49.409967898Z master-replica-0 Error in getSourceForPkgRecord(pkgRecord, srcDir(project), availablePackagesSource(repos = repos), : I master-replica-0
undefined
2020-02-09T09:30:49.410171031Z master-replica-0 Failed to retrieve package sources for latticeExtra 0.6-29 from CRAN (internet connectivity issue?) I master-replica-0

I would have thought that cloudml only uploads stuff within the directory where the training script is located and will install the required R packages in the cloud.
Could you please enlighten me if this is how the CloudML package works or does it actually use packrat and upload libraries from my local machine as well?
Also how to resolve the common packages not accessible by CloudML in the cloud like error in the above. These are not private packages and readily available from CRAN so im stuck as to why CloudML package gives such error

many thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants