AzureDSVM: a new R package for elastic use of the Azure Data Science Virtual Machine
by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft)
The Azure Data Science Virtual Machine (DSVM) is a curated VM which provides commonly-used tools and software for data science and machine learning, pre-installed. AzureDSVM is a new R package that enables seamless interaction with the DSVM from a local R session, by providing functions for the following tasks:
- Deployment, deallocation, deletion of one or multiple DSVMs;
- Remote execution of local R scripts: compute contexts available in Microsoft R Server can be enabled for enhanced computation efficiency for either a single DSVM or a cluster of DSVMs;
- Retrieval of cost consumption and total expense spent on using DSVM(s).
To install AzureDSVM with devtools package:
library(devtools) devtools::install_github("Azure/AzureDSVM") library("AzureDSVM")
When deploying a Data Science Virtual Machine, the machine name, size, OS type, etc. must be specified. AzureDSVM supports DSVMs on Ubuntu, CentOS, Windows, and Windows with the Deep Learning Toolkit (on GPU-class instances). For example, the following code fires up a D4 v2 Ubuntu DSVM located in South East Asia:
deployDSVM(context, resource.group="example", location="southeastasia", size="Standard_D4_v2", os="Ubuntu", hostname="mydsvm", username="myname", pubkey="pubkey")
context is an
azureActiveContext object created by
AzureSMR::createAzureContext() function that encapsulates credentials (Tenant ID, Client ID, etc.) for Azure authentication.
In addition to launching a single DSVM, the AzureDSVM package makes it easy to launch a cluster with multiple virtual machines. Multi-deployment supports:
- creating a collection of independent DSVMs which can be distributed to a group of data scientists for collaborative projects, as well as
- clustering a set of connected DSVMs for high-performance computation.
To create a cluster of 5 Ubuntu DSVMs with default VM size, use:
cluster<-deployDSVMCluster(context, resource.group=RG, location="southeastasia", hostnames="mydsvm", usernames="myname", pubkeys="pubkey", count=5)
To execute a local script on remote cluster of DSVMs with a specified Microsoft R Server compute context, use the
executeScript function. (NOTE: only Linux-based DSVM instances are supported at the moment as underneath the remote execution is achieved via SSH. Microsoft R Server 9.x allows remote interaction for both Linux and Windows, and more details can be found here.) Here, we use the
RxForeachDoPar context (as indicated by the
executeScript(context, resource.group="southeastasia", machines="dsvm_names_in_the_cluster", remote="fqdn_of_dsvm_used_as_master", user="myname", script="path_to_the_script_for_remote_execution", master="fqdn_of_dsvm_used_as_master", slaves="fqdns_of_dsvms_used_as_slaves", compute.context="clusterParallel")
Information of cost consumption and expense spent on DSVMs can be retrieved with:
consum<-expenseCalculator(context, instance="mydsvm", time.start="time_stamp_of_starting_point", time.end="time_stamp_of_ending_point", granularity="Daily", currency="USD", locale="en-US", offerId="offer_id_of_azure_subscription", region="southeastasia") print(consum)
Detailed introductions and tutorials can be found in the AzureDSVM Github repository, linked below.
Github (Azure): AzureDSVM
via Revolutions http://ift.tt/ImDvyF
May 19, 2017 at 04:30PM