|
13 | 13 | "cell_type": "markdown",
|
14 | 14 | "source": [
|
15 | 15 | "In this Tutorial we will:\n",
|
16 |
| - " - create blob storage\n", |
17 |
| - " - connect to Exasol SaaS\n", |
18 |
| - " - load tables in AzureML\n", |
19 |
| - " - save tables to blob storage as csv files\n" |
| 16 | + " - Connect to Exasol SaaS from AzureML\n", |
| 17 | + " - Export Exasol tables to an Azure Blobstore Container\n", |
| 18 | + " - Create a Datastore" |
20 | 19 | ],
|
21 | 20 | "metadata": {
|
22 | 21 | "collapsed": false
|
|
26 | 25 | "cell_type": "markdown",
|
27 | 26 | "source": [
|
28 | 27 | "## Prerequisites\n",
|
29 |
| - " (..)" |
| 28 | + "\n", |
| 29 | + "You will need:\n", |
| 30 | + " - Your running Exasol Saas Cluster with your data loaded into it\n", |
| 31 | + " - Authentication information for your Exasol Saas Cluster\n", |
| 32 | + " - An AzureML account and Azure Storage account\n", |
| 33 | + " - AzureML set up with a:\n", |
| 34 | + " - Workspace\n", |
| 35 | + " - Compute instance" |
30 | 36 | ],
|
31 | 37 | "metadata": {
|
32 | 38 | "collapsed": false
|
|
35 | 41 | {
|
36 | 42 | "cell_type": "markdown",
|
37 | 43 | "source": [
|
38 |
| - "## Why blob storage is necessary\n", |
39 |
| - "(explanation)\n", |
40 |
| - "(word on data duplication?)" |
| 44 | + "## Why using Azure blobstorage is necessary\n", |
| 45 | + "\n", |
| 46 | + "In this tutorial we copy the data from an Exasol Saas database into an Azure Blobstorage Container. This is necessary because while AzureML has functionality to import directly from SQL databases, the Exasol SQL dialect is not supported by AzureML at the moment of writing.\n" |
41 | 47 | ],
|
42 | 48 | "metadata": {
|
43 | 49 | "collapsed": false
|
|
48 | 54 | "source": [
|
49 | 55 | "## AzureML setup\n",
|
50 | 56 | "\n",
|
51 |
| - "\n", |
52 |
| - " - create\n", |
53 |
| - " - resource group,\n", |
54 |
| - " - workspace,\n", |
55 |
| - " - launch azML studio\n", |
56 |
| - " - compute instance\n", |
57 |
| - " - notebook (this notebook) link your compute\n", |
58 |
| - " - in compute find public ip (if load data was run from inside azurML skip this)\n", |
59 |
| - " add to exasol saas like local above\n", |
60 |
| - "\n", |
61 |
| - "\"When you create a workspace, an Azure blob container and an Azure file share are automatically registered as datastores to the workspace. They're named workspaceblobstore and workspacefilestore, respectively. \"\"https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data\"\n", |
62 |
| - "can find in data in azureml and in azure portal storage account in container as \"azureml-blobstore-someID\"\n", |
63 |
| - "\n", |
64 |
| - "if you do not want to use default blobstore\n", |
65 |
| - " - create blob storage:\n", |
66 |
| - " - in azure portal of storage account:\n", |
67 |
| - " -> data storage -> containers -> + container -> in menu on left enter name -> create\n", |
68 |
| - "\n", |
69 |
| - " while here: -> navigate security & networking -> access keys : remember access key and storage account name\n", |
70 |
| - "\n", |
71 |
| - "\n", |
72 |
| - "\n", |
73 |
| - "create datastore in azureML :(this step might be moved to following tutorial or removed)\n", |
74 |
| - " - go to assets -> data on th left -> datastores -> create\n", |
75 |
| - " menu opens on right. ender:\n", |
76 |
| - " - datastore name\n", |
77 |
| - " - datastore type (we use blob storage because the offered SQL databases are not compatible with exasol atm.)\n", |
78 |
| - " - subscription if needed\n", |
79 |
| - " - your storage account from dropdown ( from step before) pic\n", |
80 |
| - " - newly created blob storage/blob container from step above\n", |
81 |
| - " - authentication info (key from step before)\n", |
82 |
| - " pic\n", |
83 |
| - "\n", |
84 |
| - "\n" |
| 57 | + "If you do not know how to set up your AzureML studio, please refer to the [AzureML documentation](https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources).\n", |
| 58 | + "Once you are set up with a workspace and compute instance, you can copy this notebook into your notebook files. Open it and select your compute instance in the drop-down menu at the top of your notebook. Now we can get started with connecting to the Exasol Saas cluster.\n" |
85 | 59 | ],
|
86 | 60 | "metadata": {
|
87 | 61 | "collapsed": false
|
|
90 | 64 | {
|
91 | 65 | "cell_type": "markdown",
|
92 | 66 | "source": [
|
93 |
| - "explanation (link to pyexasol)" |
| 67 | + "### Connect to Exasol Saas\n", |
| 68 | + "\n", |
| 69 | + "\n", |
| 70 | + "We are going to use the [PyExasol](https://docs.exasol.com/db/latest/connect_exasol/drivers/python/pyexasol.htm) package in order to connect to the Exasol database and read the data. First we need to install PyExasol using pip in your AzureML Compute." |
94 | 71 | ],
|
95 | 72 | "metadata": {
|
96 | 73 | "collapsed": false
|
|
113 | 90 | {
|
114 | 91 | "cell_type": "markdown",
|
115 | 92 | "source": [
|
116 |
| - "explanation" |
| 93 | + "Then we need to connect with PyExasol to our Exasol Saas Cluster with the data. Change these values to reflect your Cluster.\n", |
| 94 | + "We ask for 10 lines of our \"IDA.TEST\" table from the [Scania Trucks](https://archive.ics.uci.edu/ml/datasets/IDA2016Challenge) to check if our connection is working." |
117 | 95 | ],
|
118 | 96 | "metadata": {
|
119 | 97 | "collapsed": false
|
|
126 | 104 | "source": [
|
127 | 105 | "import pyexasol\n",
|
128 | 106 | "import pandas\n",
|
129 |
| - "EXASOL_HOST = \"your.clusters.exasol.com\" # change\n", |
130 |
| - "EXASOL_PORT = \"8563\" # change if needed\n", |
131 |
| - "EXASOL_USER = \"integration-team\" # change if needed\n", |
132 |
| - "EXASOL_PASSWORD = \"exa_pat_your_password\" #change\n", |
133 |
| - "EXASOL_SCHEMA = \"IDA\" # change if needed\n", |
| 107 | + "\n", |
| 108 | + "EXASOL_HOST = \"<your>.clusters.exasol.com\" # change\n", |
| 109 | + "EXASOL_PORT = \"8563\" # change if needed\n", |
| 110 | + "EXASOL_USER = \"<your-exasol-user>\" # change\n", |
| 111 | + "EXASOL_PASSWORD = \"exa_pat_<your_password>\" # change\n", |
| 112 | + "EXASOL_SCHEMA = \"IDA\" # change if needed\n", |
| 113 | + "\n", |
| 114 | + "# get the connection\n", |
134 | 115 | "EXASOL_CONNECTION = \"{host}:{port}\".format(host=EXASOL_HOST, port=EXASOL_PORT)\n",
|
135 | 116 | "exasol = pyexasol.connect(dsn=EXASOL_CONNECTION, user=EXASOL_USER, password=EXASOL_PASSWORD, compression=True)\n",
|
136 | 117 | "\n",
|
137 |
| - "# check if working\n", |
138 |
| - "data = exasol.export_to_pandas(\"SELECT * FROM TABLE IDA.TEST LIMIT 10\")\n", |
139 |
| - "print(data)" |
| 118 | + "# check if the connection is working\n", |
| 119 | + "exasol.export_to_pandas(\"SELECT * FROM TABLE IDA.TEST LIMIT 10\")" |
140 | 120 | ],
|
141 | 121 | "metadata": {
|
142 | 122 | "collapsed": false,
|
|
148 | 128 | {
|
149 | 129 | "cell_type": "markdown",
|
150 | 130 | "source": [
|
151 |
| - "explanation" |
| 131 | + "\n", |
| 132 | + "### Load data into AzureML Blobstore\n", |
| 133 | + "\n", |
| 134 | + "\n", |
| 135 | + "For this step, we need to access the Azure Storage Account. For that you need to insert your Azure storage account name and access key. To find your access key, in the Azure portal navigate to your storage account, and click on \"Access Keys\" under \"Security + networking\" and copy one of your access Keys.\n", |
| 136 | + "\n", |
| 137 | + "\n" |
152 | 138 | ],
|
153 | 139 | "metadata": {
|
154 | 140 | "collapsed": false
|
|
161 | 147 | "source": [
|
162 | 148 | "from azure.ai.ml.entities import AccountKeyConfiguration\n",
|
163 | 149 | "\n",
|
164 |
| - "my_storage_account_name = \"your_storage_account_name\"\n", |
165 |
| - "credentials= AccountKeyConfiguration(\n", |
166 |
| - " account_key=\"your_storage_account_key\"\n", |
167 |
| - " )" |
| 150 | + "my_storage_account_name = \"your_storage_account_name\" # change\n", |
| 151 | + "account_key=\"your_storage_account_key\" # change\n", |
| 152 | + "\n", |
| 153 | + "credentials= AccountKeyConfiguration(account_key)" |
168 | 154 | ],
|
169 | 155 | "metadata": {
|
170 | 156 | "collapsed": false,
|
|
176 | 162 | {
|
177 | 163 | "cell_type": "markdown",
|
178 | 164 | "source": [
|
179 |
| - "explanation\n", |
180 |
| - "mention small table size in example" |
| 165 | + "Lastly, we use an \"EXPORT TABLE\" command for each of our data tables to export them into a CSV file in our Blobstorage using \"INTO CSV AT CLOUD AZURE BLOBSTORAGE\". You can find [the domumentation for this export command](https://docs.exasol.com/db/latest/sql/export.htm) in the Exasol documentation.\n", |
| 166 | + "If you choose an existing \"azure_storage_container_name\", this command will save your files in this container. Otherwise, an azure storage container with that name will be created automatically.\n", |
| 167 | + "When you created your AzureML workspace, an Azure blob container was [created automatically](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data) and added as a Datastore named \"workspaceblobstore\" to your workspace. You can use it here and then scip the \"Create a Datastore\" step below if you want. For this you would need to find its name (\"azureml-blobstore-some-ID\") in the datastore info and insert it here." |
181 | 168 | ],
|
182 | 169 | "metadata": {
|
183 | 170 | "collapsed": false
|
|
188 | 175 | "execution_count": null,
|
189 | 176 | "outputs": [],
|
190 | 177 | "source": [
|
| 178 | + "azure_storage_container_name = \"your-container-name\" # change, remember to you might need to remove the \"_datastore\" suffix\n", |
| 179 | + "\n", |
191 | 180 | "for table in [\"TEST\", \"TRAIN\"]:\n",
|
| 181 | + " save_path = f'{azure_storage_container_name}/ida/{table}'\n", |
192 | 182 | " sql_export = f\"EXPORT TABLE IDA.{table} INTO CSV AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net'\"\\\n",
|
193 |
| - " f\"USER '{my_storage_account_name}' IDENTIFIED BY '{credentials.account_key}' FILE 'azureml-tutorial/ida/{table}'\"\n", |
194 |
| - " exasol.execute(sql_export)\n" |
| 183 | + " f\"USER '{my_storage_account_name}' IDENTIFIED BY '{credentials.account_key}' FILE '{save_path}'\"\n", |
| 184 | + " exasol.execute(sql_export)\n", |
| 185 | + " print(f\"Saved {table} in file {save_path}\")\n" |
195 | 186 | ],
|
196 | 187 | "metadata": {
|
197 | 188 | "collapsed": false,
|
|
203 | 194 | {
|
204 | 195 | "cell_type": "markdown",
|
205 | 196 | "source": [
|
206 |
| - "- Success!\n", |
207 |
| - "- pic of show tables in AzureML storage" |
| 197 | + "You can check the success of the command by navigating to your Container in the Azure portal using your Azure storage account.\n", |
| 198 | + "In the menu on the left, you can find \"Containers\" under \"Data Storage\". Find the container named \"your-container-name\" and click on it. Your files should be there.\n" |
| 199 | + ], |
| 200 | + "metadata": { |
| 201 | + "collapsed": false |
| 202 | + } |
| 203 | + }, |
| 204 | + { |
| 205 | + "cell_type": "markdown", |
| 206 | + "source": [ |
| 207 | + "### Create a Datastore\n", |
| 208 | + "\n", |
| 209 | + "We recommend that you create a connection between your Azure Storage Container and your AzureML Workspace. For this, enter your workspace in AzureML Studio and select \"Data\" under \"Assets\" in the menu on the left. Now select \"Datastores\" and click on \"+Create\".\n", |
| 210 | + "\n", |
| 211 | + "\n", |
| 212 | + "\n", |
| 213 | + "In the view that opens you need to enter the info for your datastore. Enter a name and select the type as \"Azure Blob Storage\". Then select your Azure subscription and the blob container we loaded the data into from the drop-down menu. Use Authentication type Account key and enter your Azure storage account access key. Click create." |
| 214 | + ], |
| 215 | + "metadata": { |
| 216 | + "collapsed": false |
| 217 | + } |
| 218 | + }, |
| 219 | + { |
| 220 | + "cell_type": "markdown", |
| 221 | + "source": [ |
| 222 | + "\n", |
| 223 | + "\n", |
| 224 | + "You can now see your data directly in AzureML by navigating to \"Datastores\" and clicking on <your_datastore_name> . If you then change into the \"Browse\" view you can open your files and have a look at them if you want.\n", |
| 225 | + "\n", |
| 226 | + "\n", |
| 227 | + "Great, we successfully connected to our Exasol Saas instance and loaded data from there into our Azure Blobstorage!\n", |
| 228 | + "\n", |
| 229 | + "Now we move on to [working with the data in AzureML and training a model on it](TrainModelInAzureML.ipynb)." |
208 | 230 | ],
|
209 | 231 | "metadata": {
|
210 | 232 | "collapsed": false
|
|
0 commit comments