Skip to content

Commit 83c5ed4

Browse files
V021 prep2 (#124)
* release 0.2.1 changes for release with PyPi build - on release github action to push to PyPi - doc updates - readme and related artifact changes - version update - tutorials update for use of PyPi install
1 parent 23a83f5 commit 83c5ed4

File tree

14 files changed

+220
-150
lines changed

14 files changed

+220
-150
lines changed

.github/workflows/onrelease.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: release
2+
3+
on:
4+
release:
5+
types:
6+
[published]
7+
8+
jobs:
9+
release:
10+
runs-on: ${{ matrix.os }}
11+
strategy:
12+
max-parallel: 1
13+
matrix:
14+
python-version: [ 3.8 ]
15+
os: [ ubuntu-latest ]
16+
17+
steps:
18+
- name: Checkout
19+
uses: actions/checkout@v2
20+
21+
- name: Unshallow
22+
run: git fetch --prune --unshallow
23+
24+
- name: Install
25+
run: pip install pipenv
26+
27+
- name: Build dist
28+
run: pipenv run python setup.py sdist bdist_wheel
29+
30+
- name: Run tests
31+
run: make test
32+
33+
- name: Publish a Python distribution to PyPI
34+
uses: pypa/gh-action-pypi-publish@release/v1
35+
with:
36+
user: __token__
37+
password: ${{ secrets.LABS_PYPI_TOKEN }}
38+
39+
40+

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ See the contents of the file `python/require.txt` to see the Python package depe
2121
* moved docs to docs folder
2222
* added support for specific distributions
2323
* renamed packaging to `dbldatagen`
24-
* moved Github repo to https://github.com/databrickslabs/dbldatagen/releases
24+
* Releases now available at https://github.com/databrickslabs/dbldatagen/releases
2525
* code tidy up and rename of options
2626
* added text generation plugin support for python functions and 3rd party libraries such as Faker
2727
* Use of data generator to generate static and streaming data sources in Databricks Delta Live Tables
28-
28+
* added support for install from PyPi

README.md

Lines changed: 47 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1-
# Databricks Labs Data Generator (`dbldatagen`)
1+
# Databricks Labs Data Generator (`dbldatagen`)
2+
3+
<!-- Top bar will be removed from PyPi packaged versions -->
4+
<!-- Dont remove: exclude package -->
5+
[Documentation](https://databrickslabs.github.io/dbldatagen/public_docs/index.html) |
26
[Release Notes](CHANGELOG.md) |
3-
[Python Wheel](https://github.com/databrickslabs/dbldatagen/releases/tag/v.0.2.0-rc1-master) |
4-
[Developer Docs](docs/USING_THE_APIS.md) |
57
[Examples](examples) |
68
[Tutorial](tutorial)
9+
<!-- Dont remove: end exclude package -->
710

811
[![build](https://github.com/databrickslabs/dbldatagen/workflows/build/badge.svg?branch=master)](https://github.com/databrickslabs/dbldatagen/actions?query=workflow%3Abuild+branch%3Amaster)
912
[![codecov](https://codecov.io/gh/databrickslabs/dbldatagen/branch/master/graph/badge.svg)](https://codecov.io/gh/databrickslabs/dbldatagen)
@@ -23,6 +26,7 @@ It has no dependencies on any libraries that are not already incuded in the Data
2326
runtime, and you can use it from Scala, R or other languages by defining
2427
a view over the generated data.
2528

29+
### Feature Summary
2630
It supports:
2731
* Generating synthetic data at scale up to billions of rows within minutes using appropriately sized clusters
2832
* Generating repeatable, predictable data supporting the needs for producing multiple tables, Change Data Capture,
@@ -43,16 +47,32 @@ used in other computations
4347
* plugin mechanism to allow use of 3rd party libraries such as Faker
4448
* Use of data generator to generate data sources in Databricks Delta Live Tables
4549

46-
Details of these features can be found in the [Developer Docs](docs/source/APIDOCS.md) and the online help
47-
(which contains the full documentation including the HTML version of the Developer Docs) -
48-
[Online Help](https://databrickslabs.github.io/dbldatagen/public_docs/index.html).
50+
Details of these features can be found in the online documentation -
51+
[online documentation](https://databrickslabs.github.io/dbldatagen/public_docs/index.html).
4952

53+
## Documentation
5054

55+
Please refer to the [online documentation](https://databrickslabs.github.io/dbldatagen/public_docs/index.html) for
56+
details of use and many examples.
5157

52-
## Project Support
53-
Please note that all projects in the `databrickslabs` github space are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.
58+
Release notes and details of the latest changes for this specific release
59+
can be found in the Github repository
60+
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.2.1/CHANGELOG.md)
61+
62+
# Installation
63+
64+
Use `pip install dbldatagen` to install the PyPi package
65+
66+
Within a Databricks notebook, invoke the following in a notebook cell
67+
```commandline
68+
%pip install dbdatagen
69+
```
70+
71+
This can be invoked within a Databricks notebook, a Delta Live Tables pipeline and even works on the Databricks
72+
community edition.
5473

55-
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.
74+
The documentation [installation notes](https://databrickslabs.github.io/dbldatagen/public_docs/installation_notes.html)
75+
contains details of installation using alternative mechanisms.
5676

5777
## Compatibility
5878
The Databricks Labs data generator framework can be used with Pyspark 3.x and Python 3.6 or later
@@ -65,23 +85,6 @@ release notes for library compatibility
6585

6686
- https://docs.databricks.com/release-notes/runtime/releases.html
6787

68-
## Using a pre-built release
69-
The release binaries can be accessed at:
70-
- Databricks Labs Github Data Generator releases - https://github.com/databrickslabs/dbldatagen/releases
71-
72-
You can install the library as a notebook scoped library when working within the Databricks
73-
notebook environment through the use of a `%pip install` cell in your notebook.
74-
75-
To install as a notebook-scoped library, create and execute a notebook cell with the following text:
76-
77-
> `%pip install git+https://github.com/databrickslabs/dbldatagen@current`
78-
79-
The `%pip install` method will work in Delta Live Tables pipelines and in the Databricks Community
80-
Environment also.
81-
82-
Alternatively, you can download a wheel file and install using the Databricks install mechanism to install a wheel based
83-
library into your workspace.
84-
8588
## Using the Data Generator
8689
To use the data generator, install the library using the `%pip install` method or install the Python wheel directly
8790
in your environment.
@@ -98,58 +101,39 @@ data_rows = 1000 * 1000
98101
df_spec = (dg.DataGenerator(spark, name="test_data_set1", rows=data_rows,
99102
partitions=4)
100103
.withIdOutput()
101-
.withColumn("r", FloatType(), expr="floor(rand() * 350) * (86400 + 3600)",
102-
numColumns=column_count)
104+
.withColumn("r", FloatType(),
105+
expr="floor(rand() * 350) * (86400 + 3600)",
106+
numColumns=column_count)
103107
.withColumn("code1", IntegerType(), minValue=100, maxValue=200)
104108
.withColumn("code2", IntegerType(), minValue=0, maxValue=10)
105109
.withColumn("code3", StringType(), values=['a', 'b', 'c'])
106-
.withColumn("code4", StringType(), values=['a', 'b', 'c'], random=True)
107-
.withColumn("code5", StringType(), values=['a', 'b', 'c'], random=True, weights=[9, 1, 1])
110+
.withColumn("code4", StringType(), values=['a', 'b', 'c'],
111+
random=True)
112+
.withColumn("code5", StringType(), values=['a', 'b', 'c'],
113+
random=True, weights=[9, 1, 1])
108114
109115
)
110116
111117
df = df_spec.build()
112118
num_rows=df.count()
113119
```
120+
Refer to the [online documentation](https://databrickslabs.github.io/dbldatagen/public_docs/index.html) for further
121+
examples.
114122

123+
The Github repository also contains further examples in the examples directory
115124

116-
# Building the code
117-
118-
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed build and testing instructions, including use of alternative
119-
build environments such as conda.
120-
121-
Dependencies are maintained by [Pipenv](https://pipenv.pypa.io/). In order to start with depelopment,
122-
you should install `pipenv` and `pyenv`.
123-
124-
Use `make test-with-html-report` to build and run the tests with a coverage report.
125-
126-
Use `make dist` to make the distributable. The resulting wheel file will be placed in the `dist` subdirectory.
127-
128-
## Creating the HTML documentation
129-
130-
Run `make docs` from the main project directory.
131-
132-
The main html document will be in the file (relative to the root of the build directory) `./python/docs/docs/build/html/index.html`
133-
134-
## Running unit tests
135-
136-
If using an environment with multiple Python versions, make sure to use virtual env or similar to pick up correct python versions.
125+
## Project Support
126+
Please note that all projects released under [`Databricks Labs`](https://www.databricks.com/learn/labs)
127+
are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements
128+
(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket
129+
relating to any issues arising from the use of these projects.
137130

138-
If necessary, set `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to correct versions of Python.
131+
Any issues discovered through the use of this project should be filed as issues on the Github Repo.
132+
They will be reviewed as time permits, but there are no formal SLAs for support.
139133

140-
Run `make test` from the main project directory to run the unit tests.
141134

142135
## Feedback
143136

144137
Issues with the application? Found a bug? Have a great idea for an addition?
145-
Feel free to file an issue.
146-
147-
## Project Support
148-
149-
Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are
150-
not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not
151-
make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use
152-
of these projects.
138+
Feel free to file an [issue](https://github.com/databrickslabs/dbldatagen/issues/new).
153139

154-
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will
155-
be reviewed as time permits, but there are no formal SLAs for support.

dbldatagen/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,5 @@ def get_version(version):
3333
return version_info
3434

3535

36-
__version__ = "0.2.0-rc1" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
36+
__version__ = "0.2.1" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
3737
__version_info__ = get_version(__version__)

docs/source/APIDOCS.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,8 @@ For example, at the time of writing, a billion row version of the IOT data set e
1919
can be generated and written to a Delta table in
2020
[under 2 minutes using a 12 node x 8 core cluster (using DBR 8.3)](#scaling-it-up)
2121

22-
> NOTE: The markup version of this document does not cover all of the classes and methods in the codebase.
23-
> For further information on classes and methods contained in these modules, and
24-
> to explore the python documentation for these modules, build the HTML documentation from
25-
> the main project directory using `make docs`. Use your browser to explore the documentation by
26-
> starting with the html file `./docs/build/html/index.html`
27-
>
28-
> If you are viewing the online help version of this document, the classes and methods are already included.
29-
22+
> NOTE: The markup version of this document does not cover all of the classes and methods in the codebase and some links
23+
> may not work. To see the documentation for the latest release, see the online documentation.
3024
3125
## General Overview
3226

@@ -54,13 +48,15 @@ and [formatting on string columns](textdata)
5448

5549
## Tutorials and examples
5650

57-
In the root directory of the project, there are a number of examples and tutorials.
51+
In the
52+
[Github project directory](https://github.com/databrickslabs/dbldatagen/tree/release/v0.2.1) ,
53+
there are a number of examples and tutorials.
5854

5955
The Python examples in the `examples` folder can be run directly or imported into the Databricks runtime environment
6056
as Python files.
6157

62-
The examples in the `tutorials` folder are in notebook export format and are intended to be imported into the Databricks
63-
runtime environment.
58+
The examples in the `tutorials` folder are in Databricks notebook export format and are intended to be imported
59+
into the Databricks workspace environment.
6460

6561
## Basic concepts
6662

@@ -99,24 +95,22 @@ There is also support for applying arbitrary SQL expressions, and generation of
9995
### Getting started
10096

10197
Before using the data generator, you need to install the package in your environment and import it in your code.
102-
You can install the package from the Github releases as a library on your cluster.
98+
You can install the package from PyPi as a library on your cluster.
10399

104100
> NOTE: When running in a Databricks notebook environment, you can install directly using
105101
> the `%pip` command in a notebook cell
106102
>
107103
> To install as a notebook scoped library, add a cell with the following text and execute it:
108104
>
109-
> `%pip install git+https://github.com/databrickslabs/dbldatagen@current`
105+
> `%pip install dbldatagen`
110106
111107
The `%pip install` method will work in the Databricks Community Environment and in Delta Live Tables pipelines also.
112108

113-
You can also manually download a wheel file from the releases and install it in your environment.
109+
You can find more details and alternative installation methods at [Installation notes](installation_notes)
114110

115-
The releases are located at
111+
The Github based releases are located at
116112
[Databricks Labs Data Generator releases](https://github.com/databrickslabs/dbldatagen/releases)
117113

118-
You can find more details at [Installation notes](installation_notes)
119-
120114
Once installed, import the framework in your Python code to use it.
121115

122116
For example:

0 commit comments

Comments
 (0)