Skip to content

Commit 346c4c2

Browse files
Merge pull request #147 from rustprooflabs/bash-to-python
Rewrite Docker runtime in Python from Bash
2 parents 0d6772b + 36c5e01 commit 346c4c2

19 files changed

+1915
-503
lines changed

.dockerignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
tests/
2+
**/.coverage
3+
*.md
4+
.github
5+
.gitignore
6+
Makefile
7+
.dockerignore
8+
Dockerfile

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,4 @@
1-
tests/tmp/*
1+
tests/tmp/*
2+
output/
3+
**/.coverage
4+
**/__pycache__

DOCKER-BUILD.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@ docker build -t rustprooflabs/pgosm-flex .
1717

1818
Tag with version.
1919

20-
```
20+
```bash
2121
docker build -t rustprooflabs/pgosm-flex:0.1.2 .
2222
```
2323

2424
Push to Docker Hub.
2525

26-
```
26+
```bash
2727
docker push rustprooflabs/pgosm-flex:0.1.2
2828
docker push rustprooflabs/pgosm-flex:latest
2929
```

DOCKER-RUN.md

Lines changed: 75 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Using PgOSM-Flex within Docker
22

33
This README provides details about running PgOSM-Flex using the image defined
4-
in `Dockerfile` and the script loaded from `docker/run_pgosm_flex.sh`.
4+
in `Dockerfile` and the script loaded from `docker/pgosm_flex.py`.
55

66

77
## Setup and Run Container
@@ -12,6 +12,15 @@ Create directory for the `.osm.pbf` file and output `.sql` file.
1212
mkdir ~/pgosm-data
1313
```
1414

15+
16+
Set environment variables for the temporary Postgres connection in Docker.
17+
18+
```bash
19+
export POSTGRES_USER=postgres
20+
export POSTGRES_PASSWORD=mysecretpassword
21+
```
22+
23+
1524
Start the `pgosm` Docker container to make PostgreSQL/PostGIS available.
1625
This command exposes Postgres inside Docker on port 5433 and establishes links
1726
to the local directory created above (`~/pgosm-data`).
@@ -20,10 +29,10 @@ your the host machine's timezone, important when for archiving PBF & MD5 files b
2029

2130

2231
```bash
23-
docker run --name pgosm -d \
32+
docker run --name pgosm -d --rm \
2433
-v ~/pgosm-data:/app/output \
2534
-v /etc/localtime:/etc/localtime:ro \
26-
-e POSTGRES_PASSWORD=mysecretpassword \
35+
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
2736
-p 5433:5432 -d rustprooflabs/pgosm-flex
2837
```
2938

@@ -32,84 +41,103 @@ docker run --name pgosm -d \
3241
The following `docker exec` command runs PgOSM Flex to load the District of Columbia
3342
region
3443

35-
The command `bash docker/run_pgosm_flex.sh` runs the full process. The
44+
The command `python3 docker/pgosm_flex.py` runs the full process. The
3645
script uses a region (`north-america/us`) and sub-region (`district-of-columbia`)
3746
that must match values in URLs from the Geofabrik download server.
3847
The 3rd parameter tells the script the server has 8 GB RAM available for osm2pgsql, Postgres, and the OS. The PgOSM-Flex layer set is defined (`run-all`).
3948

4049

4150
```bash
4251
docker exec -it \
43-
-e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres \
44-
pgosm bash docker/run_pgosm_flex.sh \
45-
north-america/us \
46-
district-of-columbia \
47-
8 \
48-
run-all
52+
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD -e POSTGRES_USER=$POSTGRES_USER \
53+
pgosm python3 docker/pgosm_flex.py \
54+
--layerset=run-all --ram=8 \
55+
--region=north-america/us \
56+
--subregion=district-of-columbia
4957
```
5058

5159

5260
## Customize PgOSM-Flex
5361

54-
The following command sets the four (4) main env vars used to customize PgOSM-Flex.
55-
56-
* `PGOSM_SRID` - Set custom SRID, must be in `public.spatial_ref_sys`. Default `3857`
57-
* `PGOSM_DATA_SCHEMA_NAME` - Final schema name for the OpenStreetMap data. Default `osm`
58-
* `PGOSM_DATA_SCHEMA_ONLY` - When `false` (default) the QGIS styles and `pgosm` schema are exported along with the `PGOSM_DATA_SCHEMA_NAME` schema
59-
* `PGOSM_DATE` - Used to document data loaded to DB in `osm.pgosm_flex.pgosm_date`, and for archiving PBF/MD5 files. Defaults to today.
60-
* `PGOSM_LANGUAGE` - Used to prefer specific language when it exists.
61-
62+
See full set of options via `--help`.
6263

6364
```bash
64-
docker exec -it \
65-
-e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres \
66-
-e PGOSM_SRID=4326 \
67-
-e PGOSM_DATA_SCHEMA_ONLY=true \
68-
-e PGOSM_DATA_SCHEMA_NAME=osm_dc \
69-
-e PGOSM_DATE='2021-03-11' \
70-
-e PGOSM_LANGUAGE=en \
71-
pgosm bash docker/run_pgosm_flex.sh \
72-
north-america/us \
73-
district-of-columbia \
74-
8 \
75-
run-all
65+
docker exec -it pgosm python3 docker/pgosm_flex.py --help
7666
```
7767

78-
## Skip nested polygon calculation
68+
```bash
69+
Usage: pgosm_flex.py [OPTIONS]
70+
71+
Logic to run PgOSM Flex within Docker.
72+
73+
Options:
74+
--layerset TEXT Layer set from PgOSM Flex to load. e.g. run-all
75+
[default: (run-all);required]
76+
--ram INTEGER Amount of RAM in GB available on the server running this
77+
process. [default: 4;required]
78+
--region TEXT Region name matching the filename for data sourced from
79+
Geofabrik. e.g. north-america/us [default: (north-
80+
america/us);required]
81+
--subregion TEXT Sub-region name matching the filename for data sourced
82+
from Geofabrik. e.g. district-of-columbia [default:
83+
(district-of-columbia)]
84+
--srid TEXT SRID for data in PostGIS.
85+
--pgosm-date TEXT Date of the data in YYYY-MM-DD format. Set to historic
86+
date to load locally archived PBF/MD5 file, will fail if
87+
both files do not exist.
88+
--language TEXT Set default language in loaded OpenStreetMap data when
89+
available. e.g. 'en' or 'kn'.
90+
--schema-name TEXT Coming soon
91+
--skip-nested When True, skips calculating nested admin polygons. Can
92+
be time consuming on large regions.
93+
--data-only When True, skips running Sqitch and importing QGIS
94+
Styles.
95+
--debug Enables additional log output
96+
--basepath TEXT Debugging option. Used when testing locally and not
97+
within Docker
98+
--help Show this message and exit.
99+
```
79100
80-
The default is to run the nested polygon calculation. This can take considerable time on larger regions or may
81-
be otherwise unwanted. Define the env var `PGOSM_SKIP_NESTED_POLYGON` with the `docker exec` command
82-
to skip this process.
101+
An example of running with all current options, except `--basepath` which is only
102+
used during development.
83103
84104
```bash
85-
-e PGOSM_SKIP_NESTED_POLYGON=anything
105+
docker exec -it \
106+
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD -e POSTGRES_USER=$POSTGRES_USER \
107+
pgosm python3 docker/pgosm_flex.py \
108+
--layerset=run-all \
109+
--ram=8 \
110+
--region=north-america/us \
111+
--subregion=district-of-columbia \
112+
--schema-name=osm_dc \
113+
--pgosm-date="2021-03-11" \
114+
--language="en" \
115+
--srid="4326" \
116+
--data-only \
117+
--skip-nested \
118+
--debug
86119
```
87120
88121
89-
## Always download
122+
## Skip nested polygon calculation
90123
91-
To force the processing to remove existing files and re-download the latest PBF and MD5 files from Geofabrik, set the `PGOSM_ALWAYS_DOWNLOAD` env var when running the Docker container.
124+
Use `--skip-nested` to bypass the calculation of nested admin polygons.
125+
126+
The default is to run the nested polygon calculation. This can take considerable time on larger regions or may
127+
be otherwise unwanted.
92128
93-
```bash
94-
docker run --name pgosm -d \
95-
-v ~/pgosm-data:/app/output \
96-
-e POSTGRES_PASSWORD=mysecretpassword \
97-
-e PGOSM_ALWAYS_DOWNLOAD=1 \
98-
-p 5433:5432 -d rustprooflabs/pgosm
99-
```
100129
101130
## Configure Postgres in Docker
102131
103132
Add customizations with the `-c` switch, e.g. `-c shared_buffers=1GB`,
104133
to customize Postgres' configuration at run-time in Docker.
105134
106135
107-
108136
```bash
109-
docker run --name pgosm -d \
137+
docker run --name pgosm -d --rm \
110138
-v ~/pgosm-data:/app/output \
111139
-v /etc/localtime:/etc/localtime:ro \
112-
-e POSTGRES_PASSWORD=mysecretpassword \
140+
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
113141
-p 5433:5432 -d rustprooflabs/pgosm-flex \
114142
-c shared_buffers=1GB \
115143
-c maintenance_work_mem=1GB \

Dockerfile

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,20 @@ RUN apt-get update \
99
libboost-dev libboost-system-dev \
1010
libboost-filesystem-dev libexpat1-dev zlib1g-dev \
1111
libbz2-dev libpq-dev libproj-dev lua5.2 liblua5.2-dev \
12-
python3 python3-distutils \
12+
python3 python3-distutils python3-psycopg2 \
1313
curl \
1414
&& rm -rf /var/lib/apt/lists/*
1515

1616
RUN curl -o /tmp/get-pip.py https://bootstrap.pypa.io/get-pip.py \
1717
&& python3 /tmp/get-pip.py \
1818
&& rm /tmp/get-pip.py
1919

20-
RUN pip install requests click
21-
22-
2320
WORKDIR /tmp
2421
RUN git clone git://github.com/openstreetmap/osm2pgsql.git \
2522
&& mkdir osm2pgsql/build \
2623
&& cd osm2pgsql/build \
2724
&& cmake .. \
28-
&& make \
25+
&& make -j$(nproc) \
2926
&& make install \
3027
&& apt remove -y \
3128
make cmake g++ \
@@ -36,6 +33,9 @@ RUN git clone git://github.com/openstreetmap/osm2pgsql.git \
3633
&& apt autoremove -y \
3734
&& cd /tmp && rm -r /tmp/osm2pgsql
3835

36+
COPY ./sqitch.conf /etc/sqitch/sqitch.conf
3937

4038
WORKDIR /app
4139
COPY . ./
40+
41+
RUN pip install -r requirements.txt

MANUAL-STEPS-RUN.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
These instructions show how to manually run the PgOSM-Flex process.
44
This is the best option for scaling to larger regions (North America, Europe, etc.)
55
due to the need to customize a number of configurations. Review the
6-
`docker/run_pgosm_flex.sh` for a starting point to automating the process.
6+
`python3 docker/pgosm_flex.py` for a starting point to automating the process.
77

88
This basic working example uses Washington D.C. for a small, fast test of the
99
process.
@@ -253,3 +253,47 @@ psql -d pgosm -f data/roads-us.sql
253253

254254
Currently only U.S. region drafted, more regions with local `maxspeed` are welcome via PR!
255255

256+
257+
## Customize PgOSM Flex
258+
259+
Track additional details in the `osm.pgosm_meta` table (see more below)
260+
and customize behavior with the use of environment variables.
261+
262+
* `OSM_DATE`
263+
* `PGOSM_SRID`
264+
* `PGOSM_REGION`
265+
* `PGOSM_LANGUAGE`
266+
267+
268+
### Custom SRID
269+
270+
To use `SRID 4326` instead of the default `SRID 3857`, set the `PGOSM_SRID`
271+
environment variable before running osm2pgsql.
272+
273+
```bash
274+
export PGOSM_SRID=4326
275+
```
276+
277+
Changes to the SRID are reflected in output printed.
278+
279+
```bash
280+
2021-01-08 15:01:15 osm2pgsql version 1.4.0 (1.4.0-72-gc3eb0fb6)
281+
2021-01-08 15:01:15 Database version: 13.1 (Ubuntu 13.1-1.pgdg20.10+1)
282+
2021-01-08 15:01:15 Node-cache: cache=800MB, maxblocks=12800*65536, allocation method=11
283+
Custom SRID: 4326
284+
...
285+
```
286+
287+
### Preferred Language
288+
289+
The `name` column throughout PgOSM-Flex defaults to using the highest priority
290+
name tag according to the [OSM Wiki](https://wiki.openstreetmap.org/wiki/Names). Setting `PGOSM_LANGUAGE` allows giving preference to name tags with the
291+
given language.
292+
The value of `PGOSM_LANGUAGE` should match the codes used by OSM:
293+
294+
> where code is a lowercase language's ISO 639-1 alpha2 code, or a lowercase ISO 639-2 code if an ISO 639-1 code doesn't exist." -- [Multilingual names on OSM Wiki](https://wiki.openstreetmap.org/wiki/Multilingual_names)
295+
296+
297+
```bash
298+
export PGOSM_LANGUAGE=kn
299+
```

Makefile

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
## ----------------------------------------------------------------------
2+
## This Makefile builds and tests the PgOSM Flex Docker image.
3+
##
4+
## For full build/test use:
5+
## make
6+
##
7+
## To cleanup after you are done:
8+
## make docker-clean
9+
## ----------------------------------------------------------------------
10+
CURRENT_UID := $(shell id -u)
11+
CURRENT_GID := $(shell id -g)
12+
TODAY := $(shell date +'%Y-%m-%d')
13+
14+
.PHONY: all
15+
all: docker-clean build-run-docker unit-tests
16+
17+
.PHONY: docker-clean
18+
docker-clean: ## Stops pgosm Docker container and removes local pgosm-data directory
19+
@docker stop pgosm > /dev/null 2>&1 && echo "pgosm container removed"|| echo "pgosm container not present, nothing to remove"
20+
rm -rvf pgosm-data|| echo "folder pgosm-data did not exist"
21+
22+
23+
.PHONY: build-run-docker
24+
build-run-docker: ## Builds and runs PgOSM Flex with D.C. test file
25+
docker build -t rustprooflabs/pgosm-flex .
26+
docker run --name pgosm \
27+
--rm \
28+
-v $(shell pwd)/pgosm-data:/app/output \
29+
-v /etc/localtime:/etc/localtime:ro \
30+
-e POSTGRES_PASSWORD=mysecretpassword \
31+
-p 5433:5432 \
32+
-d \
33+
rustprooflabs/pgosm-flex
34+
# copy the test data pretending it's latest to avoid downloading each time
35+
docker cp tests/data/district-of-columbia-2021-01-13.osm.pbf \
36+
pgosm:/app/output/district-of-columbia-$(TODAY).osm.pbf
37+
docker cp tests/data/district-of-columbia-2021-01-13.osm.pbf.md5 \
38+
pgosm:/app/output/district-of-columbia-$(TODAY).osm.pbf.md5
39+
40+
# allow files created in later step to be created
41+
docker exec -it pgosm \
42+
chown $(CURRENT_UID):$(CURRENT_GID) /app/output/
43+
# Needed for unit-tests
44+
docker exec -it pgosm \
45+
chown $(CURRENT_UID):$(CURRENT_GID) /app/docker/
46+
47+
docker exec -it \
48+
-e POSTGRES_PASSWORD=mysecretpassword \
49+
-e POSTGRES_USER=postgres \
50+
-u $(CURRENT_UID):$(CURRENT_GID) \
51+
pgosm python3 docker/pgosm_flex.py \
52+
--layerset=run-all \
53+
--ram=1 \
54+
--region=north-america/us \
55+
--subregion=district-of-columbia \
56+
--debug
57+
58+
59+
.PHONY: unit-tests
60+
unit-tests: ## Runs Python unit tests and data import tests
61+
# Unit tests covering Python runtime
62+
docker exec -it \
63+
-e POSTGRES_PASSWORD=mysecretpassword \
64+
-e POSTGRES_USER=postgres \
65+
-u $(CURRENT_UID):$(CURRENT_GID) \
66+
pgosm /bin/bash -c "cd docker && coverage run -m unittest tests/*.py"
67+
68+
# Data import tests
69+
docker cp tests \
70+
pgosm:/app/tests
71+
docker exec -it pgosm \
72+
chown $(CURRENT_UID):$(CURRENT_GID) /app/tests/
73+
74+
# Detailed results from tests currently get buried in the Docker container
75+
# Error such as: FAILED TEST: sql/amenity_point_osm_type_count.sql - See tmp/amenity_point_osm_type_count.diff
76+
# Use command (changing file at end): docker exec -it -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres pgosm /bin/cat /app/tests/tmp/amenity_point_osm_type_count.diff
77+
docker exec -it \
78+
-e POSTGRES_PASSWORD=mysecretpassword \
79+
-e POSTGRES_USER=postgres \
80+
-u $(CURRENT_UID):$(CURRENT_GID) \
81+
pgosm /bin/bash -c "cd tests && ./run-output-tests.sh"
82+
83+
84+
help: ## Show this help
85+
@egrep -h '\s##\s' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m %-30s\033[0m %s\n", $$1, $$2}'

0 commit comments

Comments
 (0)