|
| 1 | +# Data Files |
| 2 | + |
| 3 | +PgOSM Fle will automatically manage downloads of the appropriate data and `.md5` |
| 4 | +files from the [Geofabrik download server](https://download.geofabrik.de/). |
| 5 | +When using the default behavior, PgOSM Flex will automatically start downloading |
| 6 | +the two necessary files: |
| 7 | + |
| 8 | +* `<region/subregion>-latest.osm.pbf` |
| 9 | +* `<region/subregion>-latest.osm.pbf.md5` |
| 10 | + |
| 11 | +The data path on the host machine is defined via the `docker run` command. This |
| 12 | +documentation always uses `~/pgosm-data` per the [quick start](quick-start.md). |
| 13 | + |
| 14 | +```bash |
| 15 | +docker run --name pgosm -d --rm \ |
| 16 | + -v ~/pgosm-data:/app/output \ |
| 17 | + ... |
| 18 | +``` |
| 19 | + |
| 20 | +> See the [Selecting Region and Sub-region](common-customization.md#selecting-region-and-subregion) |
| 21 | +> section for more about the default behavior. |
| 22 | +
|
| 23 | + |
| 24 | + |
| 25 | +There are two methods to override this default behavior: specify `--pgosm-date` |
| 26 | +or use `--input-file`. |
| 27 | +If you have manually saved files in the path used by PgOSM Flex using `-latest` |
| 28 | +in the filename, they **will be overwritten** if you are not using one of the |
| 29 | +methods described below. |
| 30 | + |
| 31 | + |
| 32 | +## Specific date with `--pgosm-date` |
| 33 | + |
| 34 | +Use `--pgosm-date` to specify a specific date for the data. The date specified |
| 35 | +must be in `yyyy-mm-dd` format. |
| 36 | +This mode requires you have a valid `.pbf` and matching `.md5` file in order to |
| 37 | +function. The following example shows the `docker exec` command along with |
| 38 | +a `--pgosm-date` defined. |
| 39 | + |
| 40 | +```bash |
| 41 | +docker exec -it \ |
| 42 | + pgosm python3 docker/pgosm_flex.py \ |
| 43 | + --ram=8 \ |
| 44 | + --region=north-america/us \ |
| 45 | + --subregion=district-of-columbia \ |
| 46 | + --pgosm-date=2024-05-14 |
| 47 | +``` |
| 48 | + |
| 49 | +The output from running should confirm it finds and uses the file with the |
| 50 | +specified date. |
| 51 | +Remember, the paths reported from Docker (`/app/output/`) report the |
| 52 | +container-internal path, not your local path on the host. |
| 53 | + |
| 54 | +```bash |
| 55 | +INFO:pgosm-flex:geofabrik:PBF File exists /app/output/district-of-columbia-2024-05-14.osm.pbf |
| 56 | +INFO:pgosm-flex:geofabrik:PBF & MD5 files exist. Download not needed |
| 57 | +INFO:pgosm-flex:geofabrik:Copying Archived files |
| 58 | +INFO:pgosm-flex:pgosm_flex:Running osm2pgsql |
| 59 | +``` |
| 60 | + |
| 61 | +If a date is specified without matching file(s) it will raise an error and exit. |
| 62 | + |
| 63 | +```bash |
| 64 | +ERROR:pgosm-flex:geofabrik:Missing PBF file for 2024-05-15. Cannot proceed. |
| 65 | +``` |
| 66 | + |
| 67 | + |
| 68 | +## Specific input file with `--input-file` |
| 69 | + |
| 70 | +The automatic Geofabrik download can be overridden by providing PgOSM Flex |
| 71 | +with the path to a valid `.osm.pbf` file using `--input-file`. |
| 72 | +This option overrides the default file handling, archiving, and MD5 |
| 73 | +checksum validation. With `--input-file` you can use a custom `osm.pbf` |
| 74 | +you created, or use it to simply remove the need for an internet connection |
| 75 | +from the instance running the processing. |
| 76 | + |
| 77 | +> Note: The `--region` option is always required, the `--subregion` option can be used with `--input-file` to put the information in the `subregion` column of `osm.pgosm_flex`. |
| 78 | +
|
| 79 | + |
| 80 | +### Small area / custom extract |
| 81 | + |
| 82 | +Some of the smallest subregions provided by Geofabrik are quite large compared |
| 83 | +to the area of interest for a project. |
| 84 | +The `osmium` tool makes it quick and easy to |
| 85 | +[extract a bounding box](https://docs.osmcode.org/osmium/latest/osmium-extract.html). |
| 86 | +The following example extracts an area roughly around Denver, Colorado. |
| 87 | +It takes about 3 seconds to extract the 3.2 MB `denver.osm.pbf` output from |
| 88 | +the 239 MB input. |
| 89 | + |
| 90 | +```bash |
| 91 | +osmium extract --bbox=-105.0193,39.7663,-104.9687,39.7323 \ |
| 92 | + -o denver.osm.pbf \ |
| 93 | + colorado-2023-04-18.osm.pbf |
| 94 | +``` |
| 95 | + |
| 96 | +The PgOSM Flex processing time for the smaller Denver region takes less than 20 seconds on a |
| 97 | +typical laptop, versus 11 minutes for all of Colorado. |
| 98 | + |
| 99 | +```bash |
| 100 | +docker exec -it \ |
| 101 | + pgosm python3 docker/pgosm_flex.py \ |
| 102 | + --ram=8 \ |
| 103 | + --region=custom \ |
| 104 | + --subregion=denver \ |
| 105 | + --input-file=denver.osm.pbf \ |
| 106 | + --layerset=everything |
| 107 | +``` |
0 commit comments