111 lines
4.4 KiB
Markdown
111 lines
4.4 KiB
Markdown
# Demo notebooks and scripts for EPOS AI Platform
|
|
|
|
|
|
This repo contains notebooks and scripts demonstrating how to:
|
|
- Prepare IGF data for training a seisbench model detecting P phase (i.e. transform mseeds into [SeisBench data format](https://seisbench.readthedocs.io/en/stable/pages/data_format.html)), check the [notebook](utils/Transforming%20mseeds%20to%20SeisBench%20dataset.ipynb).
|
|
|
|
- Explore available data, check the [notebook](notebooks/Explore%20igf%20data.ipynb)
|
|
- Train various cnn models available in seisbench library and compare their performance of detecting P phase, check the [script](scripts/pipeline.py)
|
|
|
|
- [to update] Validate model performance, check the [notebook](notebooks/Check%20model%20performance%20depending%20on%20station-random%20window.ipynb)
|
|
- [to update] Use model for detecting P phase, check the [notebook](notebooks/Present%20model%20predictions.ipynb)
|
|
|
|
|
|
### Acknowledgments
|
|
This code is based on the [pick-benchmark](https://github.com/seisbench/pick-benchmark), the repository accompanying the paper:
|
|
[Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers](https://doi.org/10.1029/2021JB023499)
|
|
|
|
### Installation method 1
|
|
|
|
Please download and install [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge) following the [official guide](https://github.com/conda-forge/miniforge#install).
|
|
|
|
After successful installation and within the Mambaforge environment please clone this repository:
|
|
|
|
```
|
|
git clone ssh://git@git.plgrid.pl:7999/eai/platform-demo-scripts.git
|
|
```
|
|
and please run for Linux or Windows platforms:
|
|
|
|
```
|
|
cd platform-demo-scripts
|
|
mambaforge env create -f epos-ai-train.yml
|
|
```
|
|
or for OSX:
|
|
```
|
|
cd platform-demo-scripts
|
|
mambaforge env create -f epos-ai-train-osx.yml
|
|
```
|
|
|
|
This will create a conda environment named `platform-demo-scripts` with all required packages installed.
|
|
|
|
To run the notebooks and scripts from this repository it is necessary to activate the `platform-demo-scripts` environment by running:
|
|
|
|
```
|
|
conda activate platform-demo-scripts
|
|
```
|
|
|
|
### Installation method 2
|
|
|
|
Please [install Poetry](https://python-poetry.org/docs/#installation), a tool for dependency management and packaging in Python.
|
|
Then we will use only Poetry for creating Python environment and installing dependencies.
|
|
|
|
Install all dependencies with poetry, run:
|
|
|
|
```
|
|
poetry install
|
|
```
|
|
|
|
To run the notebooks and scripts from this repository it is necessary to activate the poetry environment by running:
|
|
|
|
```
|
|
poetry shell
|
|
```
|
|
|
|
### Usage
|
|
|
|
1. Prepare .env file with content:
|
|
```
|
|
WANDB_HOST="https://epos-ai.grid.cyfronet.pl/"
|
|
WANDB_API_KEY="your key"
|
|
WANDB_USER="your user"
|
|
WANDB_PROJECT="training_seisbench_models_on_igf_data"
|
|
BENCHMARK_DEFAULT_WORKER=2
|
|
|
|
2. Transform data into seisbench format. (unofficial)
|
|
* Download original data from the [drive](https://drive.google.com/drive/folders/1InVI9DLaD7gdzraM2jMzeIrtiBSu-UIK?usp=drive_link)
|
|
* Run the notebook: `utils/Transforming mseeds to SeisBench dataset.ipynb`
|
|
|
|
3. Run the pipeline script:
|
|
|
|
`python pipeline.py`
|
|
|
|
The script performs the following steps:
|
|
* Generates evaluation targets
|
|
* Trains multiple versions of GPD, PhaseNet and ... models to find the best hyperparameters, producing the lowest validation loss.
|
|
This step utilizes the Weights & Biases platform to perform the hyperparameters search (called sweeping) and track the training process and store the results.
|
|
The results are available at
|
|
`https://epos-ai.grid.cyfronet.pl/<your user name>/<your project name>`
|
|
* Uses the best performing model of each type to generate predictions
|
|
* Evaluates the performance of each model by comparing the predictions with the evaluation targets
|
|
* Saves the results in the `scripts/pred` directory
|
|
*
|
|
The default settings are saved in config.json file. To change the settings, edit the config.json file or pass the new settings as arguments to the script.
|
|
For example, to change the sweep configuration file for GPD model, run:
|
|
`python pipeline.py --gpd_config <new config file>`
|
|
The new config file should be placed in the `experiments` or as specified in the `configs_path` parameter in the config.json file.
|
|
|
|
### Troubleshooting
|
|
|
|
* `wandb: ERROR Run .. errored: OSError(24, 'Too many open files')`
|
|
-> https://github.com/wandb/wandb/issues/2825
|
|
|
|
### Licence
|
|
|
|
TODO
|
|
|
|
### Copyright
|
|
|
|
Copyright © 2023 ACK Cyfronet AGH, Poland.
|
|
|
|
This work was partially funded by EPOS Project funded in frame of PL-POIR4.2
|