platform-demo-scripts/README.md

# Demo notebooks and scripts for EPOS AI Platform


This repo contains notebooks and scripts demonstrating how to:
- Prepare IGF data for training a seisbench model detecting P phase (i.e. transform mseeds into [SeisBench data format](https://seisbench.readthedocs.io/en/stable/pages/data_format.html)), check the [notebook](utils/Transforming%20mseeds%20to%20SeisBench%20dataset.ipynb).

- Explore available data, check the [notebook](notebooks/Explore%20igf%20data.ipynb)
- Train various cnn models available in seisbench library and compare their performance of detecting P phase, check the [script](scripts/pipeline.py) 
  
- [to update] Validate model performance, check the [notebook](notebooks/Check%20model%20performance%20depending%20on%20station-random%20window.ipynb)
- [to update] Use model for detecting P phase, check the [notebook](notebooks/Present%20model%20predictions.ipynb)


### Acknowledgments
This code is based on the [pick-benchmark](https://github.com/seisbench/pick-benchmark), the repository accompanying the paper:
[Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers](https://github.com/seisbench/pick-benchmark#:~:text=Which%20picker%20fits%20my%20data%3F%20A%20quantitative%20evaluation%20of%20deep%20learning%20based%20seismic%20pickers)
### Usage

1. Install all dependencies with poetry, run: 

   `poetry install`
2. Prepare .env file with content:
   ```
   WANDB_HOST="https://epos-ai.grid.cyfronet.pl/"
   WANDB_API_KEY="your key"
   WANDB_USER="your user"
   WANDB_PROJECT="training_seisbench_models_on_igf_data"
   BENCHMARK_DEFAULT_WORKER=2

3. Transform data into seisbench format.
   * Download original data from the [drive](https://drive.google.com/drive/folders/1InVI9DLaD7gdzraM2jMzeIrtiBSu-UIK?usp=drive_link)
   * Run the notebook: `utils/Transforming mseeds to SeisBench dataset.ipynb`

4. Initialize poetry environment:

   `poetry shell`

5. Run the pipeline script:

   `python pipeline.py`

   The script performs the following steps:
   * Generates evaluation targets
     * Trains multiple versions of GPD, PhaseNet and ... models to find the best hyperparameters, producing the lowest validation loss.
     This step utilizes the Weights & Biases platform to perform the hyperparameters search (called sweeping) and track the training process and store the results.
     The results are available at   
     `https://epos-ai.grid.cyfronet.pl/<your user name>/<your project name>`
   * Uses the best performing model of each type to generate predictions
   * Evaluates the performance of each model by comparing the predictions with the evaluation targets
   * Saves the results in the `scripts/pred` directory  
   * 
   The default settings are saved in config.json file. To change the settings, edit the config.json file or pass the new settings as arguments to the script. 
   For example, to change the sweep configuration file for GPD model, run:
   `python pipeline.py --gpd_config <new config file>`
   The new config file should be placed in the `experiments` or as specified in the `configs_path` parameter in the config.json file.

### Troubleshooting

* `wandb: ERROR Run .. errored: OSError(24, 'Too many open files')`
-> https://github.com/wandb/wandb/issues/2825
initial commit 2023-07-05 09:58:06 +02:00			`# Demo notebooks and scripts for EPOS AI Platform`


			`This repo contains notebooks and scripts demonstrating how to:`
initial commit with the pipeline for training and evaluating seisbench models 2023-08-29 09:59:31 +02:00			`- Prepare IGF data for training a seisbench model detecting P phase (i.e. transform mseeds into [SeisBench data format](https://seisbench.readthedocs.io/en/stable/pages/data_format.html)), check the [notebook](utils/Transforming%20mseeds%20to%20SeisBench%20dataset.ipynb).`
initial commit 2023-07-05 09:58:06 +02:00
			`- Explore available data, check the [notebook](notebooks/Explore%20igf%20data.ipynb)`
initial commit with the pipeline for training and evaluating seisbench models 2023-08-29 09:59:31 +02:00			`- Train various cnn models available in seisbench library and compare their performance of detecting P phase, check the [script](scripts/pipeline.py)`

			`- [to update] Validate model performance, check the [notebook](notebooks/Check%20model%20performance%20depending%20on%20station-random%20window.ipynb)`
			`- [to update] Use model for detecting P phase, check the [notebook](notebooks/Present%20model%20predictions.ipynb)`
initial commit 2023-07-05 09:58:06 +02:00

initial commit with the pipeline for training and evaluating seisbench models 2023-08-29 09:59:31 +02:00			`### Acknowledgments`
			`This code is based on the [pick-benchmark](https://github.com/seisbench/pick-benchmark), the repository accompanying the paper:`
			`[Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers](https://github.com/seisbench/pick-benchmark#:~:text=Which%20picker%20fits%20my%20data%3F%20A%20quantitative%20evaluation%20of%20deep%20learning%20based%20seismic%20pickers)`
initial commit 2023-07-05 09:58:06 +02:00			`### Usage`

updated readme 2023-07-05 10:21:22 +02:00			`1. Install all dependencies with poetry, run:`

initial commit with the pipeline for training and evaluating seisbench models 2023-08-29 09:59:31 +02:00			`poetry install`
			`2. Prepare .env file with content:`
			```
			`WANDB_HOST="https://epos-ai.grid.cyfronet.pl/"`
			`WANDB_API_KEY="your key"`
			`WANDB_USER="your user"`
			`WANDB_PROJECT="training_seisbench_models_on_igf_data"`
			`BENCHMARK_DEFAULT_WORKER=2`

			`3. Transform data into seisbench format.`
			`* Download original data from the [drive](https://drive.google.com/drive/folders/1InVI9DLaD7gdzraM2jMzeIrtiBSu-UIK?usp=drive_link)`
			* Run the notebook: `utils/Transforming mseeds to SeisBench dataset.ipynb`

			`4. Initialize poetry environment:`

			`poetry shell`

			`5. Run the pipeline script:`

			`python pipeline.py`

			`The script performs the following steps:`
			`* Generates evaluation targets`
			`* Trains multiple versions of GPD, PhaseNet and ... models to find the best hyperparameters, producing the lowest validation loss.`
			`This step utilizes the Weights & Biases platform to perform the hyperparameters search (called sweeping) and track the training process and store the results.`
			`The results are available at`
			`https://epos-ai.grid.cyfronet.pl/<your user name>/<your project name>`
			`* Uses the best performing model of each type to generate predictions`
			`* Evaluates the performance of each model by comparing the predictions with the evaluation targets`
			* Saves the results in the `scripts/pred` directory
			`*`
			`The default settings are saved in config.json file. To change the settings, edit the config.json file or pass the new settings as arguments to the script.`
			`For example, to change the sweep configuration file for GPD model, run:`
			`python pipeline.py --gpd_config <new config file>`
			The new config file should be placed in the `experiments` or as specified in the `configs_path` parameter in the config.json file.

			`### Troubleshooting`

			* `wandb: ERROR Run .. errored: OSError(24, 'Too many open files')`
			`-> https://github.com/wandb/wandb/issues/2825`