Skip to content

Commit 475d802

Browse files
Update hands-on.mdx
changed to the new version
1 parent f8ed470 commit 475d802

File tree

1 file changed

+23
-23
lines changed

1 file changed

+23
-23
lines changed

units/en/unit1/hands-on.mdx

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ In this notebook, you'll train your **first Deep Reinforcement Learning agent**
4747

4848
### The environment 🎮
4949

50-
- [LunarLander-v2](https://gymnasium.farama.org/environments/box2d/lunar_lander/)
50+
- [LunarLander-v3](https://gymnasium.farama.org/environments/box2d/lunar_lander/)
5151

5252
### The library used 📚
5353

@@ -138,7 +138,7 @@ For more information about the certification process, check this section 👉 ht
138138

139139
The first step is to install the dependencies, we’ll install multiple ones.
140140

141-
- `gymnasium[box2d]`: Contains the LunarLander-v2 environment 🌛
141+
- `gymnasium[box2d]`: Contains the LunarLander-v3 environment 🌛
142142
- `stable-baselines3[extra]`: The deep reinforcement learning library.
143143
- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.
144144

@@ -256,8 +256,8 @@ If the episode is terminated:
256256
```python
257257
import gymnasium as gym
258258

259-
# First, we create our environment called LunarLander-v2
260-
env = gym.make("LunarLander-v2")
259+
# First, we create our environment called LunarLander-v3
260+
env = gym.make("LunarLander-v3")
261261

262262
# Then we reset this environment
263263
observation, info = env.reset()
@@ -301,7 +301,7 @@ Let's see what the Environment looks like:
301301

302302
```python
303303
# We create our environment with gym.make("<name_of_the_environment>")
304-
env = gym.make("LunarLander-v2")
304+
env = gym.make("LunarLander-v3")
305305
env.reset()
306306
print("_____OBSERVATION SPACE_____ \n")
307307
print("Observation Space Shape", env.observation_space.shape)
@@ -355,7 +355,7 @@ An episode is **considered a solution if it scores at least 200 points.**
355355

356356
```python
357357
# Create the environment
358-
env = make_vec_env("LunarLander-v2", n_envs=16)
358+
env = make_vec_env("LunarLander-v3", n_envs=16)
359359
```
360360

361361
## Create the Model 🤖
@@ -390,7 +390,7 @@ Stable-Baselines3 is easy to set up:
390390

391391
```
392392
# Create environment
393-
env = gym.make('LunarLander-v2')
393+
env = gym.make('LunarLander-v3')
394394
395395
# Instantiate the agent
396396
model = PPO('MlpPolicy', env, verbose=1)
@@ -433,7 +433,7 @@ model = PPO(
433433
# TODO: Train it for 1,000,000 timesteps
434434

435435
# TODO: Specify file name for model and save the model to file
436-
model_name = "ppo-LunarLander-v2"
436+
model_name = "ppo-LunarLander-v3"
437437
```
438438

439439
#### Solution
@@ -443,7 +443,7 @@ model_name = "ppo-LunarLander-v2"
443443
# Train it for 1,000,000 timesteps
444444
model.learn(total_timesteps=1000000)
445445
# Save the model
446-
model_name = "ppo-LunarLander-v2"
446+
model_name = "ppo-LunarLander-v3"
447447
model.save(model_name)
448448
```
449449

@@ -473,7 +473,7 @@ mean_reward, std_reward =
473473

474474
```python
475475
# @title
476-
eval_env = Monitor(gym.make("LunarLander-v2"))
476+
eval_env = Monitor(gym.make("LunarLander-v3"))
477477
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
478478
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
479479
```
@@ -483,7 +483,7 @@ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
483483
## Publish our trained model on the Hub 🔥
484484
Now that we saw we got good results after the training, we can publish our trained model on the hub 🤗 with one line of code.
485485

486-
📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20
486+
📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v30
487487

488488
Here's an example of a Model Card (with Space Invaders):
489489

@@ -521,7 +521,7 @@ Let's fill the `package_to_hub` function:
521521
- `model`: our trained model.
522522
- `model_name`: the name of the trained model that we defined in `model_save`
523523
- `model_architecture`: the model architecture we used, in our case PPO
524-
- `env_id`: the name of the environment, in our case `LunarLander-v2`
524+
- `env_id`: the name of the environment, in our case `LunarLander-v3`
525525
- `eval_env`: the evaluation environment defined in eval_env
526526
- `repo_id`: the name of the Hugging Face Hub Repository that will be created/updated `(repo_id = {username}/{repo_name})`
527527

@@ -537,7 +537,7 @@ from stable_baselines3.common.env_util import make_vec_env
537537
from huggingface_sb3 import package_to_hub
538538

539539
## TODO: Define a repo_id
540-
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
540+
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v3
541541
repo_id =
542542

543543
# TODO: Define the name of the environment
@@ -559,7 +559,7 @@ package_to_hub(model=model, # Our trained model
559559
model_architecture=model_architecture, # The model architecture we used: in our case PPO
560560
env_id=env_id, # Name of the environment
561561
eval_env=eval_env, # Evaluation Environment
562-
repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
562+
repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v3
563563
commit_message=commit_message)
564564
```
565565

@@ -577,18 +577,18 @@ from huggingface_sb3 import package_to_hub
577577

578578
# PLACE the variables you've just defined two cells above
579579
# Define the name of the environment
580-
env_id = "LunarLander-v2"
580+
env_id = "LunarLander-v3"
581581

582582
# TODO: Define the model architecture we used
583583
model_architecture = "PPO"
584584

585585
## Define a repo_id
586-
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
586+
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v3
587587
## CHANGE WITH YOUR REPO ID
588-
repo_id = "ThomasSimonini/ppo-LunarLander-v2" # Change with your repo id, you can't push with mine 😄
588+
repo_id = "ThomasSimonini/ppo-LunarLander-v3" # Change with your repo id, you can't push with mine 😄
589589

590590
## Define the commit message
591-
commit_message = "Upload PPO LunarLander-v2 trained agent"
591+
commit_message = "Upload PPO LunarLander-v3 trained agent"
592592

593593
# Create the evaluation env and set the render_mode="rgb_array"
594594
eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode="rgb_array"))])
@@ -600,7 +600,7 @@ package_to_hub(
600600
model_architecture=model_architecture, # The model architecture we used: in our case PPO
601601
env_id=env_id, # Name of the environment
602602
eval_env=eval_env, # Evaluation Environment
603-
repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
603+
repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v3
604604
commit_message=commit_message,
605605
)
606606
```
@@ -613,7 +613,7 @@ Congrats 🥳 you've just trained and uploaded your first Deep Reinforcement Lea
613613

614614
Under the hood, the Hub uses git-based repositories (don't worry if you don't know what git is), which means you can update the model with new versions as you experiment and improve your agent.
615615

616-
Compare the results of your LunarLander-v2 with your classmates using the leaderboard 🏆 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard
616+
Compare the results of your LunarLander-v3 with your classmates using the leaderboard 🏆 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard
617617

618618
## Load a saved LunarLander model from the Hub 🤗
619619
Thanks to [ironbar](https://github.com/ironbar) for the contribution.
@@ -641,7 +641,7 @@ Shimmy Documentation: https://github.com/Farama-Foundation/Shimmy
641641
from huggingface_sb3 import load_from_hub
642642

643643
repo_id = "Classroom-workshop/assignment2-omar" # The repo_id
644-
filename = "ppo-LunarLander-v2.zip" # The model filename.zip
644+
filename = "ppo-LunarLander-v3.zip" # The model filename.zip
645645

646646
# When the model was trained on Python 3.8 the pickle protocol is 5
647647
# But Python 3.6, 3.7 use protocol 4
@@ -662,7 +662,7 @@ Let's evaluate this agent:
662662

663663
```python
664664
# @title
665-
eval_env = Monitor(gym.make("LunarLander-v2"))
665+
eval_env = Monitor(gym.make("LunarLander-v3"))
666666
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
667667
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
668668
```
@@ -678,7 +678,7 @@ Here are some ideas to achieve so:
678678
* Check the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) and try another model such as DQN.
679679
* **Push your new trained model** on the Hub 🔥
680680

681-
**Compare the results of your LunarLander-v2 with your classmates** using the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) 🏆
681+
**Compare the results of your LunarLander-v3 with your classmates** using the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) 🏆
682682

683683
Is moon landing too boring for you? Try to **change the environment**, why not use MountainCar-v0, CartPole-v1 or CarRacing-v0? Check how they work [using the gym documentation](https://www.gymlibrary.dev/) and have fun 🎉.
684684

0 commit comments

Comments
 (0)