浏览代码

Updated FP16 bash script, and added plots

tanyksg 5 年之前
父节点
当前提交
d6f6bd0394
共有 33 个文件被更改,包括 172 次插入92 次删除
  1. 116 0
      .ipynb_checkpoints/README-checkpoint.md
  2. 2 2
      README.md
  3. 二进制
      save/MNIST (MLP, IID) FP16 and FP32 Comparison_acc_FP16_32.png
  4. 二进制
      save/MNIST (MLP, IID) FP16 and FP32 Comparison_loss_FP16_32.png
  5. 二进制
      save/MNIST_CNN_IID FP16 and FP32 Comparison_acc_FP16_32.png
  6. 二进制
      save/MNIST_CNN_IID FP16 and FP32 Comparison_loss_FP16_32.png
  7. 二进制
      save/MNIST_CNN_IID_FP16_acc_FP16.png
  8. 二进制
      save/MNIST_CNN_IID_FP16_loss_FP16.png
  9. 二进制
      save/MNIST_CNN_NONIID FP16 and FP32 Comparison_acc_FP16_32.png
  10. 二进制
      save/MNIST_CNN_NONIID FP16 and FP32 Comparison_loss_FP16_32.png
  11. 二进制
      save/MNIST_CNN_NONIID_FP16_acc_FP16.png
  12. 二进制
      save/MNIST_CNN_NONIID_FP16_loss_FP16.png
  13. 二进制
      save/MNIST_MLP_IID FP16 and FP32 Comparison_acc_FP16_32.png
  14. 二进制
      save/MNIST_MLP_IID FP16 and FP32 Comparison_loss_FP16_32.png
  15. 二进制
      save/MNIST_MLP_IID_FP16_acc_FP16.png
  16. 二进制
      save/MNIST_MLP_IID_FP16_loss_FP16.png
  17. 二进制
      save/MNIST_MLP_NONIID FP16 and FP32 Comparison_acc_FP16_32.png
  18. 二进制
      save/MNIST_MLP_NONIID FP16 and FP32 Comparison_loss_FP16_32.png
  19. 二进制
      save/MNIST_MLP_NONIID_FP16_acc_FP16.png
  20. 二进制
      save/MNIST_MLP_NONIID_FP16_loss_FP16.png
  21. 二进制
      save/objects_fp16/FL_cifar_cnn_500_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]_FP16.pkl
  22. 二进制
      save/objects_fp16/FL_mnist_mlp_600_lr[0.1]_C[0.1]_iid[0]_E[1]_B[10]_FP16.pkl
  23. 二进制
      save/objects_fp16/FL_mnist_mlp_650_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]_FP16.pkl
  24. 二进制
      save/objects_fp16/HFL2_mnist_mlp_300_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]_FP16.pkl
  25. 二进制
      save/objects_fp16/HFL4_cifar_cnn_500_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]_FP16.pkl
  26. 二进制
      save/objects_fp16/HFL4_mnist_mlp_250_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]_FP16.pkl
  27. 二进制
      save/objects_fp16/HFL8_mnist_mlp_100_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]_FP16.pkl
  28. 22 0
      src/.ipynb_checkpoints/Eval_fp16-checkpoint.ipynb
  29. 1 1
      src/Eval_fp16.ipynb
  30. 5 5
      src/script_bash_FL_diffFP.sh
  31. 7 17
      src/script_bash_FL_diffFP_cifar.sh
  32. 9 17
      src/script_bash_FL_diffFP_mnist_cnn.sh
  33. 10 50
      src/script_bash_FL_diffFP_mnist_mlp.sh

+ 116 - 0
.ipynb_checkpoints/README-checkpoint.md

@@ -0,0 +1,116 @@
+# Federated-Learning (PyTorch)
+
+Implementation of both hierarchical and vanilla federated learning based on the paper : [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629).
+Blog Post: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
+
+Experiments are conducted on MNIST and CIFAR10 datasets. During training, the datasets split are both IID and non-IID. In case of non-IID, the data amongst the users can be split equally or unequally.
+
+Since the purpose of these experiments are to illustrate the effectiveness of the federated learning paradigm, only simple models such as MLP and CNN are used.
+
+## Requirments
+Install all the packages from requirments.txt
+* Python=3.7.3
+* Pytorch=1.2.0
+* Torchvision=0.4.0
+* Numpy=1.15.4
+* Tensorboardx=1.4
+* Matplotlib=3.0.1
+
+
+## Data
+* Download train and test datasets manually or they will be automatically downloaded from torchvision datasets.
+* Experiments are run on Mnist and Cifar.
+* To use your own dataset: Move your dataset to data directory and write a wrapper on pytorch dataset class.
+
+## Running the experiments
+The baseline experiment trains the model in the conventional way.
+
+* To run the baseline experiment with MNIST on MLP using CPU:
+```
+python baseline_main.py --model=mlp --dataset=mnist --epochs=10
+```
+* Or to run it on GPU (eg: if gpu:0 is available):
+```
+python baseline_main.py --model=mlp --dataset=mnist --gpu=1 --epochs=10
+```
+-----
+
+Federated experiment involves training a global model using many local models.
+
+* To run the federated experiment with CIFAR on CNN (IID):
+```
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=cifar --iid=1 --test_acc=99 --gpu=1
+```
+* To run the same experiment under non-IID condition:
+```
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=cifar --iid=0 --test_acc=99 --gpu=1
+```
+-----
+
+Hierarchical Federated experiments involve training a global model using different clusters with many local models.
+
+* To run the hierarchical federated experiment with MNIST on MLP (IID):
+```
+python federated-hierarchical_main.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=5 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --test_acc=97  --gpu=1
+```
+* To run the same experiment under non-IID condition:
+```
+python federated-hierarchical_main.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=5 --model=mlp --dataset=mnist --iid=0 --num_cluster=2 --test_acc=97  --gpu=1
+```
+
+You can change the default values of other parameters to simulate different conditions. Refer to the options section.
+
+## Options
+The default values for various paramters parsed to the experiment are given in ```options.py```. Details are given some of those parameters:
+
+* ```--dataset:```  Default: 'mnist'. Options: 'mnist', 'fmnist', 'cifar'
+* ```--model:```    Default: 'mlp'. Options: 'mlp', 'cnn'
+* ```--gpu:```      Default: None (runs on CPU). Can also be set to the specific gpu id.
+* ```--epochs:```   Number of rounds of training.
+* ```--lr:```       Learning rate set to 0.01 by default.
+* ```--verbose:```  Detailed log outputs. Activated by default, set to 0 to deactivate.
+* ```--seed:```     Random Seed. Default set to 1.
+
+#### Federated Parameters
+* ```--iid:```      Distribution of data amongst users. Default set to IID. Set to 0 for non-IID.
+* ```--num_users:```Number of users. Default is 100.
+* ```--frac:```     Fraction of users to be used for federated updates. Default is 0.1.
+* ```--local_ep:``` Number of local training epochs in each user. Default is 10.
+* ```--local_bs:``` Batch size of local updates in each user. Default is 10.
+* ```--unequal:```  Used in non-iid setting. Option to split the data amongst users equally or unequally. Default set to 0 for equal splits. Set to 1 for unequal splits.
+* ```--num_clusters:```  Number of clusters in the hierarchy.
+* ```--Cepochs:```  Number of rounds of training in each cluster.
+
+## Results on MNIST
+#### Baseline Experiment:
+The experiment involves training a single model in the conventional way.
+
+Parameters: <br />
+* ```Optimizer:```    : SGD 
+* ```Learning Rate:``` 0.01
+
+```Table 1:``` Test accuracy after training for 10 epochs:
+
+| Model | Test Acc |
+| ----- | -----    |
+|  MLP  |  92.71%  |
+|  CNN  |  98.42%  |
+
+----
+
+#### Federated Experiment:
+The experiment involves training a global model in the federated setting.
+
+Federated parameters (default values):
+* ```Fraction of users (C)```: 0.1 
+* ```Local Batch size  (B)```: 10 
+* ```Local Epochs      (E)```: 10 
+* ```Optimizer            ```: SGD 
+* ```Learning Rate        ```: 0.01 <br />
+
+```Table 2:``` Test accuracy after training for 10 global epochs with:
+
+| Model |    IID   | Non-IID (equal)|
+| ----- | -----    |----            |
+|  MLP  |  88.38%  |     73.49%     |
+|  CNN  |  97.28%  |     75.94%     |

+ 2 - 2
README.md

@@ -3,7 +3,7 @@
 Implementation of both hierarchical and vanilla federated learning based on the paper : [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629).
 Blog Post: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
 
-Experiments are produced on MNIST, Fashion MNIST and CIFAR10 (both IID and non-IID). In case of non-IID, the data amongst the users can be split equally or unequally.
+Experiments are conducted on MNIST and CIFAR10 datasets. During training, the datasets split are both IID and non-IID. In case of non-IID, the data amongst the users can be split equally or unequally.
 
 Since the purpose of these experiments are to illustrate the effectiveness of the federated learning paradigm, only simple models such as MLP and CNN are used.
 
@@ -19,7 +19,7 @@ Install all the packages from requirments.txt
 
 ## Data
 * Download train and test datasets manually or they will be automatically downloaded from torchvision datasets.
-* Experiments are run on Mnist, Fashion Mnist and Cifar.
+* Experiments are run on Mnist and Cifar.
 * To use your own dataset: Move your dataset to data directory and write a wrapper on pytorch dataset class.
 
 ## Running the experiments

二进制
save/MNIST (MLP, IID) FP16 and FP32 Comparison_acc_FP16_32.png


二进制
save/MNIST (MLP, IID) FP16 and FP32 Comparison_loss_FP16_32.png


二进制
save/MNIST_CNN_IID FP16 and FP32 Comparison_acc_FP16_32.png


二进制
save/MNIST_CNN_IID FP16 and FP32 Comparison_loss_FP16_32.png


二进制
save/MNIST_CNN_IID_FP16_acc_FP16.png


二进制
save/MNIST_CNN_IID_FP16_loss_FP16.png


二进制
save/MNIST_CNN_NONIID FP16 and FP32 Comparison_acc_FP16_32.png


二进制
save/MNIST_CNN_NONIID FP16 and FP32 Comparison_loss_FP16_32.png


二进制
save/MNIST_CNN_NONIID_FP16_acc_FP16.png


二进制
save/MNIST_CNN_NONIID_FP16_loss_FP16.png


二进制
save/MNIST_MLP_IID FP16 and FP32 Comparison_acc_FP16_32.png


二进制
save/MNIST_MLP_IID FP16 and FP32 Comparison_loss_FP16_32.png


二进制
save/MNIST_MLP_IID_FP16_acc_FP16.png


二进制
save/MNIST_MLP_IID_FP16_loss_FP16.png


二进制
save/MNIST_MLP_NONIID FP16 and FP32 Comparison_acc_FP16_32.png


二进制
save/MNIST_MLP_NONIID FP16 and FP32 Comparison_loss_FP16_32.png


二进制
save/MNIST_MLP_NONIID_FP16_acc_FP16.png


二进制
save/MNIST_MLP_NONIID_FP16_loss_FP16.png


二进制
save/objects_fp16/FL_cifar_cnn_500_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]_FP16.pkl


二进制
save/objects_fp16/FL_mnist_mlp_600_lr[0.1]_C[0.1]_iid[0]_E[1]_B[10]_FP16.pkl


二进制
save/objects_fp16/FL_mnist_mlp_650_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]_FP16.pkl


二进制
save/objects_fp16/HFL2_mnist_mlp_300_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]_FP16.pkl


二进制
save/objects_fp16/HFL4_cifar_cnn_500_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]_FP16.pkl


二进制
save/objects_fp16/HFL4_mnist_mlp_250_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]_FP16.pkl


二进制
save/objects_fp16/HFL8_mnist_mlp_100_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]_FP16.pkl


文件差异内容过多而无法显示
+ 22 - 0
src/.ipynb_checkpoints/Eval_fp16-checkpoint.ipynb


+ 1 - 1
src/Eval_fp16.ipynb

@@ -323,7 +323,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.3"
+   "version": "3.6.9"
   }
  },
  "nbformat": 4,

+ 5 - 5
src/script_bash_FL_diffFP.sh

@@ -4,12 +4,12 @@
 # Website on how to write bash script https://hackernoon.com/know-shell-scripting-202b2fbe03a8
 
 # This is the baseline without FL for 16-bit floating point.
-python ./baseline_main_fp16.py --epochs=10 --model="mlp" --dataset="mnist" --num_classes=10 --gpu=1 --gpu_id="cuda:0" --mlpdim=200 | tee -a ../logs/terminal_output1.txt &
+python ./baseline_main_fp16.py --epochs=10 --model="mlp" --dataset="mnist" --num_classes=10 --gpu=1 --gpu_id="cuda:0" --mlpdim=200 
 
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=200 | tee -a ../logs/terminal_output2.txt &
+python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=200 
 
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --lr=0.01 --mlpdim=200 --epochs=100 --test_acc=94 | tee -a ../logs/terminal_output3.txt
+python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --lr=0.01 --mlpdim=200 --epochs=100 --test_acc=94 
 
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --lr=0.1 --mlpdim=200 --epochs=100 --test_acc=95 | tee -a ../logs/terminal_output4.txt
+python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --lr=0.1 --mlpdim=200 --epochs=100 --test_acc=95 
 
-python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --mlpdim=200 --epochs=30 --num_cluster=8 --test_acc=95 | tee -a ../logs/terminal_output5.txt
+python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --mlpdim=200 --epochs=30 --num_cluster=8 --test_acc=95 

+ 7 - 17
src/script_bash_FL_diffFP_cifar.sh

@@ -3,8 +3,8 @@
 # Commands are surrounde by ()
 # Website on how to write bash script https://hackernoon.com/know-shell-scripting-202b2fbe03a8
 
-# Set GPU device
-GPU_ID="cuda:1"
+# Set GPU device (you can ignore this if not usin GPU)
+GPU_ID="cuda:0"
 
 
 
@@ -27,27 +27,17 @@ python federated-hierarchical8_main.py --local_ep=5 --local_bs=50 --Cepochs=10 -
 
 
 # ================ 16-bit ================ 
-# This is the baseline without FL for 16-bit floating point.
-python ./baseline_main_fp16.py --epochs=10 --model=cnn --dataset=cifar --num_classes=10 --gpu=1 --gpu_id=$GPU_ID | tee -a ../logs/terminaloutput_cifar_fp16_baseline.txt &
-
-
-# This is for 1 cluster FL for 16-bit floating point
-python ./federated_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=85 --epochs=100 | tee -a ../logs/terminaloutput_cifar_fp16_1c_10ep_ta85.txt &
-
-python ./federated_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=200 | tee -a ../logs/terminaloutput_cifar_fp16_1c_200ep_ta95.txt &
-
-python ./federated_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=300 | tee -a ../logs/terminaloutput_cifar_fp16_1c_300ep_ta95.txt &
+# This is for FL for 16-bit floating point
+python ./federated_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=300 
 
 
 # This is for 2 clusters FL for 16-bit floating point
-python ./federated-hierarchical2_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 --test_acc=85 | tee -a ../logs/terminaloutput_cifar_fp16_2c_100ep_ta85.txt &
-
-python ./federated-hierarchical2_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 | tee -a ../logs/terminaloutput_cifar_fp16_2c_100ep_t95.txt &
+python ./federated-hierarchical2_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 
 
 
 # This is for 4 clusters FL for 16-bit floating point
-python ./federated-hierarchical4_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 --num_cluster=4 | tee -a ../logs/terminaloutput_cifar_fp16_4c_100ep_t95.txt &
+python ./federated-hierarchical4_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 --num_cluster=4 
 
 
 # This is for 8 clusters FL for 16-bit floating point
-python ./federated-hierarchical8_main_fp16.py --local_ep=5 --local_bs=50 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 --num_cluster=8 | tee -a ../logs/terminaloutput_cifar_fp16_8c_100ep_t95.txt &
+python ./federated-hierarchical8_main_fp16.py --local_ep=5 --local_bs=50 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 --num_cluster=8 

+ 9 - 17
src/script_bash_FL_diffFP_mnist_cnn.sh

@@ -12,7 +12,7 @@ GPU_ID="cuda:1"
 # IID
 python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --test_acc=97 --epochs=100
 # NON-IID
-python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=0 --gpu=1 --lr=0.01 --epochs=100 --test_acc=97
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=0 --gpu=1 --lr=0.01 --epochs=300 --test_acc=97
 
 
 # This is for 2 clusters HFL for 32-bit floating point
@@ -39,32 +39,24 @@ python federated-hierarchical8_main.py --local_ep=1 --local_bs=10 --Cepochs=10 -
 
 
 # ================ 16-bit ================ 
-# This is the baseline without FL for 16-bit floating point.
-python ./baseline_main_fp16.py --epochs=10 --model=cnn --dataset=mnist --num_classes=10 --gpu=1  --gpu_id=$GPU_ID | tee -a ../logs/terminaloutput_mnist_CNN_fp16_baseline.txt &
-
-
 # This is for 1 cluster FL for 16-bit floating point
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=97 --epochs=100 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_1c1.txt &
-
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 --test_acc=97 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_1c2.txt &
+python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=97 --epochs=100 
 
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=261 --test_acc=97 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_1c3.txt &
+python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=300 --test_acc=97 
 
 
 # This is for 2 clusters FL for 16-bit floating point
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_2c1.txt &
-
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=0 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_2c2.txt &
-
+python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 
 
+python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=0 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100 
 
 
 # This is for 4 clusters FL for 16-bit floating point
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100  --num_cluster=4 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_4c1.txt &
+python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100  --num_cluster=4
 
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100  --num_cluster=4 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_4c2.txt &
+python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=100  --num_cluster=4 
 
 # This is for 8 clusters FL for 16-bit floating point
-python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=30  --num_cluster=8 | tee -a ../logs/terminaloutput_mnist_CNN_fp16_8c1.txt &
-
+python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=30  --num_cluster=8 
 
+python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --epochs=30  --num_cluster=8 

+ 10 - 50
src/script_bash_FL_diffFP_mnist_mlp.sh

@@ -9,9 +9,9 @@ GPU_ID="cuda:1"
 # ================ 32-bit ================ 
 # This is for FL for 32-bit floating point
 # IID
-python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=200
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=600
 # NON-IID
-python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=0 --gpu=1 --lr=0.1 --test_acc=95 --mlpdim=200 --epochs=300
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=0 --gpu=1 --lr=0.1 --test_acc=95 --mlpdim=200 --epochs=1200
 
 
 # This is for 2 clusters HFL for 32-bit floating point
@@ -36,67 +36,27 @@ python federated-hierarchical8_main.py --local_ep=1 --local_bs=10 --Cepochs=10 -
 
 
 
-
 # ================ 16-bit ================ 
-# This is the baseline without FL for 16-bit floating point.
-python ./baseline_main_fp16.py --epochs=10 --model=mlp --dataset=mnist --num_classes=10 --gpu=1 --gpu_id=$GPU_ID | tee -a ../logs/terminaloutput_mnist_fp16_baseline.txt &
-
-
 # This is for 1 cluster FL for 16-bit floating point
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=200 | tee -a ../logs/terminaloutput_mnist_fp16_1c.txt &
-
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.1 --test_acc=95 --mlpdim=200 --epochs=300 | tee -a ../logs/terminaloutput_mnist_fp16_1c.txt &
-
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.1 --test_acc=95 --mlpdim=250 --epochs=200 | tee -a ../logs/terminaloutput_mnist_fp16_1c.txt &
+python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=600 
 
-# FL_mnist_mlp_468_C[0.1]_iid[1]_E[1]_B[10]
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=468 | tee -a ../logs/terminaloutput_mnist_fp16_1c_468epoch.txt &
-
-# FL_mnist_mlp_1196_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]
-python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=1196 | tee -a ../logs/terminaloutput_mnist_fp16_1c_1196epoch.txt &
+python ./federated_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.1 --test_acc=95 --mlpdim=200 --epochs=1200
 
 
 # This is for 2 clusters FL for 16-bit floating point
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 --test_acc=94 | tee -a ../logs/terminaloutput_mnist_fp16_2c.txt &
-
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.05 --mlpdim=200 --epochs=100 --test_acc=94 | tee -a ../logs/terminaloutput_mnist_fp16_2c.txt &
-
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_2c.txt &
-
-python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_2c.txt &
-
+python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 
 
+python ./federated-hierarchical2_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --num_cluster=2 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 
 
 
 # This is for 4 clusters FL for 16-bit floating point
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.1 --mlpdim=200 --epochs=100 --test_acc=95 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.05 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.05 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=150 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.05 --mlpdim=200 --epochs=150 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=150 --optimizer='adam' | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100  --num_cluster=4 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
-
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=150  --num_cluster=4 | tee -a ../logs/terminaloutput_mnist_fp16_4c.txt
+python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=100
 
-# HFL4_mnist_mlp_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]
-python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=30 --test_acc=95 | tee -a ../logs/terminaloutput_mnist_fp16_4c_30epoch.txt &
+python ./federated-hierarchical4_main_fp16.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --num_cluster=4 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=150
 
 
 # This is for 8 clusters FL for 16-bit floating point
-python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=30 --num_cluster=8 --test_acc=95 | tee -a ../logs/terminaloutput_mnist_fp16_8c.txt
+python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --Cepochs=10 --model=mlp --dataset=mnist --iid=1 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=30 --num_cluster=8 --test_acc=95 
 
-python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=30 --num_cluster=8 --test_acc=95 | tee -a ../logs/terminaloutput_mnist_fp16_8c.txt
+python ./federated-hierarchical8_main_fp16.py --local_ep=1 --local_bs=10 --Cepochs=10 --model=mlp --dataset=mnist --iid=0 --gpu=1 --gpu_id=$GPU_ID --lr=0.01 --mlpdim=200 --epochs=30 --num_cluster=8 --test_acc=95 
 

部分文件因为文件数量过多而无法显示