瀏覽代碼

update readme, requirements, and eval notebooks

wesleyjtan 4 年之前
父節點
當前提交
70a78d4ebc

+ 52 - 61
README.md

@@ -1,71 +1,87 @@
-# Federated-Learning (PyTorch)
+# Hierarchical Federated-Learning (PyTorch)
 
 
 Implementation of both hierarchical and vanilla federated learning based on the paper : [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629).
 Implementation of both hierarchical and vanilla federated learning based on the paper : [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629).
-Blog Post: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
 
 
 Experiments are conducted on MNIST and CIFAR10 datasets. During training, the datasets split are both IID and non-IID. In case of non-IID, the data amongst the users can be split equally or unequally.
 Experiments are conducted on MNIST and CIFAR10 datasets. During training, the datasets split are both IID and non-IID. In case of non-IID, the data amongst the users can be split equally or unequally.
 
 
 Since the purpose of these experiments are to illustrate the effectiveness of the federated learning paradigm, only simple models such as MLP and CNN are used.
 Since the purpose of these experiments are to illustrate the effectiveness of the federated learning paradigm, only simple models such as MLP and CNN are used.
 
 
-## Requirments
-Install all the packages from requirments.txt
-* Python=3.7.3
-* Pytorch=1.2.0
-* Torchvision=0.4.0
-* Numpy=1.15.4
-* Tensorboardx=1.4
-* Matplotlib=3.0.1
+## Requirements
+Install all the packages from requirements.txt
+* Python==3.7.3
+* Pytorch==1.2.0
+* Torchvision==0.4.0
+* Numpy==1.15.4
+* Tensorboardx==1.4
+* Matplotlib==3.0.1
+* Tqdm==4.39.0 
+
+## Steps to setting up a Python environment
+1. Creating environment:
+```
+conda create -n myenv python=3.7.3
+```
+2. Installing Pytorch and torchvision:
+```
+conda install pytorch==1.2.0 torchvision==0.4.0 -c pytorch
+```
+3. Installing other package requirements:
+```
+pip install -r requirements.txt
+```
 
 
 
 
 ## Data
 ## Data
-* Download train and test datasets manually or they will be automatically downloaded from torchvision datasets.
-* Experiments are run on Mnist and Cifar.
+* Download train and test datasets manually or they will be automatically downloaded to the [data](/data/) folder from torchvision datasets.
+* Experiments are run on MNIST and CIFAR.
 * To use your own dataset: Move your dataset to data directory and write a wrapper on pytorch dataset class.
 * To use your own dataset: Move your dataset to data directory and write a wrapper on pytorch dataset class.
 
 
 ## Running the experiments
 ## Running the experiments
-The baseline experiment trains the model in the conventional way.
+All the experiments of reported results are in the [scripts](/src/) below:
+* script_bash_FL_diffFP_mnist_mlp.sh
+* script_bash_FL_diffFP_mnist_cnn.sh
+* script_bash_FL_diffFP_cifar.sh
+* script_bash_FL_diffFP.sh
+-----
+The baseline experiment trains the model in the conventional federated learning.
 
 
-* To run the baseline experiment with MNIST on MLP using CPU:
+* To run the baseline federated experiment with MNIST on MLP using CPU:
 ```
 ```
-python baseline_main.py --model=mlp --dataset=mnist --epochs=10
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=0 --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=600
 ```
 ```
 * Or to run it on GPU (eg: if gpu:0 is available):
 * Or to run it on GPU (eg: if gpu:0 is available):
 ```
 ```
-python baseline_main.py --model=mlp --dataset=mnist --gpu=1 --epochs=10
+python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=mlp --dataset=mnist --iid=1 --gpu=1 --lr=0.01 --test_acc=95 --mlpdim=200 --epochs=600
 ```
 ```
 -----
 -----
 
 
-Federated experiment involves training a global model using many local models.
+Hierarchical federated experiment involves training a global model using many local models. 
 
 
-* To run the federated experiment with CIFAR on CNN (IID):
+* To run the hierarchical federated experiment with 2 clusters on MNIST using CNN (IID):
 ```
 ```
-python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=cifar --iid=1 --test_acc=99 --gpu=1
+python federated-hierarchical2_main.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=1 --num_cluster=2 --gpu=1 --lr=0.01 --epochs=100
 ```
 ```
 * To run the same experiment under non-IID condition:
 * To run the same experiment under non-IID condition:
 ```
 ```
-python federated_main.py --local_ep=1 --local_bs=10 --frac=0.1 --model=cnn --dataset=cifar --iid=0 --test_acc=99 --gpu=1
+python federated-hierarchical2_main.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=10 --model=cnn --dataset=mnist --iid=0 --num_cluster=2 --gpu=1 --lr=0.01 --epochs=100
 ```
 ```
 -----
 -----
+Hierarchical Federated experiments involve training a global model using different clusters with many local models (16-bit).
 
 
-Hierarchical Federated experiments involve training a global model using different clusters with many local models.
-
-* To run the hierarchical federated experiment with MNIST on MLP (IID):
+* To run the hierarchical federated experiment with 2 clusters on CIFAR using CNN (IID):
 ```
 ```
-python federated-hierarchical_main.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=5 --model=mlp --dataset=mnist --iid=1 --num_cluster=2 --test_acc=97  --gpu=1
-```
-* To run the same experiment under non-IID condition:
-```
-python federated-hierarchical_main.py --local_ep=1 --local_bs=10 --frac=0.1 --Cepochs=5 --model=mlp --dataset=mnist --iid=0 --num_cluster=2 --test_acc=97  --gpu=1
+python ./federated-hierarchical2_main_fp16.py --local_ep=5 --local_bs=50 --frac=0.1 --Cepochs=10 --model=cnn --dataset=cifar --iid=1 --num_cluster=2 --gpu=1 --lr=0.01 --epochs=100 
 ```
 ```
 
 
+
 You can change the default values of other parameters to simulate different conditions. Refer to the options section.
 You can change the default values of other parameters to simulate different conditions. Refer to the options section.
 
 
 ## Options
 ## Options
 The default values for various paramters parsed to the experiment are given in ```options.py```. Details are given some of those parameters:
 The default values for various paramters parsed to the experiment are given in ```options.py```. Details are given some of those parameters:
 
 
-* ```--dataset:```  Default: 'mnist'. Options: 'mnist', 'fmnist', 'cifar'
+* ```--dataset:```  Default: 'mnist'. Options: 'mnist', 'cifar'
 * ```--model:```    Default: 'mlp'. Options: 'mlp', 'cnn'
 * ```--model:```    Default: 'mlp'. Options: 'mlp', 'cnn'
-* ```--gpu:```      Default: None (runs on CPU). Can also be set to the specific gpu id.
+* ```--gpu:```      Default: 1 (runs on gpu:0)
 * ```--epochs:```   Number of rounds of training.
 * ```--epochs:```   Number of rounds of training.
 * ```--lr:```       Learning rate set to 0.01 by default.
 * ```--lr:```       Learning rate set to 0.01 by default.
 * ```--verbose:```  Detailed log outputs. Activated by default, set to 0 to deactivate.
 * ```--verbose:```  Detailed log outputs. Activated by default, set to 0 to deactivate.
@@ -75,42 +91,17 @@ The default values for various paramters parsed to the experiment are given in `
 * ```--iid:```      Distribution of data amongst users. Default set to IID. Set to 0 for non-IID.
 * ```--iid:```      Distribution of data amongst users. Default set to IID. Set to 0 for non-IID.
 * ```--num_users:```Number of users. Default is 100.
 * ```--num_users:```Number of users. Default is 100.
 * ```--frac:```     Fraction of users to be used for federated updates. Default is 0.1.
 * ```--frac:```     Fraction of users to be used for federated updates. Default is 0.1.
-* ```--local_ep:``` Number of local training epochs in each user. Default is 10.
+* ```--local_ep:``` Number of local training epochs in each user. Default is 1.
 * ```--local_bs:``` Batch size of local updates in each user. Default is 10.
 * ```--local_bs:``` Batch size of local updates in each user. Default is 10.
-* ```--unequal:```  Used in non-iid setting. Option to split the data amongst users equally or unequally. Default set to 0 for equal splits. Set to 1 for unequal splits.
 * ```--num_clusters:```  Number of clusters in the hierarchy.
 * ```--num_clusters:```  Number of clusters in the hierarchy.
 * ```--Cepochs:```  Number of rounds of training in each cluster.
 * ```--Cepochs:```  Number of rounds of training in each cluster.
 
 
-## Results on MNIST
-#### Baseline Experiment:
-The experiment involves training a single model in the conventional way.
-
-Parameters: <br />
-* ```Optimizer:```    : SGD 
-* ```Learning Rate:``` 0.01
-
-```Table 1:``` Test accuracy after training for 10 epochs:
-
-| Model | Test Acc |
-| ----- | -----    |
-|  MLP  |  92.71%  |
-|  CNN  |  98.42%  |
-
-----
+## Experimental Results 
+The results and figures can be found in [evaluation notebooks](/src/)
+* Eval.ipynb
+* Eval_fp16.ipynb
+* Eval_fp16-32-compare.ipynb
 
 
-#### Federated Experiment:
-The experiment involves training a global model in the federated setting.
 
 
-Federated parameters (default values):
-* ```Fraction of users (C)```: 0.1 
-* ```Local Batch size  (B)```: 10 
-* ```Local Epochs      (E)```: 10 
-* ```Optimizer            ```: SGD 
-* ```Learning Rate        ```: 0.01 <br />
 
 
-```Table 2:``` Test accuracy after training for 10 global epochs with:
 
 
-| Model |    IID   | Non-IID (equal)|
-| ----- | -----    |----            |
-|  MLP  |  88.38%  |     73.49%     |
-|  CNN  |  97.28%  |     75.94%     |

+ 4 - 0
requirements.txt

@@ -0,0 +1,4 @@
+tqdm==4.39.0 
+numpy==1.15.4
+matplotlib==3.0.1
+tensorboardx==1.4

+ 0 - 0
save/objects/[7]HFL4_mnist_mlp_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10].pkl → save/objects/[7]HFL8_mnist_mlp_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10].pkl


文件差異過大導致無法顯示
+ 50 - 34
src/.ipynb_checkpoints/Eval-checkpoint.ipynb


+ 53 - 21
src/Eval.ipynb

@@ -46,7 +46,7 @@
     "filename1 = \"[1]FL_mnist_mlp_468_C[0.1]_iid[1]_E[1]_B[10]\"\n",
     "filename1 = \"[1]FL_mnist_mlp_468_C[0.1]_iid[1]_E[1]_B[10]\"\n",
     "filename2 = \"[3]HFL2_mnist_mlp_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
     "filename2 = \"[3]HFL2_mnist_mlp_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
     "filename3 = \"[5]HFL4_mnist_mlp_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
     "filename3 = \"[5]HFL4_mnist_mlp_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
-    "filename4 = \"[7]HFL4_mnist_mlp_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "filename4 = \"[7]HFL8_mnist_mlp_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
     "\n",
     "\n",
     "with open(r\"../save/objects/\" + filename1 + \".pkl\", \"rb\") as input_file: data1 = pickle.load(input_file)\n",
     "with open(r\"../save/objects/\" + filename1 + \".pkl\", \"rb\") as input_file: data1 = pickle.load(input_file)\n",
     "with open(r\"../save/objects/\" + filename2 + \".pkl\", \"rb\") as input_file: data2 = pickle.load(input_file)\n",
     "with open(r\"../save/objects/\" + filename2 + \".pkl\", \"rb\") as input_file: data2 = pickle.load(input_file)\n",
@@ -451,35 +451,67 @@
    ]
    ]
   },
   },
   {
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "### Function to find out the number of communication rounds needed to exceed a certain prediction accuracy."
+   ]
   },
   },
   {
   {
-   "cell_type": "raw",
+   "cell_type": "code",
+   "execution_count": 4,
    "metadata": {},
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The number of global training round just greater than 97.0% : 66\n"
+     ]
+    }
+   ],
    "source": [
    "source": [
     "import pickle\n",
     "import pickle\n",
     "\n",
     "\n",
-    "filename1 = \"FL_cifar_cnn_300_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]\"\n",
+    "##### MNIST_MLP_IID\n",
+    "# filename1 = \"[1]FL_mnist_mlp_468_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "# filename1 = \"[3]HFL2_mnist_mlp_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "# filename1 = \"[5]HFL4_mnist_mlp_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "# filename1 = \"[7]HFL8_mnist_mlp_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "\n",
+    "##### MNIST_MLP_NON-IID\n",
+    "# filename1 = \"[2]FL_mnist_mlp_1196_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "# filename1 = \"[4]HFL2_mnist_mlp_100_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "# filename1 = \"[6]HFL4_mnist_mlp_150_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "\n",
+    "##### MNIST_CNN_IID\n",
+    "filename1 = \"[9]FL_mnist_cnn_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "# filename1 = \"[11]HFL2_mnist_cnn_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "# filename1 = \"[13]HFL4_mnist_cnn_100_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "# filename1 = \"[15]HFL8_mnist_cnn_30_lr[0.01]_C[0.1]_iid[1]_E[1]_B[10]\"\n",
+    "\n",
+    "##### MNIST_CNN_NON-IID\n",
+    "# filename1 = \"[10]FL_mnist_cnn_261_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "# filename1 = \"[12]HFL2_mnist_cnn_100_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "# filename1 = \"[14]HFL4_mnist_cnn_100_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "# filename1 = \"[16]HFL8_mnist_cnn_30_lr[0.01]_C[0.1]_iid[0]_E[1]_B[10]\"\n",
+    "\n",
+    "##### CIFAR_CNN_NON-IID\n",
+    "# filename1 = \"[20]FL_cifar_cnn_300_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]\"\n",
+    "# filename1 = \"[21]HFL2_cifar_cnn_100_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]\"\n",
+    "# filename1 = \"[22]HFL4_cifar_cnn_100_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]\"\n",
+    "# filename1 = \"[23]HFL8_cifar_cnn_100_lr[0.01]_C[0.1]_iid[1]_E[5]_B[50]\"\n",
+    "\n",
+    "\n",
+    "\n",
+    "# with open(r\"../save/objects_fp16/\" + filename1 + \".pkl\", \"rb\") as input_file: data = pickle.load(input_file)\n",
     "with open(r\"../save/objects/\" + filename1 + \".pkl\", \"rb\") as input_file: data = pickle.load(input_file)\n",
     "with open(r\"../save/objects/\" + filename1 + \".pkl\", \"rb\") as input_file: data = pickle.load(input_file)\n",
-    "        \n",
-    "# print(data)\n",
+    "\n",
     "trloss = data[0]\n",
     "trloss = data[0]\n",
     "tracc = data[1]\n",
     "tracc = data[1]\n",
-    "# testloss = data[2]\n",
-    "# print(len(trloss))\n",
-    "# (len(tracc))"
-   ]
-  },
-  {
-   "cell_type": "raw",
-   "metadata": {},
-   "source": [
-    "# using enumerate() + next() to find index of first element just greater than 80%\n",
-    "testacc = 0.63\n",
+    "\n",
+    "# using enumerate() + next() to find index of first element just greater than a certain percentage\n",
+    "testacc = 0.97\n",
     "res = next(x for x, val in enumerate(tracc) if val >= testacc) \n",
     "res = next(x for x, val in enumerate(tracc) if val >= testacc) \n",
     "\n",
     "\n",
     "# printing result \n",
     "# printing result \n",
@@ -510,7 +542,7 @@
    "name": "python",
    "name": "python",
    "nbconvert_exporter": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "pygments_lexer": "ipython3",
-   "version": "3.6.9"
+   "version": "3.7.3"
   }
   }
  },
  },
  "nbformat": 4,
  "nbformat": 4,

+ 1 - 1
src/Eval_fp16.ipynb

@@ -518,7 +518,7 @@
    "name": "python",
    "name": "python",
    "nbconvert_exporter": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "pygments_lexer": "ipython3",
-   "version": "3.6.9"
+   "version": "3.7.3"
   }
   }
  },
  },
  "nbformat": 4,
  "nbformat": 4,

二進制
src/__pycache__/models.cpython-37.pyc


二進制
src/__pycache__/options.cpython-37.pyc


二進制
src/__pycache__/sampling.cpython-37.pyc


二進制
src/__pycache__/update.cpython-37.pyc


二進制
src/__pycache__/utils.cpython-37.pyc


+ 1 - 1
src/script_bash_FL_diffFP_mnist_cnn.sh

@@ -4,7 +4,7 @@
 # Website on how to write bash script https://hackernoon.com/know-shell-scripting-202b2fbe03a8
 # Website on how to write bash script https://hackernoon.com/know-shell-scripting-202b2fbe03a8
 
 
 # Set GPU device
 # Set GPU device
-GPU_ID="cuda:1"
+GPU_ID="cuda:0"
 
 
 
 
 # ================ 32-bit ================ 
 # ================ 32-bit ================ 

+ 1 - 1
src/script_bash_FL_diffFP_mnist_mlp.sh

@@ -4,7 +4,7 @@
 # Website on how to write bash script https://hackernoon.com/know-shell-scripting-202b2fbe03a8
 # Website on how to write bash script https://hackernoon.com/know-shell-scripting-202b2fbe03a8
 
 
 # Set GPU device
 # Set GPU device
-GPU_ID="cuda:1"
+GPU_ID="cuda:0"
 
 
 # ================ 32-bit ================ 
 # ================ 32-bit ================ 
 # This is for FL for 32-bit floating point
 # This is for FL for 32-bit floating point

部分文件因文件數量過多而無法顯示