| | --- |
| | language: |
| | - en |
| | - de |
| | - fr |
| | - it |
| | - pt |
| | - hi |
| | - es |
| | - th |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | tags: |
| | - facebook |
| | - meta |
| | - pytorch |
| | - llama |
| | - llama-3 |
| | license: llama3.2 |
| | --- |
| | |
| | # Evolution Learning Network (ELN) with QLoRA and Genetic Algorithms For LLM |
| |
|
| | ## Overview |
| |
|
| | This project implements an **Evolution Learning Network (ELN)** to fine-tune transformer-based models like LLaMA using a combination of **Quantized Low-Rank Adaptation (QLoRA)** and **Genetic Algorithms (GA)**. The primary objective is to evolve a population of models across multiple generations to optimize for performance (fitness) and specialization, while maintaining diversity. |
| |
|
| | ### Key Features |
| | - Efficient model fine-tuning using **QLoRA** with 4-bit quantization |
| | - Evolutionary strategies with tournament selection and blended crossover |
| | - Adaptive mutation rates based on generation progress |
| | - Comprehensive experiment tracking with **WandB** |
| | - Diversity maintenance through LoRA weight fingerprinting |
| |
|
| | ## Model Details |
| |
|
| | ### Base Model |
| | - **Name**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) |
| | - **Architecture**: Transformer-based causal language model |
| |
|
| | ### Quantization Configuration |
| | - **Type**: 4-bit quantization using `bitsandbytes` |
| | - **Parameters**: |
| | - Compute Type: `torch.float16` |
| | - Quantization Type: `"nf4"` (Nonlinear) |
| | - Double Quantization: Enabled |
| | - Nested Quantization: Enabled |
| |
|
| | ### LoRA Configuration |
| | - **Rank (r)**: 8 |
| | - **Alpha**: 16 |
| | - **Target Modules**: `q_proj`, `v_proj` |
| | - **Dropout**: 0.05 |
| | - **Task Type**: `CAUSAL_LM` |
| |
|
| | ### Training Configuration |
| | - **Optimizer**: `paged_adamw_8bit` |
| | - **Precision**: Mixed precision (`fp16`) |
| | - **Batch Size Range**: 2-16 (genome-controlled) |
| | - **Learning Rate Range**: 1e-6 to 1e-2 (genome-controlled) |
| | - **Epochs Range**: 1-4 (genome-controlled) |
| |
|
| | ## Dataset |
| |
|
| | ### Source |
| | - **Name**: WikiText-2 Raw |
| | - **Configuration**: `wikitext-2-raw-v1` |
| | - **Processing**: |
| | - Max Length: 128 tokens |
| | - Padding: Fixed to max length |
| | - Splits: train, validation (general), test (specific) |
| |
|
| | ## Evolution Process |
| |
|
| | ### Population Management |
| | 1. **Initialization**: |
| | - Population Size: 6 models |
| | - Initial random mutations (20% rate) |
| | - Randomized hyperparameter genomes |
| |
|
| | 2. **Selection & Evolution**: |
| | - Tournament selection (k=3) |
| | - Blended crossover of LoRA weights |
| | - Adaptive mutation rates (decreases with generations) |
| | - Hyperparameter mutation with controlled ranges |
| |
|
| | ## Experimental Results |
| |
|
| | ### Evolution Progress |
| |
|
| | The evolutionary learning process was run for 8 generations with a population size of 6 models. The experiment tracked several key metrics across generations: |
| |
|
| | Evolution Metrics |
| | <div style="display: flex;"> |
| | <img src="https://huggingface.co/diabolic6045/ELN-llama-1B-adapter/resolve/main/images/output.png" alt="Evolution Metrics" style="width: 50%;height: 50%;"/></div> |
| |
|
| | #### Fitness Progression |
| | - **Initial Performance**: Best fitness started at ~0.480 (Generation 1) |
| | - **Convergence**: Gradual decline to ~0.476 by Generation 8 |
| | - **Population Stability**: Average fitness closely tracked best fitness after Generation 2, indicating good convergence |
| | - **Fitness Range**: Maintained between 0.476-0.480 throughout evolution |
| |
|
| | #### Specialization Trends |
| | - **High Baseline**: Started at ~0.9975 specialization |
| | - **Consistency**: Fluctuated minimally between 0.9975-0.9990 |
| | - **Peak Performance**: Reached ~0.9991 specialization in Generation 6 |
| | - **Population Average**: Maintained above 0.997 throughout evolution |
| |
|
| | ### Comparison with Standard Training |
| |
|
| |  |
| |
|
| | The comparison reveals several key differences between ELN and standard training: |
| |
|
| | #### Fitness Metrics |
| | - **ELN**: 0.4762 final fitness with stable progression |
| | - **Standard**: 0.4779 final fitness with steeper learning curve |
| | - **Difference**: ~0.3% performance gap, favoring standard training |
| |
|
| | #### Training Characteristics |
| | - **Loss Reduction**: |
| | - Standard: Sharp initial drop followed by gradual improvement |
| | - ELN: More controlled, stable descent |
| | - **Specialization**: |
| | - Standard: More variable specialization scores |
| | - ELN: Consistently high specialization maintenance |
| |
|
| | #### Key Advantages of ELN |
| | 1. More stable learning trajectory |
| | 2. Better maintenance of model diversity |
| | 3. Consistent specialization scores |
| | 4. Reduced risk of catastrophic forgetting |
| |
|
| | ## Hardware & Framework Requirements |
| |
|
| | ### Hardware |
| | - Multi-GPU support via `DistributedDataParallel` |
| | - Memory optimization through gradient accumulation |
| | - Hardware monitoring (CPU/GPU usage) |
| |
|
| | ### Dependencies |
| | - transformers |
| | - peft |
| | - bitsandbytes |
| | - accelerate |
| | - wandb |
| | - torch >= 2.0 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | # Load model directly |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("diabolic6045/ELN-Llama-1B") |
| | model = AutoModelForCausalLM.from_pretrained("diabolic6045/ELN-Llama-1B") |
| | ``` |
| |
|
| | ## Framework Versions |
| | - PEFT 0.14.0 |
| |
|
| | ## Future Work |
| | - Explore larger population sizes and generations |
| | - Implement additional mutation strategies |
| | - Test on diverse datasets and tasks |
| | - Investigate multi-objective optimization |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use this work, please cite: |
| |
|
| | ```bibtex |
| | @misc{eln2024, |
| | title={Evolution Learning Network (ELN): Combining QLoRA and Genetic Algorithms for LLM Optimization}, |
| | year={2024}, |
| | howpublished={\url{https://github.com/diabolic6045/ELN-llama-1B-adapter}} |
| | } |
| | ``` |
| |
|
| | ### Related Works |
| |
|
| | This project builds upon several key papers and techniques: |
| |
|
| | ```bibtex |
| | @article{dettmers2023qlora, |
| | title={QLoRA: Efficient Finetuning of Quantized LLMs}, |
| | author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke}, |
| | journal={arXiv preprint arXiv:2305.14314}, |
| | year={2023} |
| | } |
| | |
| | @article{touvron2023llama, |
| | title={Llama: Open and Efficient Foundation Language Models}, |
| | author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume}, |
| | journal={arXiv preprint arXiv:2302.13971}, |
| | year={2023} |
| | } |
| | |
| | @article{such2017deep, |
| | title={Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning}, |
| | author={Such, Felipe Petroski and Madhavan, Vashisht and Conti, Edoardo and Lehman, Joel and Stanley, Kenneth O and Clune, Jeff}, |
| | journal={arXiv preprint arXiv:1712.06567}, |
| | year={2017} |
| | } |
| | |
| | @article{real2019regularized, |
| | title={Regularized Evolution for Image Classifier Architecture Search}, |
| | author={Real, Esteban and Aggarwal, Alok and Huang, Yanping and Le, Quoc V}, |
| | journal={Proceedings of the AAAI Conference on Artificial Intelligence}, |
| | volume={33}, |
| | number={01}, |
| | pages={4780--4789}, |
| | year={2019} |
| | } |
| | ``` |
| |
|
| | These citations cover: |
| | 1. QLoRA quantization and fine-tuning technique |
| | 2. The base LLaMA model architecture |
| | 3. Deep neuroevolution fundamentals |
| | 4. Regularized evolution in neural networks |
| |
|
| | The implementation also draws inspiration from recent advances in evolutionary algorithms and neural architecture search. |
| |
|
| | ~ [diabolic6045](https://huggingface.co/diabolic6045) |
| |
|
| | --- |
| |
|