Skip to content
Snippets Groups Projects
Commit e025bf51 authored by MMZK1526's avatar MMZK1526
Browse files

length grouping tuning

parent 5160d7ec
No related branches found
No related tags found
No related merge requests found
{"learning_rate": 1.4077615408330847e-05, "per_device_train_batch_size": 16, "weight_decay": 0.02877179303495793, "upsample_factor": 10, "hypertune_trials": 4}
\ No newline at end of file
......@@ -40,6 +40,7 @@ class TuningToggle:
logging_strategy="epoch",
load_best_model_at_end=False,
push_to_hub=False,
group_by_length=True,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={"use_reentrant": False},
fp16=self.env.device == "cuda:0",
......
......@@ -72,7 +72,7 @@ if __name__ == '__main__':
optuna.logging.disable_default_handler()
# Perform tuning
configs_optim: dict[str, any] = get_optimal_hyperparameters(env, [7, 6, 5, 4, 3])
configs_optim: dict[str, any] = get_optimal_hyperparameters(env, range(10, 0, -1))
# Save optimal hyperparameters
configs.update(configs_optim)
......
upsample_factor = 10
A new study created in memory with name: no-name-7c96a4d5-6148-4df4-a294-1179cae2055e
Trial 0 finished with value: 0.45714285714285713 and parameters: {'learning_rate': 6.557028252243745e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04407276262748085}. Best is trial 0 with value: 0.45714285714285713.
Trial 1 finished with value: 0.17329700272479565 and parameters: {'learning_rate': 6.413238689860444e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.02920115661456153}. Best is trial 0 with value: 0.45714285714285713.
Trial 2 finished with value: 0.546448087431694 and parameters: {'learning_rate': 6.124421924326035e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.010762306661823779}. Best is trial 2 with value: 0.546448087431694.
Trial 3 finished with value: 0.19290603609209708 and parameters: {'learning_rate': 2.765261435850596e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.013009093900656282}. Best is trial 2 with value: 0.546448087431694.
Trial 4 finished with value: 0.5882352941176471 and parameters: {'learning_rate': 1.4077615408330847e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.02877179303495793}. Best is trial 4 with value: 0.5882352941176471.
Trial 5 pruned.
Trial 6 finished with value: 0.4540727902946274 and parameters: {'learning_rate': 5.8731410409653775e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.015471102487484931}. Best is trial 4 with value: 0.5882352941176471.
Trial 7 finished with value: 0.5734597156398105 and parameters: {'learning_rate': 2.4447986545462176e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.035863304784735456}. Best is trial 4 with value: 0.5882352941176471.
Trial 8 finished with value: 0.5847665847665847 and parameters: {'learning_rate': 3.340059985476931e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.015305847978146655}. Best is trial 4 with value: 0.5882352941176471.
Trial 9 finished with value: 0.5478841870824054 and parameters: {'learning_rate': 2.0635414977246645e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.031100180566782917}. Best is trial 4 with value: 0.5882352941176471.
upsample_factor = 9
Parameter 'function'=<function preprocess_train_data.<locals>.<lambda> at 0x39c086280> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
A new study created in memory with name: no-name-0ccb0d0e-4b71-4960-bea3-789aa8ba6b18
Trial 0 finished with value: 0.17329700272479565 and parameters: {'learning_rate': 9.954541926196344e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01490528283886534}. Best is trial 0 with value: 0.17329700272479565.
Trial 1 finished with value: 0.5329512893982808 and parameters: {'learning_rate': 4.784285686099306e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.012322486510839871}. Best is trial 1 with value: 0.5329512893982808.
Trial 2 finished with value: 0.0 and parameters: {'learning_rate': 9.509642977304561e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.021072877821663668}. Best is trial 1 with value: 0.5329512893982808.
Trial 3 finished with value: 0.3248587570621469 and parameters: {'learning_rate': 9.373088741002167e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03238481129957002}. Best is trial 1 with value: 0.5329512893982808.
Trial 4 finished with value: 0.5136612021857924 and parameters: {'learning_rate': 4.056211538712784e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.039615964114542136}. Best is trial 1 with value: 0.5329512893982808.
Trial 5 finished with value: 0.5519125683060109 and parameters: {'learning_rate': 3.7075199885330543e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04446407591422239}. Best is trial 5 with value: 0.5519125683060109.
Trial 6 pruned.
Trial 7 pruned.
Trial 8 finished with value: 0.5132743362831859 and parameters: {'learning_rate': 8.444219476201994e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03366218091814306}. Best is trial 5 with value: 0.5519125683060109.
Trial 9 finished with value: 0.49624060150375937 and parameters: {'learning_rate': 4.711476115400386e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03854921555275274}. Best is trial 5 with value: 0.5519125683060109.
upsample_factor = 8
A new study created in memory with name: no-name-5f314267-4280-42e3-ac3c-4cc1e835483e
Trial 0 finished with value: 0.49877750611246946 and parameters: {'learning_rate': 3.589066370534868e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.018082287586032506}. Best is trial 0 with value: 0.49877750611246946.
Trial 1 finished with value: 0.5146198830409356 and parameters: {'learning_rate': 5.62727386977522e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04847598202130434}. Best is trial 1 with value: 0.5146198830409356.
Trial 2 finished with value: 0.4636363636363636 and parameters: {'learning_rate': 9.747238714996074e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.040733260685744746}. Best is trial 1 with value: 0.5146198830409356.
Trial 3 finished with value: 0.39365079365079364 and parameters: {'learning_rate': 9.474461397196971e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.007843793437314107}. Best is trial 1 with value: 0.5146198830409356.
Trial 4 finished with value: 0.5433526011560693 and parameters: {'learning_rate': 4.722622411238557e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04942944425181867}. Best is trial 4 with value: 0.5433526011560693.
Trial 5 finished with value: 0.522911051212938 and parameters: {'learning_rate': 4.536249708535563e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03436031176189035}. Best is trial 4 with value: 0.5433526011560693.
Trial 6 finished with value: 0.5327313769751693 and parameters: {'learning_rate': 1.3568503929506977e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.017373027794766037}. Best is trial 4 with value: 0.5433526011560693.
Trial 7 finished with value: 0.521978021978022 and parameters: {'learning_rate': 3.439123458164407e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.022057440753831918}. Best is trial 4 with value: 0.5433526011560693.
Trial 8 pruned.
Trial 9 pruned.
upsample_factor = 7
A new study created in memory with name: no-name-e09cbd51-4df8-47f5-be3f-c261a807167e
Trial 0 finished with value: 0.5102880658436214 and parameters: {'learning_rate': 1.902731450949942e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.02471114471697448}. Best is trial 0 with value: 0.5102880658436214.
Trial 1 finished with value: 0.5417721518987342 and parameters: {'learning_rate': 2.965186187342516e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.037159130383161404}. Best is trial 1 with value: 0.5417721518987342.
Trial 2 finished with value: 0.5454545454545454 and parameters: {'learning_rate': 1.4697902614155984e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01830956819247998}. Best is trial 2 with value: 0.5454545454545454.
Trial 3 finished with value: 0.46924829157175396 and parameters: {'learning_rate': 7.346014332050937e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04412779906595838}. Best is trial 2 with value: 0.5454545454545454.
Trial 4 finished with value: 0.5316455696202531 and parameters: {'learning_rate': 3.035478855902646e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04301949606167546}. Best is trial 2 with value: 0.5454545454545454.
Trial 5 finished with value: 0.5153664302600472 and parameters: {'learning_rate': 5.992413595044476e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.03691148789176883}. Best is trial 2 with value: 0.5454545454545454.
Trial 6 finished with value: 0.5436893203883495 and parameters: {'learning_rate': 2.4583568349440317e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03955409241521041}. Best is trial 2 with value: 0.5454545454545454.
Trial 7 pruned.
Trial 8 pruned.
Trial 9 pruned.
upsample_factor = 6
A new study created in memory with name: no-name-63e4f5f5-2428-4896-b356-3bc5fd7257f9
Trial 0 finished with value: 0.37037037037037035 and parameters: {'learning_rate': 9.695288076642754e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.02982685675074162}. Best is trial 0 with value: 0.37037037037037035.
Trial 1 finished with value: 0.27586206896551724 and parameters: {'learning_rate': 8.149726684689566e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03993784549540626}. Best is trial 0 with value: 0.37037037037037035.
Trial 2 finished with value: 0.5323383084577115 and parameters: {'learning_rate': 2.6974644137563597e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.0104687679011848}. Best is trial 2 with value: 0.5323383084577115.
Trial 3 finished with value: 0.5384615384615384 and parameters: {'learning_rate': 2.723729186057461e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.024438821785084403}. Best is trial 3 with value: 0.5384615384615384.
Trial 4 finished with value: 0.0 and parameters: {'learning_rate': 9.703030690261806e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.032056494532394716}. Best is trial 3 with value: 0.5384615384615384.
Trial 5 finished with value: 0.5458823529411765 and parameters: {'learning_rate': 2.2881441665983104e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.031101249628729215}. Best is trial 5 with value: 0.5458823529411765.
Trial 6 finished with value: 0.5526932084309133 and parameters: {'learning_rate': 2.1336398109721225e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04448289819295083}. Best is trial 6 with value: 0.5526932084309133.
Trial 7 finished with value: 0.5164319248826291 and parameters: {'learning_rate': 1.7975835615121758e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.023336255769612936}. Best is trial 6 with value: 0.5526932084309133.
Trial 8 finished with value: 0.4732824427480916 and parameters: {'learning_rate': 5.3340758174770044e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.006534224628853531}. Best is trial 6 with value: 0.5526932084309133.
Trial 9 pruned.
upsample_factor = 5
A new study created in memory with name: no-name-f08a7aca-8a3b-42c3-a33d-99eaec453372
Trial 0 finished with value: 0.5310734463276836 and parameters: {'learning_rate': 4.4387232166394996e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.013339920202184906}. Best is trial 0 with value: 0.5310734463276836.
Trial 1 finished with value: 0.46924829157175396 and parameters: {'learning_rate': 9.666224302023454e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.011118018840077615}. Best is trial 0 with value: 0.5310734463276836.
Trial 2 finished with value: 0.5181347150259067 and parameters: {'learning_rate': 6.060421129576445e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.011416516862050245}. Best is trial 0 with value: 0.5310734463276836.
Trial 3 finished with value: 0.5340909090909091 and parameters: {'learning_rate': 2.4938165559520836e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.015197242683874655}. Best is trial 3 with value: 0.5340909090909091.
Trial 4 finished with value: 0.5068870523415978 and parameters: {'learning_rate': 7.956200029434267e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04315218443594787}. Best is trial 3 with value: 0.5340909090909091.
Trial 5 finished with value: 0.5329341317365269 and parameters: {'learning_rate': 3.2199876296449555e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04009655776222147}. Best is trial 3 with value: 0.5340909090909091.
Trial 6 pruned.
Trial 7 finished with value: 0.5033407572383074 and parameters: {'learning_rate': 1.5806461022098265e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.023411504744949335}. Best is trial 3 with value: 0.5340909090909091.
Trial 8 pruned.
Trial 9 finished with value: 0.5459610027855153 and parameters: {'learning_rate': 5.6674668238020006e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04056942318138311}. Best is trial 9 with value: 0.5459610027855153.
upsample_factor = 4
A new study created in memory with name: no-name-f0b016f6-9218-4201-9ae2-08933d99d3c2
Trial 0 finished with value: 0.46965699208443273 and parameters: {'learning_rate': 9.6161206009484e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.035497138927888534}. Best is trial 0 with value: 0.46965699208443273.
Trial 1 finished with value: 0.5423728813559322 and parameters: {'learning_rate': 1.9807541433287844e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.03210059830744783}. Best is trial 1 with value: 0.5423728813559322.
Trial 2 finished with value: 0.5114942528735632 and parameters: {'learning_rate': 6.501942459982723e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.026694186754259098}. Best is trial 1 with value: 0.5423728813559322.
Trial 3 finished with value: 0.5136612021857924 and parameters: {'learning_rate': 3.8784810098615075e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.024001017033869056}. Best is trial 1 with value: 0.5423728813559322.
Trial 4 finished with value: 0.3713355048859935 and parameters: {'learning_rate': 9.516530186618934e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.027403298986986062}. Best is trial 1 with value: 0.5423728813559322.
Trial 5 finished with value: 0.5458937198067633 and parameters: {'learning_rate': 2.244665924939244e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.026304550566617693}. Best is trial 5 with value: 0.5458937198067633.
Trial 6 pruned.
Trial 7 pruned.
Trial 8 pruned.
Trial 9 pruned.
upsample_factor = 3
A new study created in memory with name: no-name-5f6592f5-1b4b-4f3d-b290-42cdcc9042af
Trial 0 finished with value: 0.46900269541778977 and parameters: {'learning_rate': 6.361313395697918e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.0197911070344553}. Best is trial 0 with value: 0.46900269541778977.
Trial 1 finished with value: 0.5454545454545454 and parameters: {'learning_rate': 3.365794135236168e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.02302977301759517}. Best is trial 1 with value: 0.5454545454545454.
Trial 2 finished with value: 0.5387205387205387 and parameters: {'learning_rate': 6.683884950467047e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.01843882619023822}. Best is trial 1 with value: 0.5454545454545454.
Trial 3 finished with value: 0.49291784702549574 and parameters: {'learning_rate': 8.323445198628008e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.031108901172009224}. Best is trial 1 with value: 0.5454545454545454.
Trial 4 finished with value: 0.0 and parameters: {'learning_rate': 7.760943317501582e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.033441028147665725}. Best is trial 1 with value: 0.5454545454545454.
Trial 5 pruned.
Trial 6 finished with value: 0.5502645502645502 and parameters: {'learning_rate': 1.9230634004760798e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.035847671145160284}. Best is trial 6 with value: 0.5502645502645502.
Trial 7 pruned.
Trial 8 finished with value: 0.5480225988700564 and parameters: {'learning_rate': 2.9216054389094702e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.011749074981149181}. Best is trial 6 with value: 0.5502645502645502.
Trial 9 pruned.
upsample_factor = 2
A new study created in memory with name: no-name-5d2efbf7-6570-4483-8528-53430fb9217d
Trial 0 finished with value: 0.5380116959064327 and parameters: {'learning_rate': 3.926921232322188e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.0372830263279526}. Best is trial 0 with value: 0.5380116959064327.
Trial 1 finished with value: 0.5285285285285285 and parameters: {'learning_rate': 5.4713799954531994e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.017216088699518507}. Best is trial 0 with value: 0.5380116959064327.
Trial 2 finished with value: 0.49504950495049505 and parameters: {'learning_rate': 3.546966782096452e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.014736501397552806}. Best is trial 0 with value: 0.5380116959064327.
Trial 3 finished with value: 0.5 and parameters: {'learning_rate': 3.356584838681229e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.028432065654267785}. Best is trial 0 with value: 0.5380116959064327.
Trial 4 finished with value: 0.5313432835820896 and parameters: {'learning_rate': 2.6722245870579097e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.006341604817314253}. Best is trial 0 with value: 0.5380116959064327.
Trial 5 pruned.
Trial 6 pruned.
Trial 7 pruned.
Trial 8 pruned.
Trial 9 pruned.
upsample_factor = 1
A new study created in memory with name: no-name-36361def-6398-4b4b-a063-0a7974d05f17
Trial 0 finished with value: 0.0 and parameters: {'learning_rate': 7.251175362119161e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.01440468074332283}. Best is trial 0 with value: 0.0.
Trial 1 finished with value: 0.4 and parameters: {'learning_rate': 5.716502419661862e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.025131379526653827}. Best is trial 1 with value: 0.4.
Trial 2 finished with value: 0.35514018691588783 and parameters: {'learning_rate': 4.824475575204772e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.029609047520860594}. Best is trial 1 with value: 0.4.
Trial 3 finished with value: 0.47191011235955055 and parameters: {'learning_rate': 2.2754144045293067e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04750797379159929}. Best is trial 3 with value: 0.47191011235955055.
Trial 4 finished with value: 0.39823008849557523 and parameters: {'learning_rate': 1.75236459819577e-05, 'per_device_train_batch_size': 32, 'weight_decay': 0.04100932515833351}. Best is trial 3 with value: 0.47191011235955055.
Trial 5 pruned.
Trial 6 pruned.
Trial 7 pruned.
Trial 8 pruned.
Trial 9 finished with value: 0.3373493975903614 and parameters: {'learning_rate': 9.628715531810688e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.04241237939051233}. Best is trial 3 with value: 0.47191011235955055.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment