Whakautu poto: Whakamahia ngā GPU NVIDIA mō te whakangungu AI mā te whakaū tuatahi kei te kitea te taraiwa me te GPU me nvidia-smi , kātahi ka tāuta i tētahi anga hototahi/pūranga CUDA me te whakahaere i tētahi whakamātautau iti "tauira + puranga i runga i te cuda". Mēnā ka pāngia koe e te kore-mahara, whakaitihia te rahi o te puranga me te whakamahi i te tika whakauru, me te aroturuki i te whakamahinga, te mahara, me ngā pāmahana.
Ngā kōrero matua hei whakaaroaro:
Ngā tirotirohanga taketake : Tīmata ki a Nvidia-Smi ; whakatikahia te tirohanga a ngā taraiwa i mua i te tāuta i ngā anga.
Hototahitanga o te puranga : Puritia kia rite te taraiwa, te wā whakahaere CUDA, me ngā putanga anga hei ārai i ngā tukinga me ngā tāutanga pakarukaru.
Angitu iti : Whakaūtia kei te oma te tuku whakamua kotahi i runga i te CUDA i mua i te whakanui ake i ngā whakamātautau.
Te kaupapa VRAM : Whakawhirinaki ki te tika whakaranu, te kohikohinga rōnaki, me te arowhai kia uru ki ngā tauira nunui ake.
Te whanonga aroturuki : Aroturuki i te whakamahinga, ngā tauira mahara, te hiko, me ngā pāmahana kia kite wawe ai koe i ngā aukati.

Ngā tuhinga ka pai pea koe ki te pānui i muri i tēnei:
🔗 Me pēhea te hanga i tētahi kaihoko AI
Hoahoahia te rerengamahi, ngā taputapu, te mahara, me ngā parepare haumaru a tō kaihoko.
🔗 Me pēhea te whakatinana i ngā tauira AI
Te whakatū i ngā taiao, te tākai i ngā tauira, me te tuku atu ki te wāhi mahi me te pono.
🔗 Me pehea te ine i te mahi AI
Kōwhiria ngā inenga, whakahaerehia ngā aromatawai, me te aroturuki i te mahi i roto i te wā.
🔗 Me pēhea te whakahaere aunoa i ngā mahi mā te AI
Aunoatia ngā mahi auau mā te whakamahi i ngā tohuāki, ngā rerengamahi, me ngā whakaurunga.
1) Te tirohanga whānui - he aha tāu e mahi ana ina "whakangungu ana koe ki te GPU" 🧠⚡
Ina whakangungu koe i ngā tauira AI, he nui rawa ngā pāngarau matihiko e mahia ana e koe. Kua hangaia ngā GPU mō taua momo mahi whakarara, nō reira ka taea e ngā anga pēnei i a PyTorch, TensorFlow, me JAX te tuku i te kawenga taumaha ki te GPU. ( PyTorch CUDA docs , TensorFlow install (pip) , JAX Quickstart )
I roto i te mahi, ko te tikanga o te "whakamahi i ngā GPU NVIDIA mō te whakangungu" ko:
-
Kei roto (te nuinga) i te GPU VRAM ō tawhā tauira
-
Ka nekehia ō puranga mai i te RAM ki te VRAM i ia taahiraa
-
Ka oma tō whakawhiti whakamua me tō whakawhiti whakamuri i runga i ngā kakano CUDA ( Aratohu Papatono CUDA )
-
Ka puta ō whakahōutanga arotau i runga i te GPU (tērā pea)
-
Ka aroturuki koe i te pāmahana, te mahara, te whakamahinga kia kore ai koe e tunu i tetahi mea 🔥 ( Ngā tuhinga NVIDIA nvidia-smi )
Ki te mea he nui rawa tēnā, kaua e manukanuka. He rārangi arowhai noa iho tēnei me ētahi tikanga ka hangaia e koe i roto i te wā.
2) He aha te mea e pai ai te putanga o te tatūnga whakangungu NVIDIA GPU AI 🤌
Koinei te wāhanga "kaua e hanga whare ki runga i te tiēre". He pai te whakatakotoranga mō te Whakangungu AI me te iti o te kaha. He pumau te iti o te kaha. He tere te pumau. He tere… āe, he tere 😄
Ko te tikanga, he rite tonu te hanganga whakangungu pakari ki ēnei:
-
He nui te VRAM mō te rahi o tō puranga + te tauira + ngā āhua o te kaiwhakapaipai
-
He rite te VRAM ki te wāhi o ngā peke. Ka taea e koe te tākai me te mohio ake, engari kāore e taea te tākai mutunga kore.
-
-
He puranga pūmanawa ōrite (taraiwa + wā whakahaere CUDA + hototahi anga) ( PyTorch Tīmata (kōwhiringa CUDA) , tāutanga TensorFlow (pip) )
-
Te rokiroki tere (he āwhina nui a NVMe mō ngā huinga raraunga nui)
-
He pai te PTM me te RAM kia kore ai te uta raraunga e matekai i te GPU ( Aratohu Whakarerekētanga Mahi PyTorch )
-
Te whakamatao me te kaha o te whakamahana (kāore i te tino paingia kia kore ra ano 😬)
-
Taiao ka taea te tāruarua (venv/conda, ipu rānei) kia kore ai ngā whakapainga e huri hei raruraru ( Tirohanga whānui o te NVIDIA Container Toolkit )
Ā, tētahi atu mea e pekehia ana e te tangata:
-
He tikanga aroturuki - ka tirohia e koe te mahara me te whakamahinga o te GPU, pērā i te tirotiro i ngā whakaata i a koe e taraiwa ana. ( Ngā tuhinga NVIDIA nvidia-smi )
3) Ripanga Whakataurite - ngā huarahi rongonui hei whakangungu me ngā GPU NVIDIA (me ngā āhuatanga rerekē) 📊
Kei raro nei tētahi whārangi tinihanga poto mō te "ko tēhea te mea e tau ana?". He āhua noa iho ngā utu (nā te mea he rerekē te ao tūturu), āe, he ahua pōhewa tetahi o ēnei pūtau, he mea āta whakaarohia.
| Utauta / Huarahi | Pai rawa atu mō | Utu | He aha i pai ai (te nuinga) |
|---|---|---|---|
| PyTorch (vanilla) PyTorch | te nuinga o te tangata, te nuinga o ngā kaupapa | Koreutu | Ngāwari, he rauropi nui, he ngāwari ki te whakatika hapa - he whakaaro anō hoki tō te katoa |
| Ngā tuhinga uira a PyTorch | ngā tīma, whakangungu hanganga | Koreutu | Ka whakaiti i te kaha o te mahi, ka horoi ake i ngā koropiko; i ētahi wā ka rite ki te "makutu", kia kore ra ano e mahi |
| Ngā Whakawhiti Mata Awhi + Ngā Kaiako Ngā tuhinga a te | Te whakatikatika pai o te NLP + LLM | Koreutu | Whakangungu kei roto ngā pākahiko, ngā taunoa pai, ngā wikitoria tere 👍 |
| Whakateretere Whakateretere tuhinga | maha-GPU kāore he mamae | Koreutu | Ka whakaiti i te hoha o te DDP, he pai mō te whakanui ake me te kore e tuhi anō i ngā mea katoa |
| a DeepSpeed ZeRO | ngā tauira nui, ngā mahi tinihanga maumahara | Koreutu | Kore-whakamahi, tango-uta, tauine - he uaua engari he pai te pānga ina pāwhiritia |
| TensorFlow + Keras TF | ngā paipa whakaputa | Koreutu | He taputapu pakari, he kōrero whakatinanatanga pai; he pai ki ētahi, ko ētahi kāore e pai |
| JAX + Flax JAX Tīmatanga Tere / Ngā tuhinga Flax | rangahau + ngā tohunga tere | Koreutu | He tino tere te whakahiato XLA, engari he āhua…koretake te rapu hapa |
| Arotakenga NVIDIA NeMo | ngā rerengamahi kōrero + LLM | Koreutu | Pūranga kua arotauhia e NVIDIA, ngā tohutao pai - he rite ki te tunu kai me te oumu papai 🍳 |
| Tirohanga whānui o te Kete Taputapu Ipu Docker + NVIDIA | ngā taiao ka taea te whakaputa anō | Koreutu | "Ka mahi i runga i taku mīhini" ka huri hei "ka mahi i runga i ā mātou mīhini" (i te nuinga, anō) |
4) Hipanga tuatahi - whakaū kua kitea tika tō GPU 🕵️♂️
I mua i te tāutanga o ngā mea tekau mā rua, tirohia ngā kaupapa matua.
Ngā mea e hiahia ana koe kia pono:
-
Ka kite te mīhini i te GPU
-
Kua tika te tāutanga o te taraiwa NVIDIA
-
Kāore te GPU e mau tonu ana ki te mahi i tētahi atu mea
-
Ka taea e koe te uiui pono
Ko te haki matarohia ko:
-
nvidia-smi( tuhinga NVIDIA nvidia-smi )
Ngā mea e rapua ana e koe:
-
Ingoa GPU (hei tauira, RTX, raupapa-A, me ētahi atu)
-
Putanga taraiwa
-
Te whakamahinga o te mahara
-
Ngā tukanga e whakahaere ana ( ngā tuhinga NVIDIA nvidia-smi )
Ki te a nvidia-smi , kati i reira. Kaua e tāuta i ngā anga mahi. He rite ki te ngana ki te tunu taro i te wā kāore anō kia monohia tō oumu. ( Atanga Whakahaere Pūnaha NVIDIA (NVSMI) )
He kōrero iti noa iho tēnei: i ētahi wā a nvidia-smi engari ka rahua tonu tō whakangungu nā te mea kāore te wā whakahaere CUDA e whakamahia ana e tō anga e rite ana ki ngā tumanakohanga a te taraiwa. Ehara i te mea he kuware koe. Koinā te āhua o te mahi 😭 ( PyTorch Get Started (CUDA selector) , TensorFlow install (pip) )
5) Hangaia te puranga pūmanawa - ngā taraiwa, CUDA, cuDNN, me te "kanikani hototahi" 💃
Koinei te wāhi e ngaro ai ngā hāora o te tangata. Ko te mahi tinihanga: whiriwhiria he ara, ka piri ki taua ara .
Kōwhiringa A: CUDA kua tāpirihia ki te anga (he māmā noa iho)
He maha ngā PyTorch e hanga ana me tā rātou ake wā whakahaere CUDA, arā, kāore koe e hiahia ki tētahi kete taputapu CUDA katoa kua tāutahia puta noa i te pūnaha. Ko te nuinga o te wā ka hiahia koe ki tētahi taraiwa NVIDIA hototahi. ( PyTorch Get Started (CUDA selector) , Ngā Putanga PyTorch o Mua (ngā wira CUDA) )
Ngā Painga:
-
He iti ake ngā wāhanga neke
-
Ngā tāutanga māmā ake
-
He nui ake te taea te whakaputa anō mō ia taiao
Ngā ngoikoretanga:
-
Ki te whakaranu noa koe i ngā taiao, ka pōhēhē pea koe
Kōwhiringa B: Pūnaha CUDA kete taputapu (mana whakahaere nui ake)
Ka tāutahia e koe te kete taputapu CUDA ki te pūnaha, ā, ka hāngaihia ngā mea katoa ki roto. ( Ngā tuhinga o te Kete Taputapu CUDA )
Ngā Painga:
-
He mana whakahaere nui ake mō ngā hanganga ritenga, ētahi taputapu motuhake
-
He whai hua mō te whakahiato i ētahi mahi
Ngā ngoikoretanga:
-
He maha atu ngā huarahi hei whakataurite i ngā putanga me te tangi puku
cuDNN me NCCL, i roto i ngā tikanga tangata
-
te cuDNN i ngā kaupapa ako hōhonu (ngā whakakōpikopiko, ngā moka RNN, me ētahi atu) ( Ngā tuhinga NVIDIA cuDNN )
-
te NCCL te whare pukapuka tere "whakawhitiwhiti GPU-ki-GPU" mō te whakangungu maha-GPU ( tirohanga NCCL )
Ki te whakangungu koe i ngā GPU maha, ko NCCL tō hoa pai - ā, i ētahi wā, ko tō hoa noho pukuriri. ( Tirohanga whānui mō te NCCL )
6) Tō whakangungu GPU tuatahi (tauira whakaaro PyTorch) ✅🔥
Hei whai i te Whakangungu AI mō te Whakamahi i ngā GPU NVIDIA , kāore koe e hiahia ki tētahi kaupapa nui i te tuatahi. Me whai angitu iti koe.
Ngā whakaaro matua:
-
Kimihia te pūrere
-
Nukuhia te tauira ki te GPU
-
Nukuhia ngā tensor ki te GPU
-
Whakaūtia ngā oma whakamua i reira ( tuhinga PyTorch CUDA )
Ngā mea ka tirohia wawetia e au te hauora o te hinengaro:
-
torch.cuda.is_available()ka whakahokia maite Pono( torch.cuda.is_available ) -
next(model.parameters()).e whakaatu anate cuda( PyTorch Forum: tirohia te tauira i runga i te CUDA ) -
Kāore e hapa te tuku whakamua kotahi-rōpū
-
Ka piki ake te mahara o te GPU ina tīmata koe i te whakangungu (he tohu pai!) ( Ngā tuhinga NVIDIA nvidia-smi )
He pātai noa iho "he aha i puhoi ai?"
-
He puhoi rawa tō uta raraunga (e tatari ana te GPU me te kore mahi) ( Aratohu Whakarerekētanga Mahi PyTorch )
-
I wareware koe ki te nuku raraunga ki te GPU (āe)
-
He iti te rahi o te puranga (kāore i te tino whakamahia te GPU)
-
Kei te mahi koe i te tukatuka-mua o te CPU taumaha i te taahiraa whakangungu
Āe, he maha ngā wā ka āhua "kāore i te tino pukumahi" tō GPU mēnā ko te raraunga te mea e ārai ana. He rite ki te utu i tētahi taraiwa waka reihi kātahi ka meinga kia tatari mō te wahie i ia huringa.
7) Te kēmu VRAM - rahi puranga, tino tika, ā, kāore e pahū 💥🧳
Ko te nuinga o ngā raruraru whakangungu mahi ka tīmata ki te maumahara. Ki te ako koe i tētahi pūkenga, akohia te whakahaere VRAM.
Ngā huarahi tere hei whakaiti i te whakamahinga o te mahara
-
Tika whakaranu (FP16/BF16)
-
He nui te pikinga tere. He pai mō te katoa 😌 ( Ngā tuhinga AMP a PyTorch , te aratohu tika whakaranu a TensorFlow )
-
-
Te kohikohinga o te rōnaki
-
Whakatauirahia te rahi o te puranga nui ake mā te kohikohi i ngā rōnaki i runga i ngā taahiraa maha ( Ngā tuhinga whakangungu Transformers (whakaemi rōnaki, fp16) )
-
-
Roa raupapa iti ake / rahi tapahi
-
Nanakia engari whai hua
-
-
Te tirotiro whakahohenga
-
Tauhokohoko rorohiko mō te mahara (tātai anō i ngā whakahohenga i te wā whakamuri) ( torch.utils.checkpoint )
-
-
Whakamahia he arotau māmā ake
-
Ka rongoa e ētahi taputapu arotau ngā āhua tāpiri e ngaungau ana i te VRAM
-
Te wā "he aha i kī tonu ai te VRAM i muri i taku whakamutu?"
He maha ngā wā ka rongoatia e ngā anga te mahara hei pupuri i te mahi. He mea noa tēnei. He āhua whakamataku engari ehara i te mea he turuturu i ngā wā katoa. Ka ako koe ki te pānui i ngā tauira. ( PyTorch CUDA semantics: caching allocator )
Te tikanga mahi:
-
Mātakihia te mahara kua tohaina, me te mahara kua rāhuitia (e pā ana ki te anga mahi) ( PyTorch CUDA tikanga: te tohatoha i te mahara )
-
Kaua e mataku i te tau whakamataku tuatahi 😅
8) Me mahi te GPU - he whakatikatika mahi e tika ana mō tō wā 🏎️
Ko te mahi i te "whakangungu GPU" te taahiraa tuatahi. Ko te mahi tere te taahiraa tuarua.
Ngā arotautanga pānga nui
-
Whakanuia te rahi o te puranga (kia mamae rā anō, kātahi ka hoki whakamuri paku)
-
Whakamahia te mahara kua pinea i roto i ngā uta raraunga (ngā tārua tere ake o te manaaki-ki-te-pūrere) ( Aratohu Whakarerekētanga Mahi PyTorch , akoranga pin_memory/kore_aukati PyTorch )
-
Whakanuia ngā kaimahi uta raraunga (kia tūpato, ka kino pea te mahi a te tokomaha rawa) ( Aratohu Whakarerekētanga Mahi PyTorch )
-
Tikina ngā puranga i mua kia kore ai te GPU e noho mangere
-
Whakamahia ngā mahi whakakotahi / ngā kernel kua arotauhia ina wātea
-
Whakamahia te tino tika (he pai rawa atu anō) ( tuhinga PyTorch AMP )
Ko te aukati tino warewarehia
Tō pūnaha rokiroki me te tukatuka-mua. Mena he nui tō huinga raraunga, ā, kei runga i te kōpae puhoi e rongoa ana, ka huri tō GPU hei whakamahana wāhi utu nui. He whakamahana wāhi tino matatau, tino kanapa.
Me te whakaae iti anō hoki: Kua "arotau" ahau i tētahi tauira mō te haora, ā, ka mōhio ahau ko te takiuru te aukati. Mā te nui rawa o te tā ka whakaroa i te whakangungu. Ae, ka taea.
9) Whakangungu maha-GPU - DDP, NCCL, me te tauine me te kore he raruraru 🧩🤝
Kia tere ake te hiahia, kia nui ake rānei ngā tauira, ka whakamahia e koe ngā GPU maha. Koinei te wā ka tīmata ai te mahi.
Ngā huarahi noa
-
Raraunga Whakarara (DDP)
-
Wehea ngā puranga puta noa i ngā GPU, tukutahi i ngā rōnaki
-
Ko te kōwhiringa "pai" taunoa te tikanga ( tuhinga PyTorch DDP )
-
-
Tauira Whakarara / Whakarara Tensor
-
Wehea te tauira puta noa i ngā GPU (mō ngā tauira tino nui)
-
-
Paipa Whakarara
-
Wehea ngā paparanga tauira ki ngā wāhanga (pēnei i te raina huihuinga, engari mō ngā tensor)
-
Ki te mea kei te tīmata koe, ko te whakangungu momo-DDP te wāhi tino pai. ( Akoranga PyTorch DDP )
Ngā tohutohu whai hua mō ngā GPU maha
-
Kia rite te kaha o ngā GPU (ka raru pea te whakaranu)
-
Mātakihia te hononga: He mea nui te NVLink vs PCIe mō ngā mahi tukutahi taumaha ( Tirohanga whānui NVIDIA NVLink , ngā tuhinga NVIDIA NVLink )
-
Kia taurite te rahi o ia puranga GPU
-
Kaua e warewarehia te PTM me te rokiroki - ka taea e te maha-GPU te whakanui ake i ngā aukati raraunga
Āe, ka rite ngā hapa NCCL ki tētahi panga kua takaihia ki tētahi mea ngaro kua takaihia ki te "he aha ināianei". Kāore koe i kanga. Tērā pea. ( Tirohanga whānui NCCL )
10) Te aroturuki me te whakatauira - ngā mea koretake e whakaora ana i ō hāora 📈🧯
Kāore koe e hiahia ki ngā papaaho whakahaere papai hei tīmatanga. Me mōhio koe ina he tetahi mea.
Ngā tohu matua hei tirotiro
-
Te whakamahinga o te GPU : he teitei tonu, he koikoi rānei?
-
Te whakamahinga o te mahara : pumau, piki, rerekē rānei?
-
Te tango hiko : he iti rawa te whakamahinga, ā, ka kore pea e tino whakamahia.
-
Ngā pāmahana : ka taea e ngā pāmahana teitei tonu te whakararu i te mahi
-
Te whakamahinga o te CPU : ka puta mai ngā raruraru o te paipa raraunga i konei ( Aratohu Whakarerekētanga Mahi PyTorch )
Te whakaaroaro whakatauira (putanga māmā)
-
Mena he iti te whakamahinga o te GPU - he arai raraunga, he arai CPU rānei
-
Mena he teitei te GPU engari he puhoi - te koretake o te kernel, te tika, te hoahoa tauira rānei
-
Ki te heke noa te tere o te whakangungu - te whakahekenga wera, ngā tukanga papamuri, ngā hapa I/O
E mōhio ana ahau, kāore te aroturuki i te ahua ngahau. Engari he rite ki te miro niho. He whakarihariha, kātahi ka pai ake tō oranga.
11) Te rapurongoā - ngā mea e whakapaetia ana (me ngā mea kāore i te tino kitea) 🧰😵💫
Ko te kaupapa matua o tēnei wāhanga ko: "ngā putanga e rima, ake tonu atu."
Take: Kua pau te mahara o te CUDA
Ngā Whakatikatika:
-
whakaitihia te rahi o te puranga
-
whakamahia te tika whakauru ( ngā tuhinga AMP a PyTorch , te aratohu tika whakauru a TensorFlow )
-
kohikohinga rōnaki ( ngā tuhinga whakangungu Transformers (whakaemi rōnaki, fp16) )
-
ngā whakahohenga pūwāhi tirotiro ( torch.utils.checkpoint )
-
katihia ētahi atu tukanga GPU
Take: Ka oma tūpono te whakangungu i runga i te CPU
Ngā Whakatikatika:
-
kia nekehia te tauira ki
cuda -
kia nekehia ngā tensor ki
cuda -
tirohia te whirihoranga pūrere anga ( ngā tuhinga PyTorch CUDA )
Take: Ngā tukinga rerekē, te urunga mahara turekore rānei
Ngā Whakatikatika:
-
whakaū i te hototahitanga o te taraiwa me te wā whakahaere ( PyTorch Get Started (CUDA selector) , TensorFlow install (pip) )
-
whakamātauria he taiao ma
-
whakaitihia ngā mahi ritenga
-
whakahaere anō me ngā tautuhinga whakatau-āhua hei whakaputa anō
Take: Pōturi ake i te mea i tumanakohia
Ngā Whakatikatika:
-
tirohia te tere o te uta raraunga ( Aratohu Whakarerekētanga Mahi PyTorch )
-
whakanuia te rahi o te puranga
-
whakaitihia te takitaki rakau
-
whakahohe i te tika whakaranu ( tuhinga PyTorch AMP )
-
wehewehenga wā taahiraa kōtaha
Take: Ka tau te maha o ngā GPU
Ngā Whakatikatika:
-
whakaū i ngā tautuhinga papamuri tika ( ngā tuhinga tohatoha a PyTorch )
-
tirohia ngā whirihoranga taiao NCCL (kia tūpato) ( tirohanga whānui o te NCCL )
-
whakamātautauhia te GPU kotahi i te tuatahi
-
kia ora te whatunga/hononga
He kōrero whakamuri iti: i ētahi wā ko te whakatikatika he whakaara anō i te pūnaha. He āhua kuware. Ka mahi. He pērā anō ngā rorohiko.
12) Te utu me te whaihua - te whiriwhiri i te NVIDIA GPU tika me te tatūnga me te kore e nui rawa te whakaaro 💸🧠
Kāore e hiahiatia ana e ngā kaupapa katoa te GPU nui rawa atu. I ētahi wā, me nui te GPU.
Mena kei te whakatikatika koe i ngā tauira waenga
-
Whakatairangahia te VRAM me te pumau
-
He tino āwhina te tika whakaranu ( ngā tuhinga AMP a PyTorch , te aratohu tika whakaranu a TensorFlow )
-
He maha ngā wā ka taea e koe te mawhiti me te whakamahi i tētahi GPU kaha kotahi
Mena kei te whakangungu koe i ngā tauira nunui mai i te tīmatanga
-
Ka hiahia koe ki ngā GPU maha, ki ngā VRAM tino nui rānei
-
Ka aro nui koe ki te NVLink me te tere o te whakawhitiwhiti kōrero ( tirohanga whānui o NVIDIA NVLink , tirohanga whānui o NCCL )
-
Tērā pea ka whakamahia e koe ngā taputapu arotau mahara (ZeRO, offload, me ētahi atu) ( Ngā tuhinga DeepSpeed ZeRO , Microsoft Research: ZeRO/DeepSpeed )
Mena kei te mahi whakamātautau koe
-
E hiahia ana koe ki te tere o te whakahounga
-
Kaua e whakapau katoa i tō moni ki te GPU, kātahi ka pau te rokiroki me te RAM
-
Ka hinga te pūnaha taurite i te pūnaha koretake (i te nuinga o ngā rā)
Ā, ko te pono, ka taea e koe te moumou wiki ki te whai i ngā kōwhiringa taputapu "tino pai". Hangaia he mea ka taea te mahi, inehia, kātahi ka whakatikatika. Ko te hoariri tūturu ehara i te whai i tētahi porowhita urupare.
Hei whakamutunga - Me pēhea te whakamahi i ngā GPU NVIDIA mō te whakangungu AI me te kore e ngaro tō hinengaro 😌✅
Ki te kore koe e tango i tētahi atu mea mai i tēnei aratohu mō te whakamahi i ngā GPU NVIDIA mō te Whakangungu AI , tangohia tēnei:
-
Me whakarite kei
a nvidia-smii te tuatahi ( Ngā tuhinga NVIDIA nvidia-smi ) -
Kōwhiria he ara pūmanawa ma (ko te CUDA kua whakaurua ki te anga te mea māmā rawa atu) ( PyTorch Get Started (CUDA selector) )
-
Whakamanahia he oma whakangungu GPU iti i mua i te whakanui ake ( torch.cuda.is_available )
-
Whakahaerehia te VRAM me te mea he whata iti noa iho
-
Whakamahia te tika whakaranu i te tīmatanga - ehara i te mea he "mea matatau" noa iho ( tuhinga AMP PyTorch , aratohu tika whakaranu TensorFlow )
-
Ki te puhoi, whakapaehia te uta raraunga me te I/O i mua i te whakapae i te GPU ( PyTorch Performance Tuning Guide )
-
He kaha te Multi-GPU engari ka tāpirihia te uauatanga - ka āta whakatōpūtia ( ngā tuhinga PyTorch DDP , te tirohanga whānui a te NCCL )
-
Aroturukihia te whakamahinga me ngā pāmahana kia puta wawe ai ngā raruraru ( Ngā tuhinga NVIDIA nvidia-smi )
Ko te whakangungu i runga i ngā GPU NVIDIA tētahi o ngā pūkenga e ahua whakamataku ana, kātahi ka huri hei… noa iho. He rite ki te ako taraiwa. I te tīmatanga he haruru, he pōhēhē hoki ngā mea katoa, ā, ka kaha rawa te pupuri i te wira. Kātahi ka tae ki tētahi rā kei te tere koe, e inu kawhe ana, e whakatika ana i tētahi raruraru rahi puranga me te mea kāore he aha ☕😄
Ngā Pātai Auau
Te tikanga o te whakangungu i tētahi tauira AI i runga i tētahi NVIDIA GPU
Ko te whakangungu i runga i te GPU NVIDIA ko te tikanga ka noho ō tawhā tauira me ngā puranga whakangungu i roto i te GPU VRAM, ā, ko ngā pāngarau taumaha (te tuku whakamua, te tautoko whakamuri, ngā mahi arotau) ka whakahaerehia mā roto i ngā kernel CUDA. I roto i te mahi, ko te tikanga he whakarite kia noho te tauira me ngā tensor ki runga i te cuda , kātahi ka aroturuki i te mahara, te whakamahinga, me ngā pāmahana kia noho rite tonu te putanga.
Me pēhea te whakaū kei te mahi te NVIDIA GPU i mua i te tāuta i tētahi atu mea
Tīmata ki te nvidia-smi . Me whakaatu te ingoa o te GPU, te putanga taraiwa, te whakamahinga mahara o nāianei, me ngā tukanga e rere ana. Ki te te nvidia-smi , tatari kia whakamahia te PyTorch/TensorFlow/JAX - whakatikahia te tirohanga a te taraiwa i te tuatahi. Koinei te tirotiro tūāpapa "kua monohia te oumu" mō te whakangungu GPU.
Te whiriwhiri i waenga i te pūnaha CUDA me te CUDA e honoa ana ki a PyTorch
Ko tētahi huarahi noa ko te whakamahi i te CUDA kua whakaurua ki roto i te anga (pērā i te maha o ngā wira PyTorch) nā te mea ka whakaitihia ngā wāhanga neke - me whai taraiwa NVIDIA hototahi te nuinga. Mā te tāuta i te kete taputapu CUDA pūnaha katoa ka nui ake te mana whakahaere (ngā hanganga ritenga, ngā mahi whakahiato), engari ka puta mai hoki he maha atu ngā whai wāhitanga mō ngā putanga kore taurite me ngā hapa wā whakahaere e whakapoauau ana.
He aha i puhoi tonu ai te whakangungu ahakoa he NVIDIA GPU
He maha ngā wā, ka hemo te GPU i te paipa whakauru. Ko ngā uta raraunga e whakaroa ana, te tukatuka mua o te CPU taumaha i roto i te taahiraa whakangungu, te iti o ngā rahi puranga, te puhoi rānei o te rokiroki ka taea te mahi i tētahi GPU kaha kia rite ki tētahi whakamahana wāhi mangere. Ko te whakanui ake i ngā kaimahi uta raraunga, te whakahohe i te mahara pine, te tāpiri i te tiki mua, me te tapahi i te takiuru he mahi tuatahi noa i mua i te whakapae i te tauira.
Me pēhea te ārai i ngā hapa "CUDA out of memory" i te wā whakangungu NVIDIA GPU
Ko te nuinga o ngā whakatikatika he rautaki VRAM: whakaitihia te rahi o te puranga, whakahohehia te tika whakauru (FP16/BF16), whakamahia te kohikohinga rōnaki, whakapotohia te roa o te raupapa/rahi tapahi, whakamahia rānei te arowhai whakahohenga. Tirohia hoki mēnā he maha atu ngā tukanga GPU e pau ana te mahara. He mea noa te whakamātau me te hapa - ka noho te tahua VRAM hei tikanga matua i roto i te whakangungu GPU mahi.
He aha i taea ai e te VRAM te titiro tonu kua kī tonu i muri i te mutunga o te tuhinga whakangungu
He maha ngā wā ka rongoatia e ngā anga te mahara GPU kia tere ai te mahi, nō reira ka noho teitei tonu te mahara kua rahuitia ahakoa ka heke te mahara kua tohaina. He rite tonu ki te turuturu, engari he maha ngā wā ka mahi te kaiwhakarite rokiroki kia rite ki te hoahoa. Ko te tikanga mahi ko te whai i te tauira i roto i te wā, me te whakataurite i te "kua tohaina me te mea kua rahuitia" kaua ki te aro ki tētahi whakaahua whakamataku kotahi.
Me pēhea te whakaū kāore te tauira e whakangungu puku ana i runga i te CPU
Tirotirohia te māramatanga wawe: whakaū ka whakahokia mai torch.cuda.is_available() te Pono , whakaū ka whakaatuhia e next(model.parameters()).device te cuda , ā, ka whakahaere i tētahi tuku whakamua kotahi me te kore hapa. Mena ka ahua puhoi te mahi, whakaūhia hoki kei te nekehia ō puranga ki te GPU. He mea noa te neke i te tauira, ā, ka waiho pōhēhētia ngā raraunga ki muri.
Ko te ara māmā ki te whakangungu maha-GPU
Ko te Data Parallel (whakangungu momo DDP) te taahiraa tuatahi pai rawa atu: te wehewehe i ngā puranga puta noa i ngā GPU me ngā rōnaki tukutahi. Ka taea e ngā taputapu pēnei i a Accelerate te whakaiti i te uaua o te maha-GPU me te kore e tuhi anō i te katoa. Me tumanako ki ētahi atu taurangi - te whakawhitiwhiti kōrero NCCL, ngā rerekētanga hononga (NVLink vs PCIe), me ngā aukati raraunga kua whakanuia - nō reira ka pai ake te haere o te tauine haere i muri i te whakahaere i te kotahi-GPU pakari.
Ngā mea hei aroturuki i te wā whakangungu GPU NVIDIA hei hopu wawe i ngā raruraru
Kia mataara ki te whakamahinga o te GPU, te whakamahinga mahara (pūmau vs piki), te tango hiko, me ngā pāmahana - ka taea e te whakawhāiti te whakaheke i te tere. Kia mataara hoki ki te whakamahinga o te CPU, nā te mea he maha ngā wā ka puta tuatahi mai ngā raruraru o te paipa raraunga. Mena he koi, he iti rānei te whakamahinga, whakapaehia ngā I/O, ngā uta raraunga rānei; ki te mea he teitei engari he puhoi tonu te wā hikoinga, tirohia ngā kernel, te aratau tika, me te wehenga wā-hikoinga.
Ngā Tohutoro
-
NVIDIA - Ngā tuhinga NVIDIA nvidia-smi - docs.nvidia.com
-
NVIDIA - Atanga Whakahaere Pūnaha NVIDIA (NVSMI) - developer.nvidia.com
-
NVIDIA - Tirohanga whānui mō te NVIDIA NVLink - nvidia.com
-
PyTorch - Tīmata PyTorch (kōwhiringa CUDA) - pytorch.org
-
PyTorch - Ngā tuhinga CUDA a PyTorch - docs.pytorch.org
-
TensorFlow - Tāuta TensorFlow (pip) - tensorflow.org
-
JAX - JAX Tīmatanga Tere - docs.jax.dev
-
Kanohi Awhi - Ngā tuhinga a te Kaiwhakangungu - huggingface.co
-
Uira AI - Ngā tuhinga uira - lightning.ai
-
DeepSpeed - Ngā tuhinga ZeRO - deepspeed.readthedocs.io
-
Rangahau Microsoft - Rangahau Microsoft: ZeRO/DeepSpeed - microsoft.com
-
Ngā Huinga Kōrero PyTorch - Huinga Kōrero PyTorch: tirohia te tauira i runga i te CUDA - discuss.pytorch.org