Распределенный блочно-координатный спуск для...
-
Upload
- -
Category
Data & Analytics
-
view
53 -
download
3
Transcript of Распределенный блочно-координатный спуск для...
Ðàñïðåäåëåííûé áëî÷íî-êîîðäèíàòíûé ñïóñê äëÿ
îáó÷åíèÿ ëîãèñòè÷åñêîé ðåãðåññèè ñ
L1-ðåãóëÿðèçàöèåé
Èëüÿ Òðîôèìîâ (Yandex Data Factory),Àëåêñàíäð Ãåíêèí (AVG Consulting)
4-ÿ Ìåæäóíàðîäíàÿ êîíôåðåíöèÿ ïî àíàëèçó èçîáðàæåíèé,ñîöèàëüíûõ ñåòåé è òåêñòîâ (ÀÈÑÒ)
Åêàòåðèíáóðã, 09.04.2015
Îáîáùåííûå ëèíåéíûå ìîäåëè
Çàäà÷à ìàøèííîãî îáó÷åíèÿ ïî ïðåöåäåíòàì.
Äàíî: îáó÷àþùàÿ âûáîðêà (xi , yi )ni=1
, xi ∈ Rp
Íóæíî ïîñòðîèòü çàâèñèìîñòü y(x).
Ìîäåëèðóåì çàâèñèìîñòü â âèäå y ∼ f (βTx), íóæíî ïîäîáðàòüβ ∈ Rp
Ïðèìåðû: ëèíåéíàÿ ðåãðåññèÿ, ëîãèñòè÷åñêàÿ ðåãðåññèÿ,ïóàññîíîâñêàÿ ðåãðåññèÿ, ïðîáèò-ðåãðåññèÿ.
Ëîãèñòè÷åñêàÿ ðåãðåññèÿ: yi ∈ {−1,+1}
P(y = +1|x) =1
1 + exp(−βT x)
Ìèíóñ ëîã-ïðàâäîïîäîáèå (ýìïèðè÷åñêèé ðèñê) L(β)
L(β) =n∑
i=1
log(1 + exp(−yiβTxi ))
argminβ
L(β)
Îáîáùåííûå ëèíåéíûå ìîäåëè
Çàäà÷à ìàøèííîãî îáó÷åíèÿ ïî ïðåöåäåíòàì.
Äàíî: îáó÷àþùàÿ âûáîðêà (xi , yi )ni=1
, xi ∈ Rp
Íóæíî ïîñòðîèòü çàâèñèìîñòü y(x).Ìîäåëèðóåì çàâèñèìîñòü â âèäå y ∼ f (βTx), íóæíî ïîäîáðàòüβ ∈ Rp
Ïðèìåðû: ëèíåéíàÿ ðåãðåññèÿ, ëîãèñòè÷åñêàÿ ðåãðåññèÿ,ïóàññîíîâñêàÿ ðåãðåññèÿ, ïðîáèò-ðåãðåññèÿ.
Ëîãèñòè÷åñêàÿ ðåãðåññèÿ: yi ∈ {−1,+1}
P(y = +1|x) =1
1 + exp(−βT x)
Ìèíóñ ëîã-ïðàâäîïîäîáèå (ýìïèðè÷åñêèé ðèñê) L(β)
L(β) =n∑
i=1
log(1 + exp(−yiβTxi ))
argminβ
L(β)
Îáîáùåííûå ëèíåéíûå ìîäåëè
Çàäà÷à ìàøèííîãî îáó÷åíèÿ ïî ïðåöåäåíòàì.
Äàíî: îáó÷àþùàÿ âûáîðêà (xi , yi )ni=1
, xi ∈ Rp
Íóæíî ïîñòðîèòü çàâèñèìîñòü y(x).Ìîäåëèðóåì çàâèñèìîñòü â âèäå y ∼ f (βTx), íóæíî ïîäîáðàòüβ ∈ Rp
Ïðèìåðû: ëèíåéíàÿ ðåãðåññèÿ, ëîãèñòè÷åñêàÿ ðåãðåññèÿ,ïóàññîíîâñêàÿ ðåãðåññèÿ, ïðîáèò-ðåãðåññèÿ.
Ëîãèñòè÷åñêàÿ ðåãðåññèÿ: yi ∈ {−1,+1}
P(y = +1|x) =1
1 + exp(−βT x)
Ìèíóñ ëîã-ïðàâäîïîäîáèå (ýìïèðè÷åñêèé ðèñê) L(β)
L(β) =n∑
i=1
log(1 + exp(−yiβTxi ))
argminβ
L(β)
Îáîáùåííûå ëèíåéíûå ìîäåëè
Çàäà÷à ìàøèííîãî îáó÷åíèÿ ïî ïðåöåäåíòàì.
Äàíî: îáó÷àþùàÿ âûáîðêà (xi , yi )ni=1
, xi ∈ Rp
Íóæíî ïîñòðîèòü çàâèñèìîñòü y(x).Ìîäåëèðóåì çàâèñèìîñòü â âèäå y ∼ f (βTx), íóæíî ïîäîáðàòüβ ∈ Rp
Ïðèìåðû: ëèíåéíàÿ ðåãðåññèÿ, ëîãèñòè÷åñêàÿ ðåãðåññèÿ,ïóàññîíîâñêàÿ ðåãðåññèÿ, ïðîáèò-ðåãðåññèÿ.
Ëîãèñòè÷åñêàÿ ðåãðåññèÿ: yi ∈ {−1,+1}
P(y = +1|x) =1
1 + exp(−βT x)
Ìèíóñ ëîã-ïðàâäîïîäîáèå (ýìïèðè÷åñêèé ðèñê) L(β)
L(β) =n∑
i=1
log(1 + exp(−yiβTxi ))
argminβ
L(β)
Îáîáùåííûå ëèíåéíûå ìîäåëè
Çàäà÷à ìàøèííîãî îáó÷åíèÿ ïî ïðåöåäåíòàì.
Äàíî: îáó÷àþùàÿ âûáîðêà (xi , yi )ni=1
, xi ∈ Rp
Íóæíî ïîñòðîèòü çàâèñèìîñòü y(x).Ìîäåëèðóåì çàâèñèìîñòü â âèäå y ∼ f (βTx), íóæíî ïîäîáðàòüβ ∈ Rp
Ïðèìåðû: ëèíåéíàÿ ðåãðåññèÿ, ëîãèñòè÷åñêàÿ ðåãðåññèÿ,ïóàññîíîâñêàÿ ðåãðåññèÿ, ïðîáèò-ðåãðåññèÿ.
Ëîãèñòè÷åñêàÿ ðåãðåññèÿ: yi ∈ {−1,+1}
P(y = +1|x) =1
1 + exp(−βT x)
Ìèíóñ ëîã-ïðàâäîïîäîáèå (ýìïèðè÷åñêèé ðèñê) L(β)
L(β) =n∑
i=1
log(1 + exp(−yiβTxi ))
argminβ
L(β)
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L2-ðåãóëÿðèçàöèÿ
argminβ
(L(β) +
λ22||β||2
)
L1-ðåãóëÿðèçàöèÿ (îäíîâðåìåííàÿ ðåãóëÿðèçàöèÿ + îòáîðïðèçíàêîâ)
argminβ
(L(β) + λ1||β||1)
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L2-ðåãóëÿðèçàöèÿ
argminβ
(L(β) +
λ22||β||2
)
L1-ðåãóëÿðèçàöèÿ (îäíîâðåìåííàÿ ðåãóëÿðèçàöèÿ + îòáîðïðèçíàêîâ)
argminβ
(L(β) + λ1||β||1)
Big Data
Áîëüøèå îáó÷àþùèå âûáîðêèn, p > 106, ðàçìåð > 10 Gb.
Íóæíû áûñòðûå àëãîðèòìû, êîòîðûå ðàñïàðàëëåëèâàþòñÿ
ïî íåñêîëüêèì ïðîöåññîðàì/ÿäðàì
ïî íåñêîëüêèì ñåðâåðàì
Big Data
Áîëüøèå îáó÷àþùèå âûáîðêèn, p > 106, ðàçìåð > 10 Gb.
Íóæíû áûñòðûå àëãîðèòìû, êîòîðûå ðàñïàðàëëåëèâàþòñÿ
ïî íåñêîëüêèì ïðîöåññîðàì/ÿäðàì
ïî íåñêîëüêèì ñåðâåðàì
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L2-ðåãóëÿðèçàöèÿ
argminβ
(L(β) +
λ22||β||2
)Ìèíèìèçàöèÿ ãëàäêîé âûïóêëîé ôóíêöèè.
Êàê îïòèìèçèðîâàòü?
Ìåòîä SGD
ïëîõî ïàðàëëåëèòñÿ
Ìåòîä ñîïðÿæåííûõ ãðàäèåíòîâ
õîðîøî ïàðàëëåëèòñÿ
Ìåòîä L-BFGS
õîðîøî ïàðàëëåëèòñÿ
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L2-ðåãóëÿðèçàöèÿ
argminβ
(L(β) +
λ22||β||2
)Ìèíèìèçàöèÿ ãëàäêîé âûïóêëîé ôóíêöèè.Êàê îïòèìèçèðîâàòü?
Ìåòîä SGD
ïëîõî ïàðàëëåëèòñÿ
Ìåòîä ñîïðÿæåííûõ ãðàäèåíòîâ
õîðîøî ïàðàëëåëèòñÿ
Ìåòîä L-BFGS
õîðîøî ïàðàëëåëèòñÿ
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L2-ðåãóëÿðèçàöèÿ
argminβ
(L(β) +
λ22||β||2
)Ìèíèìèçàöèÿ ãëàäêîé âûïóêëîé ôóíêöèè.Êàê îïòèìèçèðîâàòü?
Ìåòîä SGD ïëîõî ïàðàëëåëèòñÿ
Ìåòîä ñîïðÿæåííûõ ãðàäèåíòîâ õîðîøî ïàðàëëåëèòñÿ
Ìåòîä L-BFGS õîðîøî ïàðàëëåëèòñÿ
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L1-ðåãóëÿðèçàöèÿ (îäíîâðåìåííàÿ ðåãóëÿðèçàöèÿ + îòáîðïðèçíàêîâ)
argminβ
(L(β) + λ1||β||1)
Ìèíèìèçàöèÿ íåãëàäêîé âûïóêëîé ôóíêöèè.
Êàê îïòèìèçèðîâàòü?
Ìåòîä ñóáãðàäèåíòà
ïëîõî ðàáîòàåò
Ìåòîä online learning via truncated gradient
ïëîõîïàðàëëåëèòñÿ
Ìåòîäû ïîêîîðäèíàòíîãî ñïóñêà (GLMNET, BBR)
?
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L1-ðåãóëÿðèçàöèÿ (îäíîâðåìåííàÿ ðåãóëÿðèçàöèÿ + îòáîðïðèçíàêîâ)
argminβ
(L(β) + λ1||β||1)
Ìèíèìèçàöèÿ íåãëàäêîé âûïóêëîé ôóíêöèè.Êàê îïòèìèçèðîâàòü?
Ìåòîä ñóáãðàäèåíòà
ïëîõî ðàáîòàåò
Ìåòîä online learning via truncated gradient
ïëîõîïàðàëëåëèòñÿ
Ìåòîäû ïîêîîðäèíàòíîãî ñïóñêà (GLMNET, BBR)
?
Îáîáùåííûå ëèíåéíûå ìîäåëè, ðåãóëÿðèçàöèÿ
L1-ðåãóëÿðèçàöèÿ (îäíîâðåìåííàÿ ðåãóëÿðèçàöèÿ + îòáîðïðèçíàêîâ)
argminβ
(L(β) + λ1||β||1)
Ìèíèìèçàöèÿ íåãëàäêîé âûïóêëîé ôóíêöèè.Êàê îïòèìèçèðîâàòü?
Ìåòîä ñóáãðàäèåíòà ïëîõî ðàáîòàåò
Ìåòîä online learning via truncated gradient ïëîõîïàðàëëåëèòñÿ
Ìåòîäû ïîêîîðäèíàòíîãî ñïóñêà (GLMNET, BBR) ?
Öåëü
Íàéòè ñàìûé ëó÷øèé àëãîðèòì äëÿ ìèíèìèçàöèè öåëåâîéôóíêöèè çàäà÷è ëîãèñòè÷åñêîé ðåãðåññèè ñ L1-ðåãóëÿðèçàöèåéíà îäíîé ìàøèíå
...è ðàñïàðàëëåëèòü åãî
Öåëü
Íàéòè ñàìûé ëó÷øèé àëãîðèòì äëÿ ìèíèìèçàöèè öåëåâîéôóíêöèè çàäà÷è ëîãèñòè÷åñêîé ðåãðåññèè ñ L1-ðåãóëÿðèçàöèåéíà îäíîé ìàøèíå...è ðàñïàðàëëåëèòü åãî
Àëãîðèòì GLMNET
Íóæíî íàéòè: argminβ (L(β) + λ1||β||1)
L(β + ∆β) + λ1||β + ∆β||1 ≈
≈(L(β) + L′(β)T∆β +
1
2∆βT∇2L(β)∆β
)+ λ1||β + ∆β||1
=1
2
n∑i=1
wi (zi −∆βTxi )2 + C (β) + λ1||β + ∆β||1
ãäå
zi =(yi + 1)/2− p(xi )
p(xi )(1− p(xi ))
wi = p(xi )(1− p(xi ))
p(xi ) =1
1 + e−βTxi
Àëãîðèòì GLMNET
Íóæíî íàéòè: argminβ (L(β) + λ1||β||1)
L(β + ∆β) + λ1||β + ∆β||1 ≈
≈(L(β) + L′(β)T∆β +
1
2∆βT∇2L(β)∆β
)+ λ1||β + ∆β||1
=1
2
n∑i=1
wi (zi −∆βTxi )2 + C (β) + λ1||β + ∆β||1
ãäå
zi =(yi + 1)/2− p(xi )
p(xi )(1− p(xi ))
wi = p(xi )(1− p(xi ))
p(xi ) =1
1 + e−βTxi
Àëãîðèòì GLMNET
Íóæíî íàéòè: argminβ (L(β) + λ1||β||1)
L(β + ∆β) + λ1||β + ∆β||1 ≈
≈(L(β) + L′(β)T∆β +
1
2∆βT∇2L(β)∆β
)+ λ1||β + ∆β||1
=1
2
n∑i=1
wi (zi −∆βTxi )2 + C (β) + λ1||β + ∆β||1
ãäå
zi =(yi + 1)/2− p(xi )
p(xi )(1− p(xi ))
wi = p(xi )(1− p(xi ))
p(xi ) =1
1 + e−βTxi
Àëãîðèòì GLMNET
Àëãîðèòì GLMNET
Âõîä: îáó÷àþùàÿ âûáîðêà {xi , yi}ni=1, íà÷àëüíîå ïðèáëèæåíèå
β, ïàðàìåòð ðåãóëÿðèçàöèè λ1
Ïîâòîðÿòü, ïîêà íå âûïîëåíî óñëîâèå îñòàíîâà:1 Äëÿ k = 1 ... p2 Ïîêà íå âûïîëíåíî óñëîâèå îñòàíîâà:
∆βk ← argmin∆βk
(1
2
n∑i=1
wi (zi −∆βTxi )
2 + λ1||β + ∆β||1
)
∆βk ←S(∑n
i=1wixikqi , λ1
)∑ni=1
wix2ik− βk
qi = zi −∆βTxi + (βk + ∆βk)xik
S(x , a) = sgn(x)max(|x | − a, 0)
3 β ← β + ∆β
Âåðíóòü β
Àëãîðèòì GLMNET
Àëãîðèòì GLMNET
Âõîä: îáó÷àþùàÿ âûáîðêà {xi , yi}ni=1, íà÷àëüíîå ïðèáëèæåíèå
β, ïàðàìåòð ðåãóëÿðèçàöèè λ1
Ïîâòîðÿòü, ïîêà íå âûïîëåíî óñëîâèå îñòàíîâà:1 Äëÿ k = 1 ... p2 Ïîêà íå âûïîëíåíî óñëîâèå îñòàíîâà:
∆βk ← argmin∆βk
(1
2
n∑i=1
wi (zi −∆βTxi )
2 + λ1||β + ∆β||1
)
∆βk ←S(∑n
i=1wixikqi , λ1
)∑ni=1
wix2ik− βk
qi = zi −∆βTxi + (βk + ∆βk)xik
S(x , a) = sgn(x)max(|x | − a, 0)
3 β ← β + ∆β
Âåðíóòü β
Àëãîðèòì GLMNET
Äëÿ ýôôåêòèâíîé ðåàëèçàöèè íóæíî ïîääåðæèâàòü â RAMâåêòîðà (βTxi ), (∆βTxi ) (ðàçìåð - n)
Êàê ðàñïàðàëëåëèòü GLMNET?
Èñïîëüçóåì íåñêîëüêî ìàøèí (êëàñòåð).
Åñòåñòâåííî, ÷òîáû êàæäàÿ ìàøèíà îòâå÷àëà çà ñâîåïîäìíîæåñòâî ïåðåìåííûõ.
S1 ∪ . . . ∪ SM = {1, ..., p}
Sm ∩ Sk = ∅, k 6= m
Èäåÿ: êàæäàÿ ìàøèíà ïàðàëëåëüíî âûïîëíÿåò øàãè ïî ñâîåìóïîäìíîæåñòâó ïåðåìåííûõ ∆βm
∆βm ← argmin∆βm
(1
2
n∑i=1
wi (zi −∆βTxi )
2 + λ1||β + ∆β||1
∣∣∣∣∣ ∆βmj = 0 åñëè j /∈ Sm
}
Êàê ðàñïàðàëëåëèòü GLMNET?
Èñïîëüçóåì íåñêîëüêî ìàøèí (êëàñòåð).Åñòåñòâåííî, ÷òîáû êàæäàÿ ìàøèíà îòâå÷àëà çà ñâîåïîäìíîæåñòâî ïåðåìåííûõ.
S1 ∪ . . . ∪ SM = {1, ..., p}
Sm ∩ Sk = ∅, k 6= m
Èäåÿ: êàæäàÿ ìàøèíà ïàðàëëåëüíî âûïîëíÿåò øàãè ïî ñâîåìóïîäìíîæåñòâó ïåðåìåííûõ ∆βm
∆βm ← argmin∆βm
(1
2
n∑i=1
wi (zi −∆βTxi )
2 + λ1||β + ∆β||1
∣∣∣∣∣ ∆βmj = 0 åñëè j /∈ Sm
}
Êàê ðàñïàðàëëåëèòü GLMNET?
Èñïîëüçóåì íåñêîëüêî ìàøèí (êëàñòåð).Åñòåñòâåííî, ÷òîáû êàæäàÿ ìàøèíà îòâå÷àëà çà ñâîåïîäìíîæåñòâî ïåðåìåííûõ.
S1 ∪ . . . ∪ SM = {1, ..., p}
Sm ∩ Sk = ∅, k 6= m
Èäåÿ: êàæäàÿ ìàøèíà ïàðàëëåëüíî âûïîëíÿåò øàãè ïî ñâîåìóïîäìíîæåñòâó ïåðåìåííûõ ∆βm
∆βm ← argmin∆βm
(1
2
n∑i=1
wi (zi −∆βTxi )
2 + λ1||β + ∆β||1
∣∣∣∣∣ ∆βmj = 0 åñëè j /∈ Sm
}
Êàê ðàñïàðàëëåëèòü GLMNET?
Èñïîëüçóåì íåñêîëüêî ìàøèí (êëàñòåð).Åñòåñòâåííî, ÷òîáû êàæäàÿ ìàøèíà îòâå÷àëà çà ñâîåïîäìíîæåñòâî ïåðåìåííûõ.
S1 ∪ . . . ∪ SM = {1, ..., p}
Sm ∩ Sk = ∅, k 6= m
Èäåÿ: êàæäàÿ ìàøèíà ïàðàëëåëüíî âûïîëíÿåò øàãè ïî ñâîåìóïîäìíîæåñòâó ïåðåìåííûõ ∆βm
∆βm ← argmin∆βm
(1
2
n∑i=1
wi (zi −∆βTxi )
2 + λ1||β + ∆β||1
∣∣∣∣∣ ∆βmj = 0 åñëè j /∈ Sm
}
Êàê ðàñïàðàëëåëèòü ìåòîäû ïîêîîðäèíàòíîãî ñïóñêà?
Àëãîðèòì d-GLMNET
Âõîä: Îáó÷àþùàÿ âûáîðêà {xi , yi}ni=1, ðàçäåëåííàÿ íà M
÷àñòåé ïî ïåðåìåííûì.β ← 0,∆β ← 0, ãäå m - íîìåð ìàøèíûÏîêà íå âûïîëíåíî óñëîâèå îñòàíîâà:
1 Âûïîëíèòü ïàðàëëåëüíî íà M ìàøèíàõ:
2 Âûïîëíèòü øàãè ïî ïåðåìåííûì, ñîõðàíèòü ∆βm,(∆(βm)Txi ))
3 Ñóììèðîâàòü âåêòîðà ∆βm, (∆(βm)Txi ) ñ ïîìîùüþMPI_AllReduce
4 ∆β ←∑M
m=1∆βm
5 (∆βTxi )←∑M
m=1(∆(βm)Txi )
6 Íàéòè α ñ ïîìîùüþ àëãîðèòìà ëèíåéíîãî ïîèñêà (ïðàâèëîArmijo)
7 β ← β + α∆β,
8 (exp(βTxi ))← (exp(βTxi + α∆βTxi ))
Òåîðåòè÷åñêèå ðåçóëüòàòû
Òåîðåìà 1. Èòåðàöèÿ àëãîðèòìà d-GLMNET ñîîòâåòñòâóåòîïòèìèçàöèè
argmin∆β
(L(β) + L′(β)T∆β +
1
2∆βTH∆β + λ1||β + ∆β||1
)ãäå H - áëî÷íî-äèàãîíàëüíîå ïðèáëèæåíèå ê Ãåññèàíó ∇2L(β)
Òåîðåìà 2. Àëãîðèìò d-GLMNET îáëàäàåò êàê ìèíèìóìëèíåéíîé ñêîðîñòüþ ñõîäèìîñòè.
Òåîðåòè÷åñêèå ðåçóëüòàòû
Òåîðåìà 1. Èòåðàöèÿ àëãîðèòìà d-GLMNET ñîîòâåòñòâóåòîïòèìèçàöèè
argmin∆β
(L(β) + L′(β)T∆β +
1
2∆βTH∆β + λ1||β + ∆β||1
)ãäå H - áëî÷íî-äèàãîíàëüíîå ïðèáëèæåíèå ê Ãåññèàíó ∇2L(β)
Òåîðåìà 2. Àëãîðèìò d-GLMNET îáëàäàåò êàê ìèíèìóìëèíåéíîé ñêîðîñòüþ ñõîäèìîñòè.
×èñëåííûå ýêñïåðèìåíòû
dataset size #examples (train/test) #features nnz
epsilon 12 Gb 0.4× 106 / 0.1× 106 2000 8.0× 108
webspam 21 Gb 0.315× 106 / 0.035× 106 16.6× 106 1.2× 109
dna 71 Gb 45× 106 / 5× 106 800 9.0× 109
16 ìàøèí ñ Intel(R) Xeon(R) CPU E5-2660 2.20GHz, 32 GBRAM, ãèãàáèòíûé Ethernet.
Ñðàâíèâàëèñü àëãîðèòìû
d-GLMNET
Online learning via truncated gradient (Vowpal Wabbit)
Íà êàæäîé ìàøèíå çàïóñêàëñÿ îäèí ïðîöåññ d-GLMNET èëèVowpal Wabbit.
×èñëåííûå ýêñïåðèìåíòû
dataset size #examples (train/test) #features nnz
epsilon 12 Gb 0.4× 106 / 0.1× 106 2000 8.0× 108
webspam 21 Gb 0.315× 106 / 0.035× 106 16.6× 106 1.2× 109
dna 71 Gb 45× 106 / 5× 106 800 9.0× 109
16 ìàøèí ñ Intel(R) Xeon(R) CPU E5-2660 2.20GHz, 32 GBRAM, ãèãàáèòíûé Ethernet.
Ñðàâíèâàëèñü àëãîðèòìû
d-GLMNET
Online learning via truncated gradient (Vowpal Wabbit)
Íà êàæäîé ìàøèíå çàïóñêàëñÿ îäèí ïðîöåññ d-GLMNET èëèVowpal Wabbit.
×èñëåííûå ýêñïåðèìåíòû
dataset size #examples (train/test) #features nnz
epsilon 12 Gb 0.4× 106 / 0.1× 106 2000 8.0× 108
webspam 21 Gb 0.315× 106 / 0.035× 106 16.6× 106 1.2× 109
dna 71 Gb 45× 106 / 5× 106 800 9.0× 109
16 ìàøèí ñ Intel(R) Xeon(R) CPU E5-2660 2.20GHz, 32 GBRAM, ãèãàáèòíûé Ethernet.
Ñðàâíèâàëèñü àëãîðèòìû
d-GLMNET
Online learning via truncated gradient (Vowpal Wabbit)
Íà êàæäîé ìàøèíå çàïóñêàëñÿ îäèí ïðîöåññ d-GLMNET èëèVowpal Wabbit.
×èñëåííûå ýêñïåðèìåíòû
dataset size #examples (train/test) #features nnz
epsilon 12 Gb 0.4× 106 / 0.1× 106 2000 8.0× 108
webspam 21 Gb 0.315× 106 / 0.035× 106 16.6× 106 1.2× 109
dna 71 Gb 45× 106 / 5× 106 800 9.0× 109
16 ìàøèí ñ Intel(R) Xeon(R) CPU E5-2660 2.20GHz, 32 GBRAM, ãèãàáèòíûé Ethernet.
Ñðàâíèâàëèñü àëãîðèòìû
d-GLMNET
Online learning via truncated gradient (Vowpal Wabbit)
Íà êàæäîé ìàøèíå çàïóñêàëñÿ îäèí ïðîöåññ d-GLMNET èëèVowpal Wabbit.
×èñëåííûå ýêñïåðèìåíòû
1 Ñ ïîìîùüþ d-GLMNET âû÷èñëÿëñÿ ïóòü ðåãóëÿðèçàöèèäëÿ 20 çíà÷åíèé λ1. Äëÿ êàæäîãî ðåøåíèÿ âû÷èñëÿëîñüêîëè÷åñòâî íåíóëåâûõ âåñîâ è òî÷íîñòü íà òåñòîâîììíîæåñòâå.
2 Äëÿ âñåõ çíà÷åíèé λ ∈ [λmax2−1, λmax2
−2, ..., λmax2−20]
ïåðåáèðàëèñü ãèïåðïàðàìåòðû îíëàéí-îáó÷åíèÿ ñîâìåñòíîâ äèàïàçîíàõ η ∈ [0.1, 0.5], p ∈ [0.5, 0.9] è âûïîëíÿëîñü 50ïðîõîäîâ îíëàéí-îáó÷åíèÿ.Äëÿ êàæäîé êîìáèíàöèè (η, p, íîìåð ïðîõîäà)âû÷èñëÿëîñü êîëè÷åñòâî íåíóëåâûõ âåñîâ è òî÷íîñòü íàòåñòîâîì ìíîæåñòâå.
Äàòàñåò ¾epsilon¿
0.93
0.935
0.94
0.945
0.95
0.955
0.96
0 200 400 600 800 1000 1200 1400
auP
RC
Time, sec
d-GLMNET
VW
Ñêîðîñòü àëãîðèòìîâ äëÿ ëó÷øåãî λ1 è ëó÷øèõ ïàðàìåòðîâîíëàéí-îáó÷åíèÿ
d-GLMNET
Ðåàëèçàöèÿ d-GLMNET äîñòóïíà ïî àäðåñóhttps://github.com/IlyaTrofimov/dlr
Äàëüíåéøåå ðàçâèòèå:
L2-ðåãóëÿðèçàöèÿ, elastic net
èñïîëüçîâàíèå íåñêîëüêèõ ÿäåð
ðåàëèçàöèÿ LASSO
d-GLMNET
Ðåàëèçàöèÿ d-GLMNET äîñòóïíà ïî àäðåñóhttps://github.com/IlyaTrofimov/dlr
Äàëüíåéøåå ðàçâèòèå:
L2-ðåãóëÿðèçàöèÿ, elastic net
èñïîëüçîâàíèå íåñêîëüêèõ ÿäåð
ðåàëèçàöèÿ LASSO