sigmoid function:
σ(z)=1+ez1,σ′(z)=σ(z)(1−σ(z))trainSet: (x(i),y(i))
input: x
output: y∈0,1
model:
z=wTx+bfw,b(x)=σ(z)=1+e−(wTx+b)1loss:
L(fw,b(x(i)),y(i))={−log(fw,b(x(i)))−log(1−fw,b(x(i)))ify(i)=1ify(i)=0=−y(i)log(fw,b(x(i)))−(1−y(i))log(1−fw,b(x(i)))cost funcation: J(w,b)=m1i=1∑mL(fw,b(x(i)),y(i))
gradient descent: minw,bJ(w,b)
∂wj∂J(w,b)=m1i=1∑m∂wj∂L(fw,b(x(i)),y(i))=m1i=1∑m∂wj∂(−y(i)log(fw,b(x(i)))−(1−y(i))log(1−fw,b(x(i)))=m1i=1∑m[−y(i)fw,b(x(i))∂wj∂fw,b(x(i))−(1−y(i))1−fw,b(x(i))−∂wj∂fw,b(x(i))]=m1i=1∑m[−fw,b(x(i))y(i)+1−fw,b(x(i))1−y(i)]∂wj∂fw,b(x(i))=m1i=1∑m[−fw,b(x(i))y(i)+1−fw,b(x(i))1−y(i)]fw,b(x(i))(1−fw,b(x(i)))∂wj∂(wTx+b)=m1i=1∑m[−y(i)(1−fw,b(x(i)))+(1−y(i))fw,b(x(i))]xj(i)=m1i=1∑m[fw,b(x(i))−y(i)]xj(i) wj′b′forj=0…n−1:=wj−α∂wj∂J(w,b):=wj−αm1i=1∑m(fw,b(x(i))−y(i))xj(i):=b−α∂b∂J(w,b):=b−αm1i=1∑m(fw,b(x(i))−y(i))