chainsawriot

Newton-Raphson #1

Posted on Dec 29, 2011 by Chung-hong Chan

第一次聽 Newton-Raphson 是在 MIT 的 6.00 ，當時講到找 Root 的方法。其中一種找 root 的方法是 bisection method ，屬 binary search 的一種，每次 iteration 的 search space 會減少一半。
要找 5 的 root ，用任何的計算機都可以得出 2.236 。設 θ = 2.236 ，那麼 θ * θ 應該等於 5 。而 2.236 * 2.236 - 5 應該是等於 0 。
那麼，如果要找 x 的 root θ ， θ² - x 應該是等於 0 。可以將此算式寫成
f(θ) = Δ
我們的目的是要找出 Δ = 0 的 θ 。
假設我們要找 10 的 root ，而我們想答案精確度 ε 為 0.1 ， initial θ 為 0 至 10 的中間位，即是 5 。
f(5) = 5*5 - 10 = 15 ，由於 Δ 大於 ε ，我們將 θ 的最大可能值轉為 5 。下一次的估算是 0 至 5 的中間位，即是 2.5 。
f(2.5) = 2.5 * 2.5 - 10 = -3.75 ，由於 Δ 小於 ε ，我們將 θ 的最小可能值轉為 2.5 。下一次的估算是 2.5 至 5 的中間位，即是 3.75 。
f(3.75) = 3.75 * 3.75 - 10 = 4.0625 ，由於 Δ 大於 ε ，我們將 θ 的最大可能值轉為 3.75 ，下一次的估算是 2.5 至 3.75 的中間位，即是 3.125 。

之後的 θ 和 Δ 分別是：

f(3.125) = -0.234
f(3.4375) = 1.816
f(3.28125) = 0.767
f(3.203125) = 0.26
f(3.1640625) = 0.011
當 θ = 3.1640625 時， f(θ) = 0.011 ，其 absolute value 比 ε 細，故此 10 的 root 應該近似 3.1640625 。

使用 Bisection method 不是最有效的方法，可改用 Newton-Raphson method 。 Intuition 是，如要找 f(x) = 0 ，將此 function 畫在坐標圖上，找出 x intercept (y=0)。計出 f(x) 及 f(x) 的 slope of the tangent。更新 x 時，改用 tangent 的 x intercept 。維基百科的圖是這樣

根據 Calculus ， f(x) 的 slope of the tangent 等於 f'(x) ，即是其 1st derivative 。
以我們上例找 root 的 f(θ) = θ² - x，其 first derivative f'(θ) = 2θ 。
Newton-Raphson method 是，在更新 θ 時，應該將舊的 θ 減去 f(θ)/f'(θ) 。即是

θ_n+1 = θ_n - (f(θ_n)/f'(θ_n))

以上面找 10 的 root 為例，用 Newton-Raphson method 的計算就是
f(5) = 15 。下次估的 θ 是 5 - (f(θ)/f'(θ)) = 5 - (15/5*2) = 3.5
f(3.5) = 2.25 。下次估的 θ 是 3.5 - (2.5/3.5*2) = 3.143
f(3.143) = -0.121551 ，下次估的 θ 是 3.143 - (-0.121551/3.143*2) = 3.162
f(3.162) = 0.00037 。 f(θ) 細於 ε 。

以找 10 的 root 為例， Bisection method 要用 8 個 iterations ，而 Newton-Raphson method 只需 4 個 iterations 。
以下是 x 的大少與兩種方法的 iterations 次數之關係，是用以下的 R code 畫出來的。

[code]
# < -
sqrtBisec <- function(x, epsilon=0.0001, max.iters=500) {
low.bound <- 0
hi.bound <- max(c(x,1))
theta <- (low.bound+hi.bound) / 2
iters <- 0
while (abs(theta^2 - x) > epsilon & iters < = max.iters) {
if (theta^2 < x) {
low.bound <- theta
} else {
hi.bound <- theta
}
theta <- (low.bound+hi.bound) / 2
iters <- iters+1
}
if (iters == max.iters) {
return(NA)
} else {
return(list(root=theta, iters=iters))
}
}
sqrtNewt <- function(x, epsilon=0.0001, max.iters=500) {
theta <- max(c(x,1)) / 2
iters <- 0
delta <- abs(theta^2 - x)
while (delta > epsilon & iters < = max.iters) {
theta <- theta - (delta/(2*theta))
delta <- abs(theta^2 - x)
iters <- iters+1
}
if (iters == max.iters) {
return(NA)
} else {
return(list(root=theta, iters=iters))
}
}
testSqrtIterations <- function() {
testcases <- c(1,12,123,1234,12345,123456,1234567,12345678,1234567890,12345678901)
bisecIters <- c()
newtIters <- c()
for (i in 1:length(testcases)) {
bisecIters <- append(bisecIters, sqrtBisec(testcases[i])$iters)
newtIters <- append(newtIters, sqrtNewt(testcases[i])$iters)
}
plot(bisecIters, type="l", ylim=c(0,70), xlab="No of digits", ylab="No of iterations")
lines(newtIters, type="l", col="red")
legend("topleft", legend=c("Bisection","Newton-Rhapson"), col=c("black", "red"), lty=c(1,1), box.col="white")
}
testSqrtIterations()
#>
[/code]

Newton-Raphson 也可取代 Gradient Descent ，這個留待下次再講。