September 2018 - Page 2 of 3 - Shawn Zhong - 钟万祥

Math 514 – Norm & Condition Number

Norm (Definition 2.6) • Let V be a linear space, and ‖⋅‖: V→R(≥0) • ‖⋅‖ is said to be a norm if ○ ‖v ⃗ ‖=0⇔v=0 (positive definite) ○ ‖αv ⃗ ‖=|α|‖v ⃗ ‖ ○ ‖v ⃗+w ⃗ ‖≤‖v ⃗ ‖+‖w ⃗ ‖ (triangle inequality) Vector Norm (Definition 2.7 & 2.8 & 2.9) • 2-norm (Euclidean norm) ○ ‖v ⃗ ‖_2≔√(v_1^2+…+v_n^2 )=√(∑_(i=1)^n▒v_i^2 ) • 1-norm (taxicab norm / Manhattan norm) ○ ‖v ⃗ ‖_1≔|v_1 |+…+|v_n |=∑_(i=1)^n▒|v_i | ○ In order to show that 1-norm is a valid norm, we still need to prove § ‖v ⃗+w ⃗ ‖_1=∑_(i=1)^n▒|v_i+w_i | ≤∑_(i=1)^n▒〖|v_i |+|w_i | 〗=‖v ⃗ ‖_1+‖w ⃗ ‖_1 • ∞-norm (maximum norm) ○ ‖v ⃗ ‖_∞≔max┬(i∈{1,…,n} )⁡|v_i | • p-norm, for p≥1 ○ ‖v ⃗ ‖_p≔(∑_(i=1)^n▒|v_i^p | )^(1∕p) ○ Minkowski’s inequality § ‖u ⃗+v ⃗ ‖_p≤‖u ⃗ ‖_p+‖v ⃗ ‖_p § This proves the triangle inequality for p-norm Matrix Norm (Definition 2.10) • Given a matrix A_(m×n)=[■8(a_11&⋯&a_1n@⋮&⋱&⋮@a_m1&…&a_mn )] • Frobenius norm ○ ‖A‖_F≔√(∑_(i=1)^m▒∑_(j=1)^n▒a_(i,j)^2 ) ○ The matrix is viewed as a list of number • Operator norm / induced norm ○ ‖A‖_(p,q)≔sup_(x ⃗∈Rn∖{0 ⃗ } )⁡〖‖Ax ⃗ ‖_q/‖x ⃗ ‖_p 〗 ○ The matrix is viewed as a linear transformation ○ Operator norm is a means to measure the "size" of linear operators ○ Note that the operator norm has two parameter p and q • In particular, if both parameters equal to p, we simply call it p-norm ○ ‖A‖_p≔sup_(x ⃗∈Rn∖{0 ⃗ } )⁡〖‖Ax ⃗ ‖_p/‖x ⃗ ‖_p 〗 ○ 1-norm, 2-norm and ∞-norm are defined similarly • Triangle inequality for p-norm ○ Without loss of generality, suppose ‖x ⃗ ‖_p=1 ○ ‖(A+B) x ⃗ ‖_p=‖Ax ⃗+Bx ⃗ ‖_p≤‖Ax ⃗ ‖_p+‖Bx ⃗ ‖_p ○ ⇒‖(A+B) x ⃗ ‖_p/‖x ⃗ ‖_p ≤‖Ax ⃗ ‖_p/‖x ⃗ ‖_p +‖Bx ⃗ ‖_p/‖x ⃗ ‖_p ○ ⇒‖A+B‖_p≤‖A‖_p+‖B‖_p • ‖AB‖_p≤‖A‖_p ‖B‖_p ○ ‖ABx ⃗ ‖_p≤‖A‖_p ‖Bx ⃗ ‖_p≤‖A‖_p ‖B‖_p ‖x ⃗ ‖_p ○ ‖ABx ⃗ ‖_p/‖x ⃗ ‖_p ≤‖A‖_p ‖B‖_p ○ ‖AB‖_p≤‖A‖_p ‖B‖_p ‖A‖_1=Maximum Absolute Column Sum (Theorem 2.8) • Statement ○ Given a matrix A_(n×m)=[(a_1 ) ⃗,…,(a_n ) ⃗ ], then ○ ‖A‖_1=sup_(x ⃗∈Rn∖{0 ⃗ } )⁡〖‖Ax ⃗ ‖_1/‖x ⃗ ‖_1 〗=max┬(j∈{1,…,n} )⁡〖‖(a_j ) ⃗ ‖_1 〗=(max)┬(j∈{1,…,n} )⁡∑_(i=1)^m▒|a_ij | • Note ○ The 1-norm of matrix is also called maximum absolute column sum • Proof ○ Let C≔max┬(j∈{1,…,n} )⁡〖‖(a_j ) ⃗ ‖_1 〗=max┬(j∈{1,…,n} )⁡∑_(i=1)^m▒|a_ij | ○ Show that ‖Ax ⃗ ‖_1≤C‖x ⃗ ‖_1,∀x∈Rn∖{0 ⃗ } § ‖Ax ⃗ ‖_1=∑_(i=1)^m▒|(Ax ⃗ )_i | , by definition of 1\-norm of vector § =∑_(i=1)^m▒|∑_(j=1)^n▒〖a_ij x_j 〗| , by definition of Ax ⃗ § ≤∑_(i=1)^m▒∑_(j=1)^n▒|a_ij ||x_j | , by triangle inequality § =∑_(j=1)^n▒(∑_(i=1)^m▒|a_ij | )|x_j | , since only |a_ij | depends on i § ≤C∑_(j=1)^n▒|x_j | , by maximality of C § =C‖x ⃗ ‖_1, by definition of 1\-norm of vector ○ Look for an x ⃗ s.t. ‖Ax ⃗ ‖_1=C‖x ⃗ ‖_1 § Let J≔(arg⁡max)┬(j∈{1,…,n} )⁡〖‖(a_j ) ⃗ ‖_1 〗, then ‖(a_J ) ⃗ ‖_1=C § Let x ⃗∈Rn s.t. [x ⃗ ]_k={■8(1&k=J@0&k≠J)┤, then ‖x ⃗ ‖_1=1 § Therefore ‖Ax ⃗ ‖_1=‖(a_J ) ⃗ ‖_1=C=C‖x ⃗ ‖_1 ‖A‖_∞=Maximum Absolute Row Sum (Theorem 2.7) • Statement ○ Given a matrix A_(n×m)=[█((b_1 ) ⃗@⋮@(b_m ) ⃗ )], then ○ ‖A‖_∞=sup_(x ⃗∈Rn∖{0 ⃗ } )⁡〖‖Ax ⃗ ‖_∞/‖x ⃗ ‖_∞ 〗=max┬(i∈{1,…,m} )⁡〖‖(b_i ) ⃗ ‖_∞ 〗=(max)┬(i∈{1,…,m} )⁡∑_(i=1)^m▒|a_ij | • Note ○ The ∞-norm of matrix is also called maximum absolute row sum • Proof ○ Let C≔max┬(i∈{1,…,m} )⁡〖‖(b_i ) ⃗ ‖_∞ 〗=max┬(i∈{1,…,m} )⁡∑_(j=1)^n▒|a_ij | ○ Show that ‖Ax ⃗ ‖_∞≤C‖x ⃗ ‖_∞,∀x∈Rn∖{0 ⃗ } § ‖Ax ⃗ ‖_∞=max┬(i∈{1,…,m} )⁡|∑_(j=1)^n▒〖a_ij x_j 〗|, by definition of ∞\-norm § ≤max┬(i∈{1,…,m} )⁡∑_(j=1)^n▒|a_ij ||x_j | , by the triangle inequality § ≤[max┬(i∈{1,…,m} )⁡∑_(j=1)^n▒|a_ij | ] ‖x ⃗ ‖_∞, by definition of ∞\-norm § =C‖x ⃗ ‖_∞ ○ Look for an x ⃗ s.t. ‖Ax ⃗ ‖_∞=C‖x ⃗ ‖_∞ § Let I=argmax┬(i∈{1,…,m} )⁡〖‖(b_i ) ⃗ ‖_1 〗, then ‖(b_I ) ⃗ ‖_1=∑_(j=1)^n▒|a_Ij | =C § Let x ⃗∈Rn s.t. [x ⃗ ]_j={■8(1&b_Ij0@−1&b_Ij<0)┤, then ‖x ⃗ ‖_∞=1 § Then [Ax ⃗ ]_I=(b_I ) ⃗^T x ⃗=|∑_(j=1)^n▒〖a_Ij x_j 〗|=∑_(j=1)^n▒|a_Ij | =C=C‖x ⃗ ‖_∞ ‖A‖_2=Largest Singular Value (Theorem 2.9) • Positive-definite ○ A matrix A is said to be positive-definite if ○ All its eigenvalues are positive, and all the eigenvector is orthonormal ○ i.e. λ_i∈R+ and {█(‖(x_i ) ⃗ ‖_2=1@x_i⊥(x_j ) ⃗ )┤⇒⟨(x_i ) ⃗,(x_j ) ⃗ ⟩=δ_ij={■8(1&i=j@0&i≠j)┤ for A(x_i ) ⃗=λ_i (x_i ) ⃗ • Symmetric ○ A matrix A is said to be symmetric if A^T=A • Statement (special case) ○ Assume A_(n×n) is a positive-definite symmetric matrix ○ Then ‖A‖_2=sup_(x ⃗∈Rn∖{0 ⃗ } )⁡〖‖Ax ⃗ ‖_2/‖x ⃗ ‖_2 〗=(max)┬(i∈{1,…,n} )⁡|λ_i | • Proof (for special case) ○ Let C=sup_(x ⃗∈Rn∖{0 ⃗ } )⁡〖‖Ax ⃗ ‖_2/‖x ⃗ ‖_2 〗 ○ Show C≤max┬(i∈{1,…,n} )⁡|λ_i | § Express x ⃗ as a linear combination of the orthonormal vectors {x_i } □ x ⃗=c_1 (x_1 ) ⃗+c_2 (x_2 ) ⃗+…+c_n (x_n ) ⃗ □ Then ‖x ⃗ ‖_2=√(∑c_i^2 ) § Similarly, express Ax ⃗ using {x_i } □ Ax ⃗=c_1 A(x_1 ) ⃗+c_2 A(x_2 ) ⃗+…+c_n A(x_n ) ⃗ =c_1 λ_1 (x_1 ) ⃗+c_2 λ_2 (x_2 ) ⃗+…+c_n λ_n (x_n ) ⃗ □ Then ‖Ax ⃗ ‖_2=√(∑▒〖c_i^2 λ_i^2 〗) § Thus, ‖Ax ⃗ ‖_2/‖x ⃗ ‖_2 ≤√((∑▒〖c_i^2 λ_i^2 〗)/(∑▒c_i^2 ))≤max┬(i∈{1,…,n} )⁡|λ_i |,∀x ⃗∈Rn∖{0 ⃗ } ○ Look for an x ⃗ s.t. ‖Ax ⃗ ‖_2=C‖x ⃗ ‖_2 § Let I=(arg⁡max)┬(i∈{1,…,n} )⁡|λ_i |, then |λ_I |=C § Let x ⃗=(x_I ) ⃗, then ‖Ax ⃗ ‖_2/‖x ⃗ ‖_2 =‖A(x_I ) ⃗ ‖_2/‖(x_I ) ⃗ ‖_2 =√((c_I^2 λ_I^2)/(c_I^2 ))=|λ_I |=C=C‖x ⃗ ‖_2 • Statement (General Case) ○ Define B_(n×n)=A_(n×m)^T A_(m×n), then B is a positive-definite symmetric matrix ○ Let S_i=√(λ_i ), then S_i is called the singular values of A ○ The previous statement can be generalized to ‖A‖_2=(max)┬(i∈{1,…,n} )⁡|S_i | Conditioning of Function (Example 2.5 & 2.6) • Motivation ○ Suppose the input x has a perturbation of τ (because of machine percision) ○ Wed like to know how the output f will be affected by τ ○ Conditioning is used to measure the sensitivity of the output to perturbations in the input • Absolute conditioning ○ Cond(f)=(sup)┬█(x,y∈D@x≠y)⁡〖|f(x)−f(y)|/|x−y| 〗 ○ If f is differentiable, then Cond(f)=(sup)┬(x∈D)⁡|f^′ (x)| • Absolute local conditioning ○ Cond_x (f)=(sup)┬█(|δx|→0@x+δx∈D)⁡〖|f(x+δx)−f(x)|/|δx| 〗 ○ If f is differentiable, then Cond_x (f)={■8(|f^′ (x)|&if f is a scalar function@|∇f(x)|&if f is a vector function)┤ • Relative local conditioning ○ Cond_x (f)=sup┬█(|δx|→0@x+δx∈D)⁡〖(|f(x+δx)−f(x)|∕|f(x)| )/(|δx|∕|x| )〗=sup┬█(|δx|→0@x+δx∈D)⁡〖|f(x+δx)−f(x)|/|δx| 〗⋅|x|/|f(x)| ○ If f is differentiable, then Cond_x (f)=|f^′ (x)|/|f(x)| |x| ○ Motivation § f(x)=1, f(x+δx)=2 § g(x)=100, g(x+δx)=101 § Both f and g increased 1, but the effects are different! • Example: f(x)=√x ○ Absolute § If D=[0,1], then Cond(f)=+∞ § If D=[1,2], then Cond(f)=1/2 ○ Absolute local § Cond_x (f)=f^′ (x)=1/(2√x)→{■8(∞ (ill-conditioned)&as x→0@0 (well-conditioned)&as x→+∞)┤ ○ Relative local § Cond_x (f)=|f^′ (x)|/|f(x)| |x|=(1∕(2√x) )/√x |x|=1/2,∀x∈D Condition Number of Matrix (Definition 2.12) • Definition ○ κ(A)=‖A‖‖A^(−1) ‖ is called the condition number of A ○ If κ(A)≫1, then we say A is ill-conditioned • Note ○ κ(A)=κ(A^(−1) ) ○ κ(A)≥1 • ▭(x(+δx) ) ⟶┴A ▭(b(+δb) ) ○ {█(Ax=b@A(x+δx)=b+δ(b) )┤⇒δb=Aδx ○ Cond_x (A)=(‖δb‖∕‖b‖ )/(‖δx‖∕‖x‖ ), by definition =‖δb‖/‖b‖ ⋅‖x‖/‖δx‖ =‖Aδx‖/‖Ax‖ ⋅‖x‖/‖δx‖ , since δb=Aδx and b=Ax =‖Aδx‖/‖δx‖ ⋅‖A^(−1) b‖/‖b‖ , assuming A is not singular ≤‖A‖‖A^(−1) ‖ by definition of matrix norm • ▭(A(+δA) ) →┴b ▭(x(+δx) ) ○ Ax=b ○ Ax=(A+δA)(x+δx), since b is viewed as the function here ○ Ax=Ax+δAx+Aδx+ ⏟δAδx┬(≈0), since δAδx is a second order turbulence ○ δAx+Aδx=0 ○ δx=−A^(−1)⋅δA⋅x ○ ‖δx‖≤‖A^(−1) ‖‖δA‖‖x‖, since ‖PQ‖≤‖P‖‖Q‖ for any matrix P,Q ○ (‖δx‖/‖x‖)/(‖δA‖/‖A‖ )=‖δx‖/‖x‖ ⋅‖A‖/‖δA‖ ≤‖A^(−1) ‖‖δA‖‖x‖/‖x‖ ‖A‖/‖δA‖ =‖A^(−1) ‖‖A‖ • We can similarly analyze ▭(b(+δb) ) ⟶┴A ▭(x(+δx) ) and ▭(A(+δA) ) →┴x ▭(b(+δb) ) • Note: The choice of norm will affect the condition number ○ A=[■(1&&&@1&1&&@⋮&&⋱&@1&&&1)]⇒A^(−1)=[■(1&&&@−1&1&&@⋮&&⋱&@−1&&&1)] ○ {█(‖A‖_∞=‖A^(−1) ‖_∞=2@‖A‖_1=‖A^(−1) ‖_1=n)┤⇒{█(Cond_(L_1 ) (A)=‖A‖_1 ‖A^(−1) ‖_1=n^2@Cond_(L_∞ ) (A)=‖A‖_∞ ‖A^(−1) ‖_∞=4)┤ Example for Condition Number • Let A=[■8(2&1@1&2)] • 1 norm • ∞ norm • 2 norm ○ S={[█(c@d)]=A[█(x@y)]│x^2+y^2=1} ○ Let A^(−1)=[■8(b_11&b_12@b_21&b_22 )], then [█(x@y)]=A^(−1) [█(c@d)]=[█(b_11 c+b_12 d@b_21 c+b_22 d)] ○ Since x^2+y^2=1, we have αc^2+βcd+γd^2=1, where § α=b_11^2+b_21^2 § β=2b_11 b_12+2b_21 b_22 § γ=b_12^2+b_22^2 ○ Since discriminant = β^2−4αγ<0, the graph S is an ellipse ○ κ(A)=(length of major axis)/(length of minor axis)=S_max/S_min , where S is the singular value Exercise 2.12 • Given ∃x s.t. (I−B)x=0 • Show ‖B‖_2=max┬(i=1…n)⁡√(λ_i )≥1⇔∃λ_1≥1 ○ {█(x^T (I−B)x=x^T x−x^T Bx=0@x^T (I−B)^T (I−B)x=x^T x−2x^T Bx+x^T B^T Bx)┤⇒x^T B^T Bx=x^T x ○ ‖B‖_2=sup┬(x∈Rn∖{0} )⁡〖‖Bx‖_2/‖x‖_2 〗=sup┬(x∈Rn∖{0} )⁡√((x^T B^T Bx)/(x^T x))≥1 • Show (I−A)^(−1)=I+A(I−A)^(−1) ○ (I−A)(I+A(I−A)^(−1) )=(I−A)+A=I • Note: Neumann expansion ○ (I+A)^(−1)=I−A+A^2−A^3+⋯ ○ 1/(1+x)=1−x+x^2−x^3+⋯

Read More >>

Math 514 – LU Decomposition

Read More >>

Math 514 – Ch 1

Motivation • Goal: given f(x)=0, find x • Motivation for numerical methods ○ ax+b=0⇒x=−b/a ○ ax^2+bx+c=0⇒x=(−b±√(b^2−4ac))/2a ○ ⋮ ○ ax^5+bx^4+cx^3+dx^2+ex+f=0⇒ No formula! • If the order of polynomial is ≥5, there is no explicit zero formula Theorem 1.1: Bolzanos theorem • Statement ○ Let f be a real-valued continuous function on the interval [a,b] ○ If f(a)f(b)≤0, then ∃ξ∈[a,b] s.t. f(ξ)=0 • Explanation ○ If a continuous function has values of opposite sign inside an interval ○ Then it has a root in that interval • Proof ○ By the Intermediate Value Theorem • Note ○ This theorem does not guarantee the uniqueness of solution Theorem 1.2: Brouwer’s Fixed Point Theorem • Statement ○ If g∈C, and g(x)∈[a,b] for x∈[a,b] ○ Then ∃ξ∈[a,b] s.t. g(ξ)=ξ ○ Here, the real number ξ is called the fixed point of g • Proof ○ Let f(x)≔x−g(x) ○ Then, f(a)f(b)=⏟((a−g(a)) )┬(≤0) ⏟((b−g(b)) )┬(≥0)≤0 ○ By Intermediate Value Theorem, ∃ξ∈[a,b] s.t. f(ξ)=0 ○ ⇒f(ξ)=ξ−g(ξ)=0 ○ ⇒g(ξ)=ξ • Why care about fixed point? ○ Finding fixed point is numerically easier in the sense of iteration • Algorithm ○ Initial guess: x_0∈[a,b] ○ Iterate: x_(k+1)≔g(x_k ) ○ Stop when |x_(n+1)−x_n |<ε, where ε is a small number • Example ○ Given g(x)=1/2 (x^2+1/2) ○ The fixed point of g should satisfy x=1/2 (x^2+1/2)⇔x^2−2x+1/2=0 ○ Let f(x)≔x^2−2x+1/2, then we need to find f(x)=0 ○ Analytical method § x=1±√2/2≈1.7 or 0.3 ○ Numerical method § x_0=1 § x_1=g(x_0 )=g(1)=3/4=0.75 § x_2=g(x_1 )=g(3/4)=17/32≈0.53 § x_3=g(x_2 )=g(17/32)≈0.39 § ⋮ • Counter-example ○ Suppose f(x)=x^2−2 ○ f(x)=0⇔x^2=2⇔x=2/x ○ Let x_(k+1)=2/x_k and x_0=1, then x_1=2, x_2=1,x_3=2⋯ Two Main Questions Over This Chapter • When does x_(k+1)=g(x_k ) converge? ○ If the iteration is unstable § x_(k+1)=g(x_k ) diverges ○ If the iteration is stable § The contraction argument guarantees convergence § And the convergence rate is linear • Given f(x), how to find g(x)? ○ There are infinitely many g for a given f as long as f(x)=0⇔g(x)=x ○ Possible choice for g(x) § g(x)=x+f(x), or § g(x)=x+ln⁡(f(x)+1) ○ Newtons method (and secant method) will guarantee a contracting g(x) Theorem 1.3 & 1.4 & 1.5: Contraction Mapping Theorem • Contraction ○ Let g be a real-valued continuous function on the interval [a,b] ○ Then g is a contraction on [a,b] if ∃L∈(0,1) s.t. ○ |g(x)−g(y)|≤L|x−y|,∀x,y∈[a,b] (Lipschitz condition) ○ Here, L is called Lipschitz constant • Remark on Lipschitz condition ○ |g(x)−g(y)|≤L|x−y|, ∀x,y∈[a,b] ○ ⇒|g(x)−g(y)|/|x−y| ≤L ○ ⇒lim┬(y→x)⁡〖|g(x)−g(y)|/|x−y| 〗≤L ○ ⇒|g^′ (x)|≤L<1 (assume g is differentiable) • Statements ○ Let g be a contraction on [a,b] ○ Suppose g(x)∈[a,b], ∀x∈[a,b]. Then, (1) ∃ξ∈[a,b] s.t. g(ξ)=ξ (i.e. There exists a fixed point) (2) {x_(k+1)=g(x_k )} converges to ξ, ∀x_0∈[a,b] (i.e. The iterative algorithm works) (3) If the iteration stop at |x_k−ξ|≤ε, then k≤1+[ln⁡〖|x_1−x_0 |−ln⁡(ε(1−L)) 〗/ln⁡(1/L) ] where [x] is the largest integer less than or equal to x • Proof for (1) ○ See Theorem 1.2 • Proof for (2) ○ ⏟(|x_(k+1)−ξ| )┬(E_(k+1) )=|g(x_k )−g(ξ)| ○ ≤L ⏟(|x_k−ξ| )┬(E_k ) by Lipschitz condition ○ ≤L^2 ⏟(|x_(k−1)−ξ| )┬(E_(k−1) ), since ⏟(|x_k−ξ| )┬(E_k )≤L ⏟(|x_(k−1)−ξ| )┬(E_(k−1) ) by induction ○ ⋮ ○ ≤L^(k+1) ⏟(|x_0−ξ| )┬(E_0 )→0 as k→∞ • Proof for (3) ○ From the proof for (2), we know that § E_k≤L^k E_0≤ε ○ Taking log on both side, we obtain § k≤log_L⁡〖ε/E_0 〗 ○ Calculate E_0 § E_0=|x_0−ξ|=|x_0−x_1+x_1−ξ| ≤|x_0−x_1 |+|x_1−ξ|≤|x_0−x_1 |+L|x_0−ξ| § ⇒E_0≤〖|x_0−x_1 |+L⋅E〗_0 § ⇒E_0≤|x_1−x_0 |/(1−L) ○ Therefore § k≥log_L⁡〖ε/(|x_1−x_0 |/(1−L))〗=log_L⁡〖ε(1−L)/|x_1−x_0 | 〗=ln⁡〖|x_1−x_0 |−ln⁡(ε(1−L)) 〗/ln⁡(1/L) • Corollary ○ Given g:[a,b]→[a,b], and g∈C^1 [a,b] ○ If |g^′ (x)|≤L<1, then the sequence {x_k=g(x_(k−1) )} converges to ξ • Remark on Corollary ○ If we relax |g^′ (x)|<1 to be just |g′(ξ)|<1 ○ Then when x_0 is close to ξ, {x_k } will converge to ξ ○ Since in a small neighborhood of ξ, g^′ (x)~|g^′ (ξ)|<1 Theorem 1.3: Stability of Fixed Point • Stable Fixed Point ○ If ξ=g(ξ), and |g^′ (ξ)|<1, then ξ is a stable fixed point ○ A stable fixed point can be found via {x_(k+1)=g(x_k )} • Unstable Fixed Point ○ If ξ=g(ξ), and |g^′ (ξ)|>1, then ξ is a unstable fixed point ○ If ξ is an unstable fixed point, then {x_(k+1)=g(x_k )} wont converge to ξ Definition 1.4 & 1.7: Rate of Convergence • Suppose ξ=lim┬(k→∞)⁡〖x_k 〗, and define E_k≔|x_k−ξ| • An algorithm is said to converge linearly if ○ lim┬(k→∞)⁡〖E_(k+1)/(E_k^ )〗=μ, for some constant μ∈(0,1) • An algorithm is said to converge superlinearly if ○ lim┬(k→∞)⁡〖E_(k+1)/E_k 〗=0 • An algorithm is said to converge quadratically if ○ lim┬(k→∞)⁡〖E_(k+1)/(E_k^2 )〗=μ, for some constant μ>0 • An algorithm is said to converge with order q if ○ lim┬(k→∞)⁡〖E_(k+1)/(E_k^q )〗=μ, for some constant μ>0 Example 1.7: f(x)=e^x−x−2 • g(x)=e^x−2 ○ We observed that g(x) maps [1,2] to [1,2] § By the Fixed Point Theorem, ∃ξ∈[1,2] s.t. g(ξ)=ξ § We need to check whether g(x) satisfies the Lipschitz condition § g^′ (ξ)=e^ξ∈[e^1,e^2 ] § ⇒|g^′ (ξ)|>1 § ⇒ unstable fixed point § ⇒ the algorithm wont work ○ And g(x) also maps [−2,−1] to [−2,−1] § By the Fixed Point Theorem, ∃ξ∈[1,2] s.t. g(ξ)=ξ § g^′ (ξ)=e^ξ∈[e^(−2),e^(−1) ] § ⇒|g^′ (ξ)|<1 § ⇒ stable fixed point § ⇒ run {x_(k+1)=g(x_k )} for ξ • g(x)=ln⁡(x+2) ○ We observed that g(x) maps [1,2] to [1,2] § g^′ (ξ)=1/(ξ+2)∈[1/4,1/3] § ⇒|g^′ (ξ)|<1 § ⇒ stable fixed point § ⇒ run {x_(k+1)=g(x_k )} for ξ ○ And g(x) also maps (−2,−1) to (−2,−1) § g^′ (ξ)=1/(ξ+2)∈(1,+∞) § ⇒|g^′ (ξ)|>1 § ⇒ unstable fixed point § ⇒ the algorithm wont work • Remark ○ x=e^x−2⇒f(x)=e^x−x−2 § We have a stable fixed point ξ∈[−2,−1], and a unstable ξ∈[1,2] ○ x=ln⁡(x+2)⇒e^x=x+2⇒f(x)=e^x−x−2 § We have a stable fixed point ξ∈[1,2], and a unstable ξ∈[−2,−1] ○ Therefore the choice of g will affect the convergence behavior ○ So how can we design a function g(x) s.t. every fixed point is stable? Definition 1.6: Newton’s Method • In Newtons method, g(x) is defined as ○ g(x)=x−f(x)/(f^′ (x) ) • Its obvious that f(ξ)=0⇔g(ξ)=ξ • Why the fixed points of g is stable ○ We want to show that |g^′ (ξ)|<1 ○ g(x)=x−f(x)/(f^′ (x) )⇒g^′ (x)=1−(f(x)/(f^′ (x) ))^′ ○ (f/f^′ )^′=(f^′⋅f^′−f⋅f′′)/(f^′ )^2 =1−(f⋅f′′)/(f^′ )^2 ○ ⇒g^′ (x)=1−(1−(f(x)⋅f^′′ (x))/[f^′ (x)]^2 )=(f(x)⋅f^′′ (x))/[f^′ (x)]^2 ○ ⇒|g^′ (ξ)|=(f(ξ)⋅f^′′ (ξ))/[f^′ (ξ)]^2 =0<1 Theorem 1.8: Convergence of Newtons Method • Statement ○ In Newtons method, {x_(k+1)=g(x_k )} converges quadratically ○ i.e. lim┬(k→∞)⁡〖E_(k+1)/(E_k^2 )〗≤μ<1 • Assumption ○ f(ξ)=0 ○ f∈C^2 in [ξ−δ,ξ+δ]=I_δ, since we need to use f^′ and f^′′ ○ f^′ (ξ)≠0, since it will appear at the denominator ○ |(f^′′ (x))/(f^′ (y) )|≤A, (∀x,y∈I_δ ) ○ |x_0−ξ|≤1/A (i.e. The initial guess is not too far away from ξ) • Proof ○ Expand f(ξ) at x_k to obtain f(x_k ) and f′(x_k ) § f(ξ)=f(x_k+ξ−x_k ) § =f(x_k )+f^′ (x_k )(ξ−x_k )+1/2 f^′′ (x_k ) (x_k−ξ)^2+… (by Taylor expansion of f) § =f(x_k )+f^′ (x_k )(ξ−x_k )+1/2 f^′′ (θ_k ) (x_k−ξ)^2 (for some constant θ_k∈(x_k,ξ), by the Mean Value Theorem) ○ Express f(x_k )/f′(x_k ) in x_(k+1) using f^′′/f′, since we already know |(f^′′ (x))/(f^′ (y) )|≤A § By assumption, f(ξ)=0 § ⇒f(x_k )+f^′ (x_k )(ξ−x_k )+1/2 f^′′ (θ_k ) (x_k−ξ)^2=0 § ⇒f(x_k )=−1/2 f^′′ (θ_k ) (x_k−ξ)^2−f^′ (x_k )(ξ−x_k ) § ⇒f(x_k )/f′(x_k ) =−1/2 (f^′′ (θ_k ))/(f^′ (x_k ) ) (ξ−x_k )^2−(ξ−x_k ) ○ Compute E_(k+1), and express it with E_k § E_(k+1)=|x_(k+1)−ξ|=|g(x_k )−ξ| § =|(x_k−f(x_k )/(f^′ (x_k ) ))−ξ| § =|{x_k−[−1/2 (f^′′ (θ_k ))/(f^′ (x_k ) ) (ξ−x_k )^2−(ξ−x_k )]}−ξ| § =|x_k+1/2 (f^′′ (θ_k ))/(f^′ (x_k ) ) (ξ−x_k )^2+ξ−x_k−ξ| § =1/2 |(f^′′ (θ_k ))/(f^′ (x_k ) )| (ξ−x_k )^2 § =1/2 |(f^′′ (θ_k ))/(f^′ (x_k ) )| E_k^2 ○ Show the algorithm converges § By assumption,|x_k−ξ|≤1/A, and |(f^′′ (x))/(f^′ (y) )|≤A, (∀x,y∈I_δ ) § So, E_(k+1)=1/2 ⏟(|(f^′′ (θ_k ))/(f^′ (x_k ) )| )┬(≤A) ⏟(E_k^ )┬(≤1/A) E_k^ ≤1/2⋅A⋅1/A⋅E_k=1/2 E_k→0 as k→+∞ § Therefore x_k converges to ξ ○ Show the algorithm converges quadratically § E_(k+1)/(E_k^2 )=1/2 |(f^′′ (θ_k ))/(f^′ (x_k ) )|≤1/2 A § As k→+∞, both x_k and θ_k converge to ξ § Thus, lim┬(k→+∞)⁡〖E_(k+1)/(E_k^2 )〗=1/2 |(f^′′ (ξ))/(f^′ (ξ) )|=μ, where μ∈(0,A/2] is a constant Definition 1.8: Secant Method • Motivation ○ Sometimes f^′ can be hard to find in Newtons method ○ But we can approximate f′ using a difference quotient ○ i.e. f^′ (x_k )≈(f(x_k )−f(x_(k−1) ))/(x_k−x_(k−1) ) • Definition ○ x_(k+1)=x_k−f(x_k )∕((f(x_k )−f(x_(k−1) ))/(x_k−x_(k−1) )) =x_k−f(x_k )((x_k−x_(k−1))/(f(x_k )−f(x_(k−1) ) )) • Note ○ The secant method requires two initial values x_0 and x_1 Theorem 1.10: Convergence of Secant Method • Statement ○ Let f∈C^1 [ξ−δ,ξ+δ] s.t. f(ξ)=0 and f^′ (ξ)≠0 ○ If x_0,x_1 is close to ξ, then {x_(k+1)=g(x_k )} converges at least linearly • Proof ○ WLOG, assume α≔f^′ (ξ)>0 in a small neighborhood of ξ ○ Choose I be a neighborhood of ξ such that § 0<3/4 α<f^′ (x)<5/4 α,∀x∈I ○ Compute x_(k+1) § x_(k+1)=x_k−f(x_k )∕((f(x_k )−f(x_(k−1) ))/(x_k−x_(k−1) )) § By the Mean Value Theorem □ (f(x_k )−⏞(f(ξ) )┴0)/(x_k−ξ)=f^′ (η_k )⇒f(x_k )=f^′ (η_k )(x_k−ξ),and □ (f(x_k )−f(x_(k−1) ))/(x_k−x_(k−1) )=f^′ (θ_k ) □ for some θ_k∈[x_k,x_(k−1) ],η_k∈(x_k,ξ) § Therefore, x_(k+1)=x_k−(f^′ (η_k )(x_k−ξ))/(f^′ (θ_k ) ) ○ Check E_(k+1)/E_k <1 § E_(k+1)=x_(k+1)−ξ=E_k−(f^′ (η_k ))/(f^′ (θ_k ) ) E_k=[1−(f^′ (η_k ))/(f^′ (θ_k ) )] E_k § ⇒E_(k+1)/E_k =[1−(f^′ (η_k ))/(f^′ (θ_k ) )]<(1−(5α∕4)/(3α∕4))=2/3<1 ○ Therefore secant method converges at least linearly

Read More >>

Math 632 – 9/18

Read More >>

Math 632 – 9/13

Read More >>