Mathematics desk
< April 29	<< Mar \| April \| May >>	May 1 >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

April 30

Logistic regression

Hi, I need a bit of help with performing logistic regression. I think my confusion stems from two sources:

1) I apparently need to use IRLS, but I'm struggling to see how to apply that to my logistic parameters and what the parameters in the IRLS equation relate to.

2) The particular problem I'm working on requires that I re-calculate the beta parameters on-the-fly, as new data comes in. In the background I come from I'd refer to this as an online algorithm but I don't really know how to approach it with regression.

I realise this is pretty basic stuff so I suppose what I'm looking for is an example that walks through it. Thanks for your time, --Iae (talk) 13:42, 30 April 2011 (UTC)[reply]

1) Instead of thinking in terms of IRLS, I recommend treating it as maximizing the log-likelihood function by applying the Newton-Raphson method to its gradient.

Denoting by lgs the logistic function, by

X_{i}

the feature vector for point i, by

n_{i}^{+}

the weighted total of positive examples with this feature vector and by

n_{i}^{-}

the weighted total of negative examples, you have

\ell ({\boldsymbol {w}})=\sum _{i=1}^{n}\left(n_{i}^{+}\log \mathrm {lgs} ({\boldsymbol {w}}\cdot X_{i})+n_{i}^{-}\log \mathrm {lgs} (-{\boldsymbol {w}}\cdot X_{i})\right).

If X, viewed as an

n\times m

-matrix, is of rank m, then

\ell

is strictly convex. It therefore achieves its global maximum where its gradient is 0 (which, in edge cases, happens when one or more of the parameters is infinite). You can use the Newton-Raphson method to find this point. The score function, which is the gradient of

\ell

, is

{\boldsymbol {u}}({\boldsymbol {w}})=\sum _{i=1}^{n}\left(n_{i}^{+}\mathrm {lgs} (-{\boldsymbol {w}}\cdot X_{i})-n_{i}^{-}\mathrm {lgs} ({\boldsymbol {w}}\cdot X_{i})X_{i}\right).

The Jacobian of the score, which is also the Hessian of the log-likelihood and the negative of the observed Fisher information matrix, is

-{\mathcal {J}}({\boldsymbol {w}})=-\sum _{i=1}^{n}(n_{i}^{+}+n_{i}^{-})\mathrm {lgs} ({\boldsymbol {w}}\cdot X_{i})\mathrm {lgs} (-{\boldsymbol {w}}\cdot X_{i})X_{i}^{T}X_{i}.

To solve

{\boldsymbol {u}}({\boldsymbol {w}})=0

, you start with a seed value

{\boldsymbol {w}}^{(0)}

and iteratively calculate

{\boldsymbol {w}}^{(t+1)}={\boldsymbol {w}}^{(t)}+{\mathcal {J}}^{-1}\left({\boldsymbol {w}}^{(t)}\right){\boldsymbol {u}}\left({\boldsymbol {w}}^{(t)}\right).

2) You will need to update the gradient and Hessian as the data and w change. For a fixed w, adding datapoints is easy. So you can keep

w_{0}

fixed and let

{\boldsymbol {w}}^{*}={\boldsymbol {w}}_{0}+{\mathcal {J}}^{-1}\left({\boldsymbol {w}}_{0}\right){\boldsymbol {u}}\left({\boldsymbol {w}}_{0}\right)

. You will still need to do a batch calculation once in a while to find the correct gradient and hessian for the new w.

-- Meni Rosenfeld (talk) 08:42, 1 May 2011 (UTC)[reply]

Thanks a lot for this. I'll need to read up a bit on some of the matrix stuff to properly understand what's happening but it looks really helpful. --Iae (talk) 15:22, 1 May 2011 (UTC)[reply]

Question on algebraic topology, absolutely no clue where to get started, please help!

Hello everyone,

I'm trying to answer the following problem:

Let X be a simplicial complex. Suppose X = B ∪ C, subcomplexes B and C, and let A = B ∩ C. Show that the inclusion of A in B induces an isomorphism H_∗A → H_∗B if and only if the inclusion of C in X induces an isomorphism H_∗C → H_∗X.

To the best of my knowledge my notation is standard, as are my definitions. Usually I consider myself quite good with my algebraic topology (this is at undergraduate level, I should mention) - however, the only material we've covered around this area is things like Mayer Vietoris/Simplicial approximation theorem, neither of which I think have much application here. This is however a past paper for one of my courses from 6 years ago, so it may simply be the case that my lecture course has changed since then. Either way, I genuinely have no idea where to begin, I dislike simplicial complexes (the area of AT I am least fond of!) and so in the spirit of the post two above this one, would anyone please help me? I have an account and I will even thank you at the end! ;-) Estrenostre (talk) 17:42, 30 April 2011 (UTC)[reply]

Have you tried using Mayer-Vietoris? Sławomir Biały (talk) 19:37, 30 April 2011 (UTC)[reply]

It isn't clear to me how I would apply MV to this question: is that what I should be doing here? Estrenostre (talk) 21:35, 30 April 2011 (UTC)[reply]

Basically a diagram chase (so don't expect great clarity here ;-). You have an exact sequence

H_{*}(A)\to H_{*}(B)\oplus H_{*}(C)\to H_{*}(X)\to H_{*}(A)

with the kernel of the first map equal to the image of the last. If the inclusion A into B gives an isomorphism in homology, then the kernel of the first map is zero, so the middle arrow is surjective. Since the image of the first map does not meet

H_{*}(C)

, the map

H_{*}(C)\to H_{*}(X)

is surjective. Since

H_{*}(X)\to H_{*}(A)

is the zero map,

H_{*}(C)\to H_{*}(X)

is an isomorphism. The converse is similar. Sławomir Biały (talk) 22:06, 30 April 2011 (UTC)[reply]

Ah, got it, thankyou very much! I didn't expect it to be that simple :) I have one further question if you wouldn't mind lending some assistance: I want to use MV to calculate the homology groups of X = 2 Mobius strips Y and Z identified along their boundaries (assuming the relevant triangulations exist). If I've done it correctly, using of course Y∩Z, Y, Z, Y∪Z, the intersection (i.e. boundary of a Mobius strip) is homotopy equivalent to the circle, and similarly the whole mobius strip Y or Z itself is h.e. to the circle, so we get that

H_{n}(X)=H_{n}(Y)\oplus H_{n}(Z)=0

for n > 2, as

H_{n}(Y\cap Z)=0

, and for n=2 we get the exact sequence

0\to H_{2}(X)\to H_{1}(Y\cap Z)\to \ldots H_{0}(X)\to 0

, i.e.

0\to H_{2}(X)\to \mathbb {(} Z)\to \mathbb {(} Z)\oplus \mathbb {(} Z)\to H_{1}(X)\to \mathbb {(} Z)\to \mathbb {(} Z)\oplus \mathbb {(} Z)\to H_{0}(X)\to \to 0

: we can deduce a little about the maps by the fact the endpoints are 0, but not a great deal else. Should I be looking at specifically how the various inclusion maps behave in this particular case, or is there a smarter way to do this? Thanks again! (Oh, and I meant to ask - I never had the notation of

H_{*}(W)

defined for me when I was taught this material, so I just had to have a bit of a guess at what it meant here - is there a proper definition for the notation? I couldn't find anything on your Homology Group or Mayer-Vietoris pages...) Estrenostre (talk) 00:20, 1 May 2011 (UTC)[reply]

On the last question, I'd consider

H_{*}(W)\,

the notation for the direct sum of the homology groups, thought of as a graded module. There's probably some sense in which this is a ring as well, but I'm a little vague on how it works in the CW category, so I don't want to say graded ring here. I'm not sure that this appears in any standard textbooks, but I knew what you meant. For the first question (for what it's worth), gluing two Moebius strips together gives a Klein bottle. The homology is worked as an example in Mayer–Vietoris sequence#Klein bottle. Sławomir Biały (talk) 22:04, 1 May 2011 (UTC)[reply]

Perhaps I'm being slow, but why is that final map to 0? Surely it should be

H_{1}(X)\to H_{0}(S^{1})=\mathbb {Z} \to H_{0}(S^{1})\oplus H_{0}(S^{1})=\mathbb {Z} \oplus \mathbb {Z} \to H_{0}(X)\to 0

? I can't see why we can guarantee that the image of that map from

H_{1}(X)

must be 0, maybe i'm overlooking something. Thanks for persevering! Estrenostre (talk) 06:02, 2 May 2011 (UTC)[reply]

If all the spaces are path connected, the MV sequence can be terminated at

H_{0}

, since exactness of

\mathbb {Z} \to \mathbb {Z} \oplus \mathbb {Z} \to \mathbb {Z} \to 0

always implies short exactness of this sequence. This trick is exploited by the MV sequence for reduced homology groups. Sławomir Biały (talk) 11:58, 2 May 2011 (UTC)[reply]

How do finitists define real numbers?

I've read that you can base analysis on finitists concepts only. Do they define real numbers, or do they bypass this somehow? Count Iblis (talk) 18:41, 30 April 2011 (UTC)[reply]

I have a feeling computable number and computable analysis are relevant. These notions seem to be consistent with the finitist philosophy. Sławomir Biały (talk) 22:30, 30 April 2011 (UTC)[reply]

Finitism can be viewed as a kind of constructivism. Analysis can be done in a constructivist framework. See our article on constructivism (mathematics) or Bridges' article on Constructive Mathematics in the Stanford Encyclopedia of Philosophy. See also finitism. EdJohnston (talk) 03:20, 1 May 2011 (UTC)[reply]

You might like [1] by Doron Zeilberger. 69.111.194.167 (talk) 09:05, 1 May 2011 (UTC)[reply]

Ha, that's interesting! Perhaps the constructism approach and the computanionalist approach mentioned above are too much modeled on classical calculus and all you really need is just discrete mathematics... Count Iblis (talk) 21:38, 1 May 2011 (UTC)[reply]