Mathematics desk
< March 7	<< Feb \| March \| Apr >>	March 9 >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

March 8

Statistical analsysis

Consider this hypothetical example: I have a number of parameters that I think may predict the level of theft loss at given convenience store, and have collected data on those parameters for a number of stores.

How do I determine which of those parameters, alone or in combination, actually help predict the level of theft loss? (I want to throw away varaibles that don't really help and have a simpler statistical model)
How do I build a statistical model to predict the theft loss of a proposed new store? If I have no idea what form a good predict formula would take, what do I do or try?
Can the above be done "automagically" using a tool like R (for common/well-known types of statistical models)?
What would be some good introductory articles to read on the subject?

Thanks in advance. — Preceding unsigned comment added by 96.227.60.60 (talk) 13:43, 8 March 2013 (UTC)[reply]

This is going to depend on the type of data you have to work with. Ideally, you would have many data cases which vary by each independent parameter alone. For example, if one of your parameters is the average incomes of people living in the area, you would want many data points where nothing varies except for incomes. If you see no difference in theft, then you can disregard that parameter. If you do see differences, then you can determine the correlation factor with that particular parameter. Then do the same test for the next parameter, etc., until you determine the correlation factor for each parameter. That would then allow you to come up with a formula which includes each parameter and it's correlation factor as a coefficient.

However, the real world is very messy. You probably will only have a small number of data cases, and many parameters will vary for each. Another complicating factor is that some parameters may not be independent of others. For example, if you also have "high crime area" as a parameter, that may well be dependent on the average income. The danger of this is that you can over-represent these two related parameters by counting essentially the same factor twice.

BTW, I'd assume this type of research has already been done. Have you done web searches to see if it has ? StuRat (talk) 15:51, 8 March 2013 (UTC)[reply]

(ec) You might want to look up Regression analysis. IBE (talk) 15:54, 8 March 2013 (UTC)[reply]

Basically you are trying to do data analysis with the goal of developing a statistical model. Those articles should give you a reasonable starting point. Looie496 (talk) 23:04, 8 March 2013 (UTC)[reply]

exp(i2PI)=1 then i2PI=ln1=0?? thank you!!

--Ulisse0 (talk) 15:25, 8 March 2013 (UTC)[reply]

The logarithm is a multi-valued function or if you want to make it singe valued, you have to introduce a branch cut. This is related to the fact that

$\oint _{C}{\frac {dz}{z}}=2\pi i$

where C is a contour that encircles the orgin counterclockwise.

Thank you, it seemed a violation of transitive property of equality, like saying 1^(1/2)=1 but also 1^(1/2)=-1 then 1=-1 --Ulisse0 (talk) 15:46, 8 March 2013 (UTC)[reply]

Complex logarithm may be a better place to start for the concept. To perhaps make this more clear, think of the fact that sin(5*Pi/2) = 1, but 5*Pi/2 <> inverse sin(1) = Pi/2Naraht (talk) 16:09, 8 March 2013 (UTC)[reply]

It's interesting what you say, indeed arcsin is a function only in [a certain domain but also in a certain] COdomain (or image, this is indeed another question, 'coz I've never understood the real difference..), which is [-pi/2,pi/2]. The (apparent?) violation of transitive property of equality should happen for every 'non-function', e.g. square root itself if 'defined' as codomain not only in the 1st quadrant but in the 2nd too--Ulisse0 (talk) 16:47, 8 March 2013 (UTC)[reply]

The codomain is the set of values the function "could" have; the image is the set it does have (given the domain chosen). For example, when discussing polynomials, it's useful for consistency to say that they're all

\mathbb {R\rightarrow R}

, although some of them (like

x\rightarrow x^{2}

) merely have the range

\mathbb {R} _{0}^{+}

. --Tardis (talk) 14:49, 15 March 2013 (UTC)[reply]

No. ln(1) = 0 + 2k pi i. The same goes for any other number. — 79.113.230.39 (talk) 22:51, 8 March 2013 (UTC)[reply]