Talk:Iterative proportional fitting

A fact from Iterative proportional fitting appeared on Wikipedia's Main Page in the Did you know column on 10 July 2009 (check views). The text of the entry was as follows:

Did you know... that convergence of the iterative proportional fitting procedure for estimating cell values of a contingency table was rigorously proved using differential geometry?

A record of the entry may be seen at Wikipedia:Recent additions/2009/July.

Wikipedia

Notability comment[edit]

The reference to Pukelsheim and Simeone (2009) seems a bit obscure. While I can't claim to be knowledgeable enough to assess the quality or impact of the work, an unpublished preprint written 4 months ago doesn't seem to be a particularly notable reference.–3mta3 (talk) 21:28, 12 July 2009 (UTC)[reply]

Additional Contributions[edit]

I was expecting to see (trying to research) contributions from Messrs. Fratar and Furness. In the travel demand modeling industry, one often says "Fratar a matrix" or "Furness" to mean "iteratively adjust a matrix to match new row and colum totals." —Preceding unsigned comment added by 12.9.33.198 (talk) 17:44, 5 August 2009 (UTC)[reply]

I don't understand the chi-square-value in the last section[edit]

The IPFP -solution of the 2x2-contingency table produces the table-of-expected-frequencies (from the chi-square-formulae). Since it is the "expected" frequencies-table, its chi-square is zero. Then how can the probability p(0) ~ 0.18 be meaningful? --Gotti 23:52, 24 January 2011 (UTC)--Gotti 23:52, 24 January 2011 (UTC) — Preceding unsigned comment added by Druseltal2005 (talk • contribs)

What's the point?[edit]

As far as I can tell, the article completely fails to explain what the utility of the method is. Isn't the classical use to adjust cell counts for new marginals? For example, to remove bias in a poll where the gender marginals don't match the actual population gender distribution. If so, the example could be used to motivate this. As it is, I'm left wondering what I'm supposed to make of the new cell totals. What do they mean?

Yes, I would like to know that, too. Moreover, the values in the final table can easily be calculated directly in one step by renormalizing the initial table to relative frequencies and then mulitplying those, e.g. (87/100)*(52/100)*100 = 45.24. - Saibod (talk) 13:35, 5 September 2012 (UTC)[reply]

That's one real use. Another is fitting loglinear models, which are statistical models for multidimensional tables that match observed and expected lower-dimensional tables. In my experience, people fitting loglinear models are likely to call the algorithm IPF and people correcting marginals in a poll are likely to call it 'raking' or 'rim weighting' Tslumley (talk) 07:13, 2 December 2018 (UTC)[reply]

Error in RAS Section?[edit]

Should the ${\hat {a}}_{ij}$ really mean ${\hat {m}}_{ij}$ ? The $a$ 's seem to appear out of nowhere and not be defined in the statement of the problem...69.142.244.49 (talk) 13:46, 15 January 2013 (UTC)[reply]

Indeed. It is fixed now. — Preceding unsigned comment added by 203.10.91.11 (talk) 04:16, 24 February 2015 (UTC)[reply]

New edit changed from emphasizing a special, and usually trivial case - to the general, common meaning of IPF[edit]

The article emphasized how to approximate a matrix strictly as an outer product of vectors, for such purposes as chi-squared tests of independence, by ignoring the given table, replacing it with all ones, and factoring to match the original marginals. Usually this doesn't even need iteration. But most of the references (e.g. Kruithof) refer to the process of scaling a matrix to match new row- and column-totals, for such purposes as gravity models and Furness-Fratar factoring in transportation models, weighting of survey data, and synthesizing cross-classified demographic data estimates. The former is a special case, and usually a trivial case, of the latter. I edited the article to emphasize the general case. What's surprising is how much of the content was already mostly there and adaptable. The special case is mentioned.

Some things left unchanged that I think warrant review and/or verification:

No reference or methodology is apparent for how the number of elementary operations was determined, or whether these apply to the general or ignore-matrix case. The two general algorithms can both be done with I(J-1) adds, I divisions, and IJ multiplications (for 2 dimensions). Algorithm 1 requires more array traversal and/or input/output, and might accumulate more round-off error.
Whether the Pukelsheim and Simeone article stands out in history among the many modern studies of IPF.
Improve the Existence and Uniqueness of MLEs section
G.U. Yule's 1912 paper didn't propose an IPF procedure, but he gave an analytic solution to a 2x2 problem. May add this remark, to bring Yule back into the history.
Should find and add a specific reference or cross-reference to the special-case usage, in lieu of the removed example chi-squared test.
Additional usages and variations: biproportional political representation systems (with integers), inequalities, e.g. supply constraints.

Jaguarmountain (talk) 20:21, 30 September 2020 (UTC)[reply]

poorly explained as no usage cases[edit]

Despite this article claiming usage in several fields there are no sources for those uses. And the article is very poor about explaining waht IPF is or how it is of practical use. Which is a pity as it is quite simple to explain this. https://www.tandfonline.com/doi/full/10.1080/00330124.2015.1099449 has a go at doing this and maybe someone can bring the relevant stuff over so that someone that did not finish high school and has no studied economics nor mathematics nor computer related fields can understand.

My understanding is that IPF allows gaps in data to be filled through a process of comparing data from different sources. The article puts it this way "Iterative proportional fitting (IPF) is a technique that can be used to adjust a distribution reported in one data set by totals reported in others. IPF is used to revise tables of data where the information is incomplete, inaccurate, outdated, or a sample." — Preceding unsigned comment added by 88.112.30.115 (talk) 18:29, 29 November 2020 (UTC)[reply]